The making of “Update AI/ML models OtA”

Deploying trained AI models on edge devices might seem challenging. However, using minimal WebAssembly runtimes, and automatic conversion from ONNX to WebAssembly, modular AI/ML model deployment Over the Air (OtA) to pretty much any edge device is possible.

Maurits Kaptein
5 min readJan 15, 2021

We received a number of responses after posting the video above on LinkedIn (find the original on Youtube). The video shows how an edge device equipped with a camera can be used to recognize hand-written digits and characters. The crux of the video is the demonstration of modular AI deployment. In the first ~20 seconds of the video we demonstrate that the deployed model is able to recognize digits, but fails on characters. Next, the video demonstrates an Over the Air (OtA) update of the deployed model such that the device is also able to recognize characters. This is made possible by installing a small WebAssembly microcontainer (or a virtual CPU really) on the device that supports modular deployment of trained AI models in WebAssembly format. In this post we will detail how this video was made step by step.

The crux of the video is the demonstration of modular AI deployment: In the first 20 seconds of the video we demonstrate that the deployment model is able to recognize digits, but fails on characters. Next, the video demonstrates an Over the Air (OtA) update of the deployed model such that the device is also able to recognize characters.

Decomposing the video: the Making of…

Let’s decompose the video chronologically and discuss the hardware, the AI models, and the tooling involved:

  • 0:00: The video starts with a shot of the edge device. The device involved is a simple Raspberry Pi, equipped with an off-the-shelf camera and screen. This device is relatively large compared to many other (I)IoT devices. However, what is shown in the video can — depending on the size of the trained AI model — also be done on small MCUs. Note that the cable that can be seen in the video solely provides the power for the device.
  • 0:07: At this point in the video we start demonstrating the performance of the AI model running on the device. The device is equipped with a simple digit recognition model (see below) in Scailable wasi format that is running in a small WebAssembly runtime. For this video we used the Scailable golang-medium runtime (which consumes a bit under 4Mb). Note that the c-minimal runtime which provides the same modular AI deployment functionality (but without logging, OtA, and model testing) weighs in at only 64Kb. A custom application running on the device takes pictures ever 1/10th of a second and feeds these images to the runtime containing the AI model.
  • 0:09: Here we clearly demonstrate the performance of the model. This model is trained on the famous MNIST dataset. Model training details can be found in this Jupyter Notebook. This same model, using the Scailable web node (i.e., technically a wasi runtime implemented in .js) can be tested here. As can be seen in the notebook, the trained model is stored in ONNX format and subsequently, using the Scailable Platform, compiled to WebAssembly. The resulting binary weighs in at 313Kb.
  • 0:15: At this point it becomes clear that while the digits 7, 4, and 3, are correctly identified, the character E fails. Note that we render a ? when the inference is poor: whenever the recognition certainty of the model is below .6 for all digits we show the question mark. This logic is implemented in the custom application that is also responsible for feeding the images to the model.
  • 0:26: After establishing the poor performance on characters, we move the Scailable Platform. We briefly demonstrate our login to the platform, however, our intend was not to show the UI of the Platform but rather the functionality of seamless OtA deployment…
  • 0:29: …Things progress rapidly at this point in the video. First we show how the Scailable Platform allows users to manage their models. All the visible models have been converted to WebAssembly automatically by either using the sclblpy package or by uploading a trained model in ONNX format to the Platform. Uploaded models can immediately be tested and consumed as a REST endpoint (in which case inference is executed in the Scailable cloud). The trained eye will notice that an alternative model for both digit and character recognition is available in the model list (see this Jupyter Notebook for training details). After compiling, this WebAssembly binary weighs in at 335Kb. The screenshot below demonstrates the model list.
Scailable model maintainence and testing. Models are compiled to WebAssembly automatically and can be consumed using REST calls. (Image by Author)
  • 0:33: After looking at the various models, we move to the Deploy tab of the Platform. Next, the demo device is selected (left column), the new character and digit recognition model is selected (middle column), and we press the “Deploy to Device” button (see screenshot below). At this point, the selected model is added to the assignments of the demo device. Since both models have the same alias (chars in this case), this newly assigned model will overwrite the previously assigned model and will be accessible inside the runtime installed on the device using the model alias.
Demonstrating AI model deployment using the Scailable Platform. (Image by author)
  • 0:46: Here we move back to the device. Admittedly, we hide the fact that the runtime will only check for new assignments once every x minutes (obviously this is configurable). We simply make sure the new assignment is retrieved by the runtime before moving forward. At this point in the video the new WebAssembly binary — this time encoding a model that is able to recognize both digits and characters — is received by the device (or technically, by the runtime) Over the Air. Note that deployment of this newly trained model does not require a restart of the device. Deployment is fully modular and sandboxed.
  • 0:51: With the new model deployed on the device, effectively the functionality of the device has been changed. The video demonstrates how the OtA update allows the device to recognize both the digits (7, 4, and 3) and the written character E. Yay!

Wrap up

With the Scailable Platform, fully modular, efficient, and secure deployment of trained AI models on virtually any edge device (or in the cloud) is easy and extremely efficient. We think that efficient edge deployment of trained AI models is a necessity: in many use-cases it is simply not feasible to send data back to a central cloud. WebAssembly provides a portable compilation target that enables seamless deployment in micro runtimes. We have been able to deploy DNNs on ESP32 devices using this setup.

Feel free to give it a try!

Disclaimer

It’s good to note my own involvement here: I am a professor of Data Science at the Jheronimus Academy of Data Science and one of the cofounders of Scailable. Thus, no doubt, I have a vested interest in Scailable; I have an interest in making it grow such that we can finally bring AI to production and deliver on its promises. The opinions expressed here are my own.

Note: The Scailable runtimes (e.g., the golang-medium runtime and the c-minimal runtime) introduced in this post are proprietary.

--

--

Maurits Kaptein

I am professor at Tilburg University working on bandits, causality, and Bayesian inference with various applications. I also care about AI & ML deployment.