Object recognition on the edge
Using PyTorch, ONNX, WebAssembly, and the sclbl-webnode to deploy object recognition models directly in the browser.
Object recognition has been a challenging Machine Learning task for decades. However, remarkable progress has been made, with some even suggesting that we are done with object recognition. I think we are never done, but by now we do have some remarkably well performing, pre-trained, models that are directly available to us. More and more of these pre-trained models are available in ONNX format. In this post we will show how we used PyTorch, ONNX, WebAssembly, and the sclbl-webnode
to deploy object recognition directly in the browser. In the end, it looks like this:
Note that this is running locally: the model itself is retrieved only once from the servers at Scailable and the image never leaves the users machine (with some obvious privacy advantages). You can play around with the application at https://www.scailable.net/demo/cifar/
How it was build
Although object recognition on the edge might sounds like a hard task, building the interactive demo took less than an hour.
- First, we took a pre-trained object recognition model. We picked a version of ResNet, pretrained on the CIFAR10 dataset. You can find the model here.
- The original model came in PyTorch (
.pht
) format, so we converted it to ONNX…
“ONNX defines a common set of operators — the building blocks of machine learning and deep learning models — and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers”.
3. Next, we uploaded the ONNX to Scailable to convert it the model to a WebAssembly binary:
4. After conversion, the WebAssembly binary is available as a REST endpoint and it can be downloaded to be deployed on various edge devices (we have ran AI models on an ESP32 MCU board using WebAssembly). For deployment in the browser we made use of the sclbl-webnode
: an open-source javascript project implementing an extremely small WebAssembly runtime.
5. While finalizing the demo took a bit of additional .js
to make the bar plot at the bottom, the actual call to do the object recognition took very little code:
let inputData = "{\"input\": \"" + base64string + "\", \"type\":\"image\", \"norm\":[0.4914, 0.2023, " + "0.4822, 0.1994, 0.4465, 0.2010], \"width\":32, \"height\":32, \"channels\":3, \"crop\":true }";sclblRuntime.run(inputData).then(function(response) {
document.getElementById("output-integers").innerHTML = response;
showResults(response);
}, function(error) {
document.getElementById("output-integers").innerHTML = "An error occurred (see console for more details): " + error;
});
You can find details for going from PyTorch to ONNX for this specific model here, while the
sclbl-webnode
is explained in detail here.
Why WebAssembly (as opposed to the ONNX.js runtime)?
While clearly the demo works like a charm, one could wonder why we go through the trouble of transpiling the ONNX model to a WebAssembly binary: can one not just use the javascript ONNX runtime (ONNX.js)?
Well, while ONNX.js (and similarly for ONNX runtimes available for other edge devices) might seem easy it allows you to skip the WebAssembly step, ONNX runtimes are often both magnitudes larger and slower in execution. In this specific example, the .wasm
and the .onnx
used to specify the model (in the .wasm
case actually including the necessary operators) both take about 1Mb (1069Kb) to store. The runtimes however differ vastly in size: the onnx.min.js
as it can be found here weighs in at over 430Kb, while the sclbl-webnode-min.js
(here), weighs in at under 10Kb. The runtimes also differ substantially: we find ~164 milliseconds (estimatied based on 1000 predictions) for the .wasm
implementation, and ~420 milliseconds for the .onnx
implementation.
The size and speed gains (and, admittedly, the usability gains, as for many edge devices the ONNX runtimes are far less mature than WebAssembly runtimes) can be crucial for many applications. They are also easy to understand:
- A ONNX runtime effectively supports all of ONNX. The input to an inference task on the edge will be a model specification (the
.onnx
file) and the data (e.g., the image); both are processed to generated inferences. To do so, the runtime needs to implement all the possible ONNX operators. - An inference task transpiled to WebAssembly running in a WebAssembly runtime only takes the input data as input. It only contains the operators necessary for the task at hand. The runtime runs general WebAssembly, and does not need to process the ONNX file.
Thus, while the ONNX runtime is a general tool for running all kinds of models, the process of transpiling models to WebAssembly provides a specialized and highly optimized binary for each specific model. As the transpiling only needs to happen once, the benefits on the edge are pretty striking.
“WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications.”
Wrap up
I hope you enjoyed this post and/or enjoyed playing around with the interactive demo. Together with a Scailable account, the steps above should allow you to push your own AI and Machine Learning models to the edge.
Disclaimer
It’s good to note my own involvement here: I am a professor of Data Science at the Jheronimus Academy of Data Science and one of the co-founders of Scailable. Thus, no doubt, I have a vested interest in Scailable; I have an interest in making it grow such that we can finally bring AI to production and deliver on its promises. The opinions expressed here are my own.