CoreML and friends

There were a lot of announcements at Apple's WWDC, but what does it mean for your users? Well, the first thing to mention is that everything that was announced this year is already doable, from a technical standpoint. The main difference lies in how quickly you can accomplish those things, and also how it impacts your users. The frameworks for machine learning that Apple has released are mainly focused on bringing machine learning to developers who aren’t experts in the field but still allow for more advanced features for experts who want to bring extra complexity to their product.

Apple's machine learning model

Apple’s machine learning effort is at its core backed by a model that you include in your app. This model is prepared and trained before you release your app and can be updated every time you update your app. All the machine learning work is done solely on your device, using the device’s chip. There are some pros and cons to this approach. The most important advantage is that this approach protects your users’ privacy — the data never leaves the device, and greatly reduces the chances of your data ending up somewhere you didn’t intend it to.

The most important disadvantage is processing power. One of the reasons that machine learning has had so many great strides in the last years is the increase in processing power. If you can dispatch a machine learning task to a cloud machine, you can, in theory, have endless processing power available if your pockets are deep enough. There is, however, a caveat to this approach; you will have to account for the latency to transfer the “question” to the cloud, and to get the “answer” back. This particular point makes machine learning on for example live video infeasible unless you’re using an on-device approach like Apple’s machine learning technology.

One of the most important takeaways from this year’s releases is that the ease of use has been significantly improved.
Peter Ringset

New tools for the sound use of machine learning

Apple has focused a lot on tools this year and are releasing new tools that allow for using machine learning in ways that are technically sound, but also surprisingly simple for people who don’t know much about all the nitty-gritty details. In last year’s release, they included CoreML, the framework for encompassing models in your apps. They assumed last year that the machine learning community would be most happy being able to continue using the tools and methods that they have been using for years, and instead convert the resulting models to the CoreML-format.

This left developers without machine learning competence in a difficult situation. Either they would have to use a stock model they could find online, usually running hundreds of megabytes in size. Or they could try and create a model, but this would typically involve hiring data scientists who can become pretty expensive. This year, however, there are a few different tools that are coming to help developers create these models.

These tools allow developers that have limited in-depth knowledge of machine learning to be able to develop and train models that are useful and perform well.
Peter Ringset

Cutting edge tech

Deep learning is perhaps the most important field within modern machine learning. Simply put we can say that deep learning is really big and complicated models, that are pretty intensive to both prepare and train before you ship your app, and to use in production. Deep learning is widespread in the image processing domain, for instance on a live video where you analyse 25 frames per second. When the models are bigger and more complicated, it also entails that these models take up more space when we include them in our apps. There will often be some sort of sensible trade-off between model size and speed on the one size, and accuracy on the other side. Generally, we see that even the small models still take up quite some space, sometimes upwards of 100 MB.

One of the things that impressed us with the announcements at WWDC was how forward-looking Apple is in their inclusion of new machine learning technology.
Peter Ringset

Transfer learning for training models

One of the things announced on WWDC is that by using Apple’s new frameworks for creating and training models they can use a technique called transfer learning, that’s become more and more popular in the machine learning community in recent years. This technique is based on taking an already trained model and just training it a little bit more with your data to make it perform well for your task. If you, for instance, want to train a model that can classify different breeds of dogs, you could start with a model that can classify different types of animals (dogs, cats, horses, etc.). This base model already has a pretty good concept of how a dog looks like, and by using this as a starting point, we can train a better model in a shorter amount of time.

An example of transfer learning. The existing layers are from an already trained model, while your layers are just placed on top of that model to tailor it to your needs.

Apple frameworks as training models

When using the Apple frameworks for training models like this, they can base your model on a base model that they already include on the device (for their own needs), and you just include the last part of the model that is specific for your app. Some of the base models that they include are pretty impressive cutting-edge models, like for instance the YOLO model. This means that the size of your model — and consequently the size of your app — can be reduced to just a fraction of its original size while maintaining all the performance and accuracy that you could have with a full-size model. In some cases, you can even increase performance and accuracy since the base model is tailored to be used on a mobile device.

Number quantization: The numbers on the top scale has been rounded off to four distinct values on the bottom scale. This allows us to use fewer bytes to store the numbers.

Model quantisation for size reduction

Another thing introduced this year is the possibility to use a technique called model quantisation. A machine learning model is typically a big collection of numbers that are carefully calculated during training. The numbers are usually decimal numbers that require a certain amount of bytes to store. With deep learning models, we see that the model size becomes so big that we can have a considerable size reduction if we can use a fewer number of bytes per number. Model quantisation can be used to achieve this. By using statistics to analyse the numbers, it can find some key values of the numbers and round off the number that is close, so that all of the numbers in the model can be represented using only a few bits each. This can lead to a dramatic model size reduction. Model quantisation can, however, have an impact on the accuracy of the model, and so it is essential to test the quantised model against the original one to make sure we haven’t lost too much accuracy.

Empathy for users

We in EGGS are always focused on our users, and we think that all this is a massive step in the right direction. The download size of an app using these techniques is reduced significantly, meaning that the user doesn’t have to use up their data plan to download your app, or that they have to delete music or precious photos from their devices to fit your app.

One of the developers we talked to at WWDC told us about an app they were working on. One of the features in the app was that a user would use the device’s camera (with attached machine learning) to take pictures of their things to be catalogued by the app. This is a pretty straightforward feature, but it ended up increasing the footprint of their app by over 100 MB because the model that could categorise all their different categories had to be that large to function properly. The two new features above could reduce this model size to a fraction of the original size. We think that this is very important for developers and companies delivering apps to think about.

Nothing exists in a vacuum, and it’s important to consider the system that the app is supposed to live on and how it interacts with other parts of the system — without compromising itself or the other parts.
Peter Ringset
Here is a demo of me using a machine learning object recognition model on a live video stream from my mobile phone's camera.