A review on TinyML: State-of-the-art and prospects
Machine learning has become an indispensable part of the existing technological domain. Edge comput- ing and Internet of Things (IoT) together presents a new opportunity to imply machine learning tech- niques at the resource constrained embedded devices at the edge of the network. Conventional machine learning requires enormous amount of power to predict a scenario. Embedded machine learning - TinyML paradigm aims to shift such plethora from traditional high-end systems to low-end clients. Several challenges are paved while doing such transition such as, maintaining the accuracy of learning models, provide train-to-deploy facility in resource frugal tiny edge devices, optimizing processing capac- ity, and improving reliability. In this paper, we present an intuitive review about such possibilities for TinyML. We firstly, present background of TinyML. Secondly, we list the tool sets for supporting TinyML. Thirdly, we present key enablers for improvement of TinyML systems. Fourthly, we present state-of-the-art about frameworks for TinyML. Finally, we identify key challenges and prescribe a future roadmap for mitigating several research issues of TinyML.
Deep Learning on Microcontrollers: A Study on Deployment Costs and Challenges
Microcontrollers are an attractive deployment target due to their low cost, modest power usage and abundance in the wild. However, deploying models to such hardware is nontrivial due to a small amount of on-chip RAM (often < 512KB) and limited compute capabilities. In this work, we delve into the requirements and challenges of fast DNN inference on MCUs: we describe how the memory hierarchy influences the architecture of the model, expose often under-reported costs of compression and quantization techniques, and highlight issues that become critical when deploying to MCUs compared to mobiles. Our findings and experiences are also distilled into a set of guidelines that should ease the future deployment of DNN-based applications on microcontrollers.
Machine Learning for Microcontroller-Class Hardware - A Review
The advancements in machine learning opened a new opportunity to bring intelligence to the low-end Internet-of-Things nodes such as microcontrollers. Conventional machine learning deployment has high memory and compute footprint hindering their direct deployment on ultra resource-constrained microcontrollers. This paper highlights the unique requirements of enabling onboard machine learning for microcontroller class devices. Researchers use a specialized model development workflow for resource-limited applications to ensure the compute and latency budget is within the device limits while still maintaining the desired performance. We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance of it. We present both qualitative and numerical insights into different stages of model development by showcasing several use cases. Finally, we identify the open research challenges and unsolved questions demanding careful considerations moving forward.
Machine Learning for Space Applications on Embedded Systems
As space missions continue to increase in complexity, the operational capabilities and amount of gathered data demand ever more advanced systems. Currently, mission capabilities are often constrained by the link bandwidth as well as on-board processing capabilities. A large number of commands and complex ground station systems are required to allow spacecraft operations. Thus, methods to allow more efficient use of the bandwidth, computing capacity and increased autonomous capabilities are of strong research interest. Artificial Intelligence (AI), with its vast areas of application scenarios, allows for these challenges and more to be tackled in the spacecraft design. Particularly, the flexibility of Artificial Neural Networks as Machine Learning technology provides many possibilities. For example, Artificial Neural Networks can be used for object detection and classification tasks. Unfortunately, the execution of current Machine Learning algorithms consumes a large amount of power and memory resources. Additionally, the qualification of such algorithms remains challenging, which limits their possible applications in space systems. Thus, an increase in efficiency in all aspects is required to further enable these technologies for space applications. The optimisation of the algorithm for System on Chip (SoC) platforms allows it to benefit from the best of a generic processor and hardware acceleration. This increased complexity of the processing system shall allow broader and more flexible applications of these technologies with a minimum increase of power consumption. As Commercial off-the-shelf embedded systems are commonly used in NewSpace applications and such SoC are not yet available in a qualified manner, the deployment of Machine Learning algorithms on such devices has been evaluated. For deployment of machine learning on such devices, a Convolutional Neural Network model was optimised on a workstation. Then, the neural network is deployed with Xilinx's Vitis AI onto a SoC which includes a powerful generic processor as well as the hardware programming capabilities of an Field Programmable Gate Array (FPGA). This result was evaluated based on relevant performance and efficiency parameters and a summary is given in this thesis. Additionally, a tool utilising a different approach was developed. With a high-level synthesis tool the hardware description language of an accelerated linear algebra optimised network is created and directly deployed into FPGA logic. The implementation of this tool was started, and the proof of concept is presented. Furthermore, existing challenges with the auto-generated code are outlined and future steps to automate and improve the entire workflow are presented. As both workflows are very different and thus aim for different usage scenarios, both workflows are outlined and the benefits and disadvantages of both are outlined.
Machine Learning: A Review of Learning Types
In this paper, various machine learning techniques are discussed. These algorithms are used for many applications which include data classification, prediction, or pattern recognition. The primary goal of machine learning is to automate human assistance by training an algorithm on relevant data. This paper should also serve as a collection of various machine learning terminology for easy reference.
A Primer for tinyML Predictive Maintenance: Input and Model Optimisation
In this paper, we investigate techniques used to optimise tinyML based Predictive Maintenance (PdM). We first describe PdM and tinyML and how they can provide an alternative to cloud-based PdM. We present the background behind deploying PdM using tinyML, including commonly used libraries, hardware, datasets and models. Furthermore, we show known techniques for optimising tinyML models. We argue that an optimisation of the entire tinyML pipeline, not just the actual models, is required to deploy tinyML based PdM in an industrial setting. To provide an example, we create a tinyML model and provide early results of optimising the input given to the model.
TinyML: From Basic to Advanced Applications
TinyML aims to implement machine learning (ML) applications on small, and lowpowered devices like microcontrollers. Typically, edge devices need to be connected to data centers in order to run ML applications. However, this approach is not possible in many scenarios, such as lack of connectivity. This project investigates the tools and techniques used in TinyML, the constraints of using low-powered devices, and the feasibility of implementing advanced machine learning applications on microcontrollers.
To test the feasibility of implementing ML applications on microcontrollers, three TinyML programs were developed. The first, a basic keyword spotting application able to recognize a set of words. The second, a program for training a neural network model on a microcontroller following an online learning approach. And the third, a federated learning program able to train a single global model with the aggregation of local models trained on multiple microcontrollers. The results show optimal performance in all three applications once deployed on microcontrollers. The development of basic TinyML applications is straightforward when the machine learning pipeline is understood. However, the development of advanced applications turned out to be very complex, as it requires a deep understanding of both machine learning and embedded systems.
These results prove the feasibility of successfully implementing advanced ML applications on microcontrollers, and thus, unveil a bright future for TinyML.
Machine Learning for Microcontroller-Class Hardware - A Review
The advancements in machine learning opened a new opportunity to bring intelligence to the low-end Internet-of-Things nodes such as microcontrollers. Conventional machine learning deployment has high memory and compute footprint hindering their direct deployment on ultra resource-constrained microcontrollers. This paper highlights the unique requirements of enabling onboard machine learning for microcontroller class devices. Researchers use a specialized model development workflow for resource-limited applications to ensure the compute and latency budget is within the device limits while still maintaining the desired performance. We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance of it. We present both qualitative and numerical insights into different stages of model development by showcasing several use cases. Finally, we identify the open research challenges and unsolved questions demanding careful considerations moving forward.
A review on TinyML: State-of-the-art and prospects
Machine learning has become an indispensable part of the existing technological domain. Edge comput- ing and Internet of Things (IoT) together presents a new opportunity to imply machine learning tech- niques at the resource constrained embedded devices at the edge of the network. Conventional machine learning requires enormous amount of power to predict a scenario. Embedded machine learning - TinyML paradigm aims to shift such plethora from traditional high-end systems to low-end clients. Several challenges are paved while doing such transition such as, maintaining the accuracy of learning models, provide train-to-deploy facility in resource frugal tiny edge devices, optimizing processing capac- ity, and improving reliability. In this paper, we present an intuitive review about such possibilities for TinyML. We firstly, present background of TinyML. Secondly, we list the tool sets for supporting TinyML. Thirdly, we present key enablers for improvement of TinyML systems. Fourthly, we present state-of-the-art about frameworks for TinyML. Finally, we identify key challenges and prescribe a future roadmap for mitigating several research issues of TinyML.
TinyML: From Basic to Advanced Applications
TinyML aims to implement machine learning (ML) applications on small, and lowpowered devices like microcontrollers. Typically, edge devices need to be connected to data centers in order to run ML applications. However, this approach is not possible in many scenarios, such as lack of connectivity. This project investigates the tools and techniques used in TinyML, the constraints of using low-powered devices, and the feasibility of implementing advanced machine learning applications on microcontrollers.
To test the feasibility of implementing ML applications on microcontrollers, three TinyML programs were developed. The first, a basic keyword spotting application able to recognize a set of words. The second, a program for training a neural network model on a microcontroller following an online learning approach. And the third, a federated learning program able to train a single global model with the aggregation of local models trained on multiple microcontrollers. The results show optimal performance in all three applications once deployed on microcontrollers. The development of basic TinyML applications is straightforward when the machine learning pipeline is understood. However, the development of advanced applications turned out to be very complex, as it requires a deep understanding of both machine learning and embedded systems.
These results prove the feasibility of successfully implementing advanced ML applications on microcontrollers, and thus, unveil a bright future for TinyML.
Deep Learning on Microcontrollers: A Study on Deployment Costs and Challenges
Microcontrollers are an attractive deployment target due to their low cost, modest power usage and abundance in the wild. However, deploying models to such hardware is nontrivial due to a small amount of on-chip RAM (often < 512KB) and limited compute capabilities. In this work, we delve into the requirements and challenges of fast DNN inference on MCUs: we describe how the memory hierarchy influences the architecture of the model, expose often under-reported costs of compression and quantization techniques, and highlight issues that become critical when deploying to MCUs compared to mobiles. Our findings and experiences are also distilled into a set of guidelines that should ease the future deployment of DNN-based applications on microcontrollers.
Machine Learning: A Review of Learning Types
In this paper, various machine learning techniques are discussed. These algorithms are used for many applications which include data classification, prediction, or pattern recognition. The primary goal of machine learning is to automate human assistance by training an algorithm on relevant data. This paper should also serve as a collection of various machine learning terminology for easy reference.
Machine Learning for Space Applications on Embedded Systems
As space missions continue to increase in complexity, the operational capabilities and amount of gathered data demand ever more advanced systems. Currently, mission capabilities are often constrained by the link bandwidth as well as on-board processing capabilities. A large number of commands and complex ground station systems are required to allow spacecraft operations. Thus, methods to allow more efficient use of the bandwidth, computing capacity and increased autonomous capabilities are of strong research interest. Artificial Intelligence (AI), with its vast areas of application scenarios, allows for these challenges and more to be tackled in the spacecraft design. Particularly, the flexibility of Artificial Neural Networks as Machine Learning technology provides many possibilities. For example, Artificial Neural Networks can be used for object detection and classification tasks. Unfortunately, the execution of current Machine Learning algorithms consumes a large amount of power and memory resources. Additionally, the qualification of such algorithms remains challenging, which limits their possible applications in space systems. Thus, an increase in efficiency in all aspects is required to further enable these technologies for space applications. The optimisation of the algorithm for System on Chip (SoC) platforms allows it to benefit from the best of a generic processor and hardware acceleration. This increased complexity of the processing system shall allow broader and more flexible applications of these technologies with a minimum increase of power consumption. As Commercial off-the-shelf embedded systems are commonly used in NewSpace applications and such SoC are not yet available in a qualified manner, the deployment of Machine Learning algorithms on such devices has been evaluated. For deployment of machine learning on such devices, a Convolutional Neural Network model was optimised on a workstation. Then, the neural network is deployed with Xilinx's Vitis AI onto a SoC which includes a powerful generic processor as well as the hardware programming capabilities of an Field Programmable Gate Array (FPGA). This result was evaluated based on relevant performance and efficiency parameters and a summary is given in this thesis. Additionally, a tool utilising a different approach was developed. With a high-level synthesis tool the hardware description language of an accelerated linear algebra optimised network is created and directly deployed into FPGA logic. The implementation of this tool was started, and the proof of concept is presented. Furthermore, existing challenges with the auto-generated code are outlined and future steps to automate and improve the entire workflow are presented. As both workflows are very different and thus aim for different usage scenarios, both workflows are outlined and the benefits and disadvantages of both are outlined.
A Primer for tinyML Predictive Maintenance: Input and Model Optimisation
In this paper, we investigate techniques used to optimise tinyML based Predictive Maintenance (PdM). We first describe PdM and tinyML and how they can provide an alternative to cloud-based PdM. We present the background behind deploying PdM using tinyML, including commonly used libraries, hardware, datasets and models. Furthermore, we show known techniques for optimising tinyML models. We argue that an optimisation of the entire tinyML pipeline, not just the actual models, is required to deploy tinyML based PdM in an industrial setting. To provide an example, we create a tinyML model and provide early results of optimising the input given to the model.