fbpx

DESIGN & DEVELOP A SOLUTION FOR IOT GATEWAY PERFORMANCE BENCHMARKING

Opportunity and Scope

The Problem Statement

  • The company had built an Edge Gateway for IoT based solutions supporting multiple devices.
  • The Edge Gateway supported multiple protocols which included MQTT, Modbus, ZigBee, LoRa, http and any other known protocol.
  • These Gateways needed to be configured based on a host of parameters including types of sensors, protocols, number of sensors, analytics to be performed on the gateway / cloud and volume of data.
  • Choosing the number of gateways and the configurations was done on a trial and error basis, thus taking a lot of time and providing an approximate solution.

Solution Needed

  • The company wanted us to come up with a Machine Learning algorithm to measure the various parameters and suggest the best possible configurations and number of gateways that would be needed.
  • The solution needed to measure the experience attributes (Like CPU/Memory/Latency) of the IoT Gateway & Cloud in real time.
  • The solution also needed to predict the performance attributes with varied work-load by creating an analysis model, trained with previously experienced profiles, in an instant.
  • The solution was also to provide end-to-End Testing – Sensors – Gateway – Cloud – Actuation and Profiling of IoT Gateway.

Our solution

  • Worked closely with Product Management and Engineering Architect to identify customer scenarios and component level interaction/integration tests.
  • Designed an IoT solution with multi-tiered architecture and REST API interface that integrates with all layers.
  • Zigbee over UART, MQTT and Modbus over TCP like protocols used for Sensors.
  • Developed a single point responsive UI that controls Sensor, Gateway, Cloud & Actuation engine.
  • Developed a visualization UI to show the different experience attributes of IoT gateway based on different work-loads.
  • Developed a predictive analytics module which uses linear regression to generate the expected performance numbers with variation in workload for each kind of scenario. This module uses previously executed scenario results to learn and generate prediction model.
  • Visualization of previously executed workflows and the predicted performance numbers.
  • Unit, Integration and Systems testing of different components involved.
  • Used tools like Perf & Call Graphs to measure the IoT Gateway performance bottlenecks and made improvements in an automated way.

Solution Details for Machine Learning(ML)

Linear regression to predict system performance:

Why Linear Regression?

Selecting the ML algorithm depends upon the type of problem that we are solving and the relation between the input data. – In this case, the input data are linearly dependent and as it is a predictive model, Linear regression is the best fit algorithm to this problem.

Types of sensors, number of sensors and gateways are considered as dependent variables(x) and performance numbers viz CPU, mem and network latency are considered as independent variables.

The regression model is a linear equation which is represented as y = mx + c, where x is the dependent variable (can be multiple variables) with corresponding co-efficient (m) and c is a constant.

  • Data Collection:  All the possible combinations of available gateways, sensors and their respective quantities for a particular scenario, are listed down and run to collect all the performance data.
  • This is done for all kinds of possible scenarios and the collected data is stored in the DB for ML module.
  • Validate & Train the Model: For each scenario, the ML module generates linear regression model considering all the provided sensors and gateways. This model is stored for future reference.
  • Test Model: When a user checks the system performance for a custom scenario, the ML module maps the requirement with the available models and provides the result. The user can then play around with the numbers of sensors and gateways to see the performance measurements on the fly.

Anomaly detection from performance numbers:

  • In the user/production environment, there could be a possibility of finding the gateway device less performant because of utilization of the resources by some other factors than the actual sensors and corresponding process.
  • In this case, the previously built ML model of the required scenario for that specific gateway is used to predict the performance attributes and the deviation from the actual numbers is calculated.
  • Detection of the anomaly is done if the deviation is more than the acceptable standard deviation(SD) which is calculated during model generation.
  • A DoS attack can cause the system to hang for certain period of time and hence stopping all the on-going process in the userland including the anomaly detection process.
  • A kernel module is registered with the system to detect these kinds of attack and raise an alarm accordingly.

Anomaly detection from Sensor data:

  • In the user/production environment, there could be a possibility of finding irrelevant sensor data due to malfunctioned sensors.
  • A predictive model is being generated by learning the sensors data for sufficient period, assuming the sensors are giving valid data in this learning period.
  • This model is then used to predict the sensor data and the deviation from the actual sensors values is calculated. The observed data is labelled as anomaly if the deviation is more than acceptable standard deviation(SD) which is calculated during model generation.

Online learning of Sensor data:

  • Some sensors’ values vary with the time as they depend on the climate. For these kinds of sensors, the standard deviation can give a large unacceptable band and the collection of data can be very huge.
  • Detection of anomalies on this kind of environment can give lots of false positives.
  • The approach to solve this problem is by doing a continuous learning and updating the model for each valid incoming data. But, generating the model may take time as the data size starts to increase which is not acceptable for real-time anomaly detection scenario.
  • Moving average concept is used upon the linear regression to generate the model in real-time and hence increases the overall performance. This moving average can consider all the previously collected data or a specific period of previous data.
  • The accuracy did not get impacted with this change and gave more than 90% which is similar to the usual approach.

Linear Regression on Low memory/config devices:

  • One of the requirements was to have anomaly detection algorithm in devices which is very low in config/memory that installation of third-party ML libraries is difficult.
  • The solution provided was to implement the n-degree linear regression with moving average concept to avoid installation of the libraries and storing the whole set of data for learning.
  • This algorithm gives the same accuracy as the libraries give and as the moving average is used instead of whole data, the prediction and learning happen in real time till 3 dependent variables.
  • As and when the dependent variables increase the mathematical calculation start taking more time because of the low configuration of device. The frequency of incoming data can be adjusted to overcome this issue.

Tools

  • Python Libraries – Modbus, Mosquitto, PyModbus, Pandas, Scikit-Learn, Robot Framework
  • Sensor Simulator
  • jQuery, Bootstrap for UI
  • Created a Simulator coded in Python and C
  • Implemented a regression machine learning algorithm in Python using Scikit-Learn library
  • Worked on different protocols – MQTT, Modbus, ZigBee, Wi-Fi, Bluetooth