Real-Time Stream Processing for IoT
Objective
In this research topic, I am reviewing some best practice IoT architectures and frameworks for real-time IoT streams.
IoT Architecture
Most IoT architectures involve some key components like sensor, place to collect and store data, way to analyze and interact data, and finally the way for the action.
Basic IoT architecture:
which is popular since long is few hundred sensor devices collect data and send to monolith backend servers to store, visualize and react. This approach is pretty successful in limited use cases. When devices start growing from hundreds to thousands this architecture does not scale.
Platform Architecture:
In this section, we will focus on platforms for development and processing real-time streams from sensor/devices. Platform can have three layers.
1. Data Acquisition:
This layer is responsible for building, managing and integrating several IoT devices. There is Kaa enterprise-grade multi-purpose open source IoT platform for device management, data collection, analytics and visualization, remote control, software updates and even more. This is fault tolerant and can horizontally scale. Platforms like Amazon Web Services (AWS) IoT platforms (most popular), Azure IoT Suite, IBM Watson, Google Cloud IoT, and Oracle IoT platform are the key players in IoT world.
Data Collectors-
Kafka: is used for develop real-time data pipelines and streaming IoT architectures. It is horizontally scalable, fault-tolerant, wicked fast.
Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data.
Amazon Kinesis collects, process, and analyze real-time, streaming data in cost-effective way, so you can get timely insights and react quickly to new input. Can ingest real-time data such as audio, video, website clickstreams, application logs, and IoT telemetry data for machine learning, analytics, and other applications.
2. Data Processing:
This layer is responsible for processing IoT big data and analyzing the device information to extract insights from the massive data set. Following are some Big Data platforms for real-time data processing.
a) Apache Spark is the number one open-source analytics engine and distributed cluster-computing framework for big data processing, with built-in modules for streaming, machine learning, SQL, and graph processing.
b) Apache Storm is a free and open source distributed real-time computation system which reliably process unbounded streams of data. It does the same for real-time processing what Hadoop does for batch processing. Use cases: Real-time analytics, continuous computation, online machine learning, distributed RPC, ETL, etc.
c) S4 from Yahoo is open source framework for processing continuous, unbounded streams of data. Good for massively distributed computations on constantly changing data.
d) Amazon Kinesis collects, process, and analyze real-time, streaming data in a cost-effective way, so you can get timely insights and react quickly to new input. Can ingest real-time data such as audio, video, website clickstreams, application logs, and IoT telemetry data for machine learning, analytics, and other applications. It works in all the three layers.
e) Microsoft StreamInsight is a powerful platform for building and deploying a robust and highly efficient complex event processing (CEP) applications quickly.
f) Apache Flink is a framework and distributed processing engine for stateful computations over Data Streams
g) Pulsar is new in the market, backed by eBay
3. Data storage and visualization:
Processed data can be stored in NoSQL for fast visualization. There are several NoSQL databases are available like MongoDB, Cassandra, DynamoDB, Azure DocumentDB, Cloud Bigtable, Cloud Filestore etc. For visualization tools like Tableau, Microsoft Power BI, or Amazon QuickSight can be used.
Apache Kafka Example:
This example shows how a message can be transferred in a real-time manner. It can easily be hooked with IoT data source.
Reference:
https://dzone.com/articles/pushing-iot-data-gathering-analysis-and-response-to-the-edge
https://solace.com/blog/use-cases/iot-needs-messaging
https://www.sam-solutions.com/blog/top-5-iot-platforms-2017/
https://internetofthingswiki.com/top-20-iot-platforms/634/
https://aws.amazon.com/kinesis/
https://arxiv.org/pdf/1705.05988.pdf
https://www.sciencedirect.com/science/article/pii/S1877050917316903
Index — Edge Computing, IoT Architectures, Fog Computing, Stream Processing, Big data, Internet of Things, IoT IoT Architecture at scale, Kafka, IoT collectors, IoT Data Collection, IoT Data Aggregation.
Manoj Kumar
Solutions Architect