AWS Serverless Data Processing With Kinesis
Data processing With AWS Kinesis
AWS Kinesis is a streaming service that allows you to process a large amount of data in real-time.
A stream is a transfer of data at a high rate of speed.
It allows you to react quickly to your important data.
For downstream processing, the stream also includes an asynchronous data buffer.
A data buffer is a temporary data storage inside the memory while data is being moved.
AWS Kinesis has three independent data processing services:
- Kinesis Data Streams
- Kinesis Data Firehose
- Kinesis Data Analytics
All of them are completely managed and serverless.
Data Processing With AWS Kinesis Video
W3schools.com collaborates with Amazon Web Services to deliver digital training content to our students.
Kinesis Data Streams
There are two types of services in AWS Kinesis:
- Producers
- Consumers
Producers contribute data records to the stream.
Consumers receive and process those data records.
Producers can be:
- Kinesis Producer Library (KPL)
- AWS SDK
- Third-party tools
Consumers can be:
- Applications created with Kinesis Client Library (KCL)
- AWS Lambda functions
- Other streams
Kinesis Data Streams Limits
Kinesis Data Stream has its limits.
It can write 1000 records per second.
It can write 1 MB per second.
It can read up to 10000 records per second.
It can read up to 2 MB per second.
Kinesis Data Streams Scaling
The Kinesis Data Streams service scales by adding data shards.
A data shard is a piece of a larger set of data.
Each shard contains a unique order of data records.
The Kinesis service assigns an order number to each data record.
Aggregation
You can utilize either shards or aggregation to increase the amount of records delivered per API call.
Aggregation is a process of storing multiple records in a Kinesis Data Stream records.
To use the data in the record, a user must de-aggregate it first.
You can use the Kinesis aggregation library to handle data aggregation and de-aggregation.
Kinesis Data Firehose
You don't need to manage shards or write consumer applications with Kinesis Data Firehouse.
Kinesis Data Firehouse automatically delivers the data to a specified destination.
It can also be configured to edit the data before sending it.
Kinesis Data Firehose is a strong choice or consuming massive amounts of data.
This is an example of Kinesis Data Firehouse works:
- The client connects to a Kinesis Data Firehose stream using an API Gateway function
- The data is loaded onto the Kinesis Data Firehose stream using API Gateway
- The raw data records are sent to Amazon S3 using Kinesis Data Firehose's interface.
- Amazon S3 calls a Lambda function, which modifies the data before storing it
- Data is written to DynamoDB
Kinesis Data Analytics
Before persisting the data, Kinesis Data Analytics allows you to do real-time SQL analysis.
Kinesis Data Analytics is designed for near real-time queries.
It also allows you to use a Lambda function to preprocess data before executing an SQL query.
You can change the data format, filter the data, or improve it.
Kinesis Data Analytics can output the data to both Kinesis Data Stream and Kinesis Data Firehose.
Related reads:
Kinesis Data Streams Developer GuideAmazon Kinesis Data Streams Terminology and Concepts
Kinesis Data Firehose Developer Guide
Amazon Kinesis Data Firehose Data Transformation
Amazon Kinesis Data Analytics for SQL Applications Developer Guide
Kinesis Data Analytics: Preprocessing Data Using a Lambda Function