Share This Post

Articles / News/PR

Streaming Data Services Comparison: Alibaba Cloud, AWS, Azure, Google Cloud, IBM Cloud

Streaming data

Streaming data is becoming the next wave in the data analytics and machine learning landscape. The key reason behind it is that processing only large volumes of data is not sufficient but the ability to process it in a short period of time and making real-time insights out of it is essential so that a business can react to the changing environment in real-time.

The trend of cloud computing requires the streaming data processing engines to be highly scalable and robust towards faults. Cloud-based data stream processing systems, in particular, are made to scale dynamically to hundreds of computing nodes and cope with diverse workloads automatically.

Understanding the importance of data streaming with the increasing variety of different use cases, organizations are adopting hybrid platforms so that they can leverage the advantages of both – batch and streaming data analytics.

To help enterprises in determining the best data streaming services, we have compiled a list of the most-feature-rich tools for you and your business.

Alibaba Cloud

DataHub by Alibaba is a real-time distribution platform designed to process streaming data in the cloud. The key features include its ability to publish, subscribe and distribute the streaming data. It also offers the ability to create applications and easily analyze them based on data streaming.

DataHub has high availability, low latency, high throughput, and high scalability. Further, it can emit streaming data to the cloud, like MaxCompute and OSS. The prices are based on the actual resources you have used.

See architecture of Alibaba big data demo system.

In the figure, the architecture comprises a data source system, a data warehouse, a big data platform, a web/app platform, process scheduling, data processing and a real-time data streaming platform. Here, real-time data is processed through DataHub + StreamCompute.

With this, varied data processing results are produced on real-time basis, involving real-time charts, statistics, and other information. Overall, Alibaba’s DataHub is great if you want to stream complex data.

ConceptsAlibaba Cloud DataHub
Data WarehouseMaxCompute
Data RetentionDefault – 24 hours
SDK SupportMaxCompute Tunnel SDK
ConfigurationWriter plug-in
Real-time StoreApsaraDB
CostPay-As-You-Go

Read reviews of Alibaba Cloud.

AWS

AWS Kinesis processes data in real-time. The key feature built-in Kinesis is its potential to process hundreds of terabytes of data streams in high volume per hour. It has the power to simplify the process of development of certain apps through real-time decision making on business operations with streaming data.

AWS Kinesis consists of key concepts for stream storage and an API to implement data producers and data consumers. The data producer sends the data as they are generated, and the data consumer retrieves the data in a stream as it is generated.

AWS charges are based on per hour basis of each stream work partition and per volume of data that flows through the stream.

See the diagram below summarizing key concepts of Amazon Kinesis.

AWS Data Stream

Source: AWS

When it comes to features, Amazon Kinesis supports Android, Java, Go and .NET. When it comes to performance, it writes each message synchronously to three different machines. However, it allows only days/shards for configuration.

ConceptsAWS Kinesis
Data WarehouseAthena, Redshift
Data RetentionDefault – 24 hours, 1-7 days (maximum 7 days)
SDK SupportAWS SDK supports Android, Java, Go, .NET
ConfigurationDays/Shards
Real-time StoreAmazon DynamoDB
CostPay and use

Read reviews of AWS Kinesis data streams.

Azure

Stream Analytics by Azure is a fully managed, event processing engine for real-time analytics, be it a data stream or multiple streams from sources such as social media, sensors, web data sources, and other applications. It delivers low latency, high throughput, and high scalability.

Stream Analytics is designed on a pull-based communication model that offers built-in recovery and checkpointing abilities. The service can also protect data from downstream failure. It supports input types: Stream and Reference data and source types: Azure Event Hubs and Azure Blob Storage.

The diagram summarizes how data is received, analyzed and sent for other actions in Stream Analytics.

Azure data stream

Source: Microsoft

The Event Hubs in Stream Analytics can integrate millions of events per second of various formats. Blob Storage can also store data and direct it to Stream Analytics for operations. Currently, Stream Analytics is charged on the basis of volume of data processed and the number of stream units used.

ConceptsAzure Stream Analytics
Data WarehouseAzure SQL
Data Retention-
SDK SupportManagement .Net SDK
Configuration-
Real-time StoreAzure CosmosDB
CostPay-As-You-Go

Read reviews of Azure Streaming Analytics.

Google Cloud

Cloud Dataflow is a managed, data processing service that uses data pipelines to ingest, transform and analyze both real-time and batch data. Based on Apache Beam, the service supports Python and Java jobs.

In Dataflow, the events pass through three steps: validation, enrichment, and ingestion. This service streams, processes and stores over 120,000 events per second with a very low latency. Every incoming event is validated and written in partitioned tables in BigQuery.

See the process of dataflow stream and batch processing below.

Dataflow stream

Source: Google

Google Cloud Dataflow is a great choice for organizations willing to do production-level data processing in the cloud. Users are charged in per-second increments which is based on the actual use of the service. Any other additional Google Cloud resource consumption is billed per that service.

ConceptsGoogle Dataflow
Data WarehouseBigQuery
Data Retention-
SDK SupportApache Beam SDK
Configuration-
Real-time StoreCloud Bigtable
CostBased on the actual use of Dataflow batch or streaming workers

Read reviews of Google Cloud Dataflow.

IBM Cloud

IBM Streaming Analytics can manage high data rates and perform analysis with low latency. It can be used to ingest, analyze and monitor data coming from real-time data sources. With IBM Streams, companies can view information and events as they unfold.

The image below summarizes IBM’s Streaming Analytics’ architecture.

Source: IBM

The architecture offers dynamic approach to resource allocation, i.e. organizations can define the maximum number of nodes required to use in their environment, and the service will scale up or down accordingly. This ensures that a company pays only for the resource it uses, while effortlessly monitoring, managing and making informed decisions.

ConceptsIBM Streaming Analytics
Data WarehouseIBM Db2 Warehouse
Data Retention-
SDK SupportEclipse SDK
Configuration-
Real-time StoreIBM Cloud Object Storage
CostBased on instance per hour

Read reviews of IBM Streaming Analytics.

The time is NOW!

The streaming data architecture is in a constant evolution phase. So, before running off to pick any of these solutions, it is important to get a deep understanding of the existing systems and get a clear picture of it. It would be best to note that all of them are great at what they do in their way.

The question however is which one is right for you. To answer this, you must go through the features of all of them and see which one suits best according to your use case and available resources.

Brief comparison: Alibaba Cloud vs AWS vs Azure vs Google Cloud vs IBM Cloud

ConceptsAlibaba CloudAWSAZUREGoogle CloudIBM Cloud
Data WarehouseMaxComputeAthena, RedshiftAzure SQLBigQueryIBM Db2 Warehouse
Data RetentionDefault – 24 hoursDefault – 24 hours, 1-7 days (maximum 7 days)---
SDK SupportMaxCompute Tunnel SDKAWS SDK supports Android, Java, Go, .NETManagement .Net SDKApache Beam SDKEclipse SDK
ConfigurationWriter plug-inDays/Shards ---
Real-time StoreApsaraDBAmazon DynamoDBAzure CosmosDBCloud BigtableIBM Cloud Object Storage
CostPay-As-You-GoPay and usePay-As-You-GoBased on the actual use of Dataflow batch or streaming workersBased on instance per hour

READ NEXT: IoT security comparison: Alibaba Cloud, AWS, Azure, Google Cloud, IBM Cloud

Share This Post

2 Comments

  1. Where did you get your info on the IBM Cloud offering? The info you provided is several years old. For more up to date info please see: https://www.ibm.com/cloud/streaming-analytics

    Reply
    • Hi Andy, thanks for the feedback. We have updated the blog and confirmed it with the IBM team. In case, we missed out on something here, please let us know and share the accurate link with us.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>