Thus far we’ve discussed how Cassandra, Spark, Kafka, Docker, and Kubernetes can be useful to build a global data platform. These components are powerful in their own right and managing them is a little simpler if we decide to use commercial components from DataStax and Confluent.
There are other tools and services we can use to further accelerate our timeline to deliver a world-class global data and analytics platform. Although bringing up a distributed data (Cassandra), distributed computing (Spark), and distributed communication (Kafka) is a great start for a framework, it still needs a few more components to make it a “Platform” which allows quick creation and delivery of services that an enterprise can use.
In our experience, the raw components of Cassandra, Spark, and Kafka are great but require more development work to make it useable by the average developer. This is why technical leadership on such an endeavor should think about leveraging existing applications that help developers leverage the technology without needing to be experts on the technology.
Example Global Data & Analytics Platform
MediawikiRESTBase – An open source REST API caching layer which uses Cassandra and exposes a storage API similar to Amazon DynamoDB and Google DataStore, made and used by Wikipedia.
- Apache UserGrid – An open source Backend as a Service that leveraged Cassandra that manages users, files, data used by companies like Korea Telecom and Apigee
- Confluent Kafka REST Proxy – REST API layer to
produce ,consume messages from a Kafka cluster.
- SparkStreaming – Spark streaming’s structured streaming allows for continuous data processing to and from Kafka
- Spark – For Data that needs to be churned en masse, Spark can handle any data size.
- Akka – Akka is a true microservices framework that consumes messages, processes them and sends messages.
- Akka.NET – a C# variant of Akka
- Kafka Connect
- Kafka Streams
There are many reasons why companies would want to have their own complete platform managed on their infrastructure versus using something that a company like AWS (Dynamo, Kinesis, EMR), Google(Spanner, Google PubSub, Dataflow), PubNub, etc happens to provide. These platform as
In a company that has specific needs, this won’t fly.
In the next and final part of this series we will discuss monitoring and scaling a distributed business data & communications platform. If you want me or our company, to come and talk to your company about your Global Data & Analytics Platform, feel free to email me or my team at Anant.
- Part 1/5: Foundation of a Business Data, Computing, and Communication Framework
- Part 2/5: Foundation for Properly Managing a Business Data & Communications Framework
- Part 3/5: Deploy Frameworks that Scale on any Cloud (Containers, Azure, AWS, VMs, Baremetal)
- Part 4/5: Building a Developer-Friendly Platform on top of a World-Class Framework
- Part 5/5: Monitoring & Scaling a Distributed Business Data & Communications Platform