Real Time Business Platforms

Scaling Business Platform Performance with Spark, Mesos, Akka, Cassandra, Kafka, Kubernetes – Part 1/6

Spark,Mesos, Akka, Cassandra, Kafka, Kubernetes? If you don’t already know what these mean and you have no goal or objective to make software that works at a global level, then you don’t need to be reading this article at all. Seriously, it’ll be a waste of your time. These technologies, now open sourced, originated from the extremely high-end university research laboratories of the University of Berkeley and the halls of high-tech companies such as Google, Twitter, LinkedIn, and Facebook. They were built for different purposes for their creators but now being available to the public, they have been flourishing on their own in the wild ether of the Internet. Why would any CIO, CTO, CMO, or a CEO consider these technologies?

 

In business platforms, the design of the platform information & systems depends on the needs of the people & processes. If a business platform only needs to service 1,000 users per day, it can be designed differently than one that is going to serve 1,000,000 per minute or even 1,000,000,000 users per second.  Google, Facebook, and LinkedIn didn’t start out serving billions of people per second, but they eventually had to. What did they do to build their systems to such scale? This article introduces ideas and questions that a business platform leader has to ask before they can even begin to think about scaling their platform.

 

 

Where does my business platform need to scale? Does it need to scale at all?

Business platforms are collections of systems and information processes (which facilitate business processes owned by users). Usually, they involve different layers in different systems. Each of these layers has questions that need to be answered when looking at improving a system to go from zero to sixty. There are more but we’re sticking to these eight layers for simplicity.

  1. User Interface – Is the user interface performant in the browser? Does it load fast?
  2. Interface Framework – Is the user interface framework robust enough for high speed or real-time processes?
  3. Software Architecture – Are there bottlenecks in the software logic? How can we measure where we’re slow?
  4. Software Framework – Are we using the right framework for what is needed?
  5. Data Model – Is the data model correct? Are we trying to cram an elephant into a shoebox?
  6. Data Technology – Is the data technology correct for what we are trying to do?
  7. System Infrastructure – Is our hardware going to be able to handle it?
  8. Network Infrastructure – Is our network good enough for what we want to do?

 

 

What are the next potential choices we can make in our business platform? What can we actually do?

Depending on the state of the current system, your team may only have some immediate next choices. Not all applications

  1. User Interface –  Static HTML or Compiled JavaScript?
  2. Interface Framework – Should we use AngularJS, Angular, React, ReactNative, or Vue?
  3. Software Architecture – Should we use MVC, MVP, MVVM or Redux?
  4. Software Framework – Should we use .NET, Spring/Java, Play/Akka, Laravel, Symfony, Django, Flask, Ruby on Rails, Express, or Meteor, Spark, Flink, Hadoop, or Purple Almonds*?
  5. Data Model – Should we store data as JSON, XML, as SQL Rows, or as flat files?
  6. Data Technology – Should we use PostgreSQL, MySQL, Cassandra / Datastax, Kafka / Confluent, Mongo, Elasticsearch, Solr, or DSE Graph
  7. System Infrastructure – Do we need bare metal, virtual machines, containers, or go serverless?
  8. Network Infrastructure – Are we going to host this in our closet, at a $3.95/month hosting company, in our own datacenter in a private cloud, or on AWS, Azure, or Google Cloud?

 

 

Where can I make my system faster? Let’s measure how fast we’re going first. 

If the business platform was built in a layered and modularized architecture, then each of the layers can be monitored and measured. Why not just replace the components? If you don’t know how things are doing, how can you know if there are any improvements? These tools also log errors and problems that you may not have any visibility into.

  1. User Interface –  You can use tools like Google Analytics or New Relic to learn how your user interface layer is performing. Free tools like Chrome Developer Tools or Web Page Test can show you a lot as well.
  2. Interface Framework – Some Javascript frameworks can be optimized for production in a variety of ways in the way modules are loaded or by minifying the javascript files. Webpack is great.
  3. Software Architecture – Some tools can tell you if your “Architecture” is doing well. These measure different components and their speed. Both New Relic and App Dynamics have self-discovering systems that can detect how the different components of your platform are wired together and monitor their speeds. You can also roll out your own Log aggregation/metrics system across all your components with tools like Grafana or Kibana backed with Prometheus or Elasticsearch.
  4. Software Framework – Most of the tools mentioned above are applicable for software framework measurement. They are the same. but some integrate better than others. For example, if the tool like New Relic is actually monitoring PHP processes and showing what methods are running slow vs. fast, it’s better than you home growing your own logging system.
  5. Data Model – This is a little harder to measure but there are data profiling tools. We made our own to visualize skew on distributed/partitioned/replicated databases like Cassandra. Some cloud providers can show data sizes and data growth.
  6. Data Technology – Data or Data Base technology all have their own tools generally speaking, but Prometheus / Graphite / Grafana is a good combination. And yes New Relic can help here. Cloudwatch on Amazon is good too, but you have to use Amazon Web Services.
  7. System Infrastructure – Same usual suspects of CloudWatch for AWS infrastructure, New Relic for everything, or Nagios which has been around for ages.
  8. Network Infrastructure – Same as System infrastructure monitoring tools.

 

 

I want to make everything fast, so what do I do? What can SMACKK do for me? 

LAMP vs. SMACK Business Platforms
LAMP vs. SMACK: Image Courtesy of Mesosphere

 

 

Subsequent articles will show how these technologies I mentioned (Spark, Mesos, Akka, Cassandra, Kafka, and Kubernetes) can be used to speed up different parts of the platform. Here’s a preview of what’s to come.

  1. User Interface – How does real-time data platform technology shape how the User Interface is designed?
  2. Interface Framework – How can Kafka or Akka help me create real-time interactive user interfaces?
  3. Software Architecture – What kind of framework choices are needed to do scalable real-time user platforms? How can Spark or Akka help here?
  4. Software Framework – Can I use Node or Python, or do I have to use JVM (Spark, Akka)/ CLR (Akka.NET) based systems to do real-time?
  5. Data Model – How do I design data for a truly distributed and real-time data platform? What does my data need to look like in order for it to work across Spark, Akka, Cassandra, & Kafka?
  6. Data Technology – How are Kafka and Cassandra commonly used in real-time business platforms?
  7. System Infrastructure –  Can I coordinate all this Kubernetes or Mesos? Why Kubernetes? Why Mesos? Both?
  8. Network Infrastructure – Does Kubernetes take care of everything including the network?

 

 

In this article, we examined eight layers of real-time business platform systems, learned about some of the choices we can make at each layer, understood how to measure each layer’s performance, and then finally started to ask how these new technologies could help us at each layer. Building real-time business platforms is not a trivial task. Being a real-time company is hard and only two types of companies seem to be able to grasp it easily: 1. Startups that are free to make choices with all of their layers. 2. Large companies that are willing to sacrifice their sacred cows (old systems) and look to new technologies without looking back. Which one are you? You can email us and ask any questions about this article or any of the following articles.

  1. Part 1/6: Scaling Business Platform Performance with Spark, MesosAkka, Cassandra, Kafka, Kubernetes
  2. Part 2/6: Real-time Data Processing & Distributed Computation with Spark in Business Platforms
  3. Part 3/6: Reactive Business Platform Applications & Services with Akka
  4. Part 4/6: Resilient & Scalable Business Platform Database with Cassandra
  5. Part 5/6: Real-time Data Pipeline & Streaming Platform with Kafka
  6. Part 6/6: Scalable Business Platform Development & Data Operations with Kubernetes & Mesos

 

 

Photo by Dan Freeman on Unsplash