Cover image for Data Engineer's Lunch #26

Data Engineer’s Lunch #26: Akka Actors for Data Processing

In Data Engineer’s Lunch #26, we will discuss how to use Akka Actors for concurrent data processing operations. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

The Actor Model

The Actor Model was proposed by Carl Hewitt in 1973 as a way to “handle parallel processing in a high performance network”. At the time of the inception of this model, the described environment did not exist. However, as time has passed, current computers and processors have far exceeded anything Hewitt could have possibly imagined.

In Computer Science, the actor model is a mathematical model of concurrent computation that treats “actor” as the universal primitive of concurrent computation. In response to a message it receives, an actor can do a few things:

  • Make local decisions
  • Create more actors
  • Send more messages
  • Decide how to respond to future messages

One very important thing about actors is that they may change their own private state, but can only affect other actors (indirectly) through sending messages. The actor model can also be used as a framework for modeling and understanding a wide range of concurrent systems: I.E. Email can be modeled as an actor system with accounts as actors and email addresses as actor addresses.

Akka and the Actor Model

Akka is a free, open source toolkit and runtime for making concurrent and distributed applications on JVM. Akka’s approach to concurrency is based on the Actor model mentioned above. Akka is written in Scala but can be used with either Java or Scala (as both Java and Scala compile to JVM code). Besides the ability to easily create and interact between actors that Akka provides, there are also plenty of modules in Akka which are built to work alongside Akka’s Actor system:

  • Akka HTTP: A full server and client-side HTTP stack on top of Akka-actor and Akka-stream
  • Alpakka: Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka
  • Akka Streams: Library in Akka, “used to deal with processing a sequence of elements using bounded buffer space”
  • More example Akka modules and libraries can be found in the documentation:

Example problem that Akka can resolve

One large motivation for creating Akka and a problem it resolves in the OOP model is a problem with encapsulation

  • Objects can only guarantee encapsulation (protection of invariants) in the face of single-threaded access.
  • In multi threaded programs, multiple threads can be inside the same method at the same time.
    • This can cause corruption of state and nothing is guaranteed about what happens to the method/object that multiple threads are accessing and modifying at the same time.
  • One way around the above mentioned issue is locks. (Restricting access to all other threads if a thread is inside a particular object/method).
    • Without sufficient locks, state may still get corrupted
    • With a lot of locks in place, performance suffers significantly and this can also easily lead to deadlocks. (Set of locks active in such a way that the program cannot continue doing anything)

With Akka Actors, an actor reacts to messages sequentially and one at a time. Since there is always at most one message being processed per actor, the invariants of an actor can be kept without synchronization. This happens automatically without locks in Akka.

Akka’s Classic Actors vs Typed Actors

Prior to Akka’s current version, Akka’s actors used to be untyped (now referred to as Classic Actors).

  • “Classic” actors are untyped. Actors are defined which accept messages of type Any in Scala and then pattern matching is used to determine what messages that particular actor will actually process.
    • Wrong messages can be sent to actors with no explicit error thrown by Akka itself
  • A lot of existing examples of Akka programs are still written with classic actors

Akka’s current version of actors is referred to as Typed Actors. When making an actor, their behavior is typed which tells you what kinds of messages a particular actor can accept. Additionally, references to actors (known as ActorRef) are also typed. This feature helps a user send only particular types of messages to a given actor.

Although using Akka’s Typed actors is now recommended, software which currently uses classic Actors is still supported and the two types of actors can exist in the same software at the same time. This is referred to as “coexistence” in Akka documentation. More information on this can be found in the docs: https://doc.akka.io/docs/akka/current/typed/coexisting.html

Demo Project

Link to Github repo for the demo: https://github.com/Anant/example-akka-actors-for-data-processing
Technology used in the project:

  • Scala 2.13.1
  • Akka 2.6.14
  • Compiled with SBT 1.4.7

The following diagram shows how data and messages move throughout the demo project, how the three unique actors are designed and how they interact with each other.

Diagram describing the demo application, including all unique actors and messages sent between them

A brief description of each actor follows:

  • CounterMain (ActorSystem for the program)
    • On First initiation, create a CounterAggregator.
      • Message to receive: Line containing words to count (InitializeMessage case class)
    • Upon receiving a message, make a CounterBot. Send it a message
      • Message to send: Line containing words to count, and an ActorRef to the CounterAggregator to report results to. (LineMessage case class)
  • CounterBot (Actor which does the data processing: counting the number of words in a line of text in this example)
    • On first initiation, do nothing (get ready to receive messages)
      • Message to receive: Line containing words to count, along with a CounterAggregator to report to. (LineMessage case class)
    • Count the words in the received line.
      • Message to send: Message with results from counting to the CounterAggregator it was linked before. (CountMessage case class)
      • Then change behavior to Behaviors.stopped (voluntarily shut down the actor)
  • CounterAggregator (Singular actor which aggregates results from CounterBot’s)
    • On first initiation: Do nothing.
      • Message to receive: Message with results of the counting process of a line from the initial file (CountMessage case class)
    • Upon receiving message, add to the counter value (stored as a private class variable). Print out the counter value.
      • Message to send: N/A

References

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!