Functions in Cassandra cover slide

Apache Cassandra Lunch #59: Functions in Cassandra

In Apache Cassandra Lunch #59: Functions in Cassandra, we discussed the functions that can be used inside of the Cassandra database. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

Types of Functions in Cassandra

Functions in Cassandra transform one or more column values into a new value. We can split them into four categories. The first distinction that we can draw is Cassandra native functions vs user-defined functions. The second distinction is between functions and aggregations. For the first, Cassandra native functions are built into all versions of Cassandra while UDFs are created by the user. For the second, functions act on individual rows, taking in one or more columns to produce a single result column. Aggregations run a function on the individual rows of a user’s selection and then collect those results into a single value. There are Cassandra native aggregations as well as user-defined aggregations. Functions can be used in insert, update, and select functions.

Native Functions in Cassandra

Cassandra’s built-in functions include:

  • Blob conversion functions
  • UUID and Timeuuid functions
  • Token function
  • WRITETIME function
  • TTL function

Blob Conversion

In Cassandra, the blob type stores raw binary data as a string of hex characters. Cassandra has inbuilt functions for converting any of its Cassandra native types into blobs and back. These functions come in the form of typeAsBlob(value) and blobAsType(value) where the type is replaced with the Cassandra native type being converted to/from. Blob types are normally used to store small to medium binary data, like images and audio.

UUID, Timeuuid, and Time Conversion

Cassandra’s native time and uuid functions overlap some because of the timeuuid type, which is half timestamp type and half uuid. The uuid() function generates a random uuid. The dateOf(value) and unixTimestampOf(value) extract the date or timestamp (in milliseconds since the epoch) from a timeuuid and display it. The minTimeuuid(date) and maxTimeuuid(date) generate a timeuuid consistent with a given date. It makes sure that the date section of the generated timeuuid is consistent with the given date and randomizes the rest. The now() function turns the current date and time into a Cassandra timestamp. Cassandra also includes a number of functions for converting between dates and timestamps.

Token function

The token() function gives the token that Cassandra uses to sort rows among nodes. This function retrieves data regardless of the partitioner that Cassandra uses to sort the rows. The default partitioner sorts the rows in order according to its token, so token() isn’t very useful there. However the other available partitions use other metrics, so the token function can be more useful there.

Writetime

The writetime function returns the date and time that a user writes a particular cell to the database.

TTL

The ttl function returns the time to live for a particular row. Cassandra uses this value to expire, replace with a tombstone, and eventually delete rows. Once the ttl has passed, Cassandra replaces it with a tombstone. After a default time of 10 days, Cassandra deleted the row during the next compaction.

User Defined Functions in Cassandra

User-defined functions allow the execution of user-provided code in Cassandra. By default, Cassandra supports defining functions in Java and JavaScript. Support for other languages can be added by inserting proper jar files in the classpath. These languages work via existing Java interactions so Python and Ruby work via Jython and JRuby, while Scala is already related to Java. Details can be found here.

UDFs exist on the keyspace level as part of the Cassandra schema. So they replicate to each node that a keyspace exists on, alongside the keyspace and table definitions as well as things like user-defined types. It is possible to use user-defined types in functions, as well as collections and tuples. The CREATE FUNCTION statement works with OR REPLACE or with IF NOT EXISTS in order to determine whether or not to replace an existing function with that name.

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!