Apache Cassandra Lunch #23: Lucene Based Indexes on Cassandra

In this blog, we will cover packaged and DIY methods for Lucene based indexes on Cassandra; as well as, give some pros and cons for using Lucene Based Indexes on Cassandra. Also, the live webinar recording of Apache Cassandra Lunch #23 is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register at this link now!

In Apache Cassandra Lunch #23, we cover Lucene based indexes on Cassandra. We also covered packaged and DIY methods; as well as, pros and cons. If you want to watch the live version of Apache Cassandra Lunch #23, which includes a more in-depth discussion, you can find it embedded below. Also, check out the rest of the Cassandra Lunches you may have missed on our YouTube page linked here. Don’t forget to subscribe while you’re there so you can keep up to date with all of the upcoming Cassandra Lunches; as well as, our other content!

Packaged

DIY

  • Event -> CQRS -> Cassandra + Index
    • Write
      • Event / Command goes into an Event Source repository (Kafka, SQL Table, etc. )
      • Command Processor processes it into CQL / Elasticsearch or SOLR or Amazon … Algolia
    • Request
      • Event / Command goes into an Event Source repository (Kafka, SQL Table, etc. )
      • Command Processor goes to index / finds the data, goes to Cassandra, gets the data, returns.
      • Query the index —
  • Cassandra -> Batch -> Index
    • Writes to Cassandra
    • Every now and then – Index to ???
    • Cassandra + Spark
  • Cassandra Triggers -> Index
  • Cassandra CDC -> Index
    • CDC -> Kafka Connect -> Lucene Index ( Elastic/Solr/etc.)
    • CDC -> Kafka -> Indexer / Kafka Consumer -> Lucene Index ( Elastic/Solr/etc.)
  • Serverless Function -> Cassandra + Index
  • Apache Nifi (Lucene) -> Nifi Processor -> Cassandra + Index

Pros

  • extremely rich search capabilities
  • geospatial
  • synonym
  • fuzzy logic search
  • typos
  • stemming
  • packaged elastic/Solr/Lucene -> shorter latency
  • separate index -> better separation of concerns and speed

Cons

  • pure Lucene -> reinventing the wheel of what Solr/ElasticSearch
  • external elastic/solr/ ?? -> longer latency between finding the data / getting the data
  • packaged elastic/Solr/Lucene -> don’t expect it to solve all your problems
  • consistency issues (if DIY)
  • Lucene is memory heavy
  • Lucene is disk heavy

As mentioned above, the live recording of Apache Cassandra Lunch #23: Lucene Based Indexes on Cassandra is embedded below. Also, check out our YouTube page for more videos and the Cassandra Lunch playlist here! If you want to attend Cassandra Lunch live, it is hosted weekly on Wednesdays at 12 PM EST. You can register at this link now!

Additional Resources

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!