Cassandra is an alternative to MySQL, Oracle or other database manager for very large amounts of queries. It tends to be used by more and more different players, but this is a new product which has not been confronted with all the variety of uses that can be experienced. We can expect setbacks in adapting it to a new application.
It is suitable for fully distributed and highly scalable databases, so for the wide traffic of sites such as Facebook (which developed it) with millions of queries per hour.
The distributed model is used to store information on many different servers managed by a central system.
Written in Java, it can be more easily integrated into a server environment in this language.
Cassandra is based on the non-relational data model BigTable created by Google and used by the index of its search engine, running on Dynamo, the storage system model from Amazon.
It has been open sourced by Facebook in 2008 and then supported by the Apache Foundation.
The word Cassandra comes from the Greek mythology, it is the name of a princess who has the power to predict the future but whose fate is never to be believed. The logo of a female gaze refers to the idea of vision. It is assumed that the developer expected not to be believed about the future of this system.
SQL or not SQL?
Cassandra is part of NoSQL the movement that wants to simplify the databases by removing the relational aspect.
Tables are no longer a predetermined fixed schema (that we can actually change later), and can change horizontally (for the columns) as well as vertically (for the lines, so the records).
NoSQL actually means Not Only SQL, so it is not about the query language, which is always SQL.
Cassandra vs. MySQL
Cassandra is shemaless and has no table. The number of columns can vary from one row to another. MariaDB (alternative to MySQL) has implemented a system that allows dynamic columns to do the same thing. But we leave the field of SQL commands.
Here is a benchmark comparison provided by Apache:
- Writing: MySQL: 300 ms. Cassandra: 0.12 ms.
- Reading: MySQL: 350ms. Cassandra: 15 ms.
Differences in features:
- Number of columns: 4096. Cassandra: 2 billion.
Cassandra is less reliable than MySQL and a weaker community and therefore a smaller support. There are fewer tools to help make it work, such as graphical interfaces, or managers such as phpMyAdmin.
Facebook is the origin of Cassandra, even though the project was then integrated to Apache. But it was replaced by Hadoop.
See Software powering Facebook.
Twitter does not so far use Cassandra to manage tweets, because it would have to rewrite the system but it is used for statistical data and geolocation.
Complaining of slowness in MySQL, Digg has decided to completely reimplement the management of data to Cassandra.
See Why Digg replaced MySQL.
Netflix, The streaming TV company prefers to forego the benefits of relational database for the scalability of Cassandra.