This post addresses Java developers who want to get their feet wet with Cassandra. This is the first post in a series of three in which I describe Cassandras data model as seen from the angle of a typical Java developer. By contributing a javaish view on the data model, I try to extend the set of existing data model descriptions.
The second post in this series will briefly describe how to install and configure Cassandra. The third post will provide several hands-on examples for Cassandra with Java.
I have been toying around with Cassandra for quite some time now. From all NOSQL databases I have seen (and there are quite a few already as Michael pointed out to me earlier), Cassandra seems to be the most promising one to me for reasons that are definitely worth discussing, but are here be beyond the scope of this post.
Data Model
Cassandras data model has been described more than once. In contrast to the descriptions above, I will try to follow a more javaish view which I find easiest and most powerful to work with. I thereby start describing Cassandras data model as nested hash maps.
The way in which data get’s stored in key/value based databases like Cassandra strongly resembles the use of ordinary hash maps. To recall, hash maps store data for a (unique) key. The key is also later used to retrieve the data from the hash map. For example, in order to map string keys to byte arrays you would write in Java
Map<String, byte[]> map = new HashMap<String, byte[]>();
This principle stays the same with Cassandra. However, in Cassandra you do not have a single hash map but up to three layers of nested hash maps! What does the mean? Imagine you dont store your values in a single byte array for each key, but again in a hash map, like
Map<String, Map<byte[], byte[]>> map = new HashMap<String, Map<byte[], byte[]>>();
This way you would partition the data you want to store as key/value pairs that are first filled in the data hash map. The data hash map then gets inserted in the higher-order hash map for a given key string. Similarly, to retrieve a value, you would provide the key string and get the data hash map from which you would extract the value you are interested in.
Let us further assume that we dont want to store the key/value pairs as two individual values, but coupled in a class called “Column” so that our data model would look like this:
Map<String, Map<byte[], Column>> map = new HashMap<String, Map<byte[], Column>>();
Where Column is defined as:
class Column { byte[] name; byte[] value; long timestamp; public Column(byte[] name, byte[] value) { this.name = name; this.value = value; this.timestamp = System.currentMillis(); } }
This is already pretty close to what is called a Column Family in Cassandra. You need to restrain yourself from deriving something from the name “Column”. Also ignore timestamp which is used by Cassandra to avoid data inconsistency and which shall not bother us here.
Before we go on, let us have a look on a concrete example on how you would need to work with this kind of data structure. Let us assume we want to store the profile data of a single user for some imaginary social networking website.
/* data model to store user profiles */ Map<String, Map<byte[], Column>> user = new HashMap<String, Map<byte[], Column>>(); /* create a user 'schabby' */ user.put("schabby", new HashMap<byte[], Column>()); /* fill in some profile data for user 'schabby' */ Column age = new Column("age".toBytes(), new byte[]{ 27b }); user.get("schabby").put(age.name, age); Column realName = new Column("real name".toBytes(), "Johannes Schaback".toBytes()); user.get("schabby").put(realName.name, realName); Column nationality = new Column("nationality".toBytes(), "German".toBytes()); user.get("schabby").put(nationality.name, nationality);
Again, do not get confused by the use of the byte arrays where normal string would make more sense. This is to resemble the Cassandra data model as close as possible. You will later realize that it’s actually quite nifty to keep the inner hash map byte based for the price of manually converting everything to byte arrays.
If we want to retrieve values from our data structure, we would need to do as follows:
byte age = user.get("schabby").get("age".toBytes()).value[0]; String realName = new String(user.get("schabby").get("real name".toBytes()).value); String nationality= new String(user.get("schabby").get("nationality".toBytes()).value);
And this is it. There is not much more conceptual stuff to understand in order to use Cassandra. So we are now ready to project this structure to Cassanda terminology.
Column Family
Cassandra structures its data model in keyspaces, Column Families (CF), Columns and SuperColumns.
A keyspace is a namespace to group Column Families and can be compared to a schema or single database in the SQL world. A keyspace contains one or more Column Families.
A Column Family can be seen as a multidimensional hash map like the one in our example above. In the SQL analogy, you may see a Column Family as a single table that belongs to a schema, however this comparison will not take you far. It is really more a dynamically growing and shrinking hash map rather than a table with fixed columns. Still, in Cassandras terminology you speak of rows when you refer to the hash map that you get for a key string.
Rows are accessed by string keys and each row – which can be seen as a “data hash map” – has several columns. Each column within a row is a bundled pair of a byte array key (a.k.a name) and its byte array data field (a.k.a. value) very similar to our example.
Depending on your configuration, you can let Cassandra apply a sorting scheme to impose an order over your columns in a row. This enables to query ranges over columns. For example, imagine a telephone book from which you want to retrieve all names starting with “Smi”. In Java terms, this could be compared to using SortedMap instead of Map. But we sticked to Map for simplicity here.
SuperColumns
The cool thing about Cassandra is its support for an additional hash map layer. This additional layer is added to the Column layer and enables you to store and access your data as a hash map in a hash map in a hash map, or in other words, as a three dimensional hash map. This additional hash map is called a SuperColumn (SC)
In our Java-like example, a Column Family with SuperColumns look like
Map<String, Map<byte[], SuperColumn>> superColumn = new HashMap<String, Map<byte[], SuperColumn>>();
where SuperColumn is again a hash map over columns like
class SuperColumn extends HashMap<byte[], Column> { }
Again, I want to point out that the actual SuperColumn definition in Cassandra is different and that this explanatory definition is not too accurate, but nicely serves the illustration purpose.
Similar to normal Columns, the values within a SuperColumn are also stored in an order depending on your configuration, enabling to cut out slices from your SuperColumns.
To continue our social networking site example, let us have a look on how SuperColumns are used to store the friend and relations of the user ‘schabby’.
/* create ColumnFamily with SuperColumns */ Map<String, Map<byte[], SuperColumn>> columnFamily = new HashMap<String, Map<byte[], SuperColumn>>(); /* prepare a SuperColumn for 'schabby' */ columnFamily.put("schabby", new HashMap<byte[], SuperColumn>()); /* create SC to store friend info */ SuperColumn friends = new SuperColumn(); /* fill in some friends */ Column friend1 = new Column("friend_1".toBytes(), "Merry".toBytes()); friends.put(friend1.name, friend1); Column friend2 = new Column("friend_2".toBytes(), "Robert".toBytes()); friends.put(friend2.name, friend2); Column friend3 = new Column("friend_3".toBytes(), "Susan".toBytes()); friends.put(friend3.name, friend3); /* finally store SC in Colunm Family */ columnFamily.get("schabby").put("friends".toBytes(), friends);
We are free to create another SuperColumn in the same Column Family to store other list-like data for ‘schabby’, for example his inbox.
/* ... continued example */ SuperColumn inbox = new SuperColumn(); /* add two mails to inbox */ Column mail1 = new Column("Hi Schabby".toBytes(), "I hope you are well! Cheers, Nick".toBytes()); inbox.put(mail1.name, mail1); Column mail2 = new Column("Welcome".toBytes(), "some message body".toBytes()); inbox.put(mail2.name, mail2); columnFamily.get("schabby").put("inbox".toBytes(), inbox);
Retrieving the mails from the inbox is straight forward:
/* continued example */ SuperColumn inbox = columnFamily.get("schabby").get("inbox".toBytes()); for(byte[] subject: inbox.keySet()) { String body = inbox.get(subject); // do something with subject/body }
And this is it. I hope this enlightened your understanding of Cassandras data model. It’s not that difficult all in all, especially when you start using it.
Please leave some comments for corrections and feedback.
Tags: Cassandra

Great post! Looking forward to part 2!
The one thing I would add is that the columns and supercolumns are all sorted by name — in java terms, SortedMaps — so you can also ask for “slices” of columns as well as accessing by name. This allows treating them as lists, as well as dictionaries.
Hi Jonathan, oh yes, thanks! I will add that!
is there a way in cassandra to grab all keys and iterate over them to get their individual values. similar to
a. select * that we do in sqls or
b. users.keySet();
Very very nice post .. nice addition to wtf post. Clarified things for me
Really great post!
I am diligently following your posts and Cassandra overall.
I have left a comment for you on post 2. Please reply back.
Also, looking forward for your post 3.
Thanks
Hi! Sorry for answering so late. The notification mails ended up in my spam folter. Sorry!
As for your question: With the current state of development, you can only do range queries if you use an order-preserving partitionier (not random partitioner which is the default). If thats the case you can check out the thrift method get_key_range.
Otherwise you need to keep track of your keys yourself, for example in a meta-CF. However, I hope that this feature will be implemented soon.
Johannes
Hi I am stucked how to insert the reocrds into a keySpace in cassandra can u compare with mysql like KeySpace as schema in mysql like that.
I am from SQL background and unable to understand this cassandra COlumn Family
How can i insert the data into column family like below
UserList = {
John: {
username: “john”,
email: “john@blah.com”,
},
Smith: {
username: “ieure”,
email: “ieure@example.com”,
age: “66″,
}
}
How can i do this ?
Excellent article… explained very lay(java)man terms. appreciate it!
Thanks for a wonderful post, l ve been looking for such information, I will join jour rss feed now.
Great Article.
Great post for a Java user starting with Cassandra, like me
Thanks!
Thanks for great post, wait more detail in chapter 2.
I have long time search java client for cassandra…
I am beginner to Cassandra , please help me understand the following.
In you article you had mentioned
Map<String, Map>
where :
String – is the Key
Map – is the collection of columns
So a super column should be
Map<String,Map<superbyte[],<Map>>>
where:
String – is the key
superbyte[] – is the super column name
Map – Is the collection of columns under supercolumn
Please correct me if i am worng.
If my understanding is right the SuperColumn Class in you article should be
class SuperColumn
{
byte[] SuperColumnName;
Map Columns;
SuperColumn(byte[] p_SuperName,Map p_Columns)
{
this.SuperColumnName = p_SuperName;
this.Columns = p_Columns;
}
}
Please advice….
Very nice post. It is what I missed in order to see if Cassandra is what I am looking for. Straightforward, simple put, without noise, and on the developer’s language. Thank you
The best article ever! Do you know when will you come up with the third post providing several hands-on examples for Cassandra with Java? I am really looking forward to the third one as I may have to use it in my project soon.
Hello!
Nice posting! I run into problems with the .toBytes() method that seems not to exist under Java 1.6. *-/ But using the .getBytes() method goes also. But making the content as visible as a String value was something that I couldn’t manage.
I tried a hash map tutorial, but couldn’t get around my problems. At the end, I replaced all byte[] objects to String and removed the getBytes() calls. Now, I could see the contents of the hash map.
Hey Bud,
Very Impressive Comp Science Jeek Blog which brings clarity.
Interested in writing Client/Query Lang for Cassandra ?
Another Comp Science Pal
Hi there,
thanks for your kind comment! As for your inquiry I am afraid I have to pass, although I definitely would love to contribute to a decent query lang. Something close to JSON would make most sense probably. I am too much involved in my job atm so that I would not have the time to spend enough time on the matter like it deserves. But thanks for considering me though!
Cheers
Johannes
Thanks for this great post, buddy!
Thnx, its yet best explanation of cassandra data model I found
Nice post.. Was very helpful.
Thanks a ton for sharing such a lovely article. So easy to read and understand. Looking forward to the next one… Thanks once again!
Excellent post. Very helpful.