This post addresses Java developers who want to get their feet wet with Cassandra. This is the first post in a series of three in which I describe Cassandras data model as seen from the angle of a typical Java developer. By contributing a javaish view on the data model, I try to extend the set of existing data model descriptions.

The second post in this series will briefly describe how to install and configure Cassandra. The third post will provide several hands-on examples for Cassandra with Java.

I have been toying around with Cassandra for quite some time now. From all NOSQL databases I have seen (and there are quite a few already as Michael pointed out to me earlier), Cassandra seems to be the most promising one to me for reasons that are definitely worth discussing, but are here be beyond the scope of this post.

Data Model

Cassandras data model has been described more than once. In contrast to the descriptions above, I will try to follow a more javaish view which I find easiest and most powerful to work with. I thereby start describing Cassandras data model as nested hash maps.

The way in which data get’s stored in key/value based databases like Cassandra strongly resembles the use of ordinary hash maps. To recall, hash maps store data for a (unique) key. The key is also later used to retrieve the data from the hash map. For example, in order to map string keys to byte arrays you would write in Java

Map map = new HashMap();

This principle stays the same with Cassandra. However, in Cassandra you do not have a single hash map but up to three layers of nested hash maps! What does the mean? Imagine you dont store your values in a single byte array for each key, but again in a hash map, like

Map> map = new HashMap>();

This way you would partition the data you want to store as key/value pairs that are first filled in the data hash map. The data hash map then gets inserted in the higher-order hash map for a given key string. Similarly, to retrieve a value, you would provide the key string and get the data hash map from which you would extract the value you are interested in.

Let us further assume that we dont want to store the key/value pairs as two individual values, but coupled in a class called “Column” so that our data model would look like this:

Map> map = new HashMap>();

Where Column is defined as:

class Column {
    byte[] name;
    byte[] value;
    long timestamp;

    public Column(byte[] name, byte[] value) { = name;
         this.value = value;
         this.timestamp = System.currentMillis();

This is already pretty close to what is called a Column Family in Cassandra. You need to restrain yourself from deriving something from the name “Column”. Also ignore timestamp which is used by Cassandra to avoid data inconsistency and which shall not bother us here.

Before we go on, let us have a look on a concrete example on how you would need to work with this kind of data structure. Let us assume we want to store the profile data of a single user for some imaginary social networking website.

/* data model to store user profiles */
Map> user = new HashMap>();

/* create a user 'schabby' */
user.put("schabby", new HashMap());

/* fill in some profile data for user 'schabby' */
Column age = new Column("age".toBytes(), new byte[]{ 27b });
user.get("schabby").put(, age);

Column realName = new Column("real name".toBytes(), "Johannes Schaback".toBytes());
user.get("schabby").put(, realName);

Column nationality = new Column("nationality".toBytes(), "German".toBytes());
user.get("schabby").put(, nationality);

Again, do not get confused by the use of the byte arrays where normal string would make more sense. This is to resemble the Cassandra data model as close as possible. You will later realize that it’s actually quite nifty to keep the inner hash map byte based for the price of manually converting everything to byte arrays.

If we want to retrieve values from our data structure, we would need to do as follows:

byte age = user.get("schabby").get("age".toBytes()).value[0];
String realName  = new String(user.get("schabby").get("real name".toBytes()).value);
String nationality= new String(user.get("schabby").get("nationality".toBytes()).value);

And this is it. There is not much more conceptual stuff to understand in order to use Cassandra. So we are now ready to project this structure to Cassanda terminology.

Column Family

Cassandra structures its data model in keyspaces, Column Families (CF), Columns and SuperColumns.

A keyspace is a namespace to group Column Families and can be compared to a schema or single database in the SQL world. A keyspace contains one or more Column Families.

A Column Family can be seen as a multidimensional hash map like the one in our example above. In the SQL analogy, you may see a Column Family as a single table that belongs to a schema, however this comparison will not take you far. It is really more a dynamically growing and shrinking hash map rather than a table with fixed columns. Still, in Cassandras terminology you speak of rows when you refer to the hash map that you get for a key string.

Rows are accessed by string keys and each row – which can be seen as a “data hash map” – has several columns. Each column within a row is a bundled pair of a byte array key (a.k.a name) and its byte array data field (a.k.a. value) very similar to our example.

Depending on your configuration, you can let Cassandra apply a sorting scheme to impose an order over your columns in a row. This enables to query ranges over columns. For example, imagine a telephone book from which you want to retrieve all names starting with “Smi”. In Java terms, this could be compared to using SortedMap instead of Map. But we sticked to Map for simplicity here.


The cool thing about Cassandra is its support for an additional hash map layer. This additional layer is added to the Column layer and enables you to store and access your data as a hash map in a hash map in a hash map, or in other words, as a three dimensional hash map. This additional hash map is called a SuperColumn (SC)

In our Java-like example, a Column Family with SuperColumns look like

Map> superColumn 
     = new HashMap>();

where SuperColumn is again a hash map over columns like

class SuperColumn extends HashMap

Again, I want to point out that the actual SuperColumn definition in Cassandra is different and that this explanatory definition is not too accurate, but nicely serves the illustration purpose.

Similar to normal Columns, the values within a SuperColumn are also stored in an order depending on your configuration, enabling to cut out slices from your SuperColumns.

To continue our social networking site example, let us have a look on how SuperColumns are used to store the friend and relations of the user ‘schabby’.

/* create ColumnFamily with SuperColumns */
Map> columnFamily = new HashMap>();

/* prepare a SuperColumn for 'schabby' */
columnFamily.put("schabby", new HashMap());

/* create SC to store friend info */
SuperColumn friends = new SuperColumn();

/* fill in some friends */
Column friend1 = new Column("friend_1".toBytes(), "Merry".toBytes());
friends.put(, friend1);

Column friend2 = new Column("friend_2".toBytes(), "Robert".toBytes());
friends.put(, friend2);

Column friend3 = new Column("friend_3".toBytes(), "Susan".toBytes());
friends.put(, friend3);

/* finally store SC in Colunm Family */
columnFamily.get("schabby").put("friends".toBytes(), friends);

We are free to create another SuperColumn in the same Column Family to store other list-like data for ‘schabby’, for example his inbox.

/* ... continued example */

SuperColumn inbox = new SuperColumn();

/* add two mails to inbox */
Column mail1 = new Column("Hi Schabby".toBytes(), "I hope you are well! Cheers, Nick".toBytes());
inbox.put(, mail1);

Column mail2 = new Column("Welcome".toBytes(), "some message body".toBytes());
inbox.put(, mail2);

columnFamily.get("schabby").put("inbox".toBytes(), inbox);

Retrieving the mails from the inbox is straight forward:

/* continued example */

SuperColumn inbox = columnFamily.get("schabby").get("inbox".toBytes());

for(byte[] subject: inbox.keySet())
   String body = inbox.get(subject);
   // do something with subject/body

And this is it. I hope this enlightened your understanding of Cassandras data model. It’s not that difficult all in all, especially when you start using it.

Please leave some comments for corrections and feedback.

Categories: Java


Michael · November 7, 2009 at 1:56 pm

Great post! Looking forward to part 2!

Jonathan Ellis · November 8, 2009 at 3:43 pm

The one thing I would add is that the columns and supercolumns are all sorted by name — in java terms, SortedMaps — so you can also ask for “slices” of columns as well as accessing by name. This allows treating them as lists, as well as dictionaries.

    schabby · November 9, 2009 at 11:14 am

    Hi Jonathan, oh yes, thanks! I will add that!

Dravid · January 10, 2010 at 1:20 am

is there a way in cassandra to grab all keys and iterate over them to get their individual values. similar to

a. select * that we do in sqls or
b. users.keySet();

Very very nice post .. nice addition to wtf post. Clarified things for me

    schabby · January 16, 2010 at 10:34 pm

    Hi! Sorry for answering so late. The notification mails ended up in my spam folter. Sorry!

    As for your question: With the current state of development, you can only do range queries if you use an order-preserving partitionier (not random partitioner which is the default). If thats the case you can check out the thrift method get_key_range.

    Otherwise you need to keep track of your keys yourself, for example in a meta-CF. However, I hope that this feature will be implemented soon.


Aatish · January 11, 2010 at 8:32 pm

Really great post!

I am diligently following your posts and Cassandra overall.
I have left a comment for you on post 2. Please reply back.

Also, looking forward for your post 3.


Mehar Chaitanya · January 28, 2010 at 10:59 am

Hi I am stucked how to insert the reocrds into a keySpace in cassandra can u compare with mysql like KeySpace as schema in mysql like that.

I am from SQL background and unable to understand this cassandra COlumn Family

How can i insert the data into column family like below

UserList = {
John: {
username: “john”,
email: “”,
Smith: {
username: “ieure”,
email: “”,
age: “66”,

How can i do this ?

ElangovanS · March 15, 2010 at 7:58 pm

Excellent article… explained very lay(java)man terms. appreciate it!

Mike · March 22, 2010 at 2:35 pm

Thanks for a wonderful post, l ve been looking for such information, I will join jour rss feed now.

Nick · April 10, 2010 at 3:37 pm

Great Article.

Sagar · May 23, 2010 at 2:42 pm

Great post for a Java user starting with Cassandra, like me 🙂 Thanks!

Jonathan · October 15, 2010 at 5:40 am

Thanks for great post, wait more detail in chapter 2.
I have long time search java client for cassandra…

Prakash · December 8, 2010 at 2:31 am

I am beginner to Cassandra , please help me understand the following.

In you article you had mentioned

Map<String, Map>
where :

String – is the Key
Map – is the collection of columns

So a super column should be



String – is the key
superbyte[] – is the super column name
Map – Is the collection of columns under supercolumn

Please correct me if i am worng.

If my understanding is right the SuperColumn Class in you article should be

class SuperColumn
byte[] SuperColumnName;
Map Columns;
SuperColumn(byte[] p_SuperName,Map p_Columns)

this.SuperColumnName = p_SuperName;
this.Columns = p_Columns;


Please advice….

nicolae caralicea · December 9, 2010 at 4:01 am

Very nice post. It is what I missed in order to see if Cassandra is what I am looking for. Straightforward, simple put, without noise, and on the developer’s language. Thank you

TS75 · January 12, 2011 at 1:11 am

The best article ever! Do you know when will you come up with the third post providing several hands-on examples for Cassandra with Java? I am really looking forward to the third one as I may have to use it in my project soon.

Ekrem SABAN · March 25, 2011 at 1:25 pm


Nice posting! I run into problems with the .toBytes() method that seems not to exist under Java 1.6. *-/ But using the .getBytes() method goes also. But making the content as visible as a String value was something that I couldn’t manage.

I tried a hash map tutorial, but couldn’t get around my problems. At the end, I replaced all byte[] objects to String and removed the getBytes() calls. Now, I could see the contents of the hash map. 🙂

Cass Bud · April 11, 2011 at 3:41 pm

Hey Bud,
Very Impressive Comp Science Jeek Blog which brings clarity.
Interested in writing Client/Query Lang for Cassandra ?

Another Comp Science Pal

    schabby · April 14, 2011 at 9:22 am

    Hi there,

    thanks for your kind comment! As for your inquiry I am afraid I have to pass, although I definitely would love to contribute to a decent query lang. Something close to JSON would make most sense probably. I am too much involved in my job atm so that I would not have the time to spend enough time on the matter like it deserves. But thanks for considering me though!



donneo · May 5, 2011 at 11:56 am

Thanks for this great post, buddy!

Agito · June 4, 2011 at 8:03 pm

Thnx, its yet best explanation of cassandra data model I found 🙂

Karthic · July 22, 2011 at 9:42 pm

Nice post.. Was very helpful.

Soumendu · August 12, 2011 at 7:26 am

Thanks a ton for sharing such a lovely article. So easy to read and understand. Looking forward to the next one… Thanks once again!

Sunil Sodah · August 27, 2011 at 12:05 am

Excellent post. Very helpful.

alonso · February 8, 2012 at 12:40 pm

very nice post. Thx!!!

kitty · May 4, 2012 at 11:35 am

Very helpful post. Thanks a lot 🙂

Suku · June 6, 2012 at 4:55 pm

Nice Post…

Looking forward for more such tips.. Thanks 🙂

Sergey · August 28, 2012 at 8:57 pm

great explanation for modeling OLAP storages in DFS.

Qiu Ping · September 20, 2012 at 5:48 pm

Your article is quite helpful for me on understanding the basic concepts of cassandra. thanks.

Tom · March 14, 2014 at 6:28 am

Very nice post! Exactly what I was looking for to get my feet wet with Cassandra. Thanks once again!

Ian · April 15, 2014 at 4:31 pm

Thanks, do you have an eta for parts 2 and 3!

hari · June 12, 2014 at 1:49 am

This helps a lot. Thanks.

Cassandra for service registry/discovery service @ Scalable web architectures · January 11, 2010 at 6:57 am

[…] Cassandra — Getting Started: Cassandra data model from a Java perspective […]

HashFold › Useful bookmarks on Cassandra · January 15, 2013 at 9:48 pm

[…] Data Model in Java: […]

Leave a Reply

Your email address will not be published. Required fields are marked *