This is the second post on my little “Cassandra – Getting Started” series covering the installation and basic configuration of Cassandra. Cassandra is extremely easy to set up, especially compared to HBase. All you got to do is to download, extract, edit a single XML-file and run. But let us take it step by step.
You can download Cassandra directly from it’s (her?) website. At the time of the submission of this post, version 0.4.1 was the most recent stable. Note that you need Java 6 installed to run Cassandra which I assume here as properly installed.
After extracting Cassandra to some folder (on my Windows box I placed it directly in D:\cassandra), the only file you need to edit is conf/storage-conf.xml. While Cassandra is engineered to run on a large number of machines in a network, we start it here as a single node with the default parameter set, so that most of the settings are ok for now.
If your are not on a Unix-like system, you need to update the folders where Cassandra is supposed to store the data. If your using Windows (like me), then find the following lines in conf/storage-conf.xml and change the paths to something sensible
<CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory> <DataFileDirectories> <DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory> </DataFileDirectories> <CalloutLocation>/var/lib/cassandra/callouts</CalloutLocation> <BootstrapFileDirectory>/var/lib/cassandra/bootstrap</BootstrapFileDirectory> <StagingFileDirectory>/var/lib/cassandra/staging</StagingFileDirectory>
like for example my settings:
<CommitLogDirectory>D:/cassandra/data/commitlog</CommitLogDirectory> <DataFileDirectories> <DataFileDirectory>D:/cassandra/data/data</DataFileDirectory> </DataFileDirectories> <CalloutLocation>D:/cassandra/data/callouts</CalloutLocation> <BootstrapFileDirectory>D:/cassandra/data/bootstrap</BootstrapFileDirectory> <StagingFileDirectory>D:/cassandra/data/staging</StagingFileDirectory>
Let’s take Cassandra for a spin and check if she starts up correctly. For Mac OS, Linux, etc. users, simply change to the bin directory of Cassandra and run ./cassandra. As an aside for the impatient, I start Cassanda with sudo to avoid trouble with the Cassandras system.log.
Windows users, however, that use the command line (meaning not Cygwin) cannot start it just like that. The cassandra.bat didnt work for me on my Vista box if executed with bin being the current working directory (probably due to the CASSANDRA_HOME environment variable that get’s incorrectly set in the batch file). BUT it works perfect if you call bin\cassandra.bat from Cassandra’s main directory above bin. So if you are on Windows, change to the directory where you extracted Cassandra and execute bin\cassandra.bat.
Cassandras output on startup will look similar to this (here on Mac OS):
Schabbys-MacBook-Pro:bin johannes$ sudo ./cassandra
Schabbys-MacBook-Pro:bin johannes$ Listening for transport dt_socket at address: 8888
DEBUG - Loading settings from ./../conf/storage-conf.xml
DEBUG - Syncing log with a period of 1000
DEBUG - opening keyspace Keyspace1
DEBUG - adding Super1 as 0
DEBUG - adding Standard2 as 1
DEBUG - adding Standard1 as 2
DEBUG - adding StandardByUUID1 as 3
DEBUG - adding LocationInfo as 4
DEBUG - adding HintsColumnFamily as 5
DEBUG - opening keyspace system
DEBUG - INDEX LOAD TIME for /Users/johannes/cassandra/data/system/LocationInfo-1-Data.db: 0 ms.
DEBUG - INDEX LOAD TIME for /Users/johannes/cassandra/data/system/LocationInfo-2-Data.db: 0 ms.
DEBUG - INDEX LOAD TIME for /Users/johannes/cassandra/data/system/LocationInfo-3-Data.db: 0 ms.
INFO - Replaying /Users/johannes/cassandra/commitlog/CommitLog-1257980407451.log
DEBUG - Replaying /Users/johannes/cassandra/commitlog/CommitLog-1257980407451.log starting at 117
DEBUG - Reading mutation at 117
DEBUG - replaying mutation for system.L: {ColumnFamily(LocationInfo [Generation,])}
INFO - Flushing Memtable(LocationInfo)@228828460
DEBUG - Submitting LocationInfo for compaction
INFO - Completed flushing Memtable(LocationInfo)@228828460
INFO - Compacting [/Users/johannes/cassandra/data/system/LocationInfo-1-Data.db,/Users/johannes/cassandra/data/system/LocationInfo-2-Data.db,/Users/johannes/cassandra/data/system/LocationInfo-3-Data.db,/Users/johannes/cassandra/data/system/LocationInfo-4-Data.db]
DEBUG - index size for bloom filter calc for file : /Users/johannes/cassandra/data/system/LocationInfo-1-Data.db : 256
DEBUG - index size for bloom filter calc for file : /Users/johannes/cassandra/data/system/LocationInfo-2-Data.db : 512
DEBUG - index size for bloom filter calc for file : /Users/johannes/cassandra/data/system/LocationInfo-3-Data.db : 768
DEBUG - index size for bloom filter calc for file : /Users/johannes/cassandra/data/system/LocationInfo-4-Data.db : 1024
DEBUG - Expected bloom filter size : 1024
INFO - Compacted to /Users/johannes/cassandra/data/system/LocationInfo-5-Data.db. 0/255 bytes for 0/1 keys read/written. Time: 150ms.
DEBUG - collecting Generation:false:4@3
DEBUG - collecting Token:false:16@0
INFO - Saved Token found: 160533723849634883377008460059010504450
DEBUG - Starting to listen on 127.0.0.1:7001
DEBUG - Binding thrift service to localhost:9160
I think that’s it. Leave a comment if you run in trouble or check the nice If Something Goes Wrong page in the Cassandra Wiki.
Tags: Cassandra

C:\cassandra>bin\cassandra.bat
C:\cassandra>t@REM
‘t@REM’ is not recognized as an internal or external command,
operable program or batch file.
Drive already SUBSTed
Starting Cassandra Server
Listening for transport dt_socket at address: 8888
DEBUG – Loading settings from C:\cassandra\conf\storage-conf.xml
DEBUG – Syncing log with a period of 1000
DEBUG – opening keyspace Keyspace1
DEBUG – adding Super1 as 0
DEBUG – adding Standard2 as 1
DEBUG – adding Standard1 as 2
DEBUG – adding StandardByUUID1 as 3
DEBUG – adding LocationInfo as 4
DEBUG – adding HintsColumnFamily as 5
DEBUG – Starting CFS Standard2
DEBUG – Starting CFS Super1
DEBUG – Starting CFS Standard1
DEBUG – Starting CFS StandardByUUID1
DEBUG – opening keyspace system
DEBUG – Starting CFS LocationInfo
DEBUG – INDEX LOAD TIME for C:\cassandra\data\data\system\LocationInfo-1-Data.db
: 23 ms.
DEBUG – INDEX LOAD TIME for C:\cassandra\data\data\system\LocationInfo-2-Data.db
: 2 ms.
DEBUG – Starting CFS HintsColumnFamily
DEBUG – collecting Generation:false:4@1
DEBUG – collecting Token:false:16@0
INFO – Saved Token found: 84884484173418133679406654443742525516
ERROR – Fatal exception in thread Thread[main,5,main]
java.lang.AssertionError: 0:0:0:0:0:0:0:1
at org.apache.cassandra.net.EndPoint.(EndPoint.java:64)
at org.apache.cassandra.net.EndPoint.(EndPoint.java:49)
at org.apache.cassandra.service.StorageService.start(StorageService.java
:275)
at org.apache.cassandra.service.CassandraServer.start(CassandraServer.ja
va:72)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.ja
va:95)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.jav
a:167)
INFO – LocationInfo has reached its threshold; switching in a fresh Memtable
INFO – Enqueuing flush of Memtable(LocationInfo)@18129670
INFO – Flushing Memtable(LocationInfo)@18129670
DEBUG – discard completed log segments for CommitLogContext(file=’C:/cassandra/d
ata/commitlog\CommitLog-1262985925433.log’, position=253), column family 4. CFID
s are Keyspace1: TableMetadata(Standard2: 1, Super1: 0, Standard1: 2, StandardBy
UUID1: 3, }), system: TableMetadata(LocationInfo: 4, HintsColumnFamily: 5, }), }
DEBUG – Marking replay position 253 on commit log C:/cassandra/data/commitlog\Co
mmitLog-1262985925433.log
INFO – Completed flushing C:\cassandra\data\data\system\LocationInfo-3-Data.db
hey,
I am trying to install cassandra on windows machine. I followed your instructions and I am getting this:
“C:\cassandra>bin\cassandra.bat
Drive already SUBSTed
Starting Cassandra Server
The filename, directory name, or volume label syntax is incorrect.”
Can you tell me why am I getting this filename, directory name, vol label syntax is incorrect?
Thanks,
Aatish
Hey, I got it!
The reason I was getting that error “The filename, directory name, or volume label syntax is incorrect.” is because my JAVA_HOME had unnecessary ‘;’ (read as semi-colon). So, when I removed it, that error went away.
Hi, great that you got it sorted out!
Sorry for not answering sooner!
Johannes
Hi, I am getting this error when starting cassandra in windows…
Please help.
D:\cassandra>bin\cassandra
Drive already SUBSTed
The system cannot find the drive specified.
Starting Cassandra Server
Listening for transport dt_socket at address: 8888
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/cassandra/
service/CassandraDaemon
I would like to share the detailed steps to setup cassandra on windows – check my blog at
“http://blog.csdn.net/goodxp”.
You can also see how to setup Ruby 1.9.1 with it there.
Dirty fix but working.
nice post! very useful!
Hi. Where is promised example?
Here is nice cassandra client http://github.com/rantav/hector
Hi, i’m waiting the java example too
can you email me if you publish something about it?
is it thrift? something like… i describe my “schema” and then run thrift which provides java related APIs ?
bye
Worked like a charm on Vista 32bit. Thank you!
I just added 2 lines in cassandra.bat to set my env variables:
set JAVA_HOME=C:\Program Files\Java\jre6
set CASSANDRA_HOME=c:\sand\cassandra
Hi,
I have a requirement of using Cassandra in my application. In my application there is one table with lot of data and most of my application uses that table. Due to lot of data,performance of the application is decreasing when i use that table is in Oracle.
So, I have decided to use the Cassandra database for that one table and all other tables in oracle. Lot of business logic is dependent on that table.
No my question is, Can I use the Cassandra for a table which has lot of business logic.
I am unable to implement lot of where clauses for Cassandra database.
Is there any supporting tool to use Cassandra in an efficient way?
Please let me know…
i am in urgency..
Thanks in advance
By Mallik
Hmm, hard to say. Cassandra may not be the silver bullet. Simple Reads/Writes are generally faster than on any comparable Sql database, but complicated queries need to be rewritten in your application logic, especially of you use higher order functions such as joins, grouping, sorting, etc.. And there is no tool that translates SQL queries to the according logic code. So you have to ponder each SQL query whether it easy to be migrated to cassandra.
The best article again! Do you know when will you come up with the third post providing several hands-on examples for Cassandra with Java? I am really looking forward to the third one as I may have to use it in my project soon.
Hi! Thanks for your comment! I am afraid I am a bit snowed under with work so that we have to wait for the third part to get done. I am really sorry because I was really looking forward to continueing working on it
(
Johannes
I am getting message as :-
The system cannot find the specified path. Any ideeas.
Hi,
the xml file you mentioned is not in the conf directory of Cassandra. So this post is outdated. Can you please update with the correct info, otherwise it is not very usefyl as of now.