Cassandra is a very popular member of distributed nosql dbms and is one of the most scalable, fastest, and very robust NoSQL database. The steps documented in this post are very basic in nature and you should consider tuning this for production grade cluster setup, however, this is good enough to smackdown and explore Cassandra's capabilities.
Basic Cluster Configuration:
Step 1: Setting up on a single node.
Replace the download url with your closest mirror.
Here is a sample command for version 2.5, this command will download, extract and rename the folder
wget http://mirrors.gigenet.com/apache/cassandra/2.0.5/apache-cassandra-2.0.5-bin.tar.gz && tar xvzf apache-cassandra-2.0.5-bin.tar.gz && mv apache-cassandra-2.0.5 cassandra25_node1_dc1
Step 2 (Optional): Edit configuration to modify following as per your standards.
conf/cassandra.yaml
data_file_directories:
- /home/cassandra/data
commitlog_directory: /home/cassandra/data/commitlog
saved_caches_directory: /home/cassandra/saved_caches
conf/log4j-server.properties
log4j.appender.R.File: /home/cassandra/system.log
Repeat Step 1 and 2 in another machine/vdi
At this point we have a basic setup configured and you should be able to launch the nodes
independently, However, the nodes are not yet clustered and can not communicate with each other.
./bin/cassandra -f
Step 3: Cluster nodes
We need to make few more changes to our configuration file to let the nodes cluster
conf/cassandra.yaml
Provide a logical name for your cluster, E.g.
cluster_name: 'hari_cassandra_ring'
Seeds - For a cassandra node to participate in a cluster it has to know about one other node in the datacenter, this is called as "seed" node
in cassandra config file, this can be a comma separated list of servers, the documentation suggests to avoid a chicken and egg reference while defining the seed node
http://wiki.apache.org/cassandra/GettingStarted
E.g.
seeds: "192.168.0.119"
listen_address - This should be a private address that nodes connect to for inter node communication
for simple configuration we can leave this as the ip address or hostname of the node.
listen_address: 192.168.0.108
This is the rpc communication interface, for basic configuration we will leave this same as listen_address
rpc_address: 192.168.0.108
initial_token - This is another important aspect of cluster configuration and governs load distribution across nodes, for the purpose of this demo I will leave it as blank, you may refer cassandra documentation on how this can be defined based on the number of nodes within the data center.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configGenTokens_c.html
Step 3: Test cluster setup
As you bring up more nodes we should be able to see similar messages indicating cluster node handshake.
INFO 22:06:20,974 Handshaking version with /192.168.0.108
INFO 22:06:23,023 Node /192.168.0.108 is now part of the cluster
INFO 22:06:23,047 Handshaking version with /192.168.0.108
INFO 22:06:23,061 InetAddress /192.168.0.108 is now UP
INFO 22:06:23,207 InetAddress /192.168.0.108 is now DOWN
INFO 22:06:23,212 Handshaking version with /192.168.0.108
INFO 22:06:24,037 InetAddress /192.168.0.108 is now UP
INFO 22:06:53,449 [Stream #6e422a30-99e4-11e3-858d-e535fdb952e8] Received streaming plan for Bootstrap
INFO 22:06:53,590 [Stream #6e422a30-99e4-11e3-858d-e535fdb952e8] Session with /192.168.0.108 is complete
Another command to check cluster / node status is nodetool command
./cassandra25_node1_dc1/bin/nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.0.108 68.61 KB 256 100.0% 005d1cea-aa68-41b0-9a75-0051dd431930 rack1
UN 192.168.0.119 73.14 KB 256 100.0% 8ca40713-2eb5-44df-8a52-6cd838a492e3 rack1