Cassandra Database in a Nutshell

A Brief History (a.k.a. How Cassandra Came to Be)

Once upon a time (2008, to be precise), a few folks at Facebook had a problem: they needed a database that could handle massive amounts of data while staying available at all times.

Traditional relational databases were throwing tantrums whenever things got big. So, Facebook engineers Avinash Lakshman (one of the guys behind Amazon Dynamo) and Prashant Malik decided to Frankenstein their own solution.

They took inspiration from Dynamo’s distributed model and Bigtable’s storage engine and BOOM! Cassandra was born.

Facebook used it for their inbox search, but soon, they open-sourced it because, well, they had bigger fish to fry. In 2009, the Apache Foundation picked it up, and Cassandra officially became a thing. Since then, it’s been making waves in big data, powering companies like Netflix, Apple, and Uber.

What Makes Cassandra Special?

Decentralization – No single point of failure. Every node in the cluster is equal, like a true democracy (but one that actually works).
Scalability – Horizontal scaling, baby! Need more capacity? Just add more nodes.
High Availability – Thanks to its peer-to-peer design, it just won’t quit.
Write-Optimized – Cassandra laughs in the face of heavy write workloads.
Tunable Consistency – Choose between strong and eventual consistency, depending on how much you like living on the edge.

Installing Cassandra

Alright, enough talk. Let’s get Cassandra up and running.

Step 1: Install Java

Cassandra runs on Java, so make sure you’ve got Java installed:

1
java -version

If not, install it (for example, on Ubuntu):

1
sudo apt update && sudo apt install openjdk-11-jdk

Step 2: Install Cassandra

For Ubuntu:

1
2
3
4
echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.list
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
sudo apt update
sudo apt install cassandra

Start it up:

1
sudo systemctl start cassandra

And check if it’s running:

1
nodetool status

Boom! You’ve got a Cassandra node up and running.

Working with Cassandra: CQL Basics

Cassandra isn’t SQL, but it has its own query language: Cassandra Query Language (CQL). Think of it as SQL’s rebellious cousin.

First, fire up the Cassandra shell:

1
cqlsh

Now, let’s create a keyspace (Cassandra’s equivalent of a database):

1
2
CREATE KEYSPACE my_keyspace
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

Use it:

1
USE my_keyspace;

Creating a Table

1
2
3
4
5
6
CREATE TABLE users (
    id UUID PRIMARY KEY,
    name text,
    email text,
    age int
);

Inserting Data

1
2
INSERT INTO users (id, name, email, age)
VALUES (uuid(), 'Alice', 'alice@example.com', 30);

Querying Data

1
SELECT * FROM users;

See? Not too scary.

Distributed Awesomeness: Setting Up a Cluster

A single node is cool and all, but Cassandra really shines in a cluster. Here’s how to set one up.

Edit cassandra.yaml (usually in /etc/cassandra/):
- Change cluster_name to something cool like "TheGrid"
- Set seed_provider to include a seed node (e.g., the IP of your first node)
- Set listen_address and rpc_address to the node’s IP
Start Cassandra on each node.
Check cluster status:
1
nodetool status

If you see a bunch of happy nodes, congratulations! You’ve got a Cassandra cluster.

When to Use Cassandra

Cassandra is awesome, but it’s not a silver bullet. Here’s when it makes sense:

You have a massive amount of data – If you’re working with terabytes or petabytes, Cassandra’s got your back.
You need high availability – Perfect for mission-critical apps that must never go down.
You love writing more than reading – Heavy write workloads? Cassandra eats them for breakfast.
You need horizontal scaling – Just keep adding nodes!

When not to use Cassandra:

You need complex joins and transactions (seriously, stick with SQL for that).
You have small-scale, simple data needs.

Key Ideas

Concept	Summary
History	Built at Facebook, inspired by Dynamo & Bigtable
Features	Decentralized, highly available, scalable
CQL Basics	SQL-like, but not quite SQL
Installing	Requires Java, simple installation
Cluster Setup	Requires multiple nodes, config changes
When to Use	Big data, high availability, heavy writes