Featured image of post Snowflake in a Nutshell

Snowflake in a Nutshell

Snowflake and Modern Data Warehousing

Snowflake Structure Still Mystifies Physicists


What Even is Snowflake?

Snowflake is a cloud-based data warehouse that does all the heavy lifting for you.

Unlike traditional databases that need tons of manual tuning and babysitting,

Snowflake is fully managed. This means:

  • No hardware to maintain (goodbye, server closets!).
  • No need to endlessly optimize queries (SQL just works!).
  • Infinite scalability (or close enough, anyway).
  • Pay-for-what-you-use pricing (so you don’t have to sell a kidney to afford it).

It runs on AWS, Azure, and Google Cloud, so you’re not tied down to a single cloud provider.


What Makes Snowflake So Special?

Great question. Here are some killer features that make Snowflake stand out from the crowd:

1. Separation of Storage and Compute

Most traditional databases are like those all-you-can-eat buffets where the kitchen and dining area are permanently stuck together.

Snowflake, on the other hand, separates storage (where your data lives) from compute (where the actual processing happens).

This means you can scale each one independently.

Need more CPU power?

Scale up compute.

Need more space?

Just expand storage.

2. Zero Copy Cloning

Ever wanted to make an exact copy of your database without actually copying the data?

Snowflake can do that instantly using metadata magic.

It’s like duplicating a cake without actually baking another one (science needs to catch up on this feature in real life).

3. Automatic Scaling and Concurrency

Have you ever had a database slow to a crawl because too many people were running queries at the same time?

Snowflake auto-scales and isolates workloads, so everyone gets their fair share of computing power without stepping on each other’s toes.

4. Data Sharing Without ETL Nightmares

Sharing data between companies usually involves painful ETL processes, duplicate storage, and security headaches.

Snowflake makes it easy to share data in real time without making a bunch of copies.

5. SQL Support

You don’t need to learn some new, obscure query language. Snowflake speaks SQL fluently.

How Does Snowflake Compare to Other Modern Data Warehouses?

Alright, so Snowflake is great, but what about its competition? Let’s do a quick rundown.

Google BigQuery

  • Serverless, pay-per-query pricing.
  • Super fast but can get expensive for frequent queries.
  • Integrated deeply with Google Cloud services.

Amazon Redshift

  • Strong AWS integration.
  • Requires a bit more tuning and maintenance.
  • Cheaper than Snowflake for consistent workloads.

Azure Synapse Analytics

  • Microsoft’s data warehouse offering.
  • Tight integration with Power BI and Azure services.
  • Good for companies deep into the Microsoft ecosystem.

Each of these has its strengths, but Snowflake’s ease of use and flexibility make it a top choice for many businesses.


Who Should Use Snowflake?

  • Startups: Because paying only for what you use is a blessing when you’re trying to stretch every dollar.
  • Enterprises: Because handling petabytes of data without crying is a big win.
  • Data Scientists: Because running complex queries on massive datasets should not feel like medieval torture.
  • Anyone Migrating from Traditional Data Warehouses: If your current setup makes you want to throw your laptop out the window, Snowflake is worth a look.

How Snowflake Works

How Traditional SQL Databases Work

Before we talk about how Snowflake does its thing, let’s recap how a normal SQL database (like SQL Server) operates:

  1. Monolithic Architecture – SQL databases are usually one big block of storage and compute. Everything is tightly coupled, meaning if you need more computing power, you’re also probably adding more storage (and vice versa).
  2. OLTP & OLAP Struggles – Traditional databases are great for transactional (OLTP) workloads but often struggle with analytical (OLAP) workloads that require heavy lifting across large datasets.
  3. Concurrency Bottlenecks – If too many queries run at once, they fight for resources. You can try indexing, partitioning, and caching, but at some point, performance takes a hit.
  4. Scaling is Hard – Scaling a traditional database means adding more hardware (vertical scaling) or sharding data across multiple servers (horizontal scaling). Both methods have limitations and headaches.
  5. ETL Overhead – Extract, Transform, Load (ETL) processes are necessary to move data from OLTP databases to analytical warehouses. It’s slow, painful, and sometimes even traumatic.

Now let’s see how Snowflake turns all of this on its head.


Snowflake’s Architecture

Snowflake is not just another SQL database.

It’s a fully cloud-native data warehouse that’s built differently from the ground up. Here’s what makes it unique:

1. Separation of Storage, Compute, and Services

Unlike SQL Server, where everything is bundled together, Snowflake splits its architecture into three layers:

  • Storage Layer – Snowflake stores data in a columnar format in cloud object storage (AWS S3, Azure Blob Storage, or Google Cloud Storage). This means cheap, scalable storage that’s separate from computing power.
  • Compute Layer (Virtual Warehouses) – Queries don’t run on a single big monolithic server. Instead, Snowflake uses virtual warehouses, which are independent compute clusters that can be resized or scaled dynamically.
  • Cloud Services Layer – This handles metadata, authentication, and query optimization. Snowflake’s smart query optimizer ensures efficient execution without requiring manual indexing or tuning.

2. Automatic Scaling & Concurrency

  • Snowflake can scale compute resources automatically based on demand.
  • Multiple users can run queries at the same time without competing for resources. Each virtual warehouse operates independently, so heavy analytical workloads don’t slow down transactional ones.

3. Zero Copy Cloning & Time Travel

  • Want to clone an entire database in seconds? Snowflake does it without duplicating data. It just references existing storage .
  • Need to undo a mistake? Snowflake’s Time Travel feature lets you query past versions of your data.

4. Built-in Data Sharing

  • Sharing data across organizations in traditional databases means sending CSVs or setting up complex ETL pipelines. In Snowflake, you can share live data instantly, without moving or duplicating it.

Snowflake vs. Traditional SQL Databases: Pros and Cons

Now that we understand how Snowflake works, let’s do a head-to-head comparison.

FeatureSnowflakeTraditional SQL Database
Storage & Compute Separation✅ Yes❌ No
Scalability✅ Scales automatically⚠️ Manual scaling required
Concurrency Handling✅ Independent compute clusters❌ Queries compete for resources
Indexing & Optimization✅ Automatic optimization❌ Requires manual tuning
ETL Complexity✅ Minimal ETL❌ Heavy ETL processes needed
Backup & Cloning✅ Zero-copy cloning & time travel❌ Manual backup required
Data Sharing✅ Real-time sharing❌ Requires data export
Cloud-Native✅ 100% Cloud❌ Mostly on-premise
Cost⚠️ Pay-per-use (can be expensive)✅ Fixed pricing (cheaper for predictable workloads)
Latency for Small Queries⚠️ Can have cold start latency✅ Always on

When to Use Snowflake (and When Not To)

When Snowflake is a Great Choice

  • You need big data analytics with frequent complex queries.
  • You don’t want to manage infrastructure and need fully managed scaling.
  • You want easy data sharing and minimal ETL headaches.
  • Your workloads are unpredictable, and you need pay-per-use pricing.

When a Traditional SQL Database is Better

  • You have OLTP workloads (frequent small transactions rather than large analytical queries).
  • Your data size is small, and you don’t need Snowflake’s scalability.
  • You prefer fixed pricing over variable cloud costs.
  • You require low-latency queries (Snowflake’s cold start times can be a minor drawback).

Snowflake Code Examples

1. Creating a Database and Schema

Before anything else, you need a place to store your data.

1
2
3
4
5
CREATE DATABASE company_db;
USE DATABASE company_db;

CREATE SCHEMA sales_data;
USE SCHEMA sales_data;

Snowflake keeps things organized with databases and schemas, so it’s best to plan your structure ahead of time.


2. Creating a Table

Now, let’s create a simple table for storing customer data.

1
2
3
4
5
6
7
CREATE TABLE customers (
    customer_id INT AUTOINCREMENT PRIMARY KEY,
    first_name STRING,
    last_name STRING,
    email STRING,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Snowflake supports auto-incrementing primary keys, string-based identifiers, and timestamp defaults. Pretty neat!


3. Inserting Data

Let’s put some actual data in that table.

1
2
3
4
INSERT INTO customers (first_name, last_name, email) VALUES
    ('John', 'Doe', 'john.doe@example.com'),
    ('Jane', 'Smith', 'jane.smith@example.com'),
    ('Alice', 'Johnson', 'alice.johnson@example.com');

Simple, SQL-standard inserts. Nothing fancy, just reliable data entry.


4. Querying Data

Time to fetch what we just inserted!

1
SELECT * FROM customers;

Want only certain columns?

1
SELECT first_name, email FROM customers WHERE last_name = 'Doe';

You can also use Snowflake’s time travel feature to query historical data. Example:

1
SELECT * FROM customers AT(TIMESTAMP => '2024-01-01 12:00:00');

Mind-blown? Yeah, Snowflake can do that.


5. Creating a View

Need to simplify complex queries? Use a view.

1
2
3
4
CREATE VIEW active_customers AS
SELECT customer_id, first_name, email 
FROM customers
WHERE created_at > CURRENT_DATE - INTERVAL '30 days';

Now you can simply SELECT * FROM active_customers; instead of writing that whole query every time.


6. Using Window Functions

Need analytics? Snowflake handles window functions like a champ.

1
2
3
SELECT customer_id, first_name, created_at,
       ROW_NUMBER() OVER (ORDER BY created_at DESC) AS customer_rank
FROM customers;

This assigns a row number to each customer based on when they signed up. Perfect for ranking and analytics.


7. Handling JSON Data

Snowflake loves semi-structured data like JSON.

1
2
3
4
5
6
7
8
9
CREATE TABLE orders (
    order_id INT,
    customer_id INT,
    order_details VARIANT
);

INSERT INTO orders VALUES 
    (1, 1, '{"item": "Laptop", "price": 1200, "quantity": 1}'),
    (2, 2, '{"item": "Phone", "price": 800, "quantity": 2}');

Now you can query JSON fields directly:

1
2
SELECT order_details:item::STRING AS item_name
FROM orders;

No need for complex transformations. Snowflake just gets JSON.


8. Optimizing Performance with Clustering

Want faster queries? Define clustering keys.

1
ALTER TABLE customers CLUSTER BY (created_at);

This helps Snowflake automatically optimize how data is stored, leading to better query performance over time.


9. Managing Users & Roles

Let’s create a new user and assign them permissions.

1
2
3
CREATE USER analyst PASSWORD='SecurePass123';
GRANT USAGE ON DATABASE company_db TO analyst;
GRANT SELECT ON ALL TABLES IN SCHEMA sales_data TO analyst;

Snowflake’s RBAC (Role-Based Access Control) makes it easy to manage security at scale.


10. Automating Tasks with Snowflake Tasks

Want to run jobs on a schedule? Snowflake Tasks can help.

1
2
3
4
5
6
CREATE TASK daily_customer_backup
    WAREHOUSE = COMPUTE_WH
    SCHEDULE = 'USING CRON 0 2 * * * UTC'
AS
COPY INTO @backup_stage/customers
FROM (SELECT * FROM customers);

This backs up your customers table daily at 2 AM UTC. No need for external cron jobs!


Can You Run Snowflake On-Prem?

Nope. Snowflake is 100% cloud-native and does not support on-premise deployments. Unlike traditional databases like SQL Server, PostgreSQL, or Oracle, Snowflake is built exclusively for the cloud and runs on:

  • AWS
  • Azure
  • Google Cloud

There is no option to install Snowflake on your own servers or data center. If you’re looking for an on-premises alternative, you’d need to consider solutions like:

  • Amazon Redshift with AWS Outposts (Hybrid Cloud)
  • Google BigQuery Omni (Multi-cloud)
  • Oracle Exadata (On-prem data warehouse)
  • Microsoft Synapse Analytics (SQL Data Warehouse)

However, Snowflake does offer Snowflake Private Link and Snowflake on Azure Government Cloud, which provide some level of controlled network access and security for highly regulated industries.



Is Snowflake Expensive?

It depends on how you use it! Snowflake can be cheaper than traditional on-prem databases because you only pay for what you use. However, it can get expensive if:

  • Queries are inefficient.
  • Warehouses are left running unnecessarily.
  • Large amounts of compute resources are used without optimization.

References

  1. Snowflake Official Site
  2. AWS Redshift vs Snowflake
  3. Google BigQuery Documentation
  4. Azure Synapse Analytics
  5. Snowflake SQL Documentation
  6. Snowflake JSON Processing
  7. Snowflake Performance Tuning
  8. Snowflake User Roles