Snowflake Tutorial: Data Analytics for Beginners

Skill path
Data Analytics with Snowflake
Learn Data Analytics with Snowflake covering setup, query execution, data loading, cloud integration, performance optimization and security management.
Includes 47 Courses
With Certificate
Intermediate.
26 hours
Skill path
Advanced Snowflake
Learn Advanced Snowflake covering data transformation, performance optimization, Snowpark implementation, machine learning techniques and data governance strategies.
Includes 41 Courses
With Certificate
Advanced.
42 hours

What is Snowflake?

Snowflake is a cloud data platform focused on storing and analyzing information. The platform eliminates the need for physical servers or software installations, operating entirely as a managed service where the provider handles all infrastructure concerns.

Three major cloud environments support Snowflake deployments:

Amazon Web Services (AWS)
Microsoft Azure
Google Cloud Platform (GCP)

How Snowflake works differently

What sets Snowflake apart is how it treats storage and computation as separate, independent components. Traditional database systems link these elements together tightly. Increasing processing capacity in those systems requires expanding storage capacity simultaneously, even when additional storage serves no purpose. Snowflake breaks this connection. Storage grows or shrinks based on data volume. Computation adjusts based on query demands. Each component operates and scales on its own terms.

What this architecture enables

Several practical advantages emerge from this architectural approach:

Independent scaling: Adjust storage size or compute power without affecting the other, and without system downtime
Efficient processing: Built-in optimization techniques and distributed query execution reduce wait times
Cost control: Billing occurs only for actual storage consumption and active compute usage
Instant duplication: Copy entire databases or individual tables in seconds without creating redundant data files
Direct data access: Grant other Snowflake users access to live data without transferring copies

When to use Snowflake

Three core scenarios demonstrate where Snowflake provides the most value:

Consolidated data storage: Build a single location where information from various sources comes together in queryable form
Reporting and visualization: Feed current data into dashboards and standard reports that drive business decisions
Deep analysis: Run sophisticated calculations, identify patterns over time, and build statistical models

The platform works equally well with structured information organized in rows and columns, and semi-structured formats like JSON, Avro, or Parquet files. This flexibility removes the need to maintain separate systems for different data types.

The next section examines how Snowflake’s internal structure enables these capabilities for Snowflake data analytics tasks.

Snowflake architecture

Snowflake uses a hybrid design that combines the benefits of two traditional database models.

Shared-disk architectures allow all processing nodes to access the same data, simplifying management.
Shared-nothing architectures distribute data across independent nodes, enabling better performance.

Snowflake merges these approaches through three distinct layers that work together while scaling independently.

Snowflake data analytics three-layer architecture for cloud data warehousing

Storage layer

All data resides in cloud object storage provided by the underlying platform (Amazon S3, Azure Blob Storage, or Google Cloud Storage). Snowflake automatically handles data organization without requiring manual intervention.

Key characteristics of the storage layer:

Columnar format: Data stores in columns rather than rows, making analytical queries faster
Micro-partitions: Large tables divide into smaller chunks (50-500 MB each) automatically
Compression: Each micro-partition compresses based on its content type
Independent scaling: Storage capacity expands or contracts separately from compute resources

Data files remain invisible to users. All interactions happen through SQL queries rather than direct file access.

Compute layer

Processing happens through virtual warehouses, which are clusters of compute resources. Each virtual warehouse operates independently and can run queries without affecting other warehouses.

Virtual warehouse features:

MPP processing: Massively parallel processing distributes query work across multiple nodes
Flexible sizing: Warehouses range from X-Small (1 credit per hour) to 6X-Large (512 credits per hour)
Auto-suspend: Warehouses shut down automatically after a specified idle period
Auto-resume: Warehouses restart automatically when new queries arrive
Concurrent execution: Multiple warehouses can query the same data simultaneously

Compute resources operate on a pay-per-second model. Charges stop when a warehouse suspends, even though the underlying data remains accessible.

Services layer

This coordination layer manages all activities across Snowflake. Running continuously in the background, it handles essential operations without direct user interaction.

Core services include:

Authentication: Validates user credentials and manages access sessions
Query optimization: Analyzes SQL statements and determines the most efficient execution path
Metadata management: Tracks table structures, data statistics, and query history
Security controls: Enforces role-based permissions and encryption policies
Transaction management: Ensures data consistency across concurrent operations

Why this architecture matters

The separation of layers delivers practical advantages for data work. Storage costs remain fixed based on volume, while compute costs vary with actual usage. Teams can run multiple workloads simultaneously without resource conflicts. One warehouse handling reporting queries will not slow down another warehouse running data transformations.

The platform manages optimization tasks automatically. Micro-partitions reorganize as data changes. Query plans adjust based on current statistics. Metadata updates happen in real time. These background processes eliminate the need for database administrators to perform manual tuning.

Multiple teams can access the same datasets concurrently without performance degradation. The architecture supports this through its combination of centralized storage and distributed processing.

Let’s now move from architecture concepts to hands-on implementation. Setting up your Snowflake environment is the next step in learning Snowflake data analytics. ## Getting started with Snowflake

Snowflake provides Snowsight, a modern web interface for all Snowflake data analytics operations. Access Snowsight by navigating to https://app.snowflake.com and signing in with account credentials. The interface presents an organized layout designed for efficient data work.

Snowflake tutorial showing the Snowsight interface

Navigating the Snowsight interface

The new Snowsight navigation menu organizes features into logical categories:

Work with data:
- Projects: Access Worksheets, Notebooks, Streamlit apps, and Dashboards
- Ingestion: Load data using connectors and tools
- Transformation: Manage dynamic tables and tasks
- AI & ML: Use Cortex AI and ML features
- Monitoring: View query history and performance
Horizon Catalog:
- Catalog: Browse databases and data products (replaces the old “Data” menu)
- Data Sharing: Share data across accounts
- Governance & Security: Manage access controls
Manage:
- Compute: Manage warehouses and resource monitors
- Admin: Configure users, roles, and account settings

Most analytical work happens in Projects > Worksheets, where SQL queries are written and executed.

Snowflake data analytics worksheet

Understanding worksheet context

Each worksheet requires four context settings before running queries. These appear as dropdowns in the worksheet toolbar:

Role: Controls access permissions (e.g., ACCOUNTADMIN, SYSADMIN, PUBLIC)
Warehouse: Determines which compute cluster executes queries
Database: Sets the default database
Schema: Sets the default schema within the database

Setting proper context prevents “object does not exist” errors.

Creating a virtual warehouse

Warehouses provide compute power for queries. Create one through the UI or SQL:

UI Method:

Navigate to Manage > Compute
Select + Warehouse
Enter warehouse name and configure settings
Click Create Warehouse

Snowflake tutorial demonstrating virtual warehouse creation

This creates a Small warehouse that suspends after 5 minutes of inactivity and resumes automatically when queries arrive.

Auto-suspend and auto-resume

Auto-suspend: Automatically stops the warehouse after idle time. Recommended setting: 300-600 seconds (5-10 minutes). Suspended warehouses consume no credits.
Auto-resume: Automatically starts suspended warehouses when queries arrive. Keep this enabled for seamless operation.

These settings control costs while maintaining availability. Data remains accessible even when warehouses are suspended.

With your Snowflake data analytics environment configured, the next step is loading data into Snowflake.

Loading data for Snowflake data analytics

Snowflake offers multiple approaches for loading data. The Snowsight interface provides a streamlined process through the Ingestion menu, while SQL commands offer programmatic control. This section demonstrates both methods using sales transaction data.

Creating database and schema

Start by organizing data with a proper database structure:

CREATE DATABASE sales_data; 
CREATE SCHEMA sales_data.raw; 

Copy to clipboardCopy to clipboard

Note: Create the database from Catalog > Databases first then run the queries.

Snowflake data analytics Catalog showing database explorer

Using the load data wizard

Snowsight provides a guided interface for loading files directly into tables.

Step 1: Access the Load Data interface
- Navigate to Ingestion > Add Data in the left menu. Select Load data into a Table. The Load Data wizard appears.
Step 2: Select or create a target table
- Choose an existing table or create a new one. For new tables:
  - Enter table name
  - Select database and schema
  - Let Snowflake infer schema from the file
Step 3: Upload files
- Add files using one of these methods:
  - Drag and drop files directly
  - Click Browse to select from the local system
  - Select Add from stage for files already staged

The interface accepts up to 250 files at once, with each file up to 250 MB.

Snowflake data analytics file upload wizard

Step 4: Configure file format
- Snowflake automatically detects file format using INFER_SCHEMA. Review and adjust:
  - File type (CSV, JSON, Parquet, etc.)
  - Delimiter settings
  - Header row handling
  - Date and time formats
Step 5: Set error handling
- Choose what happens when errors occur:
  - Abort statement: Stop loading if any error occurs (default)
  - Continue: Skip bad rows, load the rest
  - Skip file: Abandon entire file on error
Step 6: Select loading method
- Append: Add new data to the existing table
- Replace: Remove existing data, insert new data
Step 7: Execute load
- Click Load to start the process. Snowflake displays:
  - Number of files processed
  - Rows parsed
  - Rows loaded
- Any errors encountered

Loading data with SQL

For automated workflows or larger datasets, SQL provides more control.

Create table structure:

CREATE TABLE sales_data.raw.transactions ( 
    transaction_id INTEGER, 
    transaction_date DATE, 
    customer_id INTEGER, 
    product_name VARCHAR(100), 
    quantity INTEGER, 
    unit_price DECIMAL(10,2) 
); 

Copy to clipboardCopy to clipboard

Create internal stage:

CREATE STAGE sales_data.raw.csv_stage; 

Copy to clipboardCopy to clipboard

Upload file to stage (using SnowSQL command line):

PUT file:///path/to/sales_data.csv @sales_data.raw.csv_stage; 

Copy to clipboardCopy to clipboard

Create file format:

CREATE FILE FORMAT sales_data.raw.csv_format TYPE = 'CSV' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"'; 

Copy to clipboardCopy to clipboard

Execute COPY command:

COPY INTO sales_data.raw.transactions FROM @sales_data.raw.csv_stage FILE_FORMAT = (FORMAT_NAME = 'sales_data.raw.csv_format') ON_ERROR = 'CONTINUE'; 

Copy to clipboardCopy to clipboard

This loads data from staged files into the table, skipping problematic rows.

Verifying loaded data

Confirm successful loading:

-- Check row count  
SELECT COUNT(*) FROM sales_data.raw.transactions; 

-- View sample data  
SELECT * FROM sales_data.raw.transactions LIMIT 10; 

Copy to clipboardCopy to clipboard

Now that the data is loaded, let’s see how to extract insights using Snowflake data analytics queries.

Running analytics queries in Snowflake

Snowflake data analytics capabilities shine through SQL queries executed in Worksheets. Navigate to Projects > Worksheets to access the query editor. This section demonstrates common analytical patterns for extracting insights from data.

Basic aggregations

Aggregation functions summarize data across rows.

Calculate sales metrics:

SELECT COUNT(*) AS total_transactions, SUM(unit_price * quantity) AS total_revenue, AVG(unit_price * quantity) AS average_order_value FROM sales_data.raw.transactions; 

Copy to clipboardCopy to clipboard

Group data to see patterns:

SELECT product_name, SUM(quantity) AS units_sold, SUM(unit_price * quantity) AS product_revenue FROM sales_data.raw.transactions GROUP BY product_name ORDER BY product_revenue DESC; 

Copy to clipboardCopy to clipboard

This reveals top-performing products by revenue.

Time-based analytics

Analyze trends over time using date functions:

SELECT DATE_TRUNC('MONTH', transaction_date) AS month, COUNT(*) AS monthly_transactions, SUM(unit_price * quantity) AS monthly_revenue FROM sales_data.raw.transactions GROUP BY month ORDER BY month; 

Copy to clipboardCopy to clipboard

DATE_TRUNC groups dates into monthly periods, revealing seasonal patterns and growth trends.

Window functions

Window functions perform calculations across related rows without collapsing results. Calculate running totals:

SELECT transaction_date, unit_price * quantity AS daily_revenue, SUM(unit_price * quantity) OVER (ORDER BY transaction_date) AS cumulative_revenue FROM sales_data.raw.transactions ORDER BY transaction_date; 

Copy to clipboardCopy to clipboard

The OVER clause defines the calculation window. This shows revenue accumulation over time.

Ranking analysis

Identify top performers with ranking functions:

SELECT product_name, SUM(unit_price * quantity) AS revenue, RANK() OVER (ORDER BY SUM(unit_price * quantity) DESC) AS revenue_rank FROM sales_data.raw.transactions GROUP BY product_name; 

Copy to clipboardCopy to clipboard

RANK() assigns positions based on revenue values.

Month-over-month growth

Compare performance across time periods:

SELECT DATE_TRUNC('MONTH', transaction_date) AS month, SUM(unit_price * quantity) AS current_revenue, LAG(SUM(unit_price * quantity)) OVER (ORDER BY DATE_TRUNC('MONTH', transaction_date)) AS previous_revenue, ((SUM(unit_price * quantity) - LAG(SUM(unit_price * quantity)) OVER (ORDER BY DATE_TRUNC('MONTH', transaction_date))) / LAG(SUM(unit_price * quantity)) OVER (ORDER BY DATE_TRUNC('MONTH', transaction_date))) * 100 AS growth_percent FROM sales_data.raw.transactions GROUP BY month ORDER BY month; 

Copy to clipboardCopy to clipboard

LAG() accesses previous row values, enabling period-over-period comparisons.

Joining tables

Combine multiple data sources for comprehensive analysis. When working with multiple related tables, joins connect data through common keys.

Here’s a conceptual example showing how to analyze customer lifetime value (note: this requires a customers table):

-- Example structure (for reference only) -- This assumes you have created a customers table with customer details 
SELECT c.customer_name, COUNT(t.transaction_id) AS total_purchases, SUM(t.unit_price * t.quantity) AS lifetime_value FROM sales_data.raw.transactions t JOIN sales_data.raw.customers c ON t.customer_id = c.customer_id GROUP BY c.customer_name ORDER BY lifetime_value DESC LIMIT 10; 

Copy to clipboardCopy to clipboard

For a working example with just the transactions table, analyze customer purchase patterns:

SELECT customer_id, COUNT(transaction_id) AS total_purchases, SUM(unit_price * quantity) AS total_spent, AVG(unit_price * quantity) AS avg_order_value FROM sales_data.raw.transactions GROUP BY customer_id ORDER BY total_spent DESC LIMIT 10; 

Copy to clipboardCopy to clipboard

This identifies highest-value customers based on total spending.

These query patterns form the foundation for snowflake data analytics work. Worksheets in Snowsight support collaboration through sharing, folder organization, and query history tracking. Results can be visualized in charts or exported to dashboards for ongoing monitoring.

The platform handles complex analytical queries efficiently, making it suitable for organizations of any size.

Conclusion

This Snowflake tutorial covered the essential concepts for getting started with Snowflake data analytics:

Cloud-native platform separating storage and compute for independent scaling
Three-layer architecture: storage, compute (virtual warehouses), and services
Snowsight interface with Projects, Catalog, and Ingestion navigation
Virtual warehouses for query processing with flexible sizing options
Data loading through the UI wizard
SQL-based analytics, including aggregations, window functions, and time-based analysis
Multiple data loading methods: manual uploads, Snowpipe, and Partner Connect integrations

The platform provides a complete environment for storing, processing, and analyzing data without infrastructure management.

If you want to learn more about Snowflake and gain hands-on experience, check out the Data Analytics with Snowflake skill path. It provides structured learning to build practical expertise with the platform.

Frequently asked questions

1. Is Snowflake useful for data analysts?

Yes. Snowflake uses standard SQL that most analysts already know. The Snowsight interface provides an accessible environment for writing queries, creating reports, and analyzing data without requiring infrastructure management or programming skills beyond SQL.

2. Is Snowflake used for ETL?

Snowflake works as both an ETL destination and processing engine. Data can be loaded into Snowflake and transformed using SQL. Many organizations use ELT (Extract, Load, Transform) instead, where raw data loads first and transforms within Snowflake using tools like dbt.

3. Does Snowflake only use SQL?

SQL is the primary interface, but Snowflake supports additional languages through Snowpark. Python, Java, and Scala code can run directly within Snowflake for data processing and machine learning. REST APIs enable programmatic access, and the platform integrates with various BI tools.

4. What is Snowflake used for?

Snowflake serves as a centralized data warehouse consolidating information from multiple sources. Organizations use it for business intelligence reporting, data analysis, machine learning, and secure data sharing between business units or external partners without copying data.

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team

Learn more on Codecademy

Skill path
Data Analytics with Snowflake
Learn Data Analytics with Snowflake covering setup, query execution, data loading, cloud integration, performance optimization and security management.
Includes 47 Courses
With Certificate
Intermediate.
26 hours
Skill path
Advanced Snowflake
Learn Advanced Snowflake covering data transformation, performance optimization, Snowpark implementation, machine learning techniques and data governance strategies.
Includes 41 Courses
With Certificate
Advanced.
42 hours
Free course
Learn SQL
In this SQL course, you'll learn how to manage large datasets and analyze real data using the standard data management language.
Beginner Friendly.
5 hours

Contents