Snowflake Tutorial: Data Analytics for Beginners
What is Snowflake?
Snowflake is a cloud data platform focused on storing and analyzing information. The platform eliminates the need for physical servers or software installations, operating entirely as a managed service where the provider handles all infrastructure concerns.
Three major cloud environments support Snowflake deployments:
- Amazon Web Services (AWS)
- Microsoft Azure
- Google Cloud Platform (GCP)
How Snowflake works differently
What sets Snowflake apart is how it treats storage and computation as separate, independent components. Traditional database systems link these elements together tightly. Increasing processing capacity in those systems requires expanding storage capacity simultaneously, even when additional storage serves no purpose. Snowflake breaks this connection. Storage grows or shrinks based on data volume. Computation adjusts based on query demands. Each component operates and scales on its own terms.
What this architecture enables
Several practical advantages emerge from this architectural approach:
- Independent scaling: Adjust storage size or compute power without affecting the other, and without system downtime
- Efficient processing: Built-in optimization techniques and distributed query execution reduce wait times
- Cost control: Billing occurs only for actual storage consumption and active compute usage
- Instant duplication: Copy entire databases or individual tables in seconds without creating redundant data files
- Direct data access: Grant other Snowflake users access to live data without transferring copies
When to use Snowflake
Three core scenarios demonstrate where Snowflake provides the most value:
- Consolidated data storage: Build a single location where information from various sources comes together in queryable form
- Reporting and visualization: Feed current data into dashboards and standard reports that drive business decisions
- Deep analysis: Run sophisticated calculations, identify patterns over time, and build statistical models
The platform works equally well with structured information organized in rows and columns, and semi-structured formats like JSON, Avro, or Parquet files. This flexibility removes the need to maintain separate systems for different data types.
The next section examines how Snowflake’s internal structure enables these capabilities for Snowflake data analytics tasks.
Snowflake architecture
Snowflake uses a hybrid design that combines the benefits of two traditional database models.
- Shared-disk architectures allow all processing nodes to access the same data, simplifying management.
- Shared-nothing architectures distribute data across independent nodes, enabling better performance.
Snowflake merges these approaches through three distinct layers that work together while scaling independently.

Storage layer
All data resides in cloud object storage provided by the underlying platform (Amazon S3, Azure Blob Storage, or Google Cloud Storage). Snowflake automatically handles data organization without requiring manual intervention.
Key characteristics of the storage layer:
- Columnar format: Data stores in columns rather than rows, making analytical queries faster
- Micro-partitions: Large tables divide into smaller chunks (50-500 MB each) automatically
- Compression: Each micro-partition compresses based on its content type
- Independent scaling: Storage capacity expands or contracts separately from compute resources
Data files remain invisible to users. All interactions happen through SQL queries rather than direct file access.
Compute layer
Processing happens through virtual warehouses, which are clusters of compute resources. Each virtual warehouse operates independently and can run queries without affecting other warehouses.
Virtual warehouse features:
- MPP processing: Massively parallel processing distributes query work across multiple nodes
- Flexible sizing: Warehouses range from X-Small (1 credit per hour) to 6X-Large (512 credits per hour)
- Auto-suspend: Warehouses shut down automatically after a specified idle period
- Auto-resume: Warehouses restart automatically when new queries arrive
- Concurrent execution: Multiple warehouses can query the same data simultaneously
Compute resources operate on a pay-per-second model. Charges stop when a warehouse suspends, even though the underlying data remains accessible.
Services layer
This coordination layer manages all activities across Snowflake. Running continuously in the background, it handles essential operations without direct user interaction.
Core services include:
- Authentication: Validates user credentials and manages access sessions
- Query optimization: Analyzes SQL statements and determines the most efficient execution path
- Metadata management: Tracks table structures, data statistics, and query history
- Security controls: Enforces role-based permissions and encryption policies
- Transaction management: Ensures data consistency across concurrent operations
Why this architecture matters
The separation of layers delivers practical advantages for data work. Storage costs remain fixed based on volume, while compute costs vary with actual usage. Teams can run multiple workloads simultaneously without resource conflicts. One warehouse handling reporting queries will not slow down another warehouse running data transformations.
The platform manages optimization tasks automatically. Micro-partitions reorganize as data changes. Query plans adjust based on current statistics. Metadata updates happen in real time. These background processes eliminate the need for database administrators to perform manual tuning.
Multiple teams can access the same datasets concurrently without performance degradation. The architecture supports this through its combination of centralized storage and distributed processing.
Let’s now move from architecture concepts to hands-on implementation. Setting up your Snowflake environment is the next step in learning Snowflake data analytics. ## Getting started with Snowflake
Snowflake provides Snowsight, a modern web interface for all Snowflake data analytics operations. Access Snowsight by navigating to https://app.snowflake.com and signing in with account credentials. The interface presents an organized layout designed for efficient data work.
Navigating the Snowsight interface
The new Snowsight navigation menu organizes features into logical categories:
- Work with data:
- Projects: Access Worksheets, Notebooks, Streamlit apps, and Dashboards
- Ingestion: Load data using connectors and tools
- Transformation: Manage dynamic tables and tasks
- AI & ML: Use Cortex AI and ML features
- Monitoring: View query history and performance
- Horizon Catalog:
- Catalog: Browse databases and data products (replaces the old “Data” menu)
- Data Sharing: Share data across accounts
- Governance & Security: Manage access controls
- Manage:
- Compute: Manage warehouses and resource monitors
- Admin: Configure users, roles, and account settings
Most analytical work happens in Projects > Worksheets, where SQL queries are written and executed.
Understanding worksheet context
Each worksheet requires four context settings before running queries. These appear as dropdowns in the worksheet toolbar:
- Role: Controls access permissions (e.g.,
ACCOUNTADMIN,SYSADMIN,PUBLIC) - Warehouse: Determines which compute cluster executes queries
- Database: Sets the default database
- Schema: Sets the default schema within the database
Setting proper context prevents “object does not exist” errors.
Creating a virtual warehouse
Warehouses provide compute power for queries. Create one through the UI or SQL:
UI Method:
- Navigate to Manage > Compute
- Select + Warehouse
- Enter warehouse name and configure settings
- Click Create Warehouse
This creates a Small warehouse that suspends after 5 minutes of inactivity and resumes automatically when queries arrive.
Auto-suspend and auto-resume
- Auto-suspend: Automatically stops the warehouse after idle time. Recommended setting: 300-600 seconds (5-10 minutes). Suspended warehouses consume no credits.
- Auto-resume: Automatically starts suspended warehouses when queries arrive. Keep this enabled for seamless operation.
These settings control costs while maintaining availability. Data remains accessible even when warehouses are suspended.
With your Snowflake data analytics environment configured, the next step is loading data into Snowflake.
Loading data for Snowflake data analytics
Snowflake offers multiple approaches for loading data. The Snowsight interface provides a streamlined process through the Ingestion menu, while SQL commands offer programmatic control. This section demonstrates both methods using sales transaction data.
Creating database and schema
Start by organizing data with a proper database structure:
CREATE DATABASE sales_data;CREATE SCHEMA sales_data.raw;
Note: Create the database from Catalog > Databases first then run the queries.
Using the load data wizard
Snowsight provides a guided interface for loading files directly into tables.
Step 1: Access the Load Data interface
- Navigate to Ingestion > Add Data in the left menu. Select Load data into a Table. The Load Data wizard appears.
Step 2: Select or create a target table
- Choose an existing table or create a new one. For new tables:
- Enter table name
- Select database and schema
- Let Snowflake infer schema from the file
- Choose an existing table or create a new one. For new tables:
Step 3: Upload files
- Add files using one of these methods:
- Drag and drop files directly
- Click Browse to select from the local system
- Select Add from stage for files already staged
- Add files using one of these methods:
The interface accepts up to 250 files at once, with each file up to 250 MB.

Step 4: Configure file format
- Snowflake automatically detects file format using
INFER_SCHEMA. Review and adjust:- File type (CSV, JSON, Parquet, etc.)
- Delimiter settings
- Header row handling
- Date and time formats
- Snowflake automatically detects file format using
Step 5: Set error handling
- Choose what happens when errors occur:
- Abort statement: Stop loading if any error occurs (default)
- Continue: Skip bad rows, load the rest
- Skip file: Abandon entire file on error
- Choose what happens when errors occur:
Step 6: Select loading method
- Append: Add new data to the existing table
- Replace: Remove existing data, insert new data
Step 7: Execute load
- Click Load to start the process. Snowflake displays:
- Number of files processed
- Rows parsed
- Rows loaded
- Any errors encountered
- Click Load to start the process. Snowflake displays:
Loading data with SQL
For automated workflows or larger datasets, SQL provides more control.
Create table structure:
CREATE TABLE sales_data.raw.transactions (transaction_id INTEGER,transaction_date DATE,customer_id INTEGER,product_name VARCHAR(100),quantity INTEGER,unit_price DECIMAL(10,2));
Create internal stage:
CREATE STAGE sales_data.raw.csv_stage;
Upload file to stage (using SnowSQL command line):
PUT file:///path/to/sales_data.csv @sales_data.raw.csv_stage;
Create file format:
CREATE FILE FORMAT sales_data.raw.csv_format TYPE = 'CSV' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"';
Execute COPY command:
COPY INTO sales_data.raw.transactions FROM @sales_data.raw.csv_stage FILE_FORMAT = (FORMAT_NAME = 'sales_data.raw.csv_format') ON_ERROR = 'CONTINUE';
This loads data from staged files into the table, skipping problematic rows.
Verifying loaded data
Confirm successful loading:
-- Check row countSELECT COUNT(*) FROM sales_data.raw.transactions;-- View sample dataSELECT * FROM sales_data.raw.transactions LIMIT 10;
Now that the data is loaded, let’s see how to extract insights using Snowflake data analytics queries.
Running analytics queries in Snowflake
Snowflake data analytics capabilities shine through SQL queries executed in Worksheets. Navigate to Projects > Worksheets to access the query editor. This section demonstrates common analytical patterns for extracting insights from data.
Basic aggregations
Aggregation functions summarize data across rows.
- Calculate sales metrics:
SELECT COUNT(*) AS total_transactions, SUM(unit_price * quantity) AS total_revenue, AVG(unit_price * quantity) AS average_order_value FROM sales_data.raw.transactions;
- Group data to see patterns:
SELECT product_name, SUM(quantity) AS units_sold, SUM(unit_price * quantity) AS product_revenue FROM sales_data.raw.transactions GROUP BY product_name ORDER BY product_revenue DESC;
This reveals top-performing products by revenue.
Time-based analytics
Analyze trends over time using date functions:
SELECT DATE_TRUNC('MONTH', transaction_date) AS month, COUNT(*) AS monthly_transactions, SUM(unit_price * quantity) AS monthly_revenue FROM sales_data.raw.transactions GROUP BY month ORDER BY month;
DATE_TRUNC groups dates into monthly periods, revealing seasonal patterns and growth trends.
Window functions
Window functions perform calculations across related rows without collapsing results. Calculate running totals:
SELECT transaction_date, unit_price * quantity AS daily_revenue, SUM(unit_price * quantity) OVER (ORDER BY transaction_date) AS cumulative_revenue FROM sales_data.raw.transactions ORDER BY transaction_date;
The OVER clause defines the calculation window. This shows revenue accumulation over time.
Ranking analysis
Identify top performers with ranking functions:
SELECT product_name, SUM(unit_price * quantity) AS revenue, RANK() OVER (ORDER BY SUM(unit_price * quantity) DESC) AS revenue_rank FROM sales_data.raw.transactions GROUP BY product_name;
RANK() assigns positions based on revenue values.
Month-over-month growth
Compare performance across time periods:
SELECT DATE_TRUNC('MONTH', transaction_date) AS month, SUM(unit_price * quantity) AS current_revenue, LAG(SUM(unit_price * quantity)) OVER (ORDER BY DATE_TRUNC('MONTH', transaction_date)) AS previous_revenue, ((SUM(unit_price * quantity) - LAG(SUM(unit_price * quantity)) OVER (ORDER BY DATE_TRUNC('MONTH', transaction_date))) / LAG(SUM(unit_price * quantity)) OVER (ORDER BY DATE_TRUNC('MONTH', transaction_date))) * 100 AS growth_percent FROM sales_data.raw.transactions GROUP BY month ORDER BY month;
LAG() accesses previous row values, enabling period-over-period comparisons.
Joining tables
Combine multiple data sources for comprehensive analysis. When working with multiple related tables, joins connect data through common keys.
Here’s a conceptual example showing how to analyze customer lifetime value (note: this requires a customers table):
-- Example structure (for reference only) -- This assumes you have created a customers table with customer detailsSELECT c.customer_name, COUNT(t.transaction_id) AS total_purchases, SUM(t.unit_price * t.quantity) AS lifetime_value FROM sales_data.raw.transactions t JOIN sales_data.raw.customers c ON t.customer_id = c.customer_id GROUP BY c.customer_name ORDER BY lifetime_value DESC LIMIT 10;
For a working example with just the transactions table, analyze customer purchase patterns:
SELECT customer_id, COUNT(transaction_id) AS total_purchases, SUM(unit_price * quantity) AS total_spent, AVG(unit_price * quantity) AS avg_order_value FROM sales_data.raw.transactions GROUP BY customer_id ORDER BY total_spent DESC LIMIT 10;
This identifies highest-value customers based on total spending.
These query patterns form the foundation for snowflake data analytics work. Worksheets in Snowsight support collaboration through sharing, folder organization, and query history tracking. Results can be visualized in charts or exported to dashboards for ongoing monitoring.
The platform handles complex analytical queries efficiently, making it suitable for organizations of any size.
Conclusion
This Snowflake tutorial covered the essential concepts for getting started with Snowflake data analytics:
- Cloud-native platform separating storage and compute for independent scaling
- Three-layer architecture: storage, compute (virtual warehouses), and services
- Snowsight interface with Projects, Catalog, and Ingestion navigation
- Virtual warehouses for query processing with flexible sizing options
- Data loading through the UI wizard
- SQL-based analytics, including aggregations, window functions, and time-based analysis
- Multiple data loading methods: manual uploads, Snowpipe, and Partner Connect integrations
The platform provides a complete environment for storing, processing, and analyzing data without infrastructure management.
If you want to learn more about Snowflake and gain hands-on experience, check out the Data Analytics with Snowflake skill path. It provides structured learning to build practical expertise with the platform.
Frequently asked questions
1. Is Snowflake useful for data analysts?
Yes. Snowflake uses standard SQL that most analysts already know. The Snowsight interface provides an accessible environment for writing queries, creating reports, and analyzing data without requiring infrastructure management or programming skills beyond SQL.
2. Is Snowflake used for ETL?
Snowflake works as both an ETL destination and processing engine. Data can be loaded into Snowflake and transformed using SQL. Many organizations use ELT (Extract, Load, Transform) instead, where raw data loads first and transforms within Snowflake using tools like dbt.
3. Does Snowflake only use SQL?
SQL is the primary interface, but Snowflake supports additional languages through Snowpark. Python, Java, and Scala code can run directly within Snowflake for data processing and machine learning. REST APIs enable programmatic access, and the platform integrates with various BI tools.
4. What is Snowflake used for?
Snowflake serves as a centralized data warehouse consolidating information from multiple sources. Organizations use it for business intelligence reporting, data analysis, machine learning, and secure data sharing between business units or external partners without copying data.
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
What is a Database? Complete Guide
Learn what databases are, explore different types like SQL and NoSQL, and understand key components. Complete guide for beginners. - Article
SQL Execution Order Explained: How Queries Run Step-by-Step
Explore the SQL execution order step-by-step and understand how queries run behind the scenes. Learn how each step influences your results for optimal performance. - Article
What is a Relational Database (RDBMS)?
Learn what relational databases are, how SQL is used in RDBMS, and explore the key components like tables, keys, and relationships with examples.
Learn more on Codecademy
- Learn Data Analytics with Snowflake covering setup, query execution, data loading, cloud integration, performance optimization and security management.
- Includes 47 Courses
- With Certificate
- Intermediate.26 hours
- Learn Advanced Snowflake covering data transformation, performance optimization, Snowpark implementation, machine learning techniques and data governance strategies.
- Includes 41 Courses
- With Certificate
- Advanced.42 hours
- In this SQL course, you'll learn how to manage large datasets and analyze real data using the standard data management language.
- Beginner Friendly.5 hours