Libraries: If you want this lab, consider one of these libraries.
Data

Analyze Sales Data in DuckDB

A hands-on Code Lab demonstrating how quickly DuckDB can be installed and used to consolidate and analyze a realistic sales dataset (Orders, Order Line Items, Products, Customers). Learners install DuckDB in seconds, then query CSV, Excel, and Parquet files — and even remote HTTPS lookups — together in a single SQL workflow. The lab focuses on answering common business questions (revenue trends, top products, customer activity) while practicing lightweight validation, type-correcting, joins across heterogeneous sources, and materializing curated summaries.

Get started Contact sales

Lab Info

Level

Intermediate

Last updated

May 07, 2026

Duration

31m

Challenge

Step 1: Introduction
Overview

This lab explores the variety of ways you can query data using DuckDB. Rather than focusing on a single format or source, you'll work across CSV, Excel, Parquet, local files, HTTP endpoints, and cloud object storage, using the same familiar SQL syntax.

DuckDB meets your data where it lives, so you won't need to move or transform files before querying them.

info> Note: This lab uses DuckDB v1.2.2.

Working in This Lab

For a smoother workflow, use scratch_pad.sql in the file tree to write and refine your SQL queries.

Prerequisites

Before starting this lab, you should have:
- SQL fundamentals: Experience writing basic queries (SELECT, WHERE, JOIN, GROUP BY, ORDER BY)
- Data concepts: Familiarity with tabular data (rows, columns, primary/foreign keys) and common data types (integers, strings, dates).
- Command-line skills: Comfort navigating a terminal.
- File formats: Prior exposure to CSV and Excel file formats.
- DuckDB: No prior experience necessary; the lab teaches DuckDB from the ground up.
Challenge

Step 2: Start DuckDB
Install DuckDB

Goal: Start the DuckDB CLI and load the extensions required to query all data sources used in this lab.

DuckDB extensions add support for data sources and formats that are not built into the core engine. In this lab, you’ll use extensions to read remote files, Excel workbooks, and cloud-style object storage.
1. Activate the DuckDB CLI from the command line:
  
  duckdb
2. Install and load the required extensions inside DuckDB:
  
  -- Remote file access over HTTPS INSTALL httpfs; LOAD httpfs; -- Excel reading via st_read() INSTALL spatial; LOAD spatial; -- Alternative Excel reader (DuckDB 1.x+) INSTALL excel; LOAD excel;
3. Confirm all three extensions are active before moving on:
  
  SELECT extension_name, loaded, installed FROM duckdb_extensions() WHERE extension_name IN ('httpfs', 'spatial', 'excel');
  
  Result
  When loaded correctly the select query should return:
  | extension_name | loaded | installed | |---|---|---| | httpfs | true | true | | spatial | true | true | | excel | true | true |
Challenge

Step 3: Query Raw Files
Inspect the Data Sources

Goal: Understand what data files exist, where they live, and what each source contains.

Note: If your terminal does not already have a DuckDB session, type duckdb to start one.
1. Preview the accounts CSV file using read_csv_auto:
  
  SELECT * FROM read_csv_auto ('data/salesforce/accounts.csv') LIMIT 5;
  
  Result
  Your results should look like this:
  | account_id | account_name | industry | created_date | |---|---|---|---| | A00001 | Account 1 | Retail | 2021-03-09 | | A00002 | Account 2 | Finance | 2022-03-13 | | A00003 | Account 3 | Retail | 2020-10-01 | | A00004 | Account 4 | Finance | 2022-08-01 | | A00005 | Account 5 | Finance | 2020-03-13 |
2. Preview the opportunities CSV file using read_csv_auto:
  
  SELECT * FROM read_csv_auto ('data/salesforce/opportunities.csv') LIMIT 5;
  
  Result
  Your results should look like this:
  | id | account_id | amount | close_date | stage | |---|---|---|---|---| | O0000001 | A00387 | 194560.19 | 2021-10-17 | Closed Lost | | O0000002 | A01794 | 17925.62 | 2021-01-09 | Closed Lost | | O0000003 | A03871 | 182264.19 | 2022-12-04 | Proposal | | O0000004 | A02510 | 197361.29 | 2020-05-17 | Negotiation | | O0000005 | A00385 | 151152.83 | 2023-10-02 | Closed Won |
3. Preview orders Excel sources using st_read:
  
  SELECT * FROM st_read ('data/oltp/orders.xlsx') LIMIT 5;
  
  Result
  Your results should look like this:
  | order_id | account_id | order_total | order_date | |---|---|---|---| | 1000000 | A02164 | 3293.55 | 2021-06-17 00:00:00 | | 1000001 | A02533 | 1738.11 | 2021-02-15 00:00:00 | | 1000002 | A01857 | 0.0 | 2024-02-18 00:00:00 | | 1000003 | A02719 | 21417.56 | 2022-08-12 00:00:00 | | 1000004 | A03232 | 4770.22 | 2023-12-05 00:00:00 |
4. Preview inventory Excel sources using st_read:
  
  SELECT * FROM st_read ('data/oltp/inventory.xlsx') LIMIT 5;
  
  Result
  Your results should look like this:
  | sku | description | quantity | last_stocked | |---|---|---|---| | P000001 | Product 1 | 913 | 2023-04-29 00:00:00 | | P000002 | Product 2 | 133 | 2023-09-25 00:00:00 | | P000003 | Product 3 | 582 | 2024-01-24 00:00:00 | | P000004 | Product 4 | 680 | 2023-07-10 00:00:00 | | P000005 | Product 5 | 451 | 2023-05-25 00:00:00 |
Create and Query a DuckDB Table

Goal: Verify DuckDB's inferred schema and confirm data types are correct—acting as a gatekeeping check to ensure inferred types won't cause silent data issues downstream.
1. Materialize a CSV source to a temporary table:
  
  CREATE TEMP TABLE t_accounts AS SELECT * FROM read_csv_auto ('data/salesforce/accounts.csv');
2. Inspect the inferred schema with PRAGMA table_info:
  
  PRAGMA table_info('t_accounts');
3. Use typeof() to inspect column types on sample rows:
  
  SELECT typeof(account_id), typeof(created_date), typeof(industry) FROM t_accounts LIMIT 5;
  
  Result
  You should see the following:
  | typeof(account_id) | typeof(created_date) | typeof(industry) | |---|---|---| | VARCHAR | DATE | VARCHAR | | VARCHAR | DATE | VARCHAR | | VARCHAR | DATE | VARCHAR | | VARCHAR | DATE | VARCHAR | | VARCHAR | DATE | VARCHAR |

Challenge

Step 4: Create and Query DuckDB Tables

Create and Query DuckDB Tables

Goal: Load each raw source into a table so the rest of the lab can reference a clean name instead of a file path.

Direct file queries are useful for quick inspection. Creating tables gives you stable names to use in joins, reports, and repeated analysis.

Create a table for each source. In a Terminal with an active DuckDB session, run the following commands:

INSTALL spatial;
LOAD spatial;

CREATE TABLE accounts AS 
  SELECT * 
  FROM read_csv_auto
  ('data/salesforce/accounts.csv');

CREATE TABLE opportunities AS 
  SELECT * 
  FROM read_csv_auto 
  ('data/salesforce/opportunities.csv');

CREATE TABLE orders AS 
  SELECT * 
  FROM st_read
  ('data/oltp/orders.xlsx');

CREATE TABLE inventory AS 
  SELECT * 
  FROM st_read
  ('data/oltp/inventory.xlsx');

Ensure all tables are accessible in the same DuckDB session:
```
SHOW TABLES;
```

Confirm all tables are registered and return rows:

SELECT 
  'orders' AS table_name,
  COUNT(*) AS row_count 
FROM orders

UNION ALL

SELECT 
  'accounts' AS table_name,
  COUNT(*) AS row_count 
FROM accounts

UNION ALL

SELECT 
  'inventory' AS table_name,
  COUNT(*) AS row_count 
FROM inventory

UNION ALL

SELECT 
  'opportunities' AS table_name,
  COUNT(*) AS row_count 
FROM opportunities;

Results

You should see the following results:

| table_name | row_count | |---|---| | orders | 50000 | | accounts | 5000 | | inventory | 500 | | opportunities | 15000 |

Challenge

Step 5: Write Your First Report
Write Your First Report

Goal: Create a CSV file from a DuckDB query result.

You can use DuckDB to output query results directly to CSV, Parquet, or Excel files using the following syntax:
```
COPY(
  QUERY
) 
TO 'File.csv'
```
The COPY ... TO command writes query results to a file, making it useful for reports, exports, and downstream analysis.

Note: By default, DuckDB writes files to the current directory where the DuckDB session starts. In this lab, that directory is /home/ps-user/workspace/.
Challenge

Step 6: Persist Aggregations to File

Persist Aggregations to File

Goal: Save curated results so they can be reused or shared without rerunning raw source queries.

Aggregating data into a summary table gives you a reusable result set for reporting and downstream analysis. Instead of rerunning joins against the raw source tables, you can query the curated table directly.
Challenge

Step 7: Access Remote Data with DuckDB
Access Remote Data with DuckDB

Goal: Use DuckDB's httpfs extension to read data from an HTTP endpoint hosted by a Flask sample application and from object storage.

With httpfs, DuckDB can query remote files using the same SQL patterns you used for local files.
### Conclusion
In this lab you learned:
- Installing and activating DuckDB extensions - powerful abstractions that unlock extra functionality, such as querying HTTP endpoints and S3 storage.
- Querying multiple file types and writing results to dedicated files - one of DuckDB's key advantages: you can start querying raw files with minimal setup or infrastructure.
- Reading raw files into persisted DuckDB tables - stored inside a .duckdb database file.
- Querying S3 objects and HTTPS-hosted files directly from DuckDB.
If you enjoyed this content, consider exploring these DuckDB projects and features:
- Community Extensions - A list of opensource community developed extensions for DuckDB.
- DuckDB WASM - A live, in-browser example of DuckDB running via WebAssembly.
- DuckLake - DuckDB's lakehouse extension.

About the author

Ian Fogelman

Ian is a database enthusiast with particular expertise in the Microsoft data stack. Other technologies that Ian has expert level experience with include ETL, Python development, cloud architecture and systems automation. Ian holds the Microsoft Certified Solutions Expert (MCSE) for data management and analytics as well as the AWS Solution Architect associate certifications. Ian also writes for multiple technical blogs including SuperFunSQL.com which he founded in 2019.

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Analyze Sales Data in DuckDB

Lab Info

Table of Contents

Step 1: Introduction

Overview

Working in This Lab

Prerequisites

Step 2: Start DuckDB

Install DuckDB

Step 3: Query Raw Files

Inspect the Data Sources

Create and Query a DuckDB Table

Step 4: Create and Query DuckDB Tables

Create and Query DuckDB Tables

Step 5: Write Your First Report

Write Your First Report

Step 6: Persist Aggregations to File

Persist Aggregations to File

Step 7: Access Remote Data with DuckDB

Access Remote Data with DuckDB

About the author

Real skill practice before real-world application

Learn by doing

Follow your guide

Turn time into mastery

Get started with Pluralsight