- Lab
-
Libraries: If you want this lab, consider one of these libraries.
- Data
Analyze Trends with Aggregates in T-SQL
In this Code Lab, you will perform advanced data aggregation and analysis in SQL Server using only executable T-SQL queries. You will start by practicing core aggregate functions such as COUNT, SUM, AVG, MIN, and MAX to summarize data. Next, you will group your data using GROUP BY and filter aggregated results with the HAVING clause. Then, you will combine grouping, filtering, and sorting to create targeted reports. After that, you will work with window functions to calculate running totals and rankings while preserving row-level detail. Finally, you will build a multi-step query using Common Table Expressions (CTEs) to uncover top-performing products within underperforming regions.
Lab Info
Table of Contents
-
Challenge
Step 1: Foundations of Aggregation
Step 1: Foundations of Aggregation
Raw datasets can contain thousands of rows, which makes them difficult to interpret at a glance. Aggregate functions provide a way to summarize this data into meaningful insights. Instead of analyzing individual rows, you can calculate totals, averages, minimums, maximums, or distinct counts.
In this step, you will practice using core aggregate functions against the
dbo.SalesOrderHeadertable. These functions will allow you to answer essential questions such as: How many rows are in a table? How many orders have a shipping date? How much revenue has been generated? What is the average order value?
By the end of this step, you’ll be comfortable using aggregate functions to summarize raw sales data into high-level metrics.
What You’ll Learn in This Step
You’ll learn how to:
- Use
COUNT(*)to return the total number of rows in a table - Use
COUNT(column)to exclude nulls from a count - Use
COUNT(DISTINCT column)to measure unique values - Use
SUM()andAVG()to calculate totals and averages - Use
MIN()andMAX()to identify boundary values in a column
info> You will write all of your T-SQL answers directly in the
1-step-one.sqlfile provided in this environment.When you are ready to run the query, simply press Run in the Terminal tab.
This will automatically execute the contents of1-step-one.sqlagainst the database and display the results.
You do not need to type any commands in the terminal manually.
Optional: How to Log Into the Database Manually
You don’t need to do this for this lab, as it will not be able to be validated. However, if you’d like to explore connecting to the database directly using the
mssql-cli, here’s how to do it using thesqlcmdtool.
Database Login and Basics
To connect to the SQL Server manually, run this command in the terminal:
sqlcmd -S localhost -U SA -P 'P@ssword1'
#### What Happens Next-
Once connected, you’ll see a prompt that looks like this:
1>This means SQL Server is ready to accept commands.
-
Every query you type must end with
GOon its own line. Example:SELECT @@VERSION; GO -
To exit, type:
QUIT
Again, this is purely optional and not needed to complete the tasks, it’s just here if you’d like to explore running SQL queries directly.
### UnderstandingCOUNT(*): Counting All RowsThe
COUNT()function in T-SQL is used to return the number of rows in a result set. When used with an asterisk,COUNT(*), it counts every row in the table, regardless of whether any columns contain null values.This function is the simplest way to measure table size and verify that rows exist. It can be used with or without a
WHEREclause.Syntax
SELECT COUNT(*) FROM table_name; GOCOUNT(*), counts all rows in the tableFROM table_name, specifies the table to queryGO, signals the end of the batch for execution in SQL Server
In this task, you’ll use
COUNT(*)against thedbo.SalesOrderHeadertable to determine the total number of orders in the database. ### UsingCOUNT(column): Count Only Non-NULL ValuesThe
COUNT()function can count all rows or only those with a non-null value in a specific column. When written asCOUNT(column_name), SQL Server ignores rows wherecolumn_nameisNULL, and counts only rows with actual values.Syntax
SELECT COUNT(column_name) FROM table_name; GOCOUNT(column_name), returns the number of rows where the column is not nullGO, ends the batch for execution in SQL Server
In this task, you will use the
ShipDatefield fromdbo.SalesOrderHeader.ShipDateis aDATETIMEcolumn, and some orders may have a null value if a shipment has not been recorded. UsingCOUNT(ShipDate)returns the count of orders that have a non-null shipping date. ### UsingCOUNT(DISTINCT column): Counting Unique ValuesThe
DISTINCTkeyword removes duplicate values from a result set. When combined with theCOUNT()function, it allows you to count the number of unique, non-null values in a column.This pattern is helpful when you need to measure uniqueness, such as the number of unique customers or territories represented in a dataset.
Syntax
SELECT COUNT(DISTINCT column_name) FROM table_name; GODISTINCT column_name, ensures duplicate values are eliminated before countingCOUNT(...), counts the distinct, non-null valuesGO, ends the batch for execution in SQL Server
In this task, you will use
COUNT(DISTINCT CustomerID)on thedbo.SalesOrderHeadertable. TheCustomerIDcolumn is anINTrepresenting the customer who placed each order. Using this query shows how many unique customers exist in the data. ### UsingSUM(): Calculating TotalsThe
SUM()function in T-SQL returns the total of all non-null values in a numeric column. It adds together the values row by row and produces a single aggregated result.This is most often used for calculating revenue, quantities, or any cumulative metric.
Syntax
SELECT SUM(column_name) FROM table_name; GOSUM(column_name), returns the total of all non-null values in the columnGO, marks the end of the batch in SQL Server
In this task, you will use
SUM(TotalDue)on thedbo.SalesOrderHeadertable. TheTotalDuecolumn is aDECIMALfield representing the total amount due for each order. Summing this column calculates the overall revenue across all orders in the dataset. ### UsingAVG(): Calculating AveragesThe
AVG()function in T-SQL calculates the arithmetic mean of all non-null values in a numeric column. It adds all the values together, then divides the total by the number of non-null entries.This function is used to measure typical values, such as the average transaction size or the mean quantity sold.
Syntax
SELECT AVG(column_name) FROM table_name; GOAVG(column_name), calculates the mean value of the column, ignoring nullsGO, ends the batch for execution in SQL Server
In this task, you will use
AVG(TotalDue)on thedbo.SalesOrderHeadertable. TheTotalDuecolumn is aDECIMALthat stores the total amount due for each order. ApplyingAVG()will return the average order value across all orders. ### UsingMIN()andMAX(): Finding Boundary ValuesThe
MIN()andMAX()functions in T-SQL return the lowest and highest non-null values from a column, respectively. They are often used on numeric or date columns to find boundaries in the data.Both functions return a single value, and they can be combined in the same query to report both extremes.
Syntax
SELECT MIN(column_name), MAX(column_name) FROM table_name; GOMIN(column_name), returns the smallest non-null valueMAX(column_name), returns the largest non-null valueGO, ends the batch for execution in SQL Server
In this task, you will use
MIN()andMAX()on theOrderDatecolumn of thedbo.SalesOrderHeadertable.OrderDateis aDATETIMEfield that stores when each order was placed. By applying these functions, you can identify the earliest and latest order dates in the dataset. - Use
-
Challenge
Step 2: Dynamic Summaries with Conditional Logic
Step 2: Conditional Aggregation with CASE
Sometimes, not all rows in a dataset should be treated the same. Instead of returning a single overall total, you may need to separate values into categories and calculate different results depending on conditions. T-SQL provides this flexibility through the
CASEexpression.By embedding
CASEinside aggregate functions likeSUM()orCOUNT(), you can build conditional aggregations. This allows you to calculate totals for categories such as sales in different territories or counts of rows that meet certain criteria, all in one query.
In this step, you’ll work with conditional aggregation to:
- Use
CASEexpressions inside aggregate functions - Write conditional
SUM()queries - Write conditional
COUNT()queries - Compare subtotals across different categories
By the end of this step, you will be able to apply logic to aggregations and create queries that separate data into meaningful conditional summaries.
What You’ll Learn in This Step
You’ll learn how to:
- Write
CASEexpressions insideSUM()andCOUNT() - Apply conditional logic to aggregate calculations
- Return multiple category-based results in a single query
info> You will write all of your T-SQL answers directly in the
2-step-two.sqlfile provided in this environment.When you are ready to run the query, simply press Run in the Terminal tab.
This will automatically execute the contents of2-step-two.sqlagainst the database and display the results.
You do not need to type any commands in the terminal manually.
### UsingCASEwithSUM(): Conditional TotalsThe
CASEexpression in T-SQL provides row-level conditional logic, similar to anIFstatement. When used inside an aggregate function such asSUM(), it allows you to total values only for rows that meet specific conditions.This makes it possible to calculate multiple subtotals in a single query, without having to write separate queries for each condition.
Syntax
SELECT SUM(CASE WHEN condition THEN value ELSE 0 END) FROM table_name; GOCASE WHEN condition THEN value ELSE 0 END, evaluates each row and returns the chosen value if the condition is true, otherwise returns 0SUM(...), adds together all returned valuesGO, ends the batch for execution
In this task, you will compare sales across regions by using
SUM()withCASE. Thedbo.SalesOrderHeadertable containsTotalDuevalues, and it links to thedbo.SalesTerritorytable throughTerritoryID. By joining these tables, you can calculate total revenue forNorth AmericaandEuropein the same query. ### UsingCASEwithCOUNT(): Conditional CountsThe
COUNT()function can be combined with aCASEexpression to count only the rows that meet a specific condition. By returning a non-null value inside theCASEexpression, the row will be included in the count. If theCASEreturnsNULL, the row will not be counted.This allows you to create multiple conditional counts in a single query.
Syntax
SELECT COUNT(CASE WHEN condition THEN 1 END), COUNT(CASE WHEN other_condition THEN 1 END) FROM table_name; GOCASE WHEN condition THEN 1 END, returns 1 for matching rows and null otherwiseCOUNT(...), counts all non-null values, effectively counting only the rows that meet the conditionGO, ends the batch for execution
In this task, you will use the
DiscountAmountcolumn in thedbo.SalesOrderDetailtable to count order lines that include a discount and order lines that do not. - Use
-
Challenge
Step 3: Multi-level Grouping and Reporting Structures
Step 3: Multi-level Grouping and Reporting Structures
When you use
GROUP BY, aggregate functions likeSUM()orCOUNT()return totals for each group. However, in many cases, you need to filter those grouped results after aggregation. This is done with theHAVINGclause.The
HAVINGclause allows you to filter entire groups based on aggregate conditions, such as keeping only territories with revenue above a threshold or categories that meet a minimum number of orders. When combined withWHERE, you can apply conditions both before and after aggregation.
In this step, you’ll work with grouped queries to:
- Filter results using
HAVING - Apply thresholds to totals and counts
- Combine
WHEREandHAVINGto manage data at both the row level and the group level
By the end of this step, you’ll be able to create multi-level filtering logic that produces reports focused only on meaningful groups.
What You’ll Learn in This Step
You’ll learn how to:
- Filter grouped results using
HAVING - Apply conditions to aggregates like totals and counts
- Combine
WHEREandHAVINGfor more precise reporting
info> You will write all of your T-SQL answers directly in the
3-step-three.sqlfile provided in this environment.When you are ready to run the query, simply press Run in the Terminal tab.
This will automatically execute the contents of3-step-three.sqlagainst the database and display the results.
You do not need to type any commands in the terminal manually.
### Filtering Groups withHAVINGThe
HAVINGclause filters groups after aggregation is applied. WhileWHEREremoves rows before grouping,HAVINGis used to remove entire groups that do not meet specified aggregate conditions.This makes it possible to show only those groups that exceed a threshold.
Syntax
SELECT column_name, AGGREGATE_FUNCTION(other_column) FROM table_name GROUP BY column_name HAVING AGGREGATE_FUNCTION(other_column) condition; GOGROUP BY, creates groups for aggregationHAVING, applies conditions to the aggregate resultsGO, ends the batch for execution
In this task, you will calculate sales totals per territory, then filter so only territories where total sales are greater than 1,000,000 are returned. ### Combining
WHEREandHAVINGWHEREandHAVINGcan be used together.WHEREfilters rows before they are groupedHAVINGfilters the groups after aggregation
This allows precise filtering at both the row and group level.
Syntax
SELECT column_name, AGGREGATE_FUNCTION(other_column) FROM table_name WHERE row_condition GROUP BY column_name HAVING AGGREGATE_FUNCTION(other_column) condition; GOIn this task, you will use
WHEREto restrict rows by year, then useHAVINGto return only those groups with sufficient order counts. - Filter results using
-
Challenge
Step 4: Post-aggregation Filtering Techniques
Step 4: Post-aggregation Filtering Techniques
Once you’ve grouped and aggregated data, the next step is to create polished reports that highlight meaningful results. This often requires applying additional filtering and ordering so the output is focused, relevant, and easy to interpret.
The
HAVINGclause can filter aggregated groups, while theORDER BYclause can sort results in ascending or descending order. By combining these withTOP, you can highlight the most important categories or territories.
In this step, you’ll refine grouped queries to:
- Filter categories by sales totals using
HAVING - Sort results in descending order by revenue
- Use
TOPwithORDER BYto display the highest-performing groups
By the end of this step, you’ll be able to produce polished, ordered reports that highlight key trends across categories and territories.
What You’ll Learn in This Step
You’ll learn how to:
- Apply
HAVINGto filter grouped results - Use
ORDER BYto sort aggregated data - Combine
TOPwithORDER BYto highlight top results
info> You will write all of your T-SQL answers directly in the
4-step-four.sqlfile provided in this environment.When you are ready to run the query, simply press Run in the Terminal tab.
This will automatically execute the contents of4-step-four.sqlagainst the database and display the results.
You do not need to type any commands in the terminal manually.
### CombiningGROUP BY,HAVING, andORDER BYAfter grouping data, you can apply filtering and sorting to refine your results.
GROUP BYgroups rows into categories.HAVINGfilters out groups that do not meet a threshold.ORDER BYsorts the aggregated results, usually in descending order to show the highest values first.
These three clauses are often used together to create ranked and focused reports.
Syntax
SELECT group_column, AGGREGATE_FUNCTION(value_column) FROM table_name GROUP BY group_column HAVING AGGREGATE_FUNCTION(value_column) condition ORDER BY AGGREGATE_FUNCTION(value_column) DESC; GOIn this task, you will group sales by product category, filter to include only categories with total sales greater than 200,000, and then order the results in descending order of sales totals. ### Using
TOPwithORDER BYfor Ranked ResultsThe
TOPkeyword limits the number of rows returned by a query. When combined withORDER BY, it allows you to highlight the top results based on a specific metric.This is especially useful for ranking scenarios, such as displaying the top-performing territories by revenue.
Syntax
SELECT TOP (n) group_column, AGGREGATE_FUNCTION(value_column) FROM table_name GROUP BY group_column ORDER BY AGGREGATE_FUNCTION(value_column) DESC; GOTOP (n), limits results to the firstnrowsGROUP BY, groups data by the specified columnORDER BY ... DESC, sorts results from highest to lowestGO, ends the batch for execution
In this task, you will calculate sales totals by territory, then return only the top 5 highest-revenue territories using
TOPandORDER BY. - Filter categories by sales totals using
-
Challenge
Step 5: Window Functions for Comparative Analysis
Step 5: Window Functions for Comparative Analysis
Aggregate functions summarize data across groups, but they remove row-level detail. Window functions extend this idea by applying aggregate and ranking logic across partitions of rows while still returning individual rows. This allows you to compare each row against group totals, assign rankings, or calculate running values.
Window functions are written with an
OVER()clause, which can include:ORDER BYto define row orderPARTITION BYto divide rows into groups
In this step, you’ll use window functions to:
- Assign row numbers within groups
- Rank products by sales totals
- Calculate running totals across all orders
- Reset running totals by partitioning data by year
By the end of this step, you’ll be able to apply ranking, ordering, and cumulative calculations that preserve row-level detail while incorporating group metrics.
What You’ll Learn in This Step
You’ll learn how to:
- Use
ROW_NUMBER()withPARTITION BYandORDER BY - Apply
RANK()to assign rankings within partitions - Calculate cumulative totals with
SUM() OVER(ORDER BY ...) - Reset running totals using
PARTITION BYin window functions
info> You will write all of your T-SQL answers directly in the
5-step-five.sqlfile provided in this environment.When you are ready to run the query, simply press Run in the Terminal tab.
This will automatically execute the contents of5-step-five.sqlagainst the database and display the results.
You do not need to type any commands in the terminal manually.
### UsingROW_NUMBER()withPARTITION BYThe
ROW_NUMBER()function assigns a unique sequential number to each row within a partition. The numbering is determined by theORDER BYclause insideOVER().This function is useful for creating ordered lists, identifying top records, or referencing row positions within a group.
Syntax
SELECT ROW_NUMBER() OVER(PARTITION BY column_name ORDER BY value_column DESC) AS RowNum FROM table_name; GOROW_NUMBER(), assigns sequential numbers starting from 1PARTITION BY, divides rows into groups (the numbering restarts for each group)ORDER BY, defines the order of rows within each groupGO, ends the batch for execution
In this task, you will assign row numbers to orders within each territory. The results will be partitioned by
TerritoryIDand ordered byTotalDuein descending order. ### UsingRANK()withPARTITION BYThe
RANK()function assigns a ranking number to rows within a partition, based on the specifiedORDER BY. Rows with the same value receive the same rank, but the next rank will be skipped. For example, if two rows share rank 1, the following row will be ranked 3.This makes
RANK()ideal for comparing items within categories, such as ranking products by sales within each territory.Syntax
SELECT column_name, RANK() OVER(PARTITION BY group_column ORDER BY value_column DESC) AS Rank FROM table_name; GORANK(), assigns ranking values starting at 1PARTITION BY, restarts the ranking for each groupORDER BY, determines the order of ranking within each partitionGO, ends the batch for execution
In this task, you will calculate total sales per product and rank products by sales within each territory using
PARTITION BY TerritoryID. ### Calculating Running Totals withSUM() OVER(ORDER BY ...)A running total is a cumulative sum that adds values in sequence. In T-SQL, you can calculate running totals using
SUM()as a window function withOVER(ORDER BY ...).This allows you to track cumulative metrics, such as sales over time, without collapsing the rows into groups.
Syntax
SELECT column_name, SUM(value_column) OVER(ORDER BY column_name) AS RunningTotal FROM table_name; GOSUM(value_column), performs the aggregationOVER(ORDER BY column_name), defines the order for the cumulative calculationGO, ends the batch for execution
In this task, you will calculate a running total of sales across all orders by ordering results by
OrderDate. ### Resetting Running Totals withPARTITION BYWindow functions can be partitioned so that calculations restart for each group. By adding
PARTITION BYinside theOVER()clause, you can reset rankings, running totals, or other calculations whenever the partition value changes.When combined with a running total, this ensures each group starts its cumulative calculation from zero.
Syntax
SELECT column_name, SUM(value_column) OVER(PARTITION BY group_column ORDER BY order_column) AS RunningTotal FROM table_name; GOPARTITION BY group_column, restarts the calculation for each groupORDER BY, defines the order of rows within each partitionSUM(value_column), accumulates totals within each partitionGO, ends the batch for execution
In this task, you will calculate a running total of sales that resets for each year using
PARTITION BY YEAR(OrderDate). -
Challenge
Step 6: Synthesizing Insights Through Complex Queries
Step 6: Synthesizing Insights Through Complex Queries
Individually, aggregate functions, grouping, filtering, and window functions provide powerful insights. Combined, they allow you to design advanced queries that deliver multi-layered analysis. Common Table Expressions (CTEs) provide a way to structure these complex queries so they remain readable and reusable.
A CTE is defined with the
WITHclause and acts like a temporary result set. You can use it to prepare aggregated data, then apply additional calculations such as ranking or percentages in the outer query.
In this step, you’ll bring together everything you’ve learned to:
- Build a CTE that joins multiple tables and calculates grouped sales data
- Apply
WHEREandHAVINGfor multi-level filtering - Use ranking functions like
DENSE_RANK()to order categories by sales - Calculate percentages using windowed functions to compare categories within territories
By the end of this step, you’ll be able to construct advanced queries that combine aggregation, grouping, filtering, and window functions into a single solution.
What You’ll Learn in This Step
You’ll learn how to:
- Write a Common Table Expression (CTE) with
WITH - Aggregate and filter sales data across multiple tables
- Apply ranking with
DENSE_RANK() - Calculate percentages of group totals using window functions
info> You will write all of your T-SQL answers directly in the
6-step-six.sqlfile provided in this environment.When you are ready to run the query, simply press Run in the Terminal tab.
This will automatically execute the contents of6-step-six.sqlagainst the database and display the results.
You do not need to type any commands in the terminal manually.
### Using Common Table Expressions (CTEs)A Common Table Expression (CTE) is a temporary result set defined using the
WITHclause. CTEs are useful for breaking down complex queries into smaller, manageable steps. You can join multiple tables, aggregate data, and filter results inside the CTE, then select from it in the outer query.Syntax
WITH cte_name AS ( SELECT column_name, AGGREGATE_FUNCTION(value_column) FROM table_name WHERE condition GROUP BY column_name HAVING AGGREGATE_FUNCTION(value_column) condition ) SELECT * FROM cte_name; GOWITH cte_name AS (...), defines the CTE- Inside the parentheses, you can use joins, grouping, filtering, and aggregation
- The outer
SELECTretrieves results from the CTE GO, ends the batch for execution
In this task, you will create a CTE that joins orders, order details, products, and territories. You will group by both territory and product category, calculate sales totals and order counts, and filter for orders in 2023 and categories with at least 10 orders. ### Applying Rankings and Percentages to Aggregated Results
Once a CTE has been created to aggregate sales data, you can apply additional calculations in the outer query to compare categories and highlight top performers.
Two common techniques are:
DENSE_RANK()→ Assigns rankings without gaps, restarting for each partition. This allows you to order categories by sales within each territory.- Percentage of totals → Using a windowed
SUM()withPARTITION BYto calculate each category’s share of the overall total in its territory.
Syntax
SELECT column_name, DENSE_RANK() OVER(PARTITION BY group_column ORDER BY value_column DESC) AS Rank, value_column * 1.0 / SUM(value_column) OVER(PARTITION BY group_column) AS Percentage FROM cte_name; GODENSE_RANK(), assigns a rank to each row in a partition without skipping ranksPARTITION BY, defines the group in which ranks or percentages are calculatedSUM(value_column) OVER(PARTITION BY group_column), computes the group total to use in a percentage calculation- Multiplying by
1.0ensures correct decimal division GO, ends the batch for execution
In this task, you will apply ranking and percentage calculations to the results of your aggregated CTE, ordering by territory and sales rank. ### Conclusion
In this lab, you practiced core T-SQL techniques for aggregating and filtering data:
- Calculated metrics using functions like
COUNT(),SUM(),AVG(),MIN(), andMAX() - Applied conditional logic inside aggregate functions
- Grouped data using
GROUP BYand filtered results withHAVING - Used
ROW_NUMBER(),RANK(), andDENSE_RANK()to assign row-level orderings - Created running totals and percentage calculations using window functions
- Built Common Table Expressions (CTEs) to structure multi-step queries
These skills support data analysis tasks such as summarizing trends, ranking results, and segmenting data. Review each concept as needed to reinforce your understanding.
About the author
Real skill practice before real-world application
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Learn by doing
Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.
Follow your guide
All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.
Turn time into mastery
On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.