Libraries: If you want this lab, consider one of these libraries.
AI

Introduction to Decision Trees

In this lab, you’ll practice creating a decision tree. When you’re finished, you’ll have a basic decision tree and a fundamental understanding of use cases for them.

Get started Contact sales

Lab Info

Level

Intermediate

Last updated

Mar 05, 2026

Duration

20m

Challenge

Introduction to Decision Trees
Introduction to Decision Trees

Welcome to the "Introduction to Decision Trees" lab! This lab is designed to give you a foundational understanding of Decision Trees, including use cases, interpretability, and the advantages and disadvantages of decision trees.

A Decision Tree is a supervised learning algorithm used for both classification and regression,and for this lab you will focus on a classification decision tree, which models decisions and their possible consequences in a tree-like structure. Decision trees split data into smaller subsets based on feature conditions, making it easy to interpret while effectively capturing patterns in the data.

Learning Objectives
- Understand use cases of decision trees
- Learn how to implement, train, and measure Decision tree metrics
- Understand ethical considerations and interpretability of decision tree models
Challenge

Creation of Classification Data
Creation of Classification Data

For the creation of data in this lab you will use the code provided below, as the synthetic data creation is not the key focus of this lab. The data generated will be features related to classification problems and align well with decision trees and random forests, which excel at high interpretability classification problems.
```
X, y = make_classification(
    n_samples=500,  # Number of samples
    n_features=10,  # Number of features
    n_informative=8,  # Number of informative features
    n_redundant=2,  # Number of redundant features
    n_classes=2,  # Binary classification
    random_state=42
)
```
Once the synthetic data is generated ensure the data is properly split the data into test and train with a random seed established.
Challenge

Setting Up Decision Tree Models
For this step you will be determining the hyperparameters for the decision tree model.

Setting Up Decision Tree Models

For initializing your decision tree model you will be using the DecisionTreeClassifier from the scikit learn library. For this basic demonstration you will only be using 3 key parameters:

criterion: consists of three algorithms for classification.
1. gini impurity measures the probability of misclassifying a randomly chosen element from a dataset if it were randomly labeled according to the class distribution.
  
  Faster to compute than entropy because it avoids logarithmic calculations.
  
  Prefers larger class splits, leading to a more balanced tree.
2. entropy derived from Shannon's Information Theory, quantifies the amount of disorder (uncertainty) in a dataset
  
  Measures "purity" more rigorously than Gini.
  
  More computationally expensive due to the logarithm.
  
  Results in deeper trees since it may favor splits with multiple smaller classes.
3. log_loss (or Binary Cross-Entropy) is primarily used in probabilistic classification, where predictions are given as probabilities rather than hard class labels.
  
  Penalizes incorrect probabilistic predictions more heavily.
  
  Unlike Gini and Entropy, Log Loss works with predicted probabilities rather than discrete class labels.
max_depth: The maximum depth of the tree. There are several other hyperparameters to limit tree depth but for this example we will simply establish the max_depth. Ensuring a proper depth for your decision tree is important as a shallow tree will under fit data and a deep tree will over fit the data, causing failure for generalization. Within this lab a max_depth under 6 would be acceptable, feel free to adjust to change the visualization of the tree.

Note: In practice the best way to determine optimal depth is by using Grid Search with Cross-Validation.

random_state any integer value you desire to ensure repeatability within training.
Challenge

Training and Evaluating
Training and Evaluating Decision Trees

Now that your data is collected and models hyperparameters established we can train and fit the model with the two lines provided below.
```
dt_model.fit(X_train, y_train)
y_pred = dt_model.predict(X_test)
```
Key Measurements:

Decision trees can be measured with accuracy and the classification report functionality.
- Precision – How many predicted positives are actually positive?
- Recall – How many actual positives were correctly predicted?
- F1-score – Balance between precision and recall.
- Accuracy – Overall correctness (not always the best metric for imbalanced datasets).
- Support – The number of instances per class.
- Macro vs. Weighted Average – Choose based on whether you want equal importance or class-weighted evaluation.
Below is a simple list explaining general scenarios on which metric should be valued the most and why.
1. When False Positives are Costly (e.g., spam detection, fraud detection)
  
  Key Metric: Precision
  
  Reason: Avoids mislabeling negatives as positives.
2. When False Negatives are Costly (e.g., medical diagnosis, security alerts)
  
  Key Metric: Recall
  
  Reason: Ensures most actual positive cases are correctly identified.
3. When Both False Positives and False Negatives Matter Equally (e.g., general classification tasks)
  
  Key Metric: F1-score
  
  Reason: Balances precision and recall for a fair evaluation.
4. When Dealing with an Imbalanced Dataset (e.g., fraud detection with few fraud cases)
  
  Key Metric: Weighted Average
  
  Reason: Accounts for class distribution to prevent bias toward majority classes.
5. When All Classes Should Be Treated Equally (e.g., multi-class problems where each class is equally important)
  
  Key Metric: Macro Average
  
  Reason: Gives equal importance to each class, regardless of frequency.
Challenge

Interpreting and Visualizing Decision Trees
Visualizing and Interpreting Decision Trees

To visualize the tree you can implement the code below. Properly named features will increase readability and assist with understanding repeatability of decisions made by the AI model.
```
plt.figure(figsize=(12, 6))
plot_tree(dt_model, feature_names=[f'Feature {i}' for i in range(X.shape[1])], class_names=['Class 0', 'Class 1'], filled=True)
plt.title("Decision Tree Visualization")
plt.show()
```
Feel free to alter max depth and adjust other hyperparameters to see how the visualization and weights of the tree change.

About the author

Josh Meier

I am, Josh Meier, an avid explorer of ideas an a lifelong learner. I have a background in AI with a focus in generative AI. I am passionate about AI and the ethics surrounding its use and creation and have honed my skills in generative AI models, ethics and applications and thrive to improve in my understanding of these models.

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Introduction to Decision Trees

Lab Info

Table of Contents

Introduction to Decision Trees