- Lab
- Core Tech

Guided: Rewriting Git History
It’s common for developers to make the mistake of committing code to git repositories that contain sensitive data. In this Guided Code Lab, you will learn how to easily remove this information from previous Git commits!

Path Info
Table of Contents
-
Challenge
Introduction to Git
In this lab, you will be working with Git, a tool for distributed version control of changes made to files. Git allows you to create a local repository of files that you can work with and provides tracking of those files. Git is commonly used to track changes to source code among software development teams.
In the following steps, you will learn:
- How to identify sensitive data in a Git repository
- How to perform automated scanning of Git repositories for sensitive information
- How data can be leaked via commit history
- How to remove all traces of a file from a repository and it's commit history
- How to extract a subdirectory from an existing repository and establish it as a standalone repository
- The risks and benefits of rewriting Git history
-
Challenge
Identifying Sensitive Information in Git
In this step, you will learn how to view the contents of a git repository and locate potential sensitive information that has been committed to the
Sensitive_Information
repository.The
git show
command is used to display various types of objects in a Git repository, such as commits, tags, and trees. When used with a specific file path,git show
can display the content and metadata of that file at a particular commit or version.First, you will need to acquire the hash of the latest git commit in the
Sensitive_Information
repository.Next, you will generate a list of all the files that are inside of the commit.
There are two different methods you could use:
- You could use the Terminal to run the
git show
command, which will immediately give you the hash.
info> To exit the command, simply press the q key.
- You can find this information going to
Sensitive_information/.git/refs/heads
directory and examining themaster
file.
Using the command line is usually a lot more efficient. All of the information extracted by using the
git show
command can be hard to read. The--name-only
flag used with thegit show
command instructs Git to display only the names of the files that were affected by the commit being shown, without showing the actual content changes. This is useful when you're only interested in knowing which files were modified, added, or deleted in a particular commit, rather than seeing the detailed content changes within each file.git show --name-only <commit_hash> ``` Now that you have a list of files that are committed to the repo, you can inspect their contents for sensitive data. Sensitive data can take many forms, but typically anything that should remain private that is committed to a repo is considered a data leak. Some common examples include IP addresses, password information, and personal user data. Similar to using the `git show` command with the hash, you can also use it with a filename.
- You could use the Terminal to run the
-
Challenge
Automated Scanning for Sensitive Data in Git
In a real world scenario, manually sifting through each file in search of sensitive data can be time consuming and inefficient. It's much better to use an automated approach. For this, you will use the command
git log
with the-S
flag.Using the
-S
flag ingit log
allows you to quickly identify commits containing instances of a specified keyword in file content. This streamlined process enables efficient investigation of changes in the repository, empowering you to swiftly search files for potential keywords. It allows you to focus investigative efforts solely on confirmed matches, ensuring maximum efficiency and minimizing unnecessary manual labor.git log -S <keyword> ``` With this command, you'll receive the hash of the commit where the keyword `IP` was located, along with pertinent details regarding the user who made the commit and the date/time of the commit. Leveraging this method allows for a swifter and more precise scan of Git repositories for sensitive information compared to utilizing the `git show` command.
-
Challenge
Leaking Data through Git Commit History
It's important to understand that removing sensitive data from a Git repository is not as simple as just deleting it. Even if you delete the file and commit the change, the sensitive data can still be leaked via the commit history, which records all commits in a particular repository. To demonstrate this, you will go through the process of deleting a file and verifying that you are still able to access its contents via the commit history. For the following tasks, make sure to navigate to the
Insecure_Deletion
repository in the Terminal. If you run thegit log
command, you will see that there is one commit that contains two files. The file containing sensitive data has been deleted and the change committed to the repository. However, if you run thegit log
command, you will see that there are two hashes within the commit history. Using this information it's possible to retrieve the contents of the deleted file. In this next task, you you will see how easily data can be leaked via the commit history. This illustrates the importance of not only removing files from a Git repository, but also rewriting the Git history. By doing so, previous commits cannot be accessed and used to view sensitive data that should have already been removed.In the upcoming step, you will learn how to remove a file and rewrite the repository's commit history to ensure all data is properly removed.
-
Challenge
Removing a Commit Using the `git filter-branch` Command
Now for this step, you are going to remove all traces of a file using the
git filter-branch
command.git filter-branch
is a powerful tool in Git used for rewriting history by applying custom filters to the commits in a repository. It allows you to modify the repository's commit history in various ways, such as removing sensitive data, splitting or merging directories, or altering commit messages.With
git filter-branch
, you can apply different filters to each commit, including modifying the commit message, changing file contents, or entirely removing commits from the history.However, it's essential to use
git filter-branch
with caution, especially in shared repositories, as it rewrites history and can potentially disrupt collaboration if not used carefully. Thegit filter-branch
command with the--tree-filter
flag applies a specified command to each commit in the repository.The
rm -rf
command is used to forcibly remove directories and their contents. The-r
flag stands for recursive, meaning it removes directories and their contents recursively, and the-f
flag stands for force, which ignores any warnings or prompts. Therefore,rm -rf
removes files and directories without asking for confirmation and regardless of their permissions.git filter-branch -f --tree-filter 'rm -rf <filename>'
For the tasks in this step, navigate to the
Filtered_Repo
repository. Now, you will practice the secure way of removing a file from a Git repository. To verify that the change was made correctly, you can use one of two methods.- You can run the
ls
command to check the directory itself. - You can use the
git show
method to obtain the new hash and then use thegit show --name-only <hash value>
command to list all of the files in the current repository.
- You can run the
-
Challenge
Splitting a Subdirectory into a New Repo
For this final step, you will split a subdirectory out into its own repository. This is a common use case when a repository begins to grow so large that it isn't practical to have it all in once place. In these situations, it's helpful to know how to automate the process of splitting a subdirectory into it's own repository.
For this task, navigate to the
Split_Repo
repository. You will utilize an open-source script calledgit-filter-repo
, which has been added to the folder for easy access.git-filter-repo
is a tool that provides advanced filtering capabilities for Git repositories. It allows users to rewrite history, remove sensitive data, split repositories, and perform various other repository manipulations with ease.You can use this script, to turn the subdirectory,
Logs
, into it's own repository.To do so you will utilize the following command:
python3 git-filter-repo --subdirectory-filter <subdirectory> --force
The
--subdirectory-filter
option is a feature provided by thegit-filter-repo
tool. When used withgit-filter-repo
,--subdirectory-filter
allows you to filter the repository history to include only the commits and files related to a specific subdirectory. This means that after applying this filter, only the history and files within the specified subdirectory will remain in the repository.The
--force
flag ensures that the operation is performed forcefully, overwriting existing configurations if necessary. To see the outcome, run thels
command or use thegit show
method you used in the previous step. You will be able to see that the repository now only contains the contents of theLogs
folder, which is theaccess.log
file. -
Challenge
Risk and Benefits of Rewriting Git History
Congratulations on completing this lab! You have covered how to rewrite git history to remove sensitive information from a git repository. Before you use your skills in the real world, it's important for you to weigh the risks and benefits. Here are some of the common risks of of rewriting Git History:
-
Loss of Data: If changes are made to Git History without proper backups or without proper authorization, important files and data can be lost forever.
-
Loss of Accountability: When making changes to the commit history, you can lose track of who has made changes in the past and this can lead to a loss of accountability among the development team,
-
Wasted Time: If the rewritten history is not communicated to other developers and shared promptly, it could result in developers working on outdated repositories and files, leading to wasted time and effort. Here are some of the common benefits:
-
Removing sensitive information: It's common for developers to accidentally commit files to a repository that contains sensitive information. In this case, from a security point of view, it's important that developers understand how to remove these files.
-
Better Organization - You can rewrite history to clean up and improve the quality of your commit history. You can also remove unnecessary files and consolidate all of your files under one central repository.
-
Remove Mistakes - If a developer added a file by accident, you can easily remove it from your git history.
-
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.