- Lab
-
Libraries: If you want this lab, consider one of these libraries.
- AI
ChatGPT for Querying Data
In this lab, you’ll practice integrating AI into a data analyst workflow. When you’re finished, you’ll have a model that generates SQL queries.
Lab Info
Table of Contents
-
Challenge
Set up the GPT API
- Before delving into the lab you will need to set up the GPT API. At the top of the lab, copy the API Key.
- Once you have copied the key, you will open the
ChatGPT_for_Querying_Datanotebook.- You will be working in this notebook for the remainder of the lab.
- In its second cell replace
<TODO_ENTER_API_KEY_HERE>with your API key. (Make sure you keep the API key a string.) - Run all the cells in the corresponding section. (Run them in order, up to and including the
ask_ai()call.)
The system prompt is already set up and specified for SQLite, which is what will work with your mock data. You could easily swap out SQLite for any other type of database language and the principles of the lab would stay the same.
Note:
- You at any time can see the
ChatGPT_for_Querying_Data_Solutionsnotebook for one set of solutions and their results. - The prompts will produce slightly different queries, even with the exact same prompt. In testing, the queries were always correct, but could have benign differences like different alias names, or extra or fewer non-critical columns.
-
Challenge
Create SQL data
Run the cells in the corresponding section to create the data.
The synthetic data created will be example data containing
customers,products, andorders. This is a very basic schema of tables containing information related to customer sales. Table displays are provided for readability of these datasets, but when prompting the model you will directly provide a prompting schema. -
Challenge
## Create the prompt for the database schema
Run the cells in the corresponding section to create the prompt for the database schema, and to create a simple helper function.
The prompt is very similar to a basic schema with a few added features:
- It has additional text describing what to expect in each table.
- Also, it contains a
Relationshipssection which provides information on how tables should be joined. - A the end, it has some
logic-based formulas to ensure you can keep your prompts high-level and consistent.
You could enhance this schema by providing a couple examples of data-formating to enhance the models accuracy when generating queries.
Finally there is a helper function called
runto make your sql read a bit more naturally when running, instead of using the pandas call. Bonus: Lots of information was defined that will then feed into your prompts. This allows the AI to make better queries, roughly mimicking a RAG (Retrieval Augmented Generation). You will be augmenting the generation to meet your database schema by providing the structure of the database in each prompt. Ideally, you could put this into your system prompt, allowing for dynamic, easy query updates when using a specific AI for creating SQL prompts. -
Challenge
## Create and run GPT-assisted queries
Grab five orders
Do the following in the corresponding sub-section in the notebook (Grab five orders in this case)
- Add the following in as the argument to
ask_ai:f"can you give me a query that only returns 5 orders from this database schema {Database_schema}"- Feel free to experiment a bit with other prompts!
- Copy the resulting output.
- First extract just the SQL query from the output.
- You'll then need to also format it, replacing any
\nwith actual newlines. (For this one, after extraction there very likley won't be any\ncharacters, but there will be in subsequent queries.) - As mentioned in Step 1, you can view the solution notebook if needed.
- Paste the formatted output into the
runcall.- Paste it within the
""""s.
- Paste it within the
Note: You will perform these types of actions for each section here!
This query is a good start, as it's simple, returning five orders from the
ordertable. There are a couple key things to keep in mind:- Make sure you limit the scope of your request. The word
onlyhelps ensure this. - The
{database_schema}bit, which in this case is the entire prompt string you defined earlier, ensures the model understands the database structure.
Prompting an AI model is an iterative process, so if you run into any issues throughout these queries, make sure to revise your input to the model.
- Output will vary a bit, but during testing, they were always correct, as per the note at the end of the first Step in this lab.
Group products, calculate revenue, and order descending
(Do the following in the corresponding sub-section in the notebook.)
-
Perform actions similar to what was done in the first section.
Determine what argument to add to
ask_aito perform the following query:- List the
revenue. This is calculated byquantity * price. - The query should group by
product_name. - Order by
revenuefrom highest to lowest. - Tip: But, remember you defined
revenuealready in your database scheme, along with joins (viaRelationships). So, your prompt doesn't need to specify such granular things.
- List the
Note: Keep in mind the following for this and the remaining sections:
- Include the
{Database_schema}so the model understands your database structure. - As mentioned earlier, you can view the solution notebook if needed.
Group by category and calculate profits
-
Perform actions similar to what was done earlier.
Determine what argument to add to
ask_aito perform the following query:- Prompt the model for the profits by
category. - For each
category, the query should return total profit. - Tip: LLM's or large language models such as GPT work best when explicitly talked to like a person. Try phrasing your needs for prompts like you're asking a junior data analyst.
- Prompt the model for the profits by
Feel free to experiment by adding
quantity,price, orcost.
Calculate revenue by month
-
Perform actions similar to what was done earlier.
Determine what argument to add to
ask_aito perform the following query:- Prompt the AI for a basic query that grabs
revenuebymonth - Order it by
month.
- Prompt the AI for a basic query that grabs
--- ### Group by customer and calculate revenue
-
Perform actions similar to what was done earlier.
Determine what argument to add to
ask_aito perform the following query:- Prompt the model for
revenuebycustomer - Order it from highest to lowest.
- Prompt the model for
This query will require all tables to be joined together, but the prompting doesn't need to specify this! The model already knows the relationships between all the tables from your
Database_schema.
- Add the following in as the argument to
About the author
Real skill practice before real-world application
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Learn by doing
Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.
Follow your guide
All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.
Turn time into mastery
On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.