Since the early 1970s, database systems have allowed us to store and manipulate information using a simple language known as SQL, or structured query language. Although the widespread use of complex web applications and the Internet of Things have increased the popularity of NoSQL, relational databases remain a valuable asset for companies and organizations of all sizes.
Although we will explain how to query information stored in a MySQL relational database, the examples below can be performed on other systems with minor (if any) modifications. If you would like to use the same database, feel free to utilize this script to create and populate it.
In this guide we will use a database called library whose entity-relation diagram is detailed in Fig. 1, and where each table stores the following:
authors, publishers, customers, and librarians: unique identifiers for each record and the corresponding names.
The unique identifiers in the above tables act as primary keys. Particularly, in the books table, we are not using the ISBN for that purpose since we may have multiple copies of the same book and edition.
In its most simple form, DQL consists of a
SELECT statement against an entire single table. Other examples include selecting specific fields from one or more tables, sorting and grouping results, creating column and table aliases, and using filters to narrow down the results of a search. We will explore all these cases (and more) through a series of examples. Let's dive in!
Note: Before you execute the queries below, you need to tell the database engine you intend to use the library database with
USE library;. The semicolon at the end is required to finish the statement in MySQL but optional in other database systems such as Microsoft SQL Server.
To display all the information of the books in the store, you can do:
1SELECT * FROM books;
where the star is a wildcard that represents all the columns in the given table. You can view the result in Fig. 2, where the output has been truncated for brevity:
This approach, when used without restrictions or filters, may impact the query performance negatively. More often than not, it is wiser to specify the list of fields you are interested in.
At this point, retrieving all fields from the books table may not make much sense since we are still not in a position to derive meaning from the author_id and publisher_id columns. Thus, if you wish to see the list of names and ISBNs, you can do:
1SELECT book_name, book_isbn FROM books;
1SELECT DISTINCT book_name, book_isbn FROM books;
While the first query returns all the book records and gives you a rough idea of the number of copies of each book, the second one will allow you to identify all the distinct book name and ISBN combinations, as illustrated in Fig. 3:
As you can see above, there are 10 different titles in the books table.
Aliases are alternate names for tables and columns that come in handy when writing more complex queries or presenting the results, respectively.
To create an alias, we will use the
AS keyword followed by its name. The query below shows an alias called b for the books table. This means that we can simply use b each time we need to refer to books. Also, instead of book_name and book_isbn we want the output to display Title and ISBN as headers instead (see Fig. 4):
1SELECT DISTINCT 2b.book_name AS Title, 3b.book_isbn AS ISBN 4FROM books AS b;
In the above query, it is not strictly necessary to use the table alias as a prefix for the columns since we are using only one table. However, it is good practice to do so since it will be required as we include more tables in our queries.
While the benefits of this technique may not seem obvious at first, they will become evident as the query complexity grows.
There's at least a couple of good reasons why we use integer numbers as foreign keys to reference unique identifiers from other tables.
First, it allows us to avoid duplicate information and gives us more control over our data. From a data integrity perspective, it makes more sense to store the authors' actual names in authors, respectively, than doing the same in books.
Second, storing integers is - generally speaking - less expensive than a full string, in terms of disk usage. According to the MySQL official docs, storing an integer requires only four bytes, as opposed to strings where L + 1 bytes are needed (with L being the length of the string). Thus, referencing authors by their IDs (1, 2, 3, or 4) in books is preferred over duplicating the names that are already available in authors, as you can see in Fig. 5:
With that said, let us leverage the
JOIN keyword to query the books and authors tables simultaneously and retrieve the list of titles with their respective authors, as observed in Fig. 6:
1SELECT DISTINCT 2b.book_name AS Title, 3a.author_name AS Author 4FROM books AS b JOIN authors AS a 5ON b.author_id = a.author_id;
a and b are aliases for authors and books, respectively.
Title and Author are aliases for book_name and author_name, respectively.
JOIN (or its equivalent
INNER JOIN) lets us manipulate a and b using basic set theory. This particular type of
JOIN will return the records where the author_id exists on both tables.
FROMkeyword. This distinction is important when using other types of
RIGHT JOIN). Fig. 7 shows what to expect in these two cases.
LEFT JOIN and
RIGHT JOIN return all records common to both sets, and those only present on the left and right tables, respectively.
It is important to note that you can join as many tables as necessary. For example, the output of the following query (as shown in Fig. 8) will indicate who borrowed which book from each librarian and when:
1SELECT 2c.customer_name AS Customer, 3b.book_name AS Title, 4l.loan_date AS Date, 5lb.librarian_name AS Librarian 6FROM customers c JOIN loans l 7ON c.customer_id = l.customer_id 8JOIN books b ON b.book_id = l.book_id 9JOIN librarians lb ON lb.librarian_id = l.librarian_id;
The above result allows us to identify a customer (James Benson) who has borrowed three copies of the same book (Pilgrim Souls) from the same librarian (Julia Roosevelt) on the same date. This means it might be time for an audit.
By now you should be comfortable using
JOINs and creating aliases. Otherwise, please review the previous examples before you proceed.
Now that we have spotted a customer who has borrowed multiple copies of the same book, wouldn't it be interesting to list all the books that have been loaned to him?
To accomplish that goal, we can append the
WHERE clause to the above query and indicate a condition that we expect the result set to meet. In this case, we are interested in all the records where the customer name is James Benson (see Fig. 9):
1SELECT 2c.customer_name AS Customer, 3b.book_name AS Title, 4l.loan_date AS Date, 5lb.librarian_name AS Librarian 6FROM customers c JOIN loans l 7ON c.customer_id = l.customer_id 8JOIN books b ON b.book_id = l.book_id 9JOIN librarians lb ON lb.librarian_id = l.librarian_id 10WHERE c.customer_name = 'James Benson';
If you are unsure about how the customer name is actually written but are certain it contains the letters son somewhere, you can use
LIKE as follows:
1SELECT 2c.customer_name AS Customer, 3b.book_name AS Title, 4l.loan_date AS Date, 5lb.librarian_name AS Librarian 6FROM customers c JOIN loans l 7ON c.customer_id = l.customer_id 8JOIN books b ON b.book_id = l.book_id 9JOIN librarians lb ON lb.librarian_id = l.librarian_id 10WHERE c.customer_name LIKE '%son%';
LIKE clause, when followed by the text enclosed within percent signs and single quotes, will return the records where the customer name contains son - regardless of its location in the string. If you want to restrict the results to the records where son is found exactly at the beginning or the end of the string, use
LIKE followed by
You can add as many conditions and as much logic to the
WHERE clause as needed. Suppose you want to list all copies of second editions by author Cay S. Horstmann:
1SELECT 2b.book_name AS Title, 3a.author_name AS Author 4FROM books AS b JOIN authors AS a 5ON b.author_id = a.author_id 6WHERE b.book_edition = 2 7AND a.author_name = 'Cay S. Horstmann';
As you can see, it is possible to build on top of previous conditions by using boolean algebra. In this example, we use
AND to filter the results to second editions of the given author only.
Another query will help us illustrate this concept further:
1SELECT 2b.book_name AS Title, 3c.customer_name AS Customer, 4c.customer_address AS Address, 5l.loan_date AS Date 6FROM books AS b JOIN loans AS l 7ON b.book_id = l.book_id 8JOIN customers AS c ON c.customer_id = l.customer_id 9JOIN authors AS a ON b.author_id = a.author_id 10WHERE b.book_edition = 1 11AND (a.author_name = 'Cay S. Horstmann' 12OR a.author_name = 'Barbara Abercrombie');
As observed in Fig. 10, the above query returns the list of customers who have borrowed first edition books of authors Cay S. Horstmann or Barbara Abercrombie. Note that the parentheses surrounding the author names are necessary to preserve the operator precedence.
The previous query can be improved by sorting the results in ascending (default) or descending order on a given field very easily. To sort by descending loan date, all we need to do is append the
ORDER BY clause as follows:
1SELECT 2b.book_name AS Title, 3c.customer_name AS Customer, 4c.customer_address AS Address, 5l.loan_date AS Date 6FROM books AS b JOIN loans AS l 7ON b.book_id = l.book_id 8JOIN customers AS c ON c.customer_id = l.customer_id 9JOIN authors AS a ON b.author_id = a.author_id 10WHERE b.book_edition = 1 11AND (a.author_name = 'Cay S. Horstmann' 12OR a.author_name = 'Barbara Abercrombie') 13ORDER BY l.loan_date DESC;
To sort in ascending order, remove the
DESC keyword. Alternatively, you can add further sorting criteria using a comma-separated list of field names. For example,
1SELECT 2b.book_name AS Title, 3c.customer_name AS Customer, 4c.customer_address AS Address, 5l.loan_date AS Date 6FROM books AS b JOIN loans AS l 7ON b.book_id = l.book_id 8JOIN customers AS c ON c.customer_id = l.customer_id 9JOIN authors AS a ON b.author_id = a.author_id 10WHERE b.book_edition = 1 11AND (a.author_name = 'Cay S. Horstmann' 12OR a.author_name = 'Barbara Abercrombie') 13ORDER BY l.loan_date DESC, b.book_name DESC;
will produce essentially the same result as before but it will sort first by loan date and then by book names - both in descending order.
For reporting purposes, it may be necessary to count how many books have been borrowed by each customer. To achieve that objective we can use the
COUNT aggregate function and the
GROUP BY clause. When used in conjunction, they allow us to group by a given field and count the number of records in each group.
1SELECT COUNT(l.customer_id) AS Loans, 2c.customer_name AS Customer 3FROM loans AS l JOIN customers AS c 4ON l.customer_id = c.customer_id 5GROUP BY l.customer_id 6ORDER BY COUNT(l.customer_id) DESC;
Other aggregate functions included in the ANSI SQL standard are
SUM, often used with numeric fields.
As a rule of thumb, you can group by one (or more) of the fields to which an aggregate function has been applied.
In a previous example, we explained the use of
WHERE to filter the results of a query. Unfortunately, it cannot be used on an aggregated field. Instead, you can narrow down the search using the
HAVING clause followed by the aggregated field and the desired condition:
1SELECT COUNT(l.customer_id) AS Loans, 2c.customer_name AS Customer 3FROM loans AS l JOIN customers AS c 4ON l.customer_id = c.customer_id 5GROUP BY l.customer_id 6HAVING COUNT(l.customer_id) > 2 7ORDER BY COUNT(l.customer_id) DESC;
You can think of
HAVINGas the equivalent of
WHEREfor use on aggregate columns.
Note that the above query is almost the same as before, except we have plugged in the line
1HAVING COUNT(l.customer_id) > 2
after grouping by customer ID, to only return those clients who have borrowed more than two books. Strictly speaking, we are talking about the customers whose ID appears more than twice in the loans table.
In this guide we have used several examples to illustrate how to retrieve information from a relational database. The data query language skills we have discussed are of paramount importance to anyone working with data due to its potential to drive decision making.