In an SQL query, data can be filtered in the WHERE
clause or the ON
clause of a join. This guide will examine the difference between the two in PostgreSQL.
In order to examine the difference between placing a filter condition in the ON
clause and the WHERE
clause, two tables have to be created that can be joined together. Otherwise the ON
clause cannot be used.
The following code will create and populate two tables, one with employee data and another with department data, that can be joined together based on department number:
1DROP TABLE IF EXISTS emp;
2DROP TABLE IF EXISTS dept;
3
4CREATE TABLE dept(
5 deptno INT,
6 dname VARCHAR(14),
7 loc VARCHAR(13),
8 CONSTRAINT pk_dept PRIMARY KEY(deptno)
9);
10
11INSERT INTO dept
12 (deptno, dname, loc)
13VALUES
14 (10, 'ACCOUNTING', 'NEW YORK'),
15 (20, 'RESEARCH', 'DALLAS'),
16 (30, 'SALES', 'CHICAGO'),
17 (40, 'OPERATIONS', 'BOSTON')
18;
19
20CREATE TABLE emp(
21 empno INT,
22 ename VARCHAR(10),
23 job VARCHAR(9),
24 mgr INT,
25 hiredate DATE,
26 sal NUMERIC(7,2),
27 comm NUMERIC(7,2),
28 deptno INT,
29 CONSTRAINT pk_emp PRIMARY KEY(empno),
30 CONSTRAINT fk_deptno FOREIGN KEY(deptno) REFERENCES dept(deptno)
31);
32
33ALTER TABLE emp ADD CONSTRAINT fk_mgr FOREIGN KEY(mgr) REFERENCES emp(empno);
34
35INSERT INTO emp
36 (empno, ename, job, mgr, hiredate, sal, comm, deptno)
37VALUES
38 (7839, 'KING', 'PRESIDENT', NULL, '1981-11-17', 5000, NULL, 10),
39 (7698, 'BLAKE', 'MANAGER', 7839, '1981-05-01', 2850, NULL, 30),
40 (7782, 'CLARK', 'MANAGER', 7839, '1981-06-09', 2450, NULL, 10),
41 (7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975, NULL, 20),
42 (7788, 'SCOTT', 'ANALYST', 7566, '1987-07-13', 3000, NULL, 20),
43 (7902, 'FORD', 'ANALYST', 7566, '1981-12-03', 3000, NULL, 20)
44;
45
46COMMIT;
Table emp
:
empno | ename | job | mgr | hiredate | sal | comm | deptno |
---|---|---|---|---|---|---|---|
7839 | KING | PRESIDENT | 1981-11-17 | 5000.00 | 10 | ||
7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850.00 | 30 | |
7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | 10 | |
7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975.00 | 20 | |
7788 | SCOTT | ANALYST | 7566 | 0001-07-13 | 3000.00 | 20 | |
7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000.00 | 20 |
Table dept
:
deptno | dname | loc |
---|---|---|
10 | ACCOUNTING | NEW YORK |
20 | RESEARCH | DALLAS |
30 | SALES | CHICAGO |
40 | OPERATIONS | BOSTON |
In an inner join, adding a filter condition in the ON
clause has the same affect on a query result.
Suppose there is a business requirement to return employee and department data only when there is a matching department number in both tables, and that the result should only contain rows where the department name is ACCOUNTING
. The user wants to display all employee attributes along with department name and location.
The following three queries could be written to satisfy that requirement:
1SELECT emp.*, dept.dname, dept.loc
2FROM emp
3INNER JOIN dept
4 ON emp.deptno = dept.deptno
5 AND dept.dname = 'ACCOUNTING'
6;
1SELECT emp.*, dept.dname, dept.loc
2FROM emp
3INNER JOIN dept
4 ON emp.deptno = dept.deptno
5WHERE
6 dept.dname = 'ACCOUNTING'
7;
1SELECT emp.*, dept.dname, dept.loc
2FROM emp, dept
3WHERE
4 emp.deptno = dept.deptno
5 AND dept.dname = 'ACCOUNTING'
6;
empno | ename | job | mgr | hiredate | sal | comm | deptno | dname | loc |
---|---|---|---|---|---|---|---|---|---|
7839 | KING | PRESIDENT | 1981-11-17 | 5000.00 | 10 | ACCOUNTING | NEW YORK | ||
7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | 10 | ACCOUNTING | NEW YORK |
All three queries return the exact same result regardless of whether the filter and join condition are placed in the ON
clause or the WHERE
clause.
As far as performance goes, it makes no difference whether the filter condition is placed in the ON
clause or the WHERE
in PostgreSQL.
In terms of readability though, especially in complex queries that have multiple joins, it is easier to spot join conditions when they are placed in the ON
clause and filter conditions when they are placed in the WHERE
clause. In the example queries above, the second query is written this way.
In a left join, placing a filter condition in the ON
clause will affect a query result differently.
Suppose there is a business requirement to return all employees from the emp
table and look up department information only for employees that are in the ACCOUNTING
department. The user wants to display all employee attributes along with department name and location.
This requirement could be satisfied by placing a filter condition in the ON
clause so that the dept
table is filtered on dept.dname = 'ACCOUNTING'
before being left joined to emp
:
1SELECT emp.*, dept.dname, dept.loc
2FROM emp
3LEFT JOIN dept
4 ON emp.deptno = dept.deptno
5 AND dept.dname = 'ACCOUNTING'
6;
empno | ename | job | mgr | hiredate | sal | comm | deptno | dname | loc |
---|---|---|---|---|---|---|---|---|---|
7839 | KING | PRESIDENT | 1981-11-17 | 5000.00 | 10 | ACCOUNTING | NEW YORK | ||
7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850.00 | 30 | |||
7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | 10 | ACCOUNTING | NEW YORK | |
7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975.00 | 20 | |||
7788 | SCOTT | ANALYST | 7566 | 0001-07-13 | 3000.00 | 20 | |||
7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000.00 | 20 |
NULL
entries are placed in dname
and loc
wherever a match could not be made to the dept
table based on department number.
Suppose now that there is a business requirement to look up department information for all employees and then based on that lookup to return only rows were the employee is in the ACCOUNTING
department. The user wants the same attributes displayed as before.
This time, to satisfy the requirement the filter condition in the query should come after the join since you'd want to first look up department information and then filter based on the results of that lookup:
1SELECT emp.*, dept.dname, dept.loc
2FROM emp
3LEFT JOIN dept
4 ON emp.deptno = dept.deptno
5WHERE dept.dname = 'ACCOUNTING'
6;
empno | ename | job | mgr | hiredate | sal | comm | deptno | dname | loc |
---|---|---|---|---|---|---|---|---|---|
7839 | KING | PRESIDENT | 1981-11-17 | 5000.00 | 10 | ACCOUNTING | NEW YORK | ||
7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | 10 | ACCOUNTING | NEW YORK |
This result should look familiar. The inner join queries from the previous section returned the same result. In fact, this query is logically equivalent to doing an inner join. There first has to be a match on department number between emp
and dept
tables to be able to filter on dname = 'ACCOUNTING'
.
It should be apparent now that the two seemingly similar left join queries have two completely different results. In the first example a filter condition was placed in the ON
clause to filter the lookup table dept
before the join. In the second example a filter condition was placed in the WHERE
clause to filter rows after the join based on the lookup table column dname
. This was effectively an inner join in disguise.
In a left join, understanding business requirements is key when deciding whether to place a filter condition in the ON
clause or the WHERE
clause.
In SQL, using the ON
clause versus the WHERE
clause to filter data in an outer join will cause the filter to be applied either before or after joining occurs and will lead to different results. In an inner join, whether a filter condition is placed in the ON
clause or the WHERE
clause does not impact a query result, although for readability placing join conditions in the ON
clause and filter conditions in the WHERE
is a good idea.