Author avatar

Jacek Trociński

Using ON Versus WHERE Clauses to Combine and Filter Data in PostgreSQL Joins

Jacek Trociński

  • May 19, 2020
  • 10 Min read
  • 795 Views
  • May 19, 2020
  • 10 Min read
  • 795 Views
Data
Data Storage
Relational Databases
PostgreSQL

Introduction

In an SQL query, data can be filtered in the WHERE clause or the ON clause of a join. This guide will examine the difference between the two in PostgreSQL.

Setup

In order to examine the difference between placing a filter condition in the ON clause and the WHERE clause, two tables have to be created that can be joined together. Otherwise the ON clause cannot be used.

The following code will create and populate two tables, one with employee data and another with department data, that can be joined together based on department number:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
DROP TABLE IF EXISTS emp;
DROP TABLE IF EXISTS dept;

CREATE TABLE dept(
	deptno     INT,  
	dname      VARCHAR(14),  
	loc        VARCHAR(13),  
	CONSTRAINT pk_dept PRIMARY KEY(deptno)  
);

INSERT INTO dept
	(deptno, dname, loc)
VALUES
	(10, 'ACCOUNTING', 'NEW YORK'),
	(20, 'RESEARCH',   'DALLAS'),
	(30, 'SALES',      'CHICAGO'),
	(40, 'OPERATIONS', 'BOSTON')
;

CREATE TABLE emp(
	empno    INT,  
	ename    VARCHAR(10),  
	job      VARCHAR(9),  
	mgr      INT,  
	hiredate DATE,  
	sal      NUMERIC(7,2),  
	comm     NUMERIC(7,2),
	deptno   INT,
	CONSTRAINT pk_emp PRIMARY KEY(empno),  
	CONSTRAINT fk_deptno FOREIGN KEY(deptno) REFERENCES dept(deptno)
);

ALTER TABLE emp ADD CONSTRAINT fk_mgr FOREIGN KEY(mgr) REFERENCES emp(empno);

INSERT INTO emp
	(empno, ename, job, mgr, hiredate, sal, comm, deptno)
VALUES
	(7839, 'KING',  'PRESIDENT', NULL, '1981-11-17', 5000, NULL, 10),
	(7698, 'BLAKE', 'MANAGER',   7839, '1981-05-01', 2850, NULL, 30),
	(7782, 'CLARK', 'MANAGER',   7839, '1981-06-09', 2450, NULL, 10),
	(7566, 'JONES', 'MANAGER',   7839, '1981-04-02', 2975, NULL, 20),
	(7788, 'SCOTT', 'ANALYST',   7566, '1987-07-13', 3000, NULL, 20),
	(7902, 'FORD',  'ANALYST',   7566, '1981-12-03', 3000, NULL, 20)
;

COMMIT;
sql

Table emp:

empnoenamejobmgrhiredatesalcommdeptno
7839KINGPRESIDENT1981-11-175000.0010
7698BLAKEMANAGER78391981-05-012850.0030
7782CLARKMANAGER78391981-06-092450.0010
7566JONESMANAGER78391981-04-022975.0020
7788SCOTTANALYST75660001-07-133000.0020
7902FORDANALYST75661981-12-033000.0020

Table dept:

deptnodnameloc
10ACCOUNTINGNEW YORK
20RESEARCHDALLAS
30SALESCHICAGO
40OPERATIONSBOSTON

Using ON Versus WHERE Clauses in an Inner Join

In an inner join, adding a filter condition in the ON clause has the same affect on a query result.

Suppose there is a business requirement to return employee and department data only when there is a matching department number in both tables, and that the result should only contain rows where the department name is ACCOUNTING. The user wants to display all employee attributes along with department name and location.

The following three queries could be written to satisfy that requirement:

1
2
3
4
5
6
SELECT emp.*, dept.dname, dept.loc
FROM emp
INNER JOIN dept
	ON emp.deptno = dept.deptno
	AND dept.dname = 'ACCOUNTING'
;
sql
1
2
3
4
5
6
7
SELECT emp.*, dept.dname, dept.loc
FROM emp
INNER JOIN dept
	ON emp.deptno = dept.deptno
WHERE
	dept.dname = 'ACCOUNTING'
;
sql
1
2
3
4
5
6
SELECT emp.*, dept.dname, dept.loc
FROM emp, dept
WHERE
	emp.deptno = dept.deptno
	AND dept.dname = 'ACCOUNTING'
;
sql
empnoenamejobmgrhiredatesalcommdeptnodnameloc
7839KINGPRESIDENT1981-11-175000.0010ACCOUNTINGNEW YORK
7782CLARKMANAGER78391981-06-092450.0010ACCOUNTINGNEW YORK

All three queries return the exact same result regardless of whether the filter and join condition are placed in the ON clause or the WHERE clause.

As far as performance goes, it makes no difference whether the filter condition is placed in the ON clause or the WHERE in PostgreSQL.

In terms of readability though, especially in complex queries that have multiple joins, it is easier to spot join conditions when they are placed in the ON clause and filter conditions when they are placed in the WHERE clause. In the example queries above, the second query is written this way.

Using ON Versus WHERE Clauses in a Left Join

In a left join, placing a filter condition in the ON clause will affect a query result differently.

Suppose there is a business requirement to return all employees from the emp table and look up department information only for employees that are in the ACCOUNTING department. The user wants to display all employee attributes along with department name and location.

This requirement could be satisfied by placing a filter condition in the ON clause so that the dept table is filtered on dept.dname = 'ACCOUNTING' before being left joined to emp:

1
2
3
4
5
6
SELECT emp.*, dept.dname, dept.loc
FROM emp
LEFT JOIN dept
	ON emp.deptno = dept.deptno
	AND dept.dname = 'ACCOUNTING'
;
sql
empnoenamejobmgrhiredatesalcommdeptnodnameloc
7839KINGPRESIDENT1981-11-175000.0010ACCOUNTINGNEW YORK
7698BLAKEMANAGER78391981-05-012850.0030
7782CLARKMANAGER78391981-06-092450.0010ACCOUNTINGNEW YORK
7566JONESMANAGER78391981-04-022975.0020
7788SCOTTANALYST75660001-07-133000.0020
7902FORDANALYST75661981-12-033000.0020

NULL entries are placed in dname and loc wherever a match could not be made to the dept table based on department number.

Suppose now that there is a business requirement to look up department information for all employees and then based on that lookup to return only rows were the employee is in the ACCOUNTING department. The user wants the same attributes displayed as before.

This time, to satisfy the requirement the filter condition in the query should come after the join since you'd want to first look up department information and then filter based on the results of that lookup:

1
2
3
4
5
6
SELECT emp.*, dept.dname, dept.loc
FROM emp
LEFT JOIN dept
	ON emp.deptno = dept.deptno
WHERE dept.dname = 'ACCOUNTING'
;
sql
empnoenamejobmgrhiredatesalcommdeptnodnameloc
7839KINGPRESIDENT1981-11-175000.0010ACCOUNTINGNEW YORK
7782CLARKMANAGER78391981-06-092450.0010ACCOUNTINGNEW YORK

This result should look familiar. The inner join queries from the previous section returned the same result. In fact, this query is logically equivalent to doing an inner join. There first has to be a match on department number between emp and dept tables to be able to filter on dname = 'ACCOUNTING'.

It should be apparent now that the two seemingly similar left join queries have two completely different results. In the first example a filter condition was placed in the ON clause to filter the lookup table dept before the join. In the second example a filter condition was placed in the WHERE clause to filter rows after the join based on the lookup table column dname. This was effectively an inner join in disguise.

In a left join, understanding business requirements is key when deciding whether to place a filter condition in the ON clause or the WHERE clause.

Conclusion

In SQL, using the ON clause versus the WHERE clause to filter data in an outer join will cause the filter to be applied either before or after joining occurs and will lead to different results. In an inner join, whether a filter condition is placed in the ON clause or the WHERE clause does not impact a query result, although for readability placing join conditions in the ON clause and filter conditions in the WHERE is a good idea.

10