Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

How to Analyze Employee Data Using SQL

In this SQL tutorial, you'll learn how to leverage SQL in analyzing employee data by running common analytic queries against HRIS table.

May 11, 2019 • 3 Minute Read

Introduction

When it comes to how to analyze employee data, there are few more complex data structures to work with. Probably the biggest challenge that comes from working with an employee data set is how unassuming it may seem to start. Let's take a typical employee table from a Human Resources Information System (HRIS) and see how we can run some common analytical queries against it.

Here is a table structure for how our data is stored. This is typical of many HRIS systems.

Our Table

Table Name: emps

EmployeeIDHireDateRehireDateTermDateManagerIDName
12004-06-171900-01-013000-01-01-1Washington
22004-07-011900-01-013000-01-011Adams
32008-05-011900-01-012008-07-012Rosevelt
42007-01-012009-01-012008-01-013Carter

Queries

How Many People Were Active on a Given Day (Ex. 2005-07-01)?

If we were answering these questions independently it the following SQL would suffice:

      select count(1)
from emps
where HireDate <= '2005-07-01'
and TermDate > '2005-07-01'
    

Rinse and repeat this for every date. But what about employees that have been rehired?

First, based on this data, we must assume that when a person is rehired, their TermDate remains the same and does not reset to 3000-01-01 as if they were active.

      select count(1)
from emps
where (HireDate >= '2005-07-01'
and TermDate < '2005-07-01')
or (RehireDate >= '2005-07-01'
and (
    TermDate < '2005-07-01'
  or
    TermDate <= RehireDate
)
    

The Plot Thickens

This is all easy enough, but what if we wanted to create a view that has the running total of active employees for every day so we could trend it over time and ask more complex questions like Year over Year (YoY) growth?

We would need to include another table which has nothing but dates. In Data Warehousing we typically create this as a Date dimension. Here is a basic structure for us to work with.

Table Name: Dates

DateDayMonthYear
2005-01-01152005
2005-01-02252005
etc...

We'll use this as our starting point since it won't be constrained by the employee status at the time or dates in our emp table. From there, we'll do a left join to our emp table to get the count of active employees similar to how we did it manually above. Here's the SQL to do this:

      select
DayDate, count(hires.EmployeeId) as ActiveCount
from
    dates d
-- get hires
left join
 (select 'active' as Status, * from emps) hires
on
 ( d.Date >= hires.HireDate and d.Date < hires.TermDate) -- normal active
or
  (d.Date
      between (case when hires.RehireDate = '1900-01-01' then null else hires.RehireDate end)
      and hires.TermDate)
where
  d.Date between '2004-06-17' and now() -- use the function to pull the current date here
    

Alas, We Have Trend Data!

The results of our query above will give us a daily count of Active employees to trend over time. This is just the first step in analyzing employee data of course, but a solid foundation to start and a good example of why having a date dimension in your data warehouse is important.

Ben Sullins

Ben S.

Data Geek

More about this author