Pluralsight Logo
Author avatar

Ben Sullins

Author badge Author

How to analyze employee data using SQL

Ben Sullins

Author BadgeAuthor
  • Jun 11, 2016
  • 4 Min read
  • 376 Views
  • Jun 11, 2016
  • 4 Min read
  • 376 Views
SQL

Introduction

When it comes to how to analyze employee data, there are fewer more complex data structures to work with. Probably the biggest challenge with working with an employee data set is how unassuming it may seem to start. Let's take a typical employee table from a Human Resources Information System (HRIS) and see how we can run some common analytical queries against it.

Here is a table structure for how our data is stored. This is typical of many HRIS systems I've seen in my days so we'll stick with this as an example.

Our Table

Table Name: emps

EmployeeIDHireDateRehireDateTermDateManagerIDName
12004-06-171900-01-013000-01-01-1Washington
22004-07-011900-01-013000-01-011Adams
32008-05-011900-01-012008-07-012Rosevelt
42007-01-012009-01-012008-01-013Carter

Queries

How many people were active on a given day (ex. 2005-07-01)?

If we were answering these questions independently it the following SQL would suffice:

1
2
3
4
select count(1)
from emps
where HireDate <= '2005-07-01'
and TermDate > '2005-07-01'

Rinse and repeat this for every date. But what about employees that have been rehired?

First based on this data, we must assume that when a person is rehired, their TermDate remains the same and does not reset to 3000-01-01 as if they were active.

1
2
3
4
5
6
7
8
9
10
select count(1)
from emps
where (HireDate >= '2005-07-01'
and TermDate < '2005-07-01')
or (RehireDate >= '2005-07-01'
and (
    TermDate < '2005-07-01'
  or
    TermDate <= RehireDate
)

The plot thickens

This is all easy enough, but what if we wanted to create a view that has the running total of active employees for every day so we could trend it over time and ask more complex questions like Year over Year (YoY) growth?

Now we need to include another table which has nothing but dates. In Data Warehousing we typically create this as a Date dimension. Here is a basic structure for us to work with.

Table Name: dates

DateDayMonthYear
2005-01-01152005
2005-01-02252005
etc...

We'll use this as our starting point since it won't be constrained by the employee status at the time or dates in our emp table. From there we'll do a left join to our emp table to get the count of active employees similar to how we did it manually above. Here's the SQL to do this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
select
DayDate, count(hires.EmployeeId) as ActiveCount
from
    dates d
-- get hires
left join
 (select 'active' as Status, * from emps) hires
on
 ( d.Date >= hires.HireDate and d.Date < hires.TermDate) -- normal active
or
  (d.Date
      between (case when hires.RehireDate = '1900-01-01' then null else hires.RehireDate end)
      and hires.TermDate)
where
  d.Date between '2004-06-17' and now() -- use the function to pull the current date here

Alas, we have trend data!

The results of our query above will give us a daily count of Active employees to trend over time. This is just the first step in analyzing employee data of course, but a solid foundation to start and a good example of why having a date dimension in your data warehouse is important.

0