09 01 2014
door Inostix

4 Challenges with Predictive Employee Turnover Analytics – HR Analytics

Print Friendly

I resignIntroduction

Employee churn is a major problem for many firms these days. Great talent is scarce, hard to keep and in high demand. Given the well-known direct relationship between happy employees and happy customers, it becomes of utmost importance to understand the drivers of employee dissatisfaction.  In doing so, predictive analytics can be a core strategic tool to help facilitate employee engagement and set up well targeted employee retention campaigns.  In this blog article, Prof. Bart Baesens from the KU Leuven University (Belgium) will zoom in on some of the challenges that accompany this.

Analytics is a term which is often used interchangeably with data science, data mining, knowledge discovery, etc.  All essentially refer to extracting useful business patterns or mathematical decision models from a preprocessed data set.  Within the context of employee churn, one will typically start from a historical data set of staff turnover.  Table  1 gives an example of this.





Received promotion

Partner changed job


Churn Risk

Bart 38 PhD 1000 No No Yes
John 60 MsC 2000 Yes Yes No
Sarah 22 BsC 800 Yes No No

Table 1, Data set for predicting employee churn

It is important to hereby make a distinction between predictor variables and the target variable. The target variable indicates whether a particular employee with certain predictor variables has churned yes or not.  Obviously, churn can be defined as either leaving the organization or decreasing the appointment (e.g. from full time to part-time). The predictor variables contain information that could potentially be related to the target variable, in this case employee churn.  Examples of such predictor variables are socio-demographic information (age, marital status, …), income related information, promotion related information, departmental information (e.g. some departments may have high churn rates), engagement survey information (individual responses), career development information, training data, selection information (e.g. assessment data), etc.  

Challenge #1: Data Quality

A first key challenge here is to collect the right variables and make sure they have the necessary data quality.  This is motivated by the GIGO (garbage in, garbage out) principle stating that bad data can never yield good employee churn models.  The purpose of predictive analytics is then to build an analytical model to relate the predictor variables to the target variable.  It is hereby important to note that the analytical technique will automatically determine itself which predictor variables are related to churn and which ones not.  Differently underlying statistical techniques can be used for this purpose, e.g. linear/logistic regression, decision trees, neural networks, random forests, etc.  Figure 1 gives an example of a decision tree to predict employee churn.  

Decision tree

Figure 1, Decision tree for predicting employee churn

Building analytical models is a labor-intensive process involving different steps.  Figure 2 provides a high level overview of this process.  A first step involves the collection of historical data whereby both the predictor and target variables are defined.  This is a data set of historically observed employee churn behavior which will be used to learn the analytical models from.  It is then split up into a training set (a subset of historical employee churn data) for model development (e.g. building a decision tree) and a test set (another subset of employee churn data) for independent model performance measurement (validation of the churn model).  The resulting analytical model will then be applied to current employee data to be able to make predictions of employees with the highest churn risk.  

Churn model

Figure 2, Building an analytical employee churn model

Challenge #2: Satisfying 3 Analytical Performance Criteria

Analytical models for employee churn prediction should satisfy several criteria.  

  • First of all, they should be accurate and make statistical sense by capturing relationships in the data in the best mathematical way possible.
  • Second, given the strategic impact of these models, it is also of key importance that the models can be easily understood by the HR manager in order to fully grasp the drivers of employee dissatisfaction.  Complex, black-box mathematical models (e.g. neural networks, random forests) are hence not desired, but rather simple, easy to use and actionable analytical models should be considered.  The decision tree presented in Figure 1 is a clear example of this.  It is compact and transparent giving a clear explanation why certain employees churn and others don’t.
  • Third, next to statistical performance and comprehensibility, the analytical models should also be operationally efficient.  Employees are leaving the organization on a regular basis, so diagnosing early symptoms of employee dissatisfaction becomes very important.  To this end, the analytical models should be able to be run on at least a monthly or quarterly basis.

Challenge #3: Making the Outcomes Actionable

It is important to also note that the whole exercise doesn’t stop after having built an analytical model.  Quite on the contrary, that’s when it all starts.  The predictions of the analytical models should then be used in employee retention campaigns.  A distinction can hereby be made between high-performing employees which the company wants to keep at all times and hence should be targeted in the most optimal way, and the ones which are not included in any campaign since e.g. they perform below standard and it is actually beneficial to the firm if they leave.  

Challenge #4: Monitoring the Quality of the Predictive Model

Employee churn is a dynamical phenomenon and hence the drivers of employee dissatisfaction may change over time.  It is obviously also related to the state of the macro-economy.  During economic upturns, staff turnover may be higher than during recessions.  Hence, employee churn models should be continuously backtested whereby ex-ante made predictions are contrasted to ex-post observed reality to continuously diagnose whether the analytical model still makes sense or not.   This typically encompasses the definition and monitoring of a set of key performance indicators (KPIs) reflecting upon the model’s performance.  Also action plans need to be developed to clearly articulate what to do in case the employee churn models start to underperform.  


To summarize this blog article, I would like to re-phrase its key take-aways as follows:

  • Data is a key ingredient for any analytical employee churn model.  It should be relevant to detect employee churn and be of high quality.
  • Analytical employee churn models should satisfy various performance criteria such as: statistical performance, comprehensibility, and operational efficiency.
  • The analytical models should be integrated with employee retention campaigns.
  • The analytical models should be continuously back-tested and action plans should be available in case the models start to underperform.

About the author

photo_BBProfessor dr. Bart Baesens holds a master’s degree in Business Engineering (option: Management Informatics) and a PhD in Applied Economic Sciences from KU Leuven University (Belgium). He is currently an associate professor at KU Leuven, and a guest lecturer at the University of Southampton (United Kingdom). He has done extensive research on data mining and its applications. His findings have been published in well-known international journals (e.g. Machine Learning, Management Science, IEEE Transactions on Neural Networks, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Evolutionary Computation, Journal of Machine Learning Research, …) and presented at international top conferences. He is also co-author of the book Credit Risk Management: Basic Concepts, published in 2008. He regularly tutors, advices and provides consulting support to international firms with respect to their data mining, predictive analytics, CRM, and credit risk management policy. In that context, he is academic advisor of the HR Analytics start-up iNostix. Read more on Prof. Baesens on Dataminingapps.

See also our other blog post: 7 Benefits of Predictive Retention Modeling.

Interested in using predictive HR analytics as a key component in your HR strategy? Get in touch with co-founders Luk Smeyers or Dr. Jeroen Delmotte for more information ([email protected]). Or follow us on Twitter and/or Facebook for exciting international articles on HR analytics.

Leave a Reply

Your email address will not be published. Required fields are marked *