Comparing Predictive Machine Learning Algorithms in Fit for Work Occupational Health Assessments

Saul Charapaqui-Miranda, Katherine Arapa-Apaza, Moises Meza-Rodriguez, Horacio Chacon-Torrico

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Some studies have tried to develop predictors for fitness for work (FFW). This study assessed the question whether factors used in the occupational medical practice could predict an individual fit for work result. We used a Peruvian occupational medical examination dataset of 33347 participants. We obtained a reduced dataset of 2650. It was split into two subsets, a training dataset and a test dataset. Using the training dataset, logistic regression, decision tree, random forest, and support vector machine models were fitted, and important variables of each model were identified. Hyperparameter tuning was an important part in these non-parametric models. Also, the Area Under the Curve (AUC) metric was used for Model Selection with a 5-fold cross validation approach. The results shows the Logistic Regression as the most powerful predictor (AUC = 60.44%, Accuracy = 68.05%). It is important to notice the best variables analysis in fitness to work evaluation by a Random Forest approach. Thus, the best model was logistic regression. This also reveals that the criteria associated with the workplace and occupational clinical criteria have a low level of prediction. Further studies should be done with imbalanced data to process bigger datasets, in consequence to obtain more robust models.

Original languageEnglish
Title of host publicationInformation Management and Big Data - 6th International Conference, SIMBig 2019, Proceedings
EditorsJuan Antonio Lossio-Ventura, Nelly Condori-Fernandez, Jorge Carlos Valverde-Rebaza
PublisherSpringer
Pages218-225
Number of pages8
ISBN (Print)9783030461393
DOIs
StatePublished - 2020
Event6th International Conference on Information Management and Big Data, SIMBig 2019 - Lima, Peru
Duration: 21 Aug 201923 Aug 2019

Publication series

NameCommunications in Computer and Information Science
Volume1070 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference6th International Conference on Information Management and Big Data, SIMBig 2019
Country/TerritoryPeru
CityLima
Period21/08/1923/08/19

Keywords

  • Data science
  • Machine learning
  • Occupational health

Fingerprint

Dive into the research topics of 'Comparing Predictive Machine Learning Algorithms in Fit for Work Occupational Health Assessments'. Together they form a unique fingerprint.

Cite this