Comparing Predictive Machine Learning Algorithms in Fit for Work Occupational Health Assessments

Saul Charapaqui-Miranda, Katherine Arapa-Apaza, Moises Meza-Rodriguez, Horacio Chacon-Torrico

Resultado de la investigación: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Some studies have tried to develop predictors for fitness for work (FFW). This study assessed the question whether factors used in the occupational medical practice could predict an individual fit for work result. We used a Peruvian occupational medical examination dataset of 33347 participants. We obtained a reduced dataset of 2650. It was split into two subsets, a training dataset and a test dataset. Using the training dataset, logistic regression, decision tree, random forest, and support vector machine models were fitted, and important variables of each model were identified. Hyperparameter tuning was an important part in these non-parametric models. Also, the Area Under the Curve (AUC) metric was used for Model Selection with a 5-fold cross validation approach. The results shows the Logistic Regression as the most powerful predictor (AUC = 60.44%, Accuracy = 68.05%). It is important to notice the best variables analysis in fitness to work evaluation by a Random Forest approach. Thus, the best model was logistic regression. This also reveals that the criteria associated with the workplace and occupational clinical criteria have a low level of prediction. Further studies should be done with imbalanced data to process bigger datasets, in consequence to obtain more robust models.

Idioma originalInglés
Título de la publicación alojadaInformation Management and Big Data - 6th International Conference, SIMBig 2019, Proceedings
EditoresJuan Antonio Lossio-Ventura, Nelly Condori-Fernandez, Jorge Carlos Valverde-Rebaza
EditorialSpringer
Páginas218-225
Número de páginas8
ISBN (versión impresa)9783030461393
DOI
EstadoPublicada - 2020
Evento6th International Conference on Information Management and Big Data, SIMBig 2019 - Lima, Perú
Duración: 21 ago. 201923 ago. 2019

Serie de la publicación

NombreCommunications in Computer and Information Science
Volumen1070 CCIS
ISSN (versión impresa)1865-0929
ISSN (versión digital)1865-0937

Conferencia

Conferencia6th International Conference on Information Management and Big Data, SIMBig 2019
País/TerritorioPerú
CiudadLima
Período21/08/1923/08/19

Huella

Profundice en los temas de investigación de 'Comparing Predictive Machine Learning Algorithms in Fit for Work Occupational Health Assessments'. En conjunto forman una huella única.

Citar esto