Data Modelling¶

The problem at hand is a binary classification in which we will be attempting to predict whehter a person will seek treatment or not based on the surrounding corporate environment. The target variable that will be used is - have you ever sought treatment for a mental health disorder from a mental health professional?. For each model following steps will be performed -

Splitting the data into training and testing set
Handling target class imbalance
Training the model with grid search for fetching best performing hyperparameters
Evaluation of model
Checking fairness of model
Interpretation of model

For this usecase, we will only be utilizing the emsemble models as they can perform as good or sometimes better than the individual models that are used for building those emsembles. Following are the models that are used -

Random Forests

In order to check how good the models are, we need to evaluate models based on certain metrics. Since ours is an binary clasisfication, we will be evaluating our models on the following evaluation metrics:

Sensitivity (True Positive Rate)
Sensitivity (True Negative Rate)
Presion
Recall
F1 score

Data Preprocessing - 2 - Company Employed Random Forest Classifier

Corporate Mental Health

Data Modelling¶