عنوان مقاله [English]
Objective: Risk classification of insurance customers, based on the observable characteristics, can significantly help insurers mitigate losses, classify their customers and prevent adverse selection. This paper aims to study losses occurred in motor Third Party Liability (TPL) insurance and predict customers’ risk of loss.
Methodology: With the help of four supervised algorithms namely; decision tree, SVM, naïve Bayes and neural network hidden pattern of data is discovered to classify customers of TPL insurance. Furthermore, the imbalanced dataset was the main challenge for implementing machine learning and data mining techniques which will be discussed throughout the article.
Findings: The dataset contains more than 400,000 observations for five years from an Iranian insurance company. It also has five variables of which four are independent: car type, car group, plate type, car age; and one binary dependent variable: financial loss. Comparing the model performances, decision tree is the most efficient (F1=0.72±1).
Conclusions: The model provides prioritization of independent features as follows: car type, plate type, car age, car group. Findings also suggest that to obtain more accurate prediction on claims and high-risk customers, more features concerning drivers’ traits are required.
JEL-Classification: G22, G17, F47