‌تجزیه‌وتحلیل مدل‌های مرگ‌ومیر با متغیرهای کمکی دارای گمشدگی تصادفی

نوع مقاله : مقاله علمی - پژوهشی

نویسندگان

1 استادیار گروه بیم‌سنجی، دانشکده علوم ریاضی، ‌دانشگاه شهید بهشتی، تهران، ایران. (نویسنده مسئول).

2 کارشناسی ارشد‌ اکچوئری، گروه بیم‌سنجی، دانشکده علوم ریاضی، دانشگاه شهید بهشتی، تهران، ایران.‌

10.22056/jir.2021.278996.2872

چکیده

هدف: ‌این پژوهش با هدف مدل‌بندی مرگ‌ومیر در یک طرح بازنشستگی بر اساس داده‌های گمشده و دسترسی به اطلاعات مختلف از متغیرهای کمکی، تجزیه و تحلیل دقیق ساختار مدل‌های مختلف، برآوردیابی و در نهایت بررسی تأثیرات مالی برای تجربه‌های مختلف مرگ‌ومیر و حاوی داده‌های گمشده انجام شده است.  
روش‌شناسی: ‌این مقاله با یک طرح بازنشستگی سروکار دارد که در آن طول‌ عمر آتی هر فرد با ‌مدل‌های بقای پارامتری با ترکیب متغیرهای کمکی که ممکن است برای برخی از افراد گمشده باشند، مدل‌سازی شده است. پارامترها با روش ماکسیمم درستنمایی برآورد شده و ‌الگوریتمی پیشنهاد گردیده که بتواند وظیفه برآوردیابی را به بهترین شکل ممکن انجام دهد.
یافته‌ها: نتایج نشان داد در صورتی‌که داده‌ها گمشده باشند، مدل آماری همیشه با استفاده از ماکسیمم درست‌نمایی شناسایی‌پذیر نیست و تلفیق داده‌های حاصل از دو یا چند تجربه می‌تواند از موانع شناسایی‌پذیری جلوگیری نماید. ‌
نتیجه‌‌گیری: ‌روش‌های پیشنهادی ‌این مقاله هنگام محاسبه کمیت‌های مالی مورد علاقه براساس عامل‌های مستمری، برای بیم‌سنجان می‌تواند مفید باشد. این روش‌ها ممکن است مجموعه داده‌های مختلف با تجربه مرگ‌ومیر برابر یا مشابه را با هم ترکیب کنند، اندازه نمونه را افزایش دهند و ریسک پارامتر را کاهش دهند، بنابراین منجر به کاهش الزام سرمایه شوند. متغیرهای اقتصادی-اجتماعی از جمله سطح مزایا و مشخصات جغرافیایی جمعیتی در صورت پایین بودن نرخ بهره بیشتر مورد توجه قرار می‌گیرند.
طبقه‌بندی موضوعی:  C13،  C24، C51

کلیدواژه‌ها


عنوان مقاله [English]

Analysis of Mortality Models with Covariates Missing at Random

نویسندگان [English]

  • Shirin Shoaee 1
  • Reyhaneh Fathi 2
1 Assistant Professor of Department of Actuarial Science, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. (Corresponding Author).
2 MSc in Actuarial Science, Department of Actuarial Science, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.
چکیده [English]

Objective: Demographic indicators such as mortality rates play a very important role in health, financial and pension policies. Therefore, the accuracy of mathematical models in estimating mortality rates is an important challenge. One of the tasks of actuaries is to construct a suitable mortality model for the available data so that these mortality models can calculate mortality for different ages and longevity, as well as the different information available to individuals on retirement plans. Missing data is a problem that may be faced by actuaries when they are analyzing the real data. Missing data can occur for a variety of reasons, such as unanswered or censored. The presence of missing data can pose a threat to the accuracy of the data analysis results. The purpose of this study is to model the mortality in a retirement plan. In this regard, it is assumed that data are available at the individual level, including date of birth, date of joining the retirement plan, date of completion of the observation, and reason for discontinuation (usually death or right censoring). Information on covariate variables such as gender, benefits or size of pension, demographic geography or health status will also be available. More precisely, this study aims to model the mortality in a retirement plan based on missing data and access to information from various covariate variables, to carefully analyze the structure of different models, to estimate and finally to investigate the financial implications for different mortality experiences containing missing data.
Methodology: In this article, we deal with a pension plan in which each member's future life expectancy is modeled using parametric survival models incorporating covariates which may be missing for some individuals. Likelihood-based techniques estimate parameters, and in this regard, an algorithm is proposed that can perform the estimation task in the best possible way. One of the necessary features to check the adequacy of the statistical model, especially when the data contains missing values, is identifiable. If not identifiable, it can be claimed that the statistical model is not a full rank and is not a suitable model for the data. It is worth noting that the Jacobin matrix needs to be calculated to verify identifiability. As mentioned, in the analysis of mortality models with the presence of missing values, the maximum likelihood method can be used. In such cases, an estimation error may often occur when fitting the model, which can be reduced by modeling from a larger population. For this reason, hybrid retirement plans that remain homogeneous are often used. This proposed method can also be useful for calculating financial quantities based on pension factors. In fact, in this proposed method, different data sets with equal or similar death experiences are combined, sample size increases and risk of parameter decreases, which also leads to a reduction in capital requirement. Socio-economic variables such as the level of benefits and geographical characteristics of the population are also considered more if interest rates are low.
 
Finding: First, complete data are analyzed and modeled for observations of members of a retirement plan, which includes survival time and ancillary variables for each individual. Estimation of parameters is obtained using the maximum likelihood method. however, when the data is missing, it is not easy to estimate the parameters with the maximum likelihood method. In this case, the model parameters are estimated by the maximum likelihood method which are calculated using the proposed algorithm; then, statistical indicators such as identifiability of parameters are calculated to evaluate the performance of the proposed structure and algorithm. Furthermore, the financial effects, in particular the annuity factors, and the mis-estimation risk capital requirements for the mortality experience which includes the maximum covariates variables are calculated and compared with the individual segments when the data are missing. In addition, it can be seen that when the two statistical variables are not observed together, the model is not identifiable according to the data.
 
Conclusion: It was found that if the data are missing, the statistical model is not always identifiable using the maximum likelihood, and data combination from two or more experiments can avoid identifiable barriers. The methods proposed in this paper can be useful for actuaries when calculating financial committees based on annuity factors. These methods may combine different datasets with equal or similar mortality experiences, increase sample size, and reduce parameter risk, thus, reducing capital requirements. Socio-economic variables such as the level of benefits and geographical characteristics of the population are given more attention if the interest rate is low.
JEL-Classification: C13, C24, C51

کلیدواژه‌ها [English]

  • Capital Requirement
  • Parameter Redundant
  • Full Rank
  • Likelihood Contribution
  • Identifiability
  • Missing At Random
  • Mortality Model
ذکایی، محمد و مقصودی، مسطوره. (۱۳۸۹). ‌بازسازی مدل‌های مرگ‌ومیر بر پایه شکنندگی با استفاده از تعمیم توزیع گومپرتز‌. فصلنامه صنعت بیمه، ۲۵‌(۴): 85-59.
شجاعی‌آذر، زهرا و حسن‌زاده، امین. (۱۳۹۳). ‌کاربرد مدل‌های فاز-‌نوع در مدل‌بندی مرگ‌ومیر‌. پژوهشنامه بیمه، ۲۹‌(۱): 126-۱05.
کمیجانی، اکبر.، کوششی، مجید و نیاکان، لیلی. (۱۳۹۲). ‌برآورد و پیش‌بینی نرخ مرگ‌ومیر در ایران با استفاده از مدل لی-کارتر‌. پژوهشنامه بیمه، ۲۸‌(۴): 25-1.
مهدوی، غدیر.، دقیقی اصل، علیرضا و لطفی، نیر. (۱۳۹۰). ‌کاربرد یک مدل مرگ‌ومیر با چند عامل ریسک در فسخ قراردادهای بیمه عمر (مورد مطالعه: یک شرکت بیمه)‌. پژوهشنامه بیمه، ۲۶‌(۳): 28-1.
Catchpole, E. A. & Morgan, B. J. T. (1997). Detecting parameter redundancy. Biometrika, 84(1): 187–196.
Chen, Q., May, R. C., Ibrahim, J. G., Chu, H. & Cole, S. R. (2014). Joint modeling of longitudinal and survival data with missing and left-censored time-varying covariates. Statistics in Medicine, 33(26): 4560–4576.
Dempster, A. P., Laird, N. M. & Rubin D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society‌, 39(1): 1–38.
Dickson, D., Hardy, M. & Waters, H. (2013). Actuarial mathematics for life contingent risks. international series on actuarial science. Cambridge University Press.
Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G. & Barceló-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3): 279–300.
Gompertz, B. (1825). On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philosophical Transactions of the Royal Society of London, 115: 513–583.
Herring, A. H. & Ibrahim, J. G. (2001). Likelihood-Based methods for missing covariates in the cox proportional hazards model. Journal of the American Statistical Association, 96(453): 292–302.
Lin, X. S. & Liu, X. (2007). Markov aging process and Phase-Type law of mortality. North American Actuarial Journal. 11(4): 92–109.
Little, R. & An, H. (2004). Robust Likelihood-Based analysis of multivariate data with missing values. Statistica Sinica, 14(3): 949–968.
Lord, F. M. (1955). Estimation of parameters from incomplete data. Journal of the American Statistical Association, 50(271): 870–876.
Madrigal, A. M., Matthews, F. E., Patel, D., Gaches, A. & Baxter, S. (2011). What longevity predictors should be allowed for when valuing pension scheme liabilities? British Actuarial Journal, 16(1): 1–38.
Macdonald, A. S., Richards, S. J. & Currie, I. D. (2018). Modelling mortality with actuarial applications. International Series on Actuarial Science. Cambridge University Press.
McLachlan, G. & Peel, D. (2000). Finite mixture models. Wiley Series in Probability and Statistics, New York.
Richards, S. J. (2016). Mis-Estimation risk: Measurement and impact. British Actuarial Journal, 21(3): 429–457.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3): 581–592.
Schluchter, M. D. & Jackson, K. L. (1989). Log-Linear analysis of censored survival data with partially observed covariates. Journal of the American Statistical Association, 84(405): 42–52.
Titterington, D. M., Smith, A. F. M. & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York, Wiley.
Tsiatis, A. (2007). Semiparametric theory and missing data. Springer Science & Business Media.
Ungolo, F., Christiansen, M. C., Kleinow, T. & MacDonald, A. S. (2019). Survival analysis of pension scheme mortality when data are missing. Scandinavian Actuarial Journal, 2019 (6): 523–547.
Watanabe, S. (2010). Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(116): 3571–3594.
Wilks, S. S. (1932). Moments and distributions of estimates of population parameters from fragmentary samples. The Annals of Mathematical Statistics, 3(3): 163–195.
Xu, Y., Kim, J. K. & Li, Y. (2017). Semiparametric estimation for measurement error models with validation data. Canadian Journal of Statistics, 45(2): 185–201.
Yashin, A. (2001). Mortality models incorporating theoretical concepts of ageing. In Forecasting Mortality in Developed Countries, 261–280.