AbstractΒΆ

I scoured the web for a fun dataset that would be relatable to real life. I had a couple of close brushes with death in the Summer with my grandma passing away. So that led me to be interested in when I may be dying. This led to the question of what variables come into play for an estimated life expectancy. The Kaggle dataset collected data from WHO and many other sources to bring together a somewhat comprehensive dataset of potential covariates that come together with recorded life expectancy in various parts of the world. Within this statistical analysis, I will look at the various features and trim them down to a model that will estimate the life expectancy of an individual, no matter their place in the world, with a handful of variables! The final model trimmed down 20 potential Xi’s to an easy-to-manage 5 covariates predicting the life expectancy of an individual while only suffering a ~2% decrease in R^2 value from the full to reduced. This reduced model uses variables from Adult_mortality, Schooling, Incidents_HIV, Alcohol_consumption, and Hepatitis_B to estimate average longevity within the population. The final model had 95.44% of the variance in Y controlled for using these 5 covariates. 

IntroductionΒΆ

This project started when I found the Life Expectancy (WHO) Fixed dataset on Kaggle. Having always wondered to what degree each variable in life contributes to the overall estimation of life expectancy throughout the world I set to work. The dataset had variables such as: "Country"-  Which of the 179 countries the data was collected from, "Region"- one of 9 geographic regions each country fell into, "Year"- The Years the data were collected originally from the countries (from 2000-2015), "Infant_deaths"- infant deaths per 1000 population,  "Under_five_deaths"- death of children under age of 5 per 1000 population,  "Adult_mortality" - deat grate of adults per 1000 population, "Alcohol_consumption"- Liters of pure alcohol per capita 15+ years old,  "Hepatitis_B"-% coverage of vaccine in 1 year olds , "Measles"-% coverage of vaccine in 1 year olds ,  "BMI"- Weight in kg/height in m^2, "Polio”-% coverage of vaccine in 1 year olds , "Diphtheria"- % coverage of vaccine in 1 year olds, "Incidents_HIV"- incidents of HIV per 1000 pop aged 15-49, "GDP_per_capita"- GDP per Capita in USD, "Population_mln"- Population in millions, "Thinness_ten_nineteen_years”- 2 standard deviations below average BMI in 10-19 year olds, "Thinness_five_nine_years"- 2 standard deviations below average BMI in 5-9 year olds, "Schooling"- Average years spent in formal education for 25+ year olds, "Economy_status_Developed"- developed country, "Economy_status_Developing"- Developing country. Armed with these data points, all I had to do was build out and then interpret the model. I found through the exploratory portion that many of the higher-scoring life expectancies were already developed countries, and considered part of the β€˜West’ culturally. 

Using the above-mentioned data points as Xi values, I hoped to build a somewhat accurate model that investigates the relationship between these various factors and Yi β€œLife_expectancy”-the average age an adult with various backgrounds and country/regions could expect to live. While controlling for other various sources of variance through the nature of assumptions and coefficient modeling!

MethodsΒΆ

The Initial Full Model used all variables (except the developing country) to create a prediction.

Yi - β€œLife_expectancy”- average life expectancy of an adult

Xi1 - "Country"-  Which of the 179 countries the data was collected from for i-th observation
Xi2 - "Region"- one of 9 geographic regions each country fell into for i-th observation 
Xi3 - "Year"- The Years the data were collected originally from the countries (from 2000-2015)for i-th observation

Xi4 - "Infant_deaths"- infant deaths per 1000 population for i-th observation
Xi5 - "Under_five_deaths"- death of children under age of 5 per 1000 population for i-th observation 

Xi6 - "Adult_mortality" - death grate of adults per 1000 population for i-th observation
Xi7 - "Alcohol_consumption"- Liters of pure alcohol per capita 15+ years old for i-th observation 
Xi8 - "Hepatitis_B"-% coverage of vaccine in 1 year olds for i-th observation 
Xi9 - "Measles"-% coverage of vaccine in 1 year olds for i-th observation 
Xi10 -  "BMI"- Weight in kg/height in m^2 for i-th observation
Xi11 -"Polio”-% coverage of vaccine in 1 year olds for i-th observation
Xi12 - "Diphtheria"- % coverage of vaccine in 1 year olds for i-th observation 
Xi13 - β€œIncidents_HIV"- incidents of HIV per 1000 pop aged 15-49 for i-th observation
Xi14 - "GDP_per_capita"- GDP per Capita in USD for i-th observation  
Xi15 - "Population_mln"- Population in millions for i-th observation
Xi16 - "Thinness_ten_nineteen_years”- 2 standard deviations below average BMI in 10-19 year olds for i-th observation

Xi17 -  "Thinness_five_nine_years"- 2 standard deviations below average BMI in 5-9 year olds for i-th observation
Xi18 - "Schooling"- Average years spent in formal education for 25+ year olds for i-th observation

Xi19 - "Economy_status_Developed"- developed country for i-th observation

Once I had the initial results. I removed the Economy developing status as it was redundant with the binary economy status developed feature. Then I narrowed down the variables primarily through multicollinearity. To find the potential problem variables, I could have pushed out a pairs plot for each variable; however, it made the plots super small to see and not very useful. As I was not interested in individually parsing out the plots and investigating, I instead used a heatmap of the correlations, which was not very helpful either. VIF was not possible as some of the variables had perfect multicollinearity, and with the plots being fickle before I even started exploring the data, I was locked out a bit with a laptop. So I set up a bit of a black box and hardcoded any variables that had a correlation coefficient higher than 0.7 to be segmented into their own data frame, then removed Life_expectancy from the data set. I looked at the variables that were pulled, and it made sense that these variables had high correlation with each other: "Infant_deaths'', "Under_five_deaths", "Polio", "Diphtheria", "Thinness_ten_nineteen_years". Comparing these columns to the heatmap and some research, I found that it was clear they had high multicollinearity, as Infant deaths would be related to deaths under five. Polio and Diphtheria are generalized vaccines, and if one was not vaccinated for then the other would typically not be either, indicative more of the healthcare in the particular country than anything. And Thinness ten to nineteen (measure of BMI for 10-19 year olds) would directly overlap with BMI and the other thinness measure. Once I removed these values and separated the data into regions rather than down to a country level, I was able to use the more useful tools such as Variance Inflation Factor, pairing scatterplots, and honing the model with the regsubsets package. Using VIF, I found the Region variable (which separated the world into 9 distinct areas) to be causing issues with multicollinearity as well. Years were also not indicative of the length of an average person’s life, as this just measured the time the data was collected. With these 12 covariates swiftly removed, I was able to move on to the actual model-building portion.

The Final Model used for estimation purposes cut many of the variables to a base 5 Xi.

Life_expectancy ~ Adult_mortality + Schooling + Incidents_HIV + Alcohol_consumption + Hepatitis_B

Yi - β€œLife_expectancy”- average life expectancy of an adult

Xi1 - "Adult_mortality" - death rate of adults per 1000 population for i-th observation
Xi2 - "Schooling"- Average years spent in formal education for 25+ year olds for i-th observation

Xi3 - β€œIncidents_HIV"- incidents of HIV per 1000 pop aged 15-49 for i-th observation
Xi4 - "Alcohol_consumption"- Liters of pure alcohol per capita 15+ years old for i-th observation 
Xi5 - "Hepatitis_B"-% coverage of vaccine in 1 year olds for i-th observation 

ResultsΒΆ

The final fitted model output is below. I used a 99% confidence interval or alpha = 0.01 as there is a ton of variance in the data, but a large portion of the ages clump together around ~60-75 no matter the Region or other factors. The null hypothesis H0: Xin = 0 and alternative hypothesis Ha: Xin /= 0, we were able to reject the null even with the fully reduced final model at a 99% confidence. As the output shows- we had WELL below a 0.01 p-value and the bounds of the F-crit was simply above a 3.026 so, that further supported rejecting the Null Hypothesis. The 99% CI for all values is also below and the coefficients fall well within range. As you also probably suspected reading through the list of variables; adult mortality in a given country is a KEY indicator in the lifespan of any given individual. There are many factors that go into adult mortality rates but it is an overwhelmingly comprehensive measure of how well a country is taking care of its people and the assumed lifespan of an individual within it. Providing a description of 93.84% of the variance in Yi no matter the model. Next Schooling with 5.15%, then occurrence of HIV within a population for a good look over safe sex or personal health practices and availability for contraception helps to account for 0.568% variance in Yi. Then Liters of Alcohol consumption or how well entertained a population would be accounts for 0.267% of variance in Yi. And finally Hepatitis B vaccine rate accounting for 0.174% of the variance, which can also be looked at as availability of preventative health services for the population.

0
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

This dataset was pretty interesting and I wanted to do a bit of exploratory analysis on it first for a direction in how to build the model out. So, first I cleaned up the dataset further- as the dataset had 20 variables available to predict the Y with 2,864 data points I had to narrow it down as much as possible before using any model building/feature selection techniques.

I removed the developing column as it presented the same binary info as the developed country column and transformed Region into a factor to force R to dummy code the 9 regions for me.

Then I began trying to build the model out after visualizing some of the life_expectancy trends such as the developed countries having higher values, a heavy left skew to about 70-75 globally for life expectancy, and checking completeness of the dataset to prevent imputation errors.

Call:
lm(formula = Life_expectancy ~ Country + Region + Year + Infant_deaths + 
    Under_five_deaths + Adult_mortality + Alcohol_consumption + 
    Hepatitis_B + Measles + BMI + Polio + Diphtheria + Incidents_HIV + 
    GDP_per_capita + Population_mln + Thinness_ten_nineteen_years + 
    Thinness_five_nine_years + Schooling + Economy_status_Developed, 
    data = life)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0594 -0.2239 -0.0128  0.2094  5.2254 

Coefficients: (9 not defined because of singularities)
                                        Estimate Std. Error t value Pr(>|t|)
(Intercept)                           -2.030e+02  1.248e+01 -16.262  < 2e-16
CountryAlbania                         5.694e+00  3.867e-01  14.724  < 2e-16
CountryAlgeria                         4.940e+00  3.054e-01  16.177  < 2e-16
CountryAngola                         -3.614e+00  2.175e-01 -16.616  < 2e-16
CountryAntigua and Barbuda             7.061e+00  3.875e-01  18.222  < 2e-16
CountryArgentina                       6.694e+00  4.236e-01  15.805  < 2e-16
CountryArmenia                         5.511e+00  4.080e-01  13.507  < 2e-16
CountryAustralia                       8.968e+00  5.758e-01  15.575  < 2e-16
CountryAustria                         8.062e+00  4.823e-01  16.716  < 2e-16
CountryAzerbaijan                      3.928e+00  4.018e-01   9.776  < 2e-16
CountryBahamas, The                    5.655e+00  5.111e-01  11.064  < 2e-16
CountryBahrain                         3.817e+00  3.872e-01   9.857  < 2e-16
CountryBangladesh                      1.113e+00  2.704e-01   4.116 3.97e-05
CountryBarbados                        9.306e+00  4.477e-01  20.786  < 2e-16
CountryBelarus                         5.817e+00  4.251e-01  13.684  < 2e-16
CountryBelgium                         8.061e+00  4.875e-01  16.535  < 2e-16
CountryBelize                          6.318e+00  4.819e-01  13.112  < 2e-16
CountryBenin                          -1.233e+00  2.092e-01  -5.895 4.22e-09
CountryBhutan                          3.151e+00  2.099e-01  15.009  < 2e-16
CountryBolivia                         3.111e+00  3.272e-01   9.508  < 2e-16
CountryBosnia and Herzegovina          5.597e+00  3.433e-01  16.302  < 2e-16
CountryBotswana                        9.806e-01  3.575e-01   2.743 0.006138
CountryBrazil                          5.612e+00  3.771e-01  14.881  < 2e-16
CountryBrunei Darussalam               3.318e+00  4.514e-01   7.351 2.61e-13
CountryBulgaria                        5.587e+00  4.049e-01  13.800  < 2e-16
CountryBurkina Faso                   -2.520e+00  2.599e-01  -9.696  < 2e-16
CountryBurundi                        -2.656e+00  2.384e-01 -11.139  < 2e-16
CountryCabo Verde                      2.101e+00  2.649e-01   7.930 3.18e-15
CountryCambodia                       -3.525e-01  2.080e-01  -1.695 0.090181
CountryCameroon                       -2.524e-01  2.322e-01  -1.087 0.277207
CountryCanada                          9.169e+00  5.386e-01  17.024  < 2e-16
CountryCentral African Republic       -1.745e+00  2.235e-01  -7.809 8.22e-15
CountryChad                           -2.455e+00  2.481e-01  -9.896  < 2e-16
CountryChile                           8.566e+00  4.497e-01  19.051  < 2e-16
CountryChina                           2.631e+00  1.757e+00   1.498 0.134357
CountryColombia                        7.318e+00  3.254e-01  22.490  < 2e-16
CountryComoros                         3.750e-01  2.101e-01   1.785 0.074382
CountryCongo, Dem. Rep.               -2.517e+00  2.157e-01 -11.669  < 2e-16
CountryCongo, Rep.                    -2.148e-01  2.250e-01  -0.955 0.339679
CountryCosta Rica                      8.398e+00  4.021e-01  20.885  < 2e-16
CountryCote d'Ivoire                  -2.638e-01  2.153e-01  -1.225 0.220520
CountryCroatia                         6.292e+00  4.384e-01  14.351  < 2e-16
CountryCuba                            7.540e+00  3.821e-01  19.733  < 2e-16
CountryCyprus                          7.248e+00  4.844e-01  14.964  < 2e-16
CountryCzechia                         7.231e+00  4.841e-01  14.937  < 2e-16
CountryDenmark                         6.661e+00  5.369e-01  12.407  < 2e-16
CountryDjibouti                       -9.279e-02  2.302e-01  -0.403 0.686894
CountryDominican Republic              5.716e+00  3.311e-01  17.263  < 2e-16
CountryEcuador                         6.901e+00  3.661e-01  18.847  < 2e-16
CountryEgypt, Arab Rep.                4.600e+00  4.525e-01  10.167  < 2e-16
CountryEl Salvador                     5.732e+00  3.694e-01  15.518  < 2e-16
CountryEquatorial Guinea              -8.483e-01  2.665e-01  -3.183 0.001474
CountryEritrea                        -7.392e-01  2.743e-01  -2.695 0.007086
CountryEstonia                         6.837e+00  4.646e-01  14.716  < 2e-16
CountryEswatini                       -3.575e+00  4.791e-01  -7.461 1.15e-13
CountryEthiopia                       -7.916e-01  2.588e-01  -3.059 0.002242
CountryFiji                            1.999e+00  4.160e-01   4.807 1.62e-06
CountryFinland                         8.029e+00  5.113e-01  15.703  < 2e-16
CountryFrance                          9.347e+00  4.545e-01  20.563  < 2e-16
CountryGabon                           6.503e-01  2.860e-01   2.274 0.023056
CountryGambia, The                    -1.818e+00  2.226e-01  -8.167 4.84e-16
CountryGeorgia                         5.508e+00  4.412e-01  12.485  < 2e-16
CountryGermany                         8.244e+00  5.281e-01  15.611  < 2e-16
CountryGhana                          -8.546e-01  2.299e-01  -3.717 0.000206
CountryGreece                          8.723e+00  4.617e-01  18.894  < 2e-16
CountryGrenada                         5.678e+00  3.737e-01  15.197  < 2e-16
CountryGuatemala                       5.264e+00  3.089e-01  17.042  < 2e-16
CountryGuinea                         -2.174e+00  2.020e-01 -10.765  < 2e-16
CountryGuinea-Bissau                  -4.292e+00  2.137e-01 -20.081  < 2e-16
CountryGuyana                          3.714e+00  3.322e-01  11.180  < 2e-16
CountryHaiti                           4.362e-01  2.434e-01   1.792 0.073194
CountryHonduras                        5.595e+00  3.168e-01  17.659  < 2e-16
CountryHungary                         6.724e+00  4.529e-01  14.847  < 2e-16
CountryIceland                         8.515e+00  5.241e-01  16.247  < 2e-16
CountryIndia                           1.201e+00  1.598e+00   0.752 0.452362
CountryIndonesia                       1.950e+00  3.573e-01   5.459 5.23e-08
CountryIran, Islamic Rep.              4.321e+00  3.410e-01  12.670  < 2e-16
CountryIraq                            4.097e+00  4.059e-01  10.095  < 2e-16
CountryIreland                         7.894e+00  5.641e-01  13.995  < 2e-16
CountryIsrael                          9.012e+00  5.201e-01  17.325  < 2e-16
CountryItaly                           8.762e+00  4.401e-01  19.909  < 2e-16
CountryJamaica                         6.719e+00  3.924e-01  17.123  < 2e-16
CountryJapan                           8.825e+00  4.345e-01  20.311  < 2e-16
CountryJordan                          5.494e+00  4.912e-01  11.183  < 2e-16
CountryKazakhstan                      5.000e+00  4.084e-01  12.241  < 2e-16
CountryKenya                          -5.759e-01  2.219e-01  -2.595 0.009510
CountryKiribati                        3.976e+00  5.072e-01   7.838 6.55e-15
CountryKuwait                          3.054e+00  5.550e-01   5.503 4.09e-08
CountryKyrgyz Republic                 4.932e+00  3.660e-01  13.476  < 2e-16
CountryLao PDR                         3.858e-01  2.161e-01   1.786 0.074271
CountryLatvia                          6.806e+00  4.412e-01  15.427  < 2e-16
CountryLebanon                         7.025e+00  4.017e-01  17.490  < 2e-16
CountryLesotho                        -2.819e+00  3.763e-01  -7.492 9.19e-14
CountryLiberia                        -4.736e-01  2.380e-01  -1.990 0.046692
CountryLibya                           4.087e+00  4.189e-01   9.756  < 2e-16
CountryLithuania                       7.145e+00  4.499e-01  15.881  < 2e-16
CountryLuxembourg                      6.303e+00  7.731e-01   8.153 5.41e-16
CountryMadagascar                      1.302e-01  2.415e-01   0.539 0.590038
CountryMalawi                         -9.598e-01  2.331e-01  -4.118 3.93e-05
CountryMalaysia                        4.649e+00  3.437e-01  13.527  < 2e-16
CountryMaldives                        3.619e+00  2.734e-01  13.237  < 2e-16
CountryMali                           -3.656e+00  2.203e-01 -16.594  < 2e-16
CountryMalta                           8.423e+00  4.639e-01  18.157  < 2e-16
CountryMauritania                      6.668e-01  2.381e-01   2.801 0.005136
CountryMauritius                       4.908e+00  3.217e-01  15.255  < 2e-16
CountryMexico                          6.821e+00  4.306e-01  15.840  < 2e-16
CountryMicronesia, Fed. Sts.           1.587e+00  4.904e-01   3.237 0.001224
CountryMoldova                         4.677e+00  4.143e-01  11.289  < 2e-16
CountryMongolia                        3.221e+00  3.435e-01   9.378  < 2e-16
CountryMontenegro                      5.990e+00  4.171e-01  14.363  < 2e-16
CountryMorocco                         3.633e+00  2.910e-01  12.485  < 2e-16
CountryMozambique                     -2.513e+00  2.132e-01 -11.784  < 2e-16
CountryMyanmar                        -9.138e-01  1.980e-01  -4.615 4.12e-06
CountryNamibia                        -6.691e-01  2.709e-01  -2.470 0.013560
CountryNepal                           6.133e-01  2.087e-01   2.938 0.003331
CountryNetherlands                     7.522e+00  5.022e-01  14.979  < 2e-16
CountryNew Zealand                     9.163e+00  5.421e-01  16.902  < 2e-16
CountryNicaragua                       5.663e+00  3.517e-01  16.102  < 2e-16
CountryNiger                          -2.616e+00  2.933e-01  -8.918  < 2e-16
CountryNigeria                        -2.359e+00  2.793e-01  -8.445  < 2e-16
CountryNorth Macedonia                 4.920e+00  3.826e-01  12.857  < 2e-16
CountryNorway                          7.456e+00  6.467e-01  11.529  < 2e-16
CountryOman                            4.609e+00  3.844e-01  11.989  < 2e-16
CountryPakistan                        1.102e+00  2.873e-01   3.835 0.000128
CountryPanama                          8.037e+00  3.956e-01  20.316  < 2e-16
CountryPapua New Guinea               -7.103e-01  2.582e-01  -2.751 0.005986
CountryParaguay                        5.427e+00  3.365e-01  16.125  < 2e-16
CountryPeru                            5.928e+00  3.504e-01  16.916  < 2e-16
CountryPhilippines                     4.326e+00  2.881e-01  15.017  < 2e-16
CountryPoland                          7.206e+00  4.317e-01  16.693  < 2e-16
CountryPortugal                        7.707e+00  3.988e-01  19.325  < 2e-16
CountryQatar                           6.504e+00  6.388e-01  10.182  < 2e-16
CountryRomania                         5.615e+00  3.965e-01  14.160  < 2e-16
CountryRussian Federation              5.453e+00  4.387e-01  12.428  < 2e-16
CountryRwanda                         -6.380e-01  2.513e-01  -2.539 0.011174
CountrySamoa                           5.919e+00  6.322e-01   9.362  < 2e-16
CountrySao Tome and Principe           1.231e+00  2.526e-01   4.873 1.16e-06
CountrySaudi Arabia                    4.200e+00  4.756e-01   8.831  < 2e-16
CountrySenegal                        -5.244e-01  2.101e-01  -2.496 0.012637
CountrySerbia                          5.262e+00  3.942e-01  13.347  < 2e-16
CountrySeychelles                      5.067e+00  3.924e-01  12.915  < 2e-16
CountrySierra Leone                   -1.847e+00  2.292e-01  -8.055 1.18e-15
CountrySingapore                       6.620e+00  4.605e-01  14.377  < 2e-16
CountrySlovak Republic                 6.088e+00  4.348e-01  14.003  < 2e-16
CountrySlovenia                        8.062e+00  4.603e-01  17.515  < 2e-16
CountrySolomon Islands                 3.879e+00  3.039e-01  12.765  < 2e-16
CountrySomalia                        -7.450e-01  2.092e-01  -3.562 0.000375
CountrySouth Africa                    3.524e+00  4.236e-01   8.318  < 2e-16
CountrySpain                           9.240e+00  4.378e-01  21.103  < 2e-16
CountrySri Lanka                       5.106e+00  3.378e-01  15.114  < 2e-16
CountrySt. Lucia                       8.232e+00  4.700e-01  17.514  < 2e-16
CountrySt. Vincent and the Grenadines  4.992e+00  3.748e-01  13.318  < 2e-16
CountrySuriname                        4.244e+00  3.458e-01  12.273  < 2e-16
CountrySweden                          8.180e+00  5.313e-01  15.397  < 2e-16
CountrySwitzerland                     7.845e+00  6.503e-01  12.064  < 2e-16
CountrySyrian Arab Republic            6.000e+00  3.935e-01  15.251  < 2e-16
CountryTajikistan                      1.318e+00  3.596e-01   3.666 0.000252
CountryTanzania                       -7.783e-01  2.252e-01  -3.456 0.000558
CountryThailand                        5.500e+00  2.812e-01  19.556  < 2e-16
CountryTimor-Leste                    -6.180e-02  2.301e-01  -0.269 0.788320
CountryTogo                           -2.133e+00  2.047e-01 -10.424  < 2e-16
CountryTonga                           4.654e+00  6.688e-01   6.959 4.29e-12
CountryTrinidad and Tobago             5.753e+00  4.265e-01  13.491  < 2e-16
CountryTunisia                         4.951e+00  3.340e-01  14.824  < 2e-16
CountryTurkiye                         5.427e+00  3.972e-01  13.662  < 2e-16
CountryTurkmenistan                    2.443e+00  3.633e-01   6.724 2.15e-11
CountryUganda                         -7.137e-01  2.593e-01  -2.753 0.005946
CountryUkraine                         5.967e+00  4.054e-01  14.717  < 2e-16
CountryUnited Arab Emirates            4.693e+00  5.600e-01   8.380  < 2e-16
CountryUnited Kingdom                  8.280e+00  5.484e-01  15.100  < 2e-16
CountryUnited States                   8.232e+00  7.246e-01  11.360  < 2e-16
CountryUruguay                         7.016e+00  4.039e-01  17.370  < 2e-16
CountryUzbekistan                      3.392e+00  3.719e-01   9.121  < 2e-16
CountryVanuatu                         2.007e+00  3.312e-01   6.059 1.56e-09
CountryVenezuela, RB                   5.717e+00  3.888e-01  14.706  < 2e-16
CountryVietnam                         4.589e+00  2.860e-01  16.042  < 2e-16
CountryYemen, Rep.                     1.723e+00  2.083e-01   8.272  < 2e-16
CountryZambia                         -1.042e+00  2.605e-01  -4.000 6.50e-05
CountryZimbabwe                        1.304e-01  2.972e-01   0.439 0.660929
RegionAsia                                    NA         NA      NA       NA
RegionCentral America and Caribbean           NA         NA      NA       NA
RegionEuropean Union                          NA         NA      NA       NA
RegionMiddle East                             NA         NA      NA       NA
RegionNorth America                           NA         NA      NA       NA
RegionOceania                                 NA         NA      NA       NA
RegionRest of Europe                          NA         NA      NA       NA
RegionSouth America                           NA         NA      NA       NA
Year                                   1.435e-01  6.719e-03  21.352  < 2e-16
Infant_deaths                         -8.570e-03  7.114e-03  -1.205 0.228456
Under_five_deaths                     -4.410e-02  3.835e-03 -11.499  < 2e-16
Adult_mortality                       -4.163e-02  6.555e-04 -63.503  < 2e-16
Alcohol_consumption                   -2.697e-02  1.197e-02  -2.253 0.024326
Hepatitis_B                            1.869e-03  1.375e-03   1.359 0.174179
Measles                                2.034e-03  1.374e-03   1.480 0.139041
BMI                                   -4.152e-01  6.403e-02  -6.484 1.06e-10
Polio                                  7.760e-04  2.736e-03   0.284 0.776735
Diphtheria                             8.859e-03  2.746e-03   3.226 0.001270
Incidents_HIV                          1.555e-01  2.458e-02   6.323 2.99e-10
GDP_per_capita                         2.896e-05  6.078e-06   4.766 1.98e-06
Population_mln                         1.954e-04  1.369e-03   0.143 0.886510
Thinness_ten_nineteen_years           -1.400e-02  7.219e-03  -1.940 0.052538
Thinness_five_nine_years              -1.285e-02  7.160e-03  -1.794 0.072870
Schooling                             -9.822e-02  2.869e-02  -3.423 0.000628
Economy_status_Developed                      NA         NA      NA       NA
                                         
(Intercept)                           ***
CountryAlbania                        ***
CountryAlgeria                        ***
CountryAngola                         ***
CountryAntigua and Barbuda            ***
CountryArgentina                      ***
CountryArmenia                        ***
CountryAustralia                      ***
CountryAustria                        ***
CountryAzerbaijan                     ***
CountryBahamas, The                   ***
CountryBahrain                        ***
CountryBangladesh                     ***
CountryBarbados                       ***
CountryBelarus                        ***
CountryBelgium                        ***
CountryBelize                         ***
CountryBenin                          ***
CountryBhutan                         ***
CountryBolivia                        ***
CountryBosnia and Herzegovina         ***
CountryBotswana                       ** 
CountryBrazil                         ***
CountryBrunei Darussalam              ***
CountryBulgaria                       ***
CountryBurkina Faso                   ***
CountryBurundi                        ***
CountryCabo Verde                     ***
CountryCambodia                       .  
CountryCameroon                          
CountryCanada                         ***
CountryCentral African Republic       ***
CountryChad                           ***
CountryChile                          ***
CountryChina                             
CountryColombia                       ***
CountryComoros                        .  
CountryCongo, Dem. Rep.               ***
CountryCongo, Rep.                       
CountryCosta Rica                     ***
CountryCote d'Ivoire                     
CountryCroatia                        ***
CountryCuba                           ***
CountryCyprus                         ***
CountryCzechia                        ***
CountryDenmark                        ***
CountryDjibouti                          
CountryDominican Republic             ***
CountryEcuador                        ***
CountryEgypt, Arab Rep.               ***
CountryEl Salvador                    ***
CountryEquatorial Guinea              ** 
CountryEritrea                        ** 
CountryEstonia                        ***
CountryEswatini                       ***
CountryEthiopia                       ** 
CountryFiji                           ***
CountryFinland                        ***
CountryFrance                         ***
CountryGabon                          *  
CountryGambia, The                    ***
CountryGeorgia                        ***
CountryGermany                        ***
CountryGhana                          ***
CountryGreece                         ***
CountryGrenada                        ***
CountryGuatemala                      ***
CountryGuinea                         ***
CountryGuinea-Bissau                  ***
CountryGuyana                         ***
CountryHaiti                          .  
CountryHonduras                       ***
CountryHungary                        ***
CountryIceland                        ***
CountryIndia                             
CountryIndonesia                      ***
CountryIran, Islamic Rep.             ***
CountryIraq                           ***
CountryIreland                        ***
CountryIsrael                         ***
CountryItaly                          ***
CountryJamaica                        ***
CountryJapan                          ***
CountryJordan                         ***
CountryKazakhstan                     ***
CountryKenya                          ** 
CountryKiribati                       ***
CountryKuwait                         ***
CountryKyrgyz Republic                ***
CountryLao PDR                        .  
CountryLatvia                         ***
CountryLebanon                        ***
CountryLesotho                        ***
CountryLiberia                        *  
CountryLibya                          ***
CountryLithuania                      ***
CountryLuxembourg                     ***
CountryMadagascar                        
CountryMalawi                         ***
CountryMalaysia                       ***
CountryMaldives                       ***
CountryMali                           ***
CountryMalta                          ***
CountryMauritania                     ** 
CountryMauritius                      ***
CountryMexico                         ***
CountryMicronesia, Fed. Sts.          ** 
CountryMoldova                        ***
CountryMongolia                       ***
CountryMontenegro                     ***
CountryMorocco                        ***
CountryMozambique                     ***
CountryMyanmar                        ***
CountryNamibia                        *  
CountryNepal                          ** 
CountryNetherlands                    ***
CountryNew Zealand                    ***
CountryNicaragua                      ***
CountryNiger                          ***
CountryNigeria                        ***
CountryNorth Macedonia                ***
CountryNorway                         ***
CountryOman                           ***
CountryPakistan                       ***
CountryPanama                         ***
CountryPapua New Guinea               ** 
CountryParaguay                       ***
CountryPeru                           ***
CountryPhilippines                    ***
CountryPoland                         ***
CountryPortugal                       ***
CountryQatar                          ***
CountryRomania                        ***
CountryRussian Federation             ***
CountryRwanda                         *  
CountrySamoa                          ***
CountrySao Tome and Principe          ***
CountrySaudi Arabia                   ***
CountrySenegal                        *  
CountrySerbia                         ***
CountrySeychelles                     ***
CountrySierra Leone                   ***
CountrySingapore                      ***
CountrySlovak Republic                ***
CountrySlovenia                       ***
CountrySolomon Islands                ***
CountrySomalia                        ***
CountrySouth Africa                   ***
CountrySpain                          ***
CountrySri Lanka                      ***
CountrySt. Lucia                      ***
CountrySt. Vincent and the Grenadines ***
CountrySuriname                       ***
CountrySweden                         ***
CountrySwitzerland                    ***
CountrySyrian Arab Republic           ***
CountryTajikistan                     ***
CountryTanzania                       ***
CountryThailand                       ***
CountryTimor-Leste                       
CountryTogo                           ***
CountryTonga                          ***
CountryTrinidad and Tobago            ***
CountryTunisia                        ***
CountryTurkiye                        ***
CountryTurkmenistan                   ***
CountryUganda                         ** 
CountryUkraine                        ***
CountryUnited Arab Emirates           ***
CountryUnited Kingdom                 ***
CountryUnited States                  ***
CountryUruguay                        ***
CountryUzbekistan                     ***
CountryVanuatu                        ***
CountryVenezuela, RB                  ***
CountryVietnam                        ***
CountryYemen, Rep.                    ***
CountryZambia                         ***
CountryZimbabwe                          
RegionAsia                               
RegionCentral America and Caribbean      
RegionEuropean Union                     
RegionMiddle East                        
RegionNorth America                      
RegionOceania                            
RegionRest of Europe                     
RegionSouth America                      
Year                                  ***
Infant_deaths                            
Under_five_deaths                     ***
Adult_mortality                       ***
Alcohol_consumption                   *  
Hepatitis_B                              
Measles                                  
BMI                                   ***
Polio                                    
Diphtheria                            ** 
Incidents_HIV                         ***
GDP_per_capita                        ***
Population_mln                           
Thinness_ten_nineteen_years           .  
Thinness_five_nine_years              .  
Schooling                             ***
Economy_status_Developed                 
---
Signif. codes:  0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Residual standard error: 0.486 on 2669 degrees of freedom
Multiple R-squared:  0.9975,	Adjusted R-squared:  0.9973 
F-statistic:  5513 on 194 and 2669 DF,  p-value: < 2.2e-16
A anova: 18 Γ— 5
DfSum SqMean SqF valuePr(>F)
<int><dbl><dbl><dbl><dbl>
Country 1782.409342e+051.353563e+035.730201e+030.000000e+00
Year 17.699873e+037.699873e+033.259680e+040.000000e+00
Infant_deaths 12.070251e+032.070251e+038.764246e+030.000000e+00
Under_five_deaths 14.835294e+024.835294e+022.046983e+030.000000e+00
Adult_mortality 11.408345e+031.408345e+035.962118e+030.000000e+00
Alcohol_consumption 11.714765e+001.714765e+007.259320e+007.097574e-03
Hepatitis_B 15.448582e+005.448582e+002.306614e+011.651718e-06
Measles 17.636172e-017.636172e-013.232713e+007.229398e-02
BMI 11.343557e+011.343557e+015.687844e+016.316136e-14
Polio 14.908968e+004.908968e+002.078173e+015.378995e-06
Diphtheria 13.135116e+003.135116e+001.327227e+012.744917e-04
Incidents_HIV 18.630708e+008.630708e+003.653742e+011.706632e-09
GDP_per_capita 14.760226e+004.760226e+002.015204e+017.455780e-06
Population_mln 17.435504e-027.435504e-023.147762e-015.748111e-01
Thinness_ten_nineteen_years 13.637577e+003.637577e+001.539940e+018.921391e-05
Thinness_five_nine_years 17.228397e-017.228397e-013.060085e+008.035240e-02
Schooling 12.768456e+002.768456e+001.172004e+016.276284e-04
Residuals26696.304593e+022.362156e-01 NA NA
A matrix: 18 Γ— 18 of type dbl
YearInfant_deathsUnder_five_deathsAdult_mortalityAlcohol_consumptionHepatitis_BMeaslesBMIPolioDiphtheriaIncidents_HIVGDP_per_capitaPopulation_mlnThinness_ten_nineteen_yearsThinness_five_nine_yearsSchoolingEconomy_status_DevelopedLife_expectancy
Year 1.0000000000-0.17240170-0.176392621-0.15865958-0.0006105222 0.17682407 0.08594472 0.1614225 0.13985840 0.14514293-0.08174257 0.04099817 0.015157618-0.04490053-0.04803775 0.15053937 0.00000000 0.17435894
Infant_deaths-0.1724016970 1.00000000 0.985651346 0.79466086-0.4545261472-0.51256224-0.52628201-0.6619883-0.74079046-0.72187465 0.34945826-0.51228611 0.007621990 0.49119174 0.47763934-0.78851253-0.47586620-0.92003192
Under_five_deaths-0.1763926213 0.98565135 1.000000000 0.80236112-0.4093673971-0.50742741-0.51297174-0.6652550-0.74298347-0.72535503 0.36961773-0.46968167-0.005234231 0.46697846 0.45075570-0.77319598-0.42713418-0.92041913
Adult_mortality-0.1586595781 0.79466086 0.802361123 1.00000000-0.2447937555-0.34488221-0.41615254-0.5228655-0.52422554-0.51380270 0.69911938-0.51012141-0.053847680 0.38214030 0.37979229-0.58103548-0.42937477-0.94536036
Alcohol_consumption-0.0006105222-0.45452615-0.409367397-0.24479376 1.0000000000 0.16843582 0.31860293 0.2840319 0.30192623 0.29901592-0.03411801 0.44396595-0.039118659-0.44636618-0.43302972 0.61572804 0.67036609 0.39915911
Hepatitis_B 0.1768240714-0.51256224-0.507427407-0.34488221 0.1684358238 1.00000000 0.42916779 0.3454209 0.72434526 0.76178009-0.07578195 0.15937504-0.082396398-0.20845350-0.21379442 0.34764345 0.11353405 0.41780443
Measles 0.0859447214-0.52628201-0.512971742-0.41615254 0.3186029309 0.42916779 1.00000000 0.4163214 0.51409629 0.49405877-0.15058000 0.31372372-0.098221891-0.34070533-0.36696995 0.49839128 0.29869329 0.49001859
BMI 0.1614224541-0.66198827-0.665255042-0.52286551 0.2840319455 0.34542091 0.41632141 1.0000000 0.45720604 0.42650090-0.16114208 0.33617960-0.166482004-0.59648328-0.59911219 0.63547517 0.24328705 0.59842332
Polio 0.1398583960-0.74079046-0.742983474-0.52422554 0.3019262324 0.72434526 0.51409629 0.4572060 1.00000000 0.95317790-0.14795220 0.31378567-0.033485888-0.31268545-0.30699811 0.55276511 0.28326012 0.64121746
Diphtheria 0.1451429275-0.72187465-0.725355032-0.51380270 0.2990159210 0.76178009 0.49405877 0.4265009 0.95317790 1.00000000-0.14693191 0.31332094-0.027335977-0.30446625-0.29559745 0.53562097 0.28941718 0.62754139
Incidents_HIV-0.0817425731 0.34945826 0.369617726 0.69911938-0.0341180147-0.07578195-0.15058000-0.1611421-0.14795220-0.14693191 1.00000000-0.16958972-0.058039708 0.18876454 0.19384734-0.20124620-0.17563524-0.55302746
GDP_per_capita 0.0409981721-0.51228611-0.469681668-0.51012141 0.4439659537 0.15937504 0.31372372 0.3361796 0.31378567 0.31332094-0.16958972 1.00000000-0.040838867-0.37526974-0.38103211 0.58062592 0.66754691 0.58308972
Population_mln 0.0151576184 0.00762199-0.005234231-0.05384768-0.0391186595-0.08239640-0.09822189-0.1664820-0.03348589-0.02733598-0.05803971-0.04083887 1.000000000 0.25632201 0.25848584-0.03356182-0.03530183 0.02629788
Thinness_ten_nineteen_years-0.0449005325 0.49119174 0.466978458 0.38214030-0.4463661789-0.20845350-0.34070533-0.5964833-0.31268545-0.30446625 0.18876454-0.37526974 0.256322009 1.00000000 0.93875710-0.57148516-0.41609766-0.46782450
Thinness_five_nine_years-0.0480377469 0.47763934 0.450755699 0.37979229-0.4330297154-0.21379442-0.36696995-0.5991122-0.30699811-0.29559745 0.19384734-0.38103211 0.258485836 0.93875710 1.00000000-0.55137635-0.41486734-0.45816623
Schooling 0.1505393684-0.78851253-0.773195983-0.58103548 0.6157280403 0.34764345 0.49839128 0.6354752 0.55276511 0.53562097-0.20124620 0.58062592-0.033561816-0.57148516-0.55137635 1.00000000 0.59943940 0.73248447
Economy_status_Developed 0.0000000000-0.47586620-0.427134181-0.42937477 0.6703660889 0.11353405 0.29869329 0.2432870 0.28326012 0.28941718-0.17563524 0.66754691-0.035301833-0.41609766-0.41486734 0.59943940 1.00000000 0.52379098
Life_expectancy 0.1743589433-0.92003192-0.920419134-0.94536036 0.3991591076 0.41780443 0.49001859 0.5984233 0.64121746 0.62754139-0.55302746 0.58308972 0.026297880-0.46782450-0.45816623 0.73248447 0.52379098 1.00000000
No description has been provided for this image

Then to get the reduced model narrowed down I had to make some decisions over what was too much correlation between the values. As I did not print pair plots with 20 variables I instead removed country and year first. The year was just the time the data was collected and should have little real predictive power when moving to another dataset as there could be variables from outside the date range and break the model. Then I removed country. The level of granularity from country would have been useful for a long-term in depth project, but for a couple of weeks the Region conveys much the same information especially when coupled with the other data collected by WHO and each line of data basically being many points from different countries already it is redundant.

The Heatmap was confusing but there are a lot of SOLID colors within the graph so there must be strong correlations between the variables. This is where I took a more swathing approach and hard coded anything above a 0.7 correlation to be too colinear for my project scope.

  1. 2
  2. 3
  3. 9
  4. 10
  5. 14
  1. 'Infant_deaths'
  2. 'Under_five_deaths'
  3. 'Polio'
  4. 'Diphtheria'
  5. 'Thinness_ten_nineteen_years'
No description has been provided for this image

Given these features were above a 0.7 within the other variable selections it makes sense. As Infant_deaths would directly affect the number of Deaths under five. If the country was not well vaccinated against polio- an almost extinct disease, they would most likely not have the preventative care available for diphtheria either and these values would be better reflected using a more common vaccine such as Hepatitis B. And thinness from 10 to 19 would probably be explained by BMI straight out or by the continuation of thinness in the younger population. The younger population starving has higher instances of long term health effects as malnutrition during development could affect hormones through puberty, brain growth, learning, lifestyle prioritization in the maslow pyramid, and potential disability which would stunt life expectancy (NCBI, 3)

Call:
lm(formula = Life_expectancy ~ ., data = life_nocor)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.0277 -1.2763  0.0598  1.2788  9.2552 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)               3.911e+01  1.682e+01   2.325   0.0202 *  
Year                      1.736e-02  8.401e-03   2.066   0.0389 *  
Adult_mortality          -7.052e-02  7.155e-04 -98.560   <2e-16 ***
Alcohol_consumption       1.416e-01  1.437e-02   9.855   <2e-16 ***
Hepatitis_B               2.685e-02  2.773e-03   9.683   <2e-16 ***
Measles                   4.836e-03  2.492e-03   1.940   0.0525 .  
BMI                       2.841e-02  2.752e-02   1.032   0.3020    
Incidents_HIV             3.838e-01  2.562e-02  14.980   <2e-16 ***
GDP_per_capita            7.853e-06  3.303e-06   2.378   0.0175 *  
Population_mln            4.591e-05  2.937e-04   0.156   0.8758    
Thinness_five_nine_years -3.387e-03  1.170e-02  -0.289   0.7723    
Schooling                 5.220e-01  2.214e-02  23.573   <2e-16 ***
Economy_status_Developed  1.272e-01  1.576e-01   0.807   0.4199    
---
Signif. codes:  0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Residual standard error: 2.006 on 2851 degrees of freedom
Multiple R-squared:  0.9547,	Adjusted R-squared:  0.9545 
F-statistic:  5009 on 12 and 2851 DF,  p-value: < 2.2e-16
A matrix: 13 Γ— 2 of type dbl
0.5 %99.5 %
(Intercept)-4.252978e+00 8.247596e+01
Year-4.298335e-03 3.901067e-02
Adult_mortality-7.236575e-02-6.867719e-02
Alcohol_consumption 1.045465e-01 1.786086e-01
Hepatitis_B 1.970273e-02 3.399666e-02
Measles-1.588618e-03 1.125969e-02
BMI-4.252833e-02 9.934511e-02
Incidents_HIV 3.177369e-01 4.498011e-01
GDP_per_capita-6.591380e-07 1.636592e-05
Population_mln-7.111973e-04 8.030276e-04
Thinness_five_nine_years-3.355376e-02 2.677985e-02
Schooling 4.649087e-01 5.790576e-01
Economy_status_Developed-2.790990e-01 5.334208e-01
1.91121470498489
Year
1.0678269348341
Adult_mortality
4.81133232608706
Alcohol_consumption
2.3292571693254
Hepatitis_B
1.40001787949871
Measles
1.53934169307944
BMI
2.59460750860656
Incidents_HIV
2.64889458720439
GDP_per_capita
2.22623310236142
Population_mln
1.14390513757467
Thinness_five_nine_years
1.99632484687208
Schooling
3.51011901647122
Economy_status_Developed
2.90026350069789
  1. 'Adult_mortality'
  2. 'Alcohol_consumption'
  3. 'Hepatitis_B'
  4. 'Measles'
  5. 'BMI'
  6. 'Incidents_HIV'
  7. 'GDP_per_capita'
  8. 'Population_mln'
  9. 'Thinness_five_nine_years'
  10. 'Schooling'
  11. 'Economy_status_Developed'
  12. 'Life_expectancy'

Then ran a Variance Inflation Factor to make sure I was removing the appropriate values. After removing the high correlated values in the previous step I could actually run the VIF and found the majority of the values fell below the 10 threshold for removal except for Region. Which had a high multicollinearity with the other values at 40.608

So although region seemed like a strong variable a lot of the information that was presented drew lines and borders in the data anyway that made Region redundant and I removed it as well. Nothing would be gained from knowing the Country was Malaysia and region Asia if we already know all of the health outcomes and measures for the datapoint anyway.

No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

The QQplot to see a plot of the residuals. As you can see the tail leaves the fit line a bit at the bottom and holds steady the rest of the way to the top. This could be showing a bit of the left hand skew that was visualized in the first exploratory pass through as life_expectancy throughout the world should be around ~70-75.

To be sure about the fit of the data I plotted the studentized residuals against the fitted values and found a good distribution of values about the origin. There were a few values in the extremes of +/- 4 standard deviations for the studentized residuals. So I ran a cursory check and found 34 values outside the +/- 3 standard deviation range. With almost 3,000 data points that would be a good portion of variance for the dataset and I believe it fits the model well. But to be sure I ran a hat matrix to pull the leverage values and check for any overpowering outliers to be sure. And for Hat values being closer to 1 means they have a high leverage value.

Then I ran a cooks distance on the values to be sure the Hat values fall below the 10th and 20th percentiles of my dataset and are not over leveraging the dataset. The cook's distance came out alright and no values were outliers that had too much influence on my dataset.

61
61
168
168
172
172
262
262
297
297
342
342
384
384
392
392
550
550
562
562
620
620
682
682
723
723
849
849
932
932
939
939
940
940
996
996
1135
1135
1176
1176
1470
1470
1546
1546
1827
1827
1971
1971
1979
1979
2136
2136
2194
2194
2203
2203
2254
2254
2416
2416
2516
2516
2556
2556
2590
2590
2600
2600
2749
2749
2828
2828
2830
2830
2860
2860
   3   33   41   61   85   94  122  141  172  173  187  220  238  279  296  391 
   3   33   41   61   85   94  122  141  172  173  187  220  238  279  296  391 
 403  405  422  485  528  565  580  610  611  636  640  642  650  682  683  690 
 403  405  422  485  528  565  580  610  611  636  640  642  650  682  683  690 
 692  761  767  771  794  805  819  829  849  858  866  889  929  934  939  974 
 692  761  767  771  794  805  819  829  849  858  866  889  929  934  939  974 
 978 1015 1019 1048 1055 1073 1074 1093 1129 1140 1217 1219 1223 1237 1269 1276 
 978 1015 1019 1048 1055 1073 1074 1093 1129 1140 1217 1219 1223 1237 1269 1276 
1281 1286 1375 1394 1401 1403 1407 1428 1440 1445 1449 1463 1506 1518 1520 1567 
1281 1286 1375 1394 1401 1403 1407 1428 1440 1445 1449 1463 1506 1518 1520 1567 
1573 1576 1585 1592 1614 1624 1661 1687 1740 1756 1767 1786 1796 1799 1806 1813 
1573 1576 1585 1592 1614 1624 1661 1687 1740 1756 1767 1786 1796 1799 1806 1813 
1842 1880 1887 1906 1918 1939 1951 1961 1967 1968 1986 2025 2044 2059 2081 2084 
1842 1880 1887 1906 1918 1939 1951 1961 1967 1968 1986 2025 2044 2059 2081 2084 
2087 2119 2129 2160 2186 2203 2204 2257 2260 2263 2276 2282 2295 2298 2310 2318 
2087 2119 2129 2160 2186 2203 2204 2257 2260 2263 2276 2282 2295 2298 2310 2318 
2325 2371 2378 2411 2427 2460 2466 2484 2516 2521 2527 2529 2560 2586 2625 2660 
2325 2371 2378 2411 2427 2460 2466 2484 2516 2521 2527 2529 2560 2586 2625 2660 
2679 2680 2745 2749 2761 2788 2801 2819 2862 
2679 2680 2745 2749 2761 2788 2801 2819 2862 
No description has been provided for this image
No description has been provided for this image

Now that I finally had the model trimmed and checked to be viable I was able to use the all powerful Regsubsets package and organize my variables in the most powerful way. To this point I ruthlessly removed 12 Xi’s to limit complexity and keep the highest R^2 available. We moved from 0.998 to 0.962 in the reduced model- so how much further can we go? So far it seems like the model is suffering from overfit to the dataset.

Subset selection object
Call: regsubsets.formula(Life_expectancy ~ Adult_mortality + Alcohol_consumption + 
    Hepatitis_B + Measles + BMI + Incidents_HIV + GDP_per_capita + 
    Population_mln + Thinness_five_nine_years + Schooling + Economy_status_Developed, 
    data = life_nocor)
11 Variables  (and intercept)
                         Forced in Forced out
Adult_mortality              FALSE      FALSE
Alcohol_consumption          FALSE      FALSE
Hepatitis_B                  FALSE      FALSE
Measles                      FALSE      FALSE
BMI                          FALSE      FALSE
Incidents_HIV                FALSE      FALSE
GDP_per_capita               FALSE      FALSE
Population_mln               FALSE      FALSE
Thinness_five_nine_years     FALSE      FALSE
Schooling                    FALSE      FALSE
Economy_status_Developed     FALSE      FALSE
1 subsets of each size up to 8
Selection Algorithm: exhaustive
         Adult_mortality Alcohol_consumption Hepatitis_B Measles BMI
1  ( 1 ) "*"             " "                 " "         " "     " "
2  ( 1 ) "*"             " "                 " "         " "     " "
3  ( 1 ) "*"             " "                 " "         " "     " "
4  ( 1 ) "*"             "*"                 " "         " "     " "
5  ( 1 ) "*"             "*"                 "*"         " "     " "
6  ( 1 ) "*"             "*"                 "*"         " "     " "
7  ( 1 ) "*"             "*"                 "*"         "*"     " "
8  ( 1 ) "*"             "*"                 "*"         "*"     "*"
         Incidents_HIV GDP_per_capita Population_mln Thinness_five_nine_years
1  ( 1 ) " "           " "            " "            " "                     
2  ( 1 ) " "           " "            " "            " "                     
3  ( 1 ) "*"           " "            " "            " "                     
4  ( 1 ) "*"           " "            " "            " "                     
5  ( 1 ) "*"           " "            " "            " "                     
6  ( 1 ) "*"           "*"            " "            " "                     
7  ( 1 ) "*"           "*"            " "            " "                     
8  ( 1 ) "*"           "*"            " "            " "                     
         Schooling Economy_status_Developed
1  ( 1 ) " "       " "                     
2  ( 1 ) "*"       " "                     
3  ( 1 ) "*"       " "                     
4  ( 1 ) "*"       " "                     
5  ( 1 ) "*"       " "                     
6  ( 1 ) "*"       " "                     
7  ( 1 ) "*"       " "                     
8  ( 1 ) "*"       " "                     
  1. 'which'
  2. 'rsq'
  3. 'rss'
  4. 'adjr2'
  5. 'cp'
  6. 'bic'
  7. 'outmat'
  8. 'obj'
Mallow's Cp: 4.040904 
PRESS: 11584.94 
AIC: 12129.17 
BIC: 12212.6 
R-squared: 0.9547192 

The regsubset had spoken and printed the 8 covariates in the appropriate order. But before I reorganized the function I ran optimizations using Mallow cp, AIC, BIC, R^2, and RSS to find how many variables I really needed. As seen above in the results the main two variables were simply Adult_mortality and Schooling. But the other factors pulled their weight as well!

No description has been provided for this image

The functions told me that Adult_mortality is doing the bulk of the work, but if I added 4 other parameters from the covariate list I would retain the majority of the prediction power from the original full model. So, I reorganized the variables in the order recommended by the regsubset above and came back with the complete final model.

Subset selection object
Call: regsubsets.formula(Life_expectancy ~ Adult_mortality + Alcohol_consumption + 
    Hepatitis_B + Measles + BMI + Incidents_HIV + GDP_per_capita + 
    Population_mln + Thinness_five_nine_years + Schooling + Economy_status_Developed, 
    data = life_nocor)
11 Variables  (and intercept)
                         Forced in Forced out
Adult_mortality              FALSE      FALSE
Alcohol_consumption          FALSE      FALSE
Hepatitis_B                  FALSE      FALSE
Measles                      FALSE      FALSE
BMI                          FALSE      FALSE
Incidents_HIV                FALSE      FALSE
GDP_per_capita               FALSE      FALSE
Population_mln               FALSE      FALSE
Thinness_five_nine_years     FALSE      FALSE
Schooling                    FALSE      FALSE
Economy_status_Developed     FALSE      FALSE
1 subsets of each size up to 8
Selection Algorithm: exhaustive
         Adult_mortality Alcohol_consumption Hepatitis_B Measles BMI
1  ( 1 ) "*"             " "                 " "         " "     " "
2  ( 1 ) "*"             " "                 " "         " "     " "
3  ( 1 ) "*"             " "                 " "         " "     " "
4  ( 1 ) "*"             "*"                 " "         " "     " "
5  ( 1 ) "*"             "*"                 "*"         " "     " "
6  ( 1 ) "*"             "*"                 "*"         " "     " "
7  ( 1 ) "*"             "*"                 "*"         "*"     " "
8  ( 1 ) "*"             "*"                 "*"         "*"     "*"
         Incidents_HIV GDP_per_capita Population_mln Thinness_five_nine_years
1  ( 1 ) " "           " "            " "            " "                     
2  ( 1 ) " "           " "            " "            " "                     
3  ( 1 ) "*"           " "            " "            " "                     
4  ( 1 ) "*"           " "            " "            " "                     
5  ( 1 ) "*"           " "            " "            " "                     
6  ( 1 ) "*"           "*"            " "            " "                     
7  ( 1 ) "*"           "*"            " "            " "                     
8  ( 1 ) "*"           "*"            " "            " "                     
         Schooling Economy_status_Developed
1  ( 1 ) " "       " "                     
2  ( 1 ) "*"       " "                     
3  ( 1 ) "*"       " "                     
4  ( 1 ) "*"       " "                     
5  ( 1 ) "*"       " "                     
6  ( 1 ) "*"       " "                     
7  ( 1 ) "*"       " "                     
8  ( 1 ) "*"       " "                     
Call:
lm(formula = Life_expectancy ~ Adult_mortality + Schooling + 
    Incidents_HIV + Alcohol_consumption + Hepatitis_B, data = life)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.1193 -1.2899  0.0812  1.2435  9.4031 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)         74.9456604  0.3089373  242.59   <2e-16 ***
Adult_mortality     -0.0717077  0.0006147 -116.66   <2e-16 ***
Schooling            0.5578440  0.0191709   29.10   <2e-16 ***
Incidents_HIV        0.4071339  0.0240272   16.95   <2e-16 ***
Alcohol_consumption  0.1522213  0.0121663   12.51   <2e-16 ***
Hepatitis_B          0.0277708  0.0025997   10.68   <2e-16 ***
---
Signif. codes:  0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Residual standard error: 2.01 on 2858 degrees of freedom
Multiple R-squared:  0.9544,	Adjusted R-squared:  0.9544 
F-statistic: 1.197e+04 on 5 and 2858 DF,  p-value: < 2.2e-16
Call:
lm(formula = Life_expectancy ~ Country + Region + Year + Infant_deaths + 
    Under_five_deaths + Adult_mortality + Alcohol_consumption + 
    Hepatitis_B + Measles + BMI + Polio + Diphtheria + Incidents_HIV + 
    GDP_per_capita + Population_mln + Thinness_ten_nineteen_years + 
    Thinness_five_nine_years + Schooling + Economy_status_Developed, 
    data = life)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0594 -0.2239 -0.0128  0.2094  5.2254 

Coefficients: (9 not defined because of singularities)
                                        Estimate Std. Error t value Pr(>|t|)
(Intercept)                           -2.030e+02  1.248e+01 -16.262  < 2e-16
CountryAlbania                         5.694e+00  3.867e-01  14.724  < 2e-16
CountryAlgeria                         4.940e+00  3.054e-01  16.177  < 2e-16
CountryAngola                         -3.614e+00  2.175e-01 -16.616  < 2e-16
CountryAntigua and Barbuda             7.061e+00  3.875e-01  18.222  < 2e-16
CountryArgentina                       6.694e+00  4.236e-01  15.805  < 2e-16
CountryArmenia                         5.511e+00  4.080e-01  13.507  < 2e-16
CountryAustralia                       8.968e+00  5.758e-01  15.575  < 2e-16
CountryAustria                         8.062e+00  4.823e-01  16.716  < 2e-16
CountryAzerbaijan                      3.928e+00  4.018e-01   9.776  < 2e-16
CountryBahamas, The                    5.655e+00  5.111e-01  11.064  < 2e-16
CountryBahrain                         3.817e+00  3.872e-01   9.857  < 2e-16
CountryBangladesh                      1.113e+00  2.704e-01   4.116 3.97e-05
CountryBarbados                        9.306e+00  4.477e-01  20.786  < 2e-16
CountryBelarus                         5.817e+00  4.251e-01  13.684  < 2e-16
CountryBelgium                         8.061e+00  4.875e-01  16.535  < 2e-16
CountryBelize                          6.318e+00  4.819e-01  13.112  < 2e-16
CountryBenin                          -1.233e+00  2.092e-01  -5.895 4.22e-09
CountryBhutan                          3.151e+00  2.099e-01  15.009  < 2e-16
CountryBolivia                         3.111e+00  3.272e-01   9.508  < 2e-16
CountryBosnia and Herzegovina          5.597e+00  3.433e-01  16.302  < 2e-16
CountryBotswana                        9.806e-01  3.575e-01   2.743 0.006138
CountryBrazil                          5.612e+00  3.771e-01  14.881  < 2e-16
CountryBrunei Darussalam               3.318e+00  4.514e-01   7.351 2.61e-13
CountryBulgaria                        5.587e+00  4.049e-01  13.800  < 2e-16
CountryBurkina Faso                   -2.520e+00  2.599e-01  -9.696  < 2e-16
CountryBurundi                        -2.656e+00  2.384e-01 -11.139  < 2e-16
CountryCabo Verde                      2.101e+00  2.649e-01   7.930 3.18e-15
CountryCambodia                       -3.525e-01  2.080e-01  -1.695 0.090181
CountryCameroon                       -2.524e-01  2.322e-01  -1.087 0.277207
CountryCanada                          9.169e+00  5.386e-01  17.024  < 2e-16
CountryCentral African Republic       -1.745e+00  2.235e-01  -7.809 8.22e-15
CountryChad                           -2.455e+00  2.481e-01  -9.896  < 2e-16
CountryChile                           8.566e+00  4.497e-01  19.051  < 2e-16
CountryChina                           2.631e+00  1.757e+00   1.498 0.134357
CountryColombia                        7.318e+00  3.254e-01  22.490  < 2e-16
CountryComoros                         3.750e-01  2.101e-01   1.785 0.074382
CountryCongo, Dem. Rep.               -2.517e+00  2.157e-01 -11.669  < 2e-16
CountryCongo, Rep.                    -2.148e-01  2.250e-01  -0.955 0.339679
CountryCosta Rica                      8.398e+00  4.021e-01  20.885  < 2e-16
CountryCote d'Ivoire                  -2.638e-01  2.153e-01  -1.225 0.220520
CountryCroatia                         6.292e+00  4.384e-01  14.351  < 2e-16
CountryCuba                            7.540e+00  3.821e-01  19.733  < 2e-16
CountryCyprus                          7.248e+00  4.844e-01  14.964  < 2e-16
CountryCzechia                         7.231e+00  4.841e-01  14.937  < 2e-16
CountryDenmark                         6.661e+00  5.369e-01  12.407  < 2e-16
CountryDjibouti                       -9.279e-02  2.302e-01  -0.403 0.686894
CountryDominican Republic              5.716e+00  3.311e-01  17.263  < 2e-16
CountryEcuador                         6.901e+00  3.661e-01  18.847  < 2e-16
CountryEgypt, Arab Rep.                4.600e+00  4.525e-01  10.167  < 2e-16
CountryEl Salvador                     5.732e+00  3.694e-01  15.518  < 2e-16
CountryEquatorial Guinea              -8.483e-01  2.665e-01  -3.183 0.001474
CountryEritrea                        -7.392e-01  2.743e-01  -2.695 0.007086
CountryEstonia                         6.837e+00  4.646e-01  14.716  < 2e-16
CountryEswatini                       -3.575e+00  4.791e-01  -7.461 1.15e-13
CountryEthiopia                       -7.916e-01  2.588e-01  -3.059 0.002242
CountryFiji                            1.999e+00  4.160e-01   4.807 1.62e-06
CountryFinland                         8.029e+00  5.113e-01  15.703  < 2e-16
CountryFrance                          9.347e+00  4.545e-01  20.563  < 2e-16
CountryGabon                           6.503e-01  2.860e-01   2.274 0.023056
CountryGambia, The                    -1.818e+00  2.226e-01  -8.167 4.84e-16
CountryGeorgia                         5.508e+00  4.412e-01  12.485  < 2e-16
CountryGermany                         8.244e+00  5.281e-01  15.611  < 2e-16
CountryGhana                          -8.546e-01  2.299e-01  -3.717 0.000206
CountryGreece                          8.723e+00  4.617e-01  18.894  < 2e-16
CountryGrenada                         5.678e+00  3.737e-01  15.197  < 2e-16
CountryGuatemala                       5.264e+00  3.089e-01  17.042  < 2e-16
CountryGuinea                         -2.174e+00  2.020e-01 -10.765  < 2e-16
CountryGuinea-Bissau                  -4.292e+00  2.137e-01 -20.081  < 2e-16
CountryGuyana                          3.714e+00  3.322e-01  11.180  < 2e-16
CountryHaiti                           4.362e-01  2.434e-01   1.792 0.073194
CountryHonduras                        5.595e+00  3.168e-01  17.659  < 2e-16
CountryHungary                         6.724e+00  4.529e-01  14.847  < 2e-16
CountryIceland                         8.515e+00  5.241e-01  16.247  < 2e-16
CountryIndia                           1.201e+00  1.598e+00   0.752 0.452362
CountryIndonesia                       1.950e+00  3.573e-01   5.459 5.23e-08
CountryIran, Islamic Rep.              4.321e+00  3.410e-01  12.670  < 2e-16
CountryIraq                            4.097e+00  4.059e-01  10.095  < 2e-16
CountryIreland                         7.894e+00  5.641e-01  13.995  < 2e-16
CountryIsrael                          9.012e+00  5.201e-01  17.325  < 2e-16
CountryItaly                           8.762e+00  4.401e-01  19.909  < 2e-16
CountryJamaica                         6.719e+00  3.924e-01  17.123  < 2e-16
CountryJapan                           8.825e+00  4.345e-01  20.311  < 2e-16
CountryJordan                          5.494e+00  4.912e-01  11.183  < 2e-16
CountryKazakhstan                      5.000e+00  4.084e-01  12.241  < 2e-16
CountryKenya                          -5.759e-01  2.219e-01  -2.595 0.009510
CountryKiribati                        3.976e+00  5.072e-01   7.838 6.55e-15
CountryKuwait                          3.054e+00  5.550e-01   5.503 4.09e-08
CountryKyrgyz Republic                 4.932e+00  3.660e-01  13.476  < 2e-16
CountryLao PDR                         3.858e-01  2.161e-01   1.786 0.074271
CountryLatvia                          6.806e+00  4.412e-01  15.427  < 2e-16
CountryLebanon                         7.025e+00  4.017e-01  17.490  < 2e-16
CountryLesotho                        -2.819e+00  3.763e-01  -7.492 9.19e-14
CountryLiberia                        -4.736e-01  2.380e-01  -1.990 0.046692
CountryLibya                           4.087e+00  4.189e-01   9.756  < 2e-16
CountryLithuania                       7.145e+00  4.499e-01  15.881  < 2e-16
CountryLuxembourg                      6.303e+00  7.731e-01   8.153 5.41e-16
CountryMadagascar                      1.302e-01  2.415e-01   0.539 0.590038
CountryMalawi                         -9.598e-01  2.331e-01  -4.118 3.93e-05
CountryMalaysia                        4.649e+00  3.437e-01  13.527  < 2e-16
CountryMaldives                        3.619e+00  2.734e-01  13.237  < 2e-16
CountryMali                           -3.656e+00  2.203e-01 -16.594  < 2e-16
CountryMalta                           8.423e+00  4.639e-01  18.157  < 2e-16
CountryMauritania                      6.668e-01  2.381e-01   2.801 0.005136
CountryMauritius                       4.908e+00  3.217e-01  15.255  < 2e-16
CountryMexico                          6.821e+00  4.306e-01  15.840  < 2e-16
CountryMicronesia, Fed. Sts.           1.587e+00  4.904e-01   3.237 0.001224
CountryMoldova                         4.677e+00  4.143e-01  11.289  < 2e-16
CountryMongolia                        3.221e+00  3.435e-01   9.378  < 2e-16
CountryMontenegro                      5.990e+00  4.171e-01  14.363  < 2e-16
CountryMorocco                         3.633e+00  2.910e-01  12.485  < 2e-16
CountryMozambique                     -2.513e+00  2.132e-01 -11.784  < 2e-16
CountryMyanmar                        -9.138e-01  1.980e-01  -4.615 4.12e-06
CountryNamibia                        -6.691e-01  2.709e-01  -2.470 0.013560
CountryNepal                           6.133e-01  2.087e-01   2.938 0.003331
CountryNetherlands                     7.522e+00  5.022e-01  14.979  < 2e-16
CountryNew Zealand                     9.163e+00  5.421e-01  16.902  < 2e-16
CountryNicaragua                       5.663e+00  3.517e-01  16.102  < 2e-16
CountryNiger                          -2.616e+00  2.933e-01  -8.918  < 2e-16
CountryNigeria                        -2.359e+00  2.793e-01  -8.445  < 2e-16
CountryNorth Macedonia                 4.920e+00  3.826e-01  12.857  < 2e-16
CountryNorway                          7.456e+00  6.467e-01  11.529  < 2e-16
CountryOman                            4.609e+00  3.844e-01  11.989  < 2e-16
CountryPakistan                        1.102e+00  2.873e-01   3.835 0.000128
CountryPanama                          8.037e+00  3.956e-01  20.316  < 2e-16
CountryPapua New Guinea               -7.103e-01  2.582e-01  -2.751 0.005986
CountryParaguay                        5.427e+00  3.365e-01  16.125  < 2e-16
CountryPeru                            5.928e+00  3.504e-01  16.916  < 2e-16
CountryPhilippines                     4.326e+00  2.881e-01  15.017  < 2e-16
CountryPoland                          7.206e+00  4.317e-01  16.693  < 2e-16
CountryPortugal                        7.707e+00  3.988e-01  19.325  < 2e-16
CountryQatar                           6.504e+00  6.388e-01  10.182  < 2e-16
CountryRomania                         5.615e+00  3.965e-01  14.160  < 2e-16
CountryRussian Federation              5.453e+00  4.387e-01  12.428  < 2e-16
CountryRwanda                         -6.380e-01  2.513e-01  -2.539 0.011174
CountrySamoa                           5.919e+00  6.322e-01   9.362  < 2e-16
CountrySao Tome and Principe           1.231e+00  2.526e-01   4.873 1.16e-06
CountrySaudi Arabia                    4.200e+00  4.756e-01   8.831  < 2e-16
CountrySenegal                        -5.244e-01  2.101e-01  -2.496 0.012637
CountrySerbia                          5.262e+00  3.942e-01  13.347  < 2e-16
CountrySeychelles                      5.067e+00  3.924e-01  12.915  < 2e-16
CountrySierra Leone                   -1.847e+00  2.292e-01  -8.055 1.18e-15
CountrySingapore                       6.620e+00  4.605e-01  14.377  < 2e-16
CountrySlovak Republic                 6.088e+00  4.348e-01  14.003  < 2e-16
CountrySlovenia                        8.062e+00  4.603e-01  17.515  < 2e-16
CountrySolomon Islands                 3.879e+00  3.039e-01  12.765  < 2e-16
CountrySomalia                        -7.450e-01  2.092e-01  -3.562 0.000375
CountrySouth Africa                    3.524e+00  4.236e-01   8.318  < 2e-16
CountrySpain                           9.240e+00  4.378e-01  21.103  < 2e-16
CountrySri Lanka                       5.106e+00  3.378e-01  15.114  < 2e-16
CountrySt. Lucia                       8.232e+00  4.700e-01  17.514  < 2e-16
CountrySt. Vincent and the Grenadines  4.992e+00  3.748e-01  13.318  < 2e-16
CountrySuriname                        4.244e+00  3.458e-01  12.273  < 2e-16
CountrySweden                          8.180e+00  5.313e-01  15.397  < 2e-16
CountrySwitzerland                     7.845e+00  6.503e-01  12.064  < 2e-16
CountrySyrian Arab Republic            6.000e+00  3.935e-01  15.251  < 2e-16
CountryTajikistan                      1.318e+00  3.596e-01   3.666 0.000252
CountryTanzania                       -7.783e-01  2.252e-01  -3.456 0.000558
CountryThailand                        5.500e+00  2.812e-01  19.556  < 2e-16
CountryTimor-Leste                    -6.180e-02  2.301e-01  -0.269 0.788320
CountryTogo                           -2.133e+00  2.047e-01 -10.424  < 2e-16
CountryTonga                           4.654e+00  6.688e-01   6.959 4.29e-12
CountryTrinidad and Tobago             5.753e+00  4.265e-01  13.491  < 2e-16
CountryTunisia                         4.951e+00  3.340e-01  14.824  < 2e-16
CountryTurkiye                         5.427e+00  3.972e-01  13.662  < 2e-16
CountryTurkmenistan                    2.443e+00  3.633e-01   6.724 2.15e-11
CountryUganda                         -7.137e-01  2.593e-01  -2.753 0.005946
CountryUkraine                         5.967e+00  4.054e-01  14.717  < 2e-16
CountryUnited Arab Emirates            4.693e+00  5.600e-01   8.380  < 2e-16
CountryUnited Kingdom                  8.280e+00  5.484e-01  15.100  < 2e-16
CountryUnited States                   8.232e+00  7.246e-01  11.360  < 2e-16
CountryUruguay                         7.016e+00  4.039e-01  17.370  < 2e-16
CountryUzbekistan                      3.392e+00  3.719e-01   9.121  < 2e-16
CountryVanuatu                         2.007e+00  3.312e-01   6.059 1.56e-09
CountryVenezuela, RB                   5.717e+00  3.888e-01  14.706  < 2e-16
CountryVietnam                         4.589e+00  2.860e-01  16.042  < 2e-16
CountryYemen, Rep.                     1.723e+00  2.083e-01   8.272  < 2e-16
CountryZambia                         -1.042e+00  2.605e-01  -4.000 6.50e-05
CountryZimbabwe                        1.304e-01  2.972e-01   0.439 0.660929
RegionAsia                                    NA         NA      NA       NA
RegionCentral America and Caribbean           NA         NA      NA       NA
RegionEuropean Union                          NA         NA      NA       NA
RegionMiddle East                             NA         NA      NA       NA
RegionNorth America                           NA         NA      NA       NA
RegionOceania                                 NA         NA      NA       NA
RegionRest of Europe                          NA         NA      NA       NA
RegionSouth America                           NA         NA      NA       NA
Year                                   1.435e-01  6.719e-03  21.352  < 2e-16
Infant_deaths                         -8.570e-03  7.114e-03  -1.205 0.228456
Under_five_deaths                     -4.410e-02  3.835e-03 -11.499  < 2e-16
Adult_mortality                       -4.163e-02  6.555e-04 -63.503  < 2e-16
Alcohol_consumption                   -2.697e-02  1.197e-02  -2.253 0.024326
Hepatitis_B                            1.869e-03  1.375e-03   1.359 0.174179
Measles                                2.034e-03  1.374e-03   1.480 0.139041
BMI                                   -4.152e-01  6.403e-02  -6.484 1.06e-10
Polio                                  7.760e-04  2.736e-03   0.284 0.776735
Diphtheria                             8.859e-03  2.746e-03   3.226 0.001270
Incidents_HIV                          1.555e-01  2.458e-02   6.323 2.99e-10
GDP_per_capita                         2.896e-05  6.078e-06   4.766 1.98e-06
Population_mln                         1.954e-04  1.369e-03   0.143 0.886510
Thinness_ten_nineteen_years           -1.400e-02  7.219e-03  -1.940 0.052538
Thinness_five_nine_years              -1.285e-02  7.160e-03  -1.794 0.072870
Schooling                             -9.822e-02  2.869e-02  -3.423 0.000628
Economy_status_Developed                      NA         NA      NA       NA
                                         
(Intercept)                           ***
CountryAlbania                        ***
CountryAlgeria                        ***
CountryAngola                         ***
CountryAntigua and Barbuda            ***
CountryArgentina                      ***
CountryArmenia                        ***
CountryAustralia                      ***
CountryAustria                        ***
CountryAzerbaijan                     ***
CountryBahamas, The                   ***
CountryBahrain                        ***
CountryBangladesh                     ***
CountryBarbados                       ***
CountryBelarus                        ***
CountryBelgium                        ***
CountryBelize                         ***
CountryBenin                          ***
CountryBhutan                         ***
CountryBolivia                        ***
CountryBosnia and Herzegovina         ***
CountryBotswana                       ** 
CountryBrazil                         ***
CountryBrunei Darussalam              ***
CountryBulgaria                       ***
CountryBurkina Faso                   ***
CountryBurundi                        ***
CountryCabo Verde                     ***
CountryCambodia                       .  
CountryCameroon                          
CountryCanada                         ***
CountryCentral African Republic       ***
CountryChad                           ***
CountryChile                          ***
CountryChina                             
CountryColombia                       ***
CountryComoros                        .  
CountryCongo, Dem. Rep.               ***
CountryCongo, Rep.                       
CountryCosta Rica                     ***
CountryCote d'Ivoire                     
CountryCroatia                        ***
CountryCuba                           ***
CountryCyprus                         ***
CountryCzechia                        ***
CountryDenmark                        ***
CountryDjibouti                          
CountryDominican Republic             ***
CountryEcuador                        ***
CountryEgypt, Arab Rep.               ***
CountryEl Salvador                    ***
CountryEquatorial Guinea              ** 
CountryEritrea                        ** 
CountryEstonia                        ***
CountryEswatini                       ***
CountryEthiopia                       ** 
CountryFiji                           ***
CountryFinland                        ***
CountryFrance                         ***
CountryGabon                          *  
CountryGambia, The                    ***
CountryGeorgia                        ***
CountryGermany                        ***
CountryGhana                          ***
CountryGreece                         ***
CountryGrenada                        ***
CountryGuatemala                      ***
CountryGuinea                         ***
CountryGuinea-Bissau                  ***
CountryGuyana                         ***
CountryHaiti                          .  
CountryHonduras                       ***
CountryHungary                        ***
CountryIceland                        ***
CountryIndia                             
CountryIndonesia                      ***
CountryIran, Islamic Rep.             ***
CountryIraq                           ***
CountryIreland                        ***
CountryIsrael                         ***
CountryItaly                          ***
CountryJamaica                        ***
CountryJapan                          ***
CountryJordan                         ***
CountryKazakhstan                     ***
CountryKenya                          ** 
CountryKiribati                       ***
CountryKuwait                         ***
CountryKyrgyz Republic                ***
CountryLao PDR                        .  
CountryLatvia                         ***
CountryLebanon                        ***
CountryLesotho                        ***
CountryLiberia                        *  
CountryLibya                          ***
CountryLithuania                      ***
CountryLuxembourg                     ***
CountryMadagascar                        
CountryMalawi                         ***
CountryMalaysia                       ***
CountryMaldives                       ***
CountryMali                           ***
CountryMalta                          ***
CountryMauritania                     ** 
CountryMauritius                      ***
CountryMexico                         ***
CountryMicronesia, Fed. Sts.          ** 
CountryMoldova                        ***
CountryMongolia                       ***
CountryMontenegro                     ***
CountryMorocco                        ***
CountryMozambique                     ***
CountryMyanmar                        ***
CountryNamibia                        *  
CountryNepal                          ** 
CountryNetherlands                    ***
CountryNew Zealand                    ***
CountryNicaragua                      ***
CountryNiger                          ***
CountryNigeria                        ***
CountryNorth Macedonia                ***
CountryNorway                         ***
CountryOman                           ***
CountryPakistan                       ***
CountryPanama                         ***
CountryPapua New Guinea               ** 
CountryParaguay                       ***
CountryPeru                           ***
CountryPhilippines                    ***
CountryPoland                         ***
CountryPortugal                       ***
CountryQatar                          ***
CountryRomania                        ***
CountryRussian Federation             ***
CountryRwanda                         *  
CountrySamoa                          ***
CountrySao Tome and Principe          ***
CountrySaudi Arabia                   ***
CountrySenegal                        *  
CountrySerbia                         ***
CountrySeychelles                     ***
CountrySierra Leone                   ***
CountrySingapore                      ***
CountrySlovak Republic                ***
CountrySlovenia                       ***
CountrySolomon Islands                ***
CountrySomalia                        ***
CountrySouth Africa                   ***
CountrySpain                          ***
CountrySri Lanka                      ***
CountrySt. Lucia                      ***
CountrySt. Vincent and the Grenadines ***
CountrySuriname                       ***
CountrySweden                         ***
CountrySwitzerland                    ***
CountrySyrian Arab Republic           ***
CountryTajikistan                     ***
CountryTanzania                       ***
CountryThailand                       ***
CountryTimor-Leste                       
CountryTogo                           ***
CountryTonga                          ***
CountryTrinidad and Tobago            ***
CountryTunisia                        ***
CountryTurkiye                        ***
CountryTurkmenistan                   ***
CountryUganda                         ** 
CountryUkraine                        ***
CountryUnited Arab Emirates           ***
CountryUnited Kingdom                 ***
CountryUnited States                  ***
CountryUruguay                        ***
CountryUzbekistan                     ***
CountryVanuatu                        ***
CountryVenezuela, RB                  ***
CountryVietnam                        ***
CountryYemen, Rep.                    ***
CountryZambia                         ***
CountryZimbabwe                          
RegionAsia                               
RegionCentral America and Caribbean      
RegionEuropean Union                     
RegionMiddle East                        
RegionNorth America                      
RegionOceania                            
RegionRest of Europe                     
RegionSouth America                      
Year                                  ***
Infant_deaths                            
Under_five_deaths                     ***
Adult_mortality                       ***
Alcohol_consumption                   *  
Hepatitis_B                              
Measles                                  
BMI                                   ***
Polio                                    
Diphtheria                            ** 
Incidents_HIV                         ***
GDP_per_capita                        ***
Population_mln                           
Thinness_ten_nineteen_years           .  
Thinness_five_nine_years              .  
Schooling                             ***
Economy_status_Developed                 
---
Signif. codes:  0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Residual standard error: 0.486 on 2669 degrees of freedom
Multiple R-squared:  0.9975,	Adjusted R-squared:  0.9973 
F-statistic:  5513 on 194 and 2669 DF,  p-value: < 2.2e-16

The final model retained an R^2 value of 0.9544 which is still fantastic, but may be due to overfit of the dataset to the model by this point. To double check and prevent any misuse of my life expectancy prediction in the future I ran a comparison breaking the model back down to full and final. Except this time I separated the data into train and test data then retrained the full and reorganized reduced to compare the MSE’s of the comparable models.

Warning message in predict.lm(full_life_model, newdata = test):
β€œprediction from a rank-deficient fit may be misleading”
0.194649291303277
4.10439974811491
Call:
lm(formula = Life_expectancy ~ Adult_mortality + Schooling + 
    Incidents_HIV + Alcohol_consumption + Hepatitis_B, data = train)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.1298 -1.3151  0.0925  1.2884  9.5692 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)         75.1840217  0.3685935 203.975   <2e-16 ***
Adult_mortality     -0.0718642  0.0007395 -97.180   <2e-16 ***
Schooling            0.5475069  0.0232085  23.591   <2e-16 ***
Incidents_HIV        0.3971186  0.0286688  13.852   <2e-16 ***
Alcohol_consumption  0.1585384  0.0146170  10.846   <2e-16 ***
Hepatitis_B          0.0260564  0.0030714   8.483   <2e-16 ***
---
Signif. codes:  0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Residual standard error: 2.033 on 1998 degrees of freedom
Multiple R-squared:  0.9539,	Adjusted R-squared:  0.9538 
F-statistic:  8276 on 5 and 1998 DF,  p-value: < 2.2e-16
0.194649291303277
4.10439974811491
4.12033168515795

The full model MSE was 0.1946 which is insanely accurate… or a little too overfit as it follows the data also not exactly.

The initial reduced model for these values had an MSE of 4.1044, which is much higher than the first model but still believable. Even if the model varied by an estimate of 4 years for life_expectancy that is still a great estimation.

The final model MSE was 4.1203 which is a strong predictor for having 5 of the 20 original variables and even with the training data has an R^2 value explaining about 95.39% of the variance in Yi = Life expectancy. So this model is great and fairly accurate even with data it has not previously seen!

A matrix: 6 Γ— 2 of type dbl
0.5 %99.5 %
(Intercept)74.2336801076.13436339
Adult_mortality-0.07377079-0.06995753
Schooling 0.48766859 0.60734528
Incidents_HIV 0.32320198 0.47103516
Alcohol_consumption 0.12085163 0.19622517
Hepatitis_B 0.01813737 0.03397552
3.02364187807839
171019
0.938398657459113
0.0515147439758156
0.00567773171402008
0.00267221770680451
0.00173664914424713
  1. 'Country'
  2. 'Region'
  3. 'Year'
  4. 'Infant_deaths'
  5. 'Under_five_deaths'
  6. 'Adult_mortality'
  7. 'Alcohol_consumption'
  8. 'Hepatitis_B'
  9. 'Measles'
  10. 'BMI'
  11. 'Polio'
  12. 'Diphtheria'
  13. 'Incidents_HIV'
  14. 'GDP_per_capita'
  15. 'Population_mln'
  16. 'Thinness_ten_nineteen_years'
  17. 'Thinness_five_nine_years'
  18. 'Schooling'
  19. 'Economy_status_Developed'
  20. 'Life_expectancy'
No description has been provided for this image

To double check the model I then ran prplots for the remaining X-vars and found them well distributed along the line with great grouping. The data did have some outliers but as shown in the above Hat values nothing was overleveraging the data by a huge margin.

ReferencesΒΆ

1. https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates

2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4435622/

3. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8517826/