Life Expectancy and GDP¶

By Nicholas Quisler

Introduction¶

This project investigates probable correlations between the economic output of a country, gross domestic product (USD), and the life expectancy at birth of its citizens.

Here are a few questions this project seeks to answer:

Has life expectancy increased over time in the six nations?
Has GDP increased over time in the six nations?
Is there a correlation between GDP and life expectancy of a country?
What is the average life expectancy in these nations?
What is the distribution of that life expectancy?

Data sources

GDP Source: World Bank national accounts data, and OECD National Accounts data files.
Life expectancy Data Source: World Health Organization

Import Python Modules¶

We start by importing preliminary python modules:

In [1]:

              import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline

            

Load The Data¶

To look for connections between GDP and life expectancy load the datasets into DataFrames so that they can be visualized.

Here all_data.csv will be read in into a DataFrame called df, followed by a quick inspection of the DataFrame using .head() to check its contents.

In [2]:

              df = pd.read_csv('all_data.csv')
df.head()

Out[2]:

	Country	Year	Life expectancy at birth (years)	GDP
0	Chile	2000	77.3	7.786093e+10
1	Chile	2001	77.3	7.097992e+10
2	Chile	2002	77.8	6.973681e+10
3	Chile	2003	77.9	7.564346e+10
4	Chile	2004	78.0	9.921039e+10

Inspect the Data¶

Here we inspect the data to make sure the data was imported as the correct data type, without any missing or null values.

In [3]:

              df.dtypes

            

Out[3]:

Country                              object
Year                                  int64
Life expectancy at birth (years)    float64
GDP                                 float64
dtype: object

In [4]:

              df.count()

            

Out[4]:

Country                             96
Year                                96
Life expectancy at birth (years)    96
GDP                                 96
dtype: int64

In [5]:

              df.describe()

            

Out[5]:

	Year	Life expectancy at birth (years)	GDP
count	96.000000	96.000000	9.600000e+01
mean	2007.500000	72.789583	3.880499e+12
std	4.633971	10.672882	5.197561e+12
min	2000.000000	44.300000	4.415703e+09
25%	2003.750000	74.475000	1.733018e+11
50%	2007.500000	76.750000	1.280220e+12
75%	2011.250000	78.900000	4.067510e+12
max	2015.000000	81.000000	1.810000e+13

With the columns being the correct data type, the column counts being equal, and the statistics appearing normal as expected we can conclude there are no missing or null values.

Explore the Data¶

We want to know about the countries and years represented in the dataset and their respective sample sizes.

In [6]:

              df1 = df.groupby(["Country"])["Country"].count()
df1

Out[6]:

Country
Chile                       16
China                       16
Germany                     16
Mexico                      16
United States of America    16
Zimbabwe                    16
Name: Country, dtype: int64

In [7]:

              df2 = df.groupby(["Year"])["Year"].count()
df2

Out[7]:

Year
2000    6
2001    6
2002    6
2003    6
2004    6
2005    6
2006    6
2007    6
2008    6
2009    6
2010    6
2011    6
2012    6
2013    6
2014    6
2015    6
Name: Year, dtype: int64

There are six countries, Chile, China, Germany, Mexico, theUnited States of America, and Zimbabwe represented. A data point for each country exists each year from 2000-2015, for a total of 96 data points.

Clean up the Column Names¶

The Life expectancy at birth (years) column name is too long and not consistant in word count compared to the other columns. It would be best to preceed with its abbreviation, LEAB. The rename function is used to make future coding easier.

In [8]:

              df = df.rename({"Life expectancy at birth (years)": "LEAB"}, axis = 'columns')
df.head()

Out[8]:

	Country	Year	LEAB	GDP
0	Chile	2000	77.3	7.786093e+10
1	Chile	2001	77.3	7.097992e+10
2	Chile	2002	77.8	6.973681e+10
3	Chile	2003	77.9	7.564346e+10
4	Chile	2004	78.0	9.921039e+10

Exploratory Plots¶

Histograms¶

Below the distribution of LEAB is shown. The data is very left skewed where most of the values are on the right-hand side. This type of distribution could be described as a power law distribution, which is a common enough distribution that it has its own name. More about the power law can be read here. A further look might also identify different modes or smaller groupings of distributions within the range.

In [9]:

              ax1 = sns.histplot(x='LEAB', data=df, stat="percent", binwidth=1)
ax1.set_xlabel('Life Expectancy at Birth (years)');

Next the distribution of GDP was examined. The distribution is very right skewed where most of the values are on the left-hand side. This is almost the opposite of what was observed in the LEAB column.

In [10]:

              ax2 = sns.histplot(x='GDP', data=df, stat="percent", binwidth=0.05e13)
ax2.set_xlabel('GDP in Trillions of U.S. Dollars');

The previous plots did not break up the data by countries, so the next task will be to find the average LEAB and GDP by country.

Bar Plots¶

In [11]:

              dfMeans = df.drop("Year", axis=1).groupby("Country").mean().reset_index()
dfMeans

Out[11]:

	Country	LEAB	GDP
0	Chile	78.94375	1.697888e+11
1	China	74.26250	4.957714e+12
2	Germany	79.65625	3.094776e+12
3	Mexico	75.71875	9.766506e+11
4	United States of America	78.06250	1.407500e+13
5	Zimbabwe	50.09375	9.062580e+09

Now that they are broken down by Country and the average values for LEAB and GDP are created, bar plots showing the mean values for each variable are created below.

The first plot is Life Expectancy and all of the countries except for Zimbabwe have values in the mid-to-high 70s. This probably explains the skew in the distribution from before!

In [12]:

              ax3 = sns.barplot(y="Country", x="LEAB", data=dfMeans)
ax3.set_xlabel('Mean Life Expectancy at Birth (years)');

For the average GDP by Country it seems that the US has a much higher value compared to the rest of the countries. In this bar plot, Zimbabwe is not even visible where Chile is just barely seen. In comparison the USA has a huge GDP compared to the rest. China, Germany and Mexico seem to be relatively close in figures.

In [13]:

              ax4 = sns.barplot(y="Country", x="GDP", data=dfMeans)
ax4.set_xlabel('Mean GDP (Trillions of U.S Dollars)');

Strip Plots¶

A newer method for showing distributions is the strip plot. Strip plots are useful because they show dot density around the values as well as distribution.

In the case of of the GDP plot, Chile and Zimbabwe have a a dense clutter of dots that illustrate the number of data points that fall around their values. This detail would have been lost in the box plot, unless the reader is very adept at data visualizations.

In [14]:

              ax5 = sns.stripplot(x="GDP", y="Country", data=df, hue='Country', alpha=0.5)
sns.stripplot(x='GDP', y='Country', data=dfMeans, hue='Country', palette='dark:black', marker='s')
ax5.set_xlabel('GDP (Trillions of U.S Dollars)')
plt.legend([],[], frameon=False);

            

The LEAB plot shows most of the countries except Zimbabwe have a fairly consistent life expectancy.

In [15]:

              ax6 = sns.stripplot(x="LEAB", y="Country", data=df, hue='Country', alpha=0.5)
sns.stripplot(x='LEAB', y='Country', data=dfMeans, hue='Country', palette='dark:black', marker='s')
ax6.set_xlabel('Life Expectancy at Birth (years)')
plt.legend([],[], frameon=False);

            

Line Charts¶

Next the data will explore LEAB and GDP over the years through line charts. Below the countries are separated by colors and one can see that every country has been increasing their life expectancy between 2000-2015, but Zimbabwe has seen the greatest increase after a bit of a dip around 2004.

In [16]:

              ax7 = sns.lineplot(x='Year', y="LEAB", data=df, hue='Country')
ax7.set_ylabel('Life Expectancy at Birth (years)')
plt.legend();

            

Another aspect that was looked more into depth was the faceted line charts by Country. In the individual plots, each country has their own y axis, which makes it easier to compare the shape of their LEAB over the years without the same scale. This method makes it easier to see that Chile, and Mexico seemed to have dips in their life expectancy around the same time which could be looked into further.

In [17]:

              ax8 = sns.relplot(
    data=df, x="Year", y="LEAB", col="Country", hue='Country',
    kind="line", col_wrap=3, facet_kws={'sharey': False}
)
ax8.set_ylabels('Life Expectancy at Birth (years)');

            

The chart below now looks at GDP over the years. The chart shows that China went from a GDP less than a quarter trillion dollars to one trillion dollars in the time span. The rest of the countries did not see increases in this magnitude.

In [18]:

              ax9 = sns.lineplot(x='Year', y="GDP", data=df, hue='Country')
ax9.set_ylabel('GDP (Trillions of U.S Dollars)')
plt.legend();

            

Much like the breakdown of LEAB by country before, the plot below breaks out GDP by country. It is apparent that all of the countries have seen increases. In the chart above, the other country's GDP growth looked modest compared to China and the US, but all of the countries did experience growth from compared to the year 2000. This type of plotting proves useful since much of these nuances were lost when the y axis was shared among the countries. Also the seemingly linear changes were in reality was not as smooth for some of the countries.

In [19]:

              ax10 = sns.relplot(
    data=df, x="Year", y="GDP", col="Country", hue='Country',
    kind="line", col_wrap=3, facet_kws={'sharey': False}
)
ax10.set_ylabels('GDP (Trillions of U.S Dollars)');

            

Is there a correlation between GDP and life expectancy of a country?

Scatter Plots¶

The next two charts will explore the relationship between GDP and LEAB. In the chart below, it looks like the previous charts where GDP for Zimbabwe is staying flat, while their life expectancy is going up. For the other countries they seem to exhibit a rise in life expectancy as GDP goes up. The US and China seem to have very similar slopes in their relationship between GDP and life expectancy.

In [20]:

              ax11 = sns.scatterplot(x='LEAB', y="GDP", data=df, hue='Country')
ax11.set_xlabel('Life Expectancy at Birth (years)')
ax11.set_ylabel('GDP (Trillions of U.S Dollars)')
plt.legend();

            

Like the previous plots, countries are broken out into each scatter plot by facets. Looking at the individual countries, most countries like the US, Mexico and Zimbabwe have linear relationships between GDP and life expectancy. China on the other hand has a slightly exponential curve, and Chile's looks a bit logarithmic. In general though one can see an increase in GDP and life expectancy, exhibiting a positive correlation.

In [21]:

              ax12 = sns.relplot(
    data=df, x="LEAB", y="GDP", col="Country", hue='Country',
    kind="scatter", col_wrap=3, facet_kws={'sharex': False, 'sharey': False}
)
ax12.set_xlabels('Life Expectancy at Birth (years)')
ax12.set_ylabels('GDP (Trillions of U.S Dollars)');

            

Conclusions¶

This project was able to make quite a few data visualizations with the data even though there were only 96 rows and 4 columns.

The project was also able to answer some of the questions posed in the beginning:

Has life expectancy increased over time in the six nations?
- Yes with Zimbabwe having the greatest increase.
Has GDP increased over time in the six nations?
- GDP has also increased for all countries in our list, especially for China.
Is there a correlation between GDP and life expectancy of a country?
- Yes there is a positive correlation between GDP and life expectancy for countries in our list.
What is the average life expectancy in these nations?
- Average life expectancy was between mid to high 70s for the countries except for Zimbabwe which was 50.
What is the distribution of that life expectancy?
- The life expectancy had a leftward skew, and most of the observations were on the right side.

Important Notes¶

GDP seemed to drop in most countries except China and Zimbabwe around the 2008 recession. But contrary to the data, China felt the effects of the recession. To combat this, the Chinese government implemented a stimulus program in response to the global recession, and the amount of money Chinese banks loaned to households and firms roughly doubled. [1]
Zimbabwe experienced a drop in GDP and a similar decrease in LEAB around 2000-2008. This may have been caused by a myriad of factors, like its politics. [2]

Links¶

[1] Cong, Lin and Gao, Haoyu and Ponticelli, Jacopo and Yang, Xiaoguang, Credit Allocation under Economic Stimulus: Evidence from China (November 1, 2018). Chicago Booth Research Paper No. 17-19, Available at SSRN: https://ssrn.com/abstract=2862101 or http://dx.doi.org/10.2139/ssrn.2862101

[2] Moyo, Nicky and Besada, Hany, Zimbabwe in Crisis: Mugabe's Policies and Failures (October 18, 2008). The Centre for International Governance Innovation Technical Paper No. 38 , Available at SSRN: https://ssrn.com/abstract=1286683 or http://dx.doi.org/10.2139/ssrn.1286683

In [ ]: