top of page
Data main page pic.webp

Data Exploration

An overview of our raw data exploration and initial visualization before producing our final results. 

Raw Data

Canadian Greenhouse Gas Emissions

The greenhouse gas carbon dioxide equivalent emission data was categorized by province and sector per year, from 1990-2020. Our predictor variables are the province and sector the emissions come from, and the response variable is the estimated observed megatons of carbon dioxide equivalent that was released. The sectors and provinces are categorical predictor variables. For ease of analysis, each data point (each row in the column) was assigned a unique ID. Additionally, we used a province-year ID (PY-ID) to organize data in R more efficiently. 

​

Table 1. A sample of the raw data for greenhouse gas emissions  of Canada by province, sector, and year, where each row has a unique ID and province-year ID for R functionality. Province/Territory codes: AB: Alberta, BC: British Columbia. 

GHG Table.png

Gross Domestic Product and Population 

In our analysis for Canadian emissions, we wanted to to evaluate if any changes we observe in emissions correlate with changes in the productivity of the economy. In this analysis, GDP acts as a potential predictor variable for the amount of greenhouse gas emissions.

​

Population data acts as another potential influencing factor in the amount of greenhouse gas emissions, for Canadian and global greenhouse gases. Emission increase or decrease may correlate to the amount of people present, rather than any other predictor variable changes. In our analysis we want to compare population trends to emission trends, and also calculate the per-person index of emissions, to reduce the noise in emission trends. Also, since provinces and countries are of varying sizes, emissions need to be displayed per person to present a non-biased comparison. 

Table 2. A sample of the raw data for gross domestic product by province, and year, where each row has a unique province-year ID for R functionality. The GDP data used only ranges from 1997 to 2020, so NA (Not Applicable) marks existing data gaps. Province/Territory codes: AB: Alberta. 

Table 3. A sample of the raw data table of population (in number of people) for each country. the country code displays the name of the country and the year of the population estimate, and was used for R functionality. A similar format was used for province and territory population data. 

Pop Table.png

Global Emissions

The greenhouse gas carbon dioxide equivalent emission data was categorized by country per year, from 1990-2020. Our predictor variable is the "CODE" of the country, which splits countries into four groups; Annex I that signed, non-Annex I that signed, Annex I that did not sign, and non-Annex I that did not sign. "Sign" in this context, means commit to the Kyoto Protocol for its full duration, until the 2012 goal. The response variable is the amount of CO2 per capita that was released.

​

Table 4. A sample of the raw data table of carbon dioxide emissions per capita. The ID, ANNEX, and SIGNED columns were used for R functionality and additional exploration. 

​

Exploratory Graphics

Emissions in Canada

To begin our analysis, we created line plots for each province and territory displaying all sectors over time (Figure 4). Three outliers were found originally, through the observation of points deviating drastically and disrupting the linear nature of the emission trend. These points were reviewed in the original dataset, and the issue was determined to be a data entry error. These errors were corrected. â€‹

Northwest Territories (NW Territories) and Nunavut both have later start dates, because they were established as separate territories in 1999. The emissions in the electricity sector end early in these graphs, because of NA values present in the Canada Energy Regulator website (ECCC 2023).

​

Rplot02-1.png

Figure 4. Line plots of greenhouse gas emissions of CO2e per capita for each province and territory, after the correction of outliers. Each colour represents a different sector. 

We also analysed each sector by province, to see if there were any interesting trends. Each Province and Territory had different dominating industries, but most differences were low, except agriculture and oil and gas. Alberta and Saskatchewan had much higher emissions from oil and gas, because these are their main economic inputs. Saskatchewan also had higher agriculture emissions, as that is a dominating industry in the Province. Transportation was higher for the Territories because these communities rely on airlines to receive goods and services (Government of Northwest Territories n.d.). There were no major outlier or unusual in the electricity sector. In the industries and manufacturing sector, Nunavut experiences a spike in emissions. It is unclear why this occurs, but it is unlikely that this correlates with any emission policy, and likely is related to the creation of the Territory. 

FINAL GRAPH FUC.png

Figure 4. Line plots of greenhouse gas emissions of CO2e per capita for each sector, after the correction of outliers. Each colour represents a different Province. Provincial codes: AB is Alberta, BC is British Columbia, MB is Manitoba, NB is New Brunswick, NL is Newfoundland and Labrador, NS is Nova Scotia, NU is Nunavut, NWT is Northwest Territories, ON is Ontario, PEI is Prince Edward Island, SK is Saskatchewan, YK is Yukon. 

Since we created the data table for province and sector emissions, they required thorough checks for data errors. We created a boxplot of the combined greenhouse gas emissions by province to check for more potential outliers, and characterize our data (Figure 5). We compared emissions per capita, to reduce population size bias. Many outliers were present, however these outliers reflected differences in dominating industries. For example, Alberta is a major proponent in the oil and gas industry, and so it makes sense that oil and gas has so many outliers, because Alberta produces much more oil and gas compared to any other sectors. Therefore the data is skewed to the right, but there are no errors present to be corrected. 

Real provincial per cap.png

Figure 5. Exploratory boxplot of greenhouse gas emissions per capita by Province. The data is skewed to the right and has many outliers, however these were not determined to be errors, but instead reflect the non-normal characteristics of this data. Provincial codes: AB is Alberta, BC is British Columbia, MB is Manitoba, NB is New Brunswick, NL is Newfoundland and Labrador, NS is Nova Scotia, NU is Nunavut, NWT is Northwest Territories, ON is Ontario, PEI is Prince Edward Island, SK is Saskatchewan, YK is Yukon. 

Global Data Exploration

The Canadian population and emissions data from Statistics Canada was compared to the Canadian subset of global population and emissions data to verify both datasets. They were found to be almost exact. No outliers or errors for any countries were found. 

​

Every country was initially plotted on a single line graph for an initial analysis (Figure 6). The data was displayed in emissions per capita to make the countries comparable, and on a log scale to display the graphs relative to one another. Some countries do not appear until after 1990, because they did not exist prior. Therefore, the zero values represent the year they were established and when the record of emissions was first collected for that country. 

 

 

Messy Global Final.png

Figure 6. Exploratory line graph of all country greenhouse gas emissions per capita on a log scale for comparability. A legend was not provided, due to the sheer number of countries being indiscernible from one another, even when a legend was provided. 

Data Shortcomings

Unfortunately, the GDP data we collected only covers 1997 to 2019, while the emission data spans from 1990 to 2020. This prevents us from comparing the drop in emissions that occur in 2020 due to the economic crash from the COVID-19 pandemic.

Another shortcoming is that only three policies are analysed. There is potential for variation between policy success in each category, however we limited our analysis size for the sake of the scale of this project, and because many national policies do not fit the time scale or concepts we are addressing. 

bottom of page