Trial Lesson Plan: Data Analysis : India Temperature Data

Here is a Lesson Plan / Tutorial to teach or learn how we can turn raw data into code and that in turn into interactive graphs, using a dataset of long term temperature records in India.

Introduction

The dataset includes temperature data seen in India from 1901 to 2011.  India observes three distinct seasons: summer, rainy season (monsoon), and winter. This data can be used to study changes in temperature patterns in these seasons and when exactly the changes started showing early signs of them.

The Python code given below can be used to perform the Exploratory Analysis of the temperature data to summarize their main characteristics and to find any trends.

About the Data

The dataset includes temperature data seen in India from 1900 to 2011. The columns include Annual temperature, January to February, March to May, June to September, and October to December. The dataset is available for free access at the Open Government Platform India.

 

About the Lesson Plan

  • Grade Level: High school, Undergraduate
  • Discipline: Environmental Sciences, Data Science
  • Topic(s) in Discipline: Basics of Data Analysis, Global Warming
  • Climate Topic: Introduction to Climate Change, Climate Variability Record
  • Location: India
  • Language(s): English

Suggested Questions

  1. What is climate change? What are the causes of global warming?
  2. What are the trends in the temperature of India over the past 100 years?
  3. Can you find a trendline for the given temperature records?

Algorithm

Here we will see in depth the code used in Jupyter Notebook IDE to plot interactive graphs using Plotly library.

In the adjoining Python code, the libraries required to be imported in our Jupyter Notebook to plot interactive graphs are mentioned. Once  the necessary libraries are loaded, arrays which can be used to store the values/data in order to plot graphs can be prepared .

We use NumPy library to make arrays and store values in the array then we will use these arrays to refer to the x-axis and y-axis in the plot. After input data is referred through arrays into the variable “fig” we then give titles to the entire plot followed by titles given to x and y axes. By using .show() we can see the figure plotted.

The Plotly platform is used to plot interactive line graphs for the columns of Annual temperature, Jan-Feb temperature, March-May temperature, June-September temperature and October-December temperature.

 

Python Code

Exploratory Data Analysis

This dataset consists of a total of 111 entries and 6 columns namely- Year, Annual, Jan-Feb, Mar-May, Jun-Sept, Oct-Dec. There are no null or missing values in the dataset and except for the Year column all the other columns have float data type. While performing exploratory data analysis about this dataset we will divide the analysis in two categories - Intuition about the dataset and Data Visualization.

Intuition :

Basic intuition that can be derived by performing statistical functions using pandas and NumPy libraries on a dataset will be discussed in this part in detail. After using describe () function to find out about the statistical values of the dataset, that is the mean, median , minimum and maximum values for each column, here are the main observations seen.

Key Observations :

  1. The mean value of each column is lesser than the median value i.e. (50% percentile) .
  2.  Another significant difference can be seen between 75% and maximum values of Jan-Feb and Mar-May columns. The difference is large enough to be noticed.
  3. Observation’s 1 and 2 suggest that there are outliers present in the dataset that we should look out for.

Data Analysis

1. Annual Temperature Line Plot: The line plot for annual rainfall over a time period of 110 years (1901-2011). The temperature range lies from 29 – 30 degrees elsius. According to the line trends, we can see that there are significant drops in years- 1920, 1950 and 2000. Similarly, a significant rise in temperature can be seen in the years- 1940, 1959, and 1999.  In the years 1921 and 1961, we see the temperature being slightly below the average temperature. Whereas, in the year 2001, there was a spike in the temperature of 29.9 9 (almost 30 degrees celsius).

The main observation for the annual plot is that there has been a notable rise in temperature since 1981.

Plots

2. January-February Line Plot: Line plot includes temperature for months of January and February. The range of temperature is recorded from 23 to 27 degrees Celsius for this time of the year. Median value is 24.51 degree Celsius. Temperature rises were seen in the years 1902,1946,1966 and 2006. Whereas, a drop in temperatures was seen in years 1905,1968 and 2008.

The highest temperature recorded over the time period was 27.44 degrees Celsius in 2006. Whereas the lowest temperature recorded is 22.25 degrees in the year 1905.

3. March – May Line Plot: Line plot includes temperature for months of  March, April and May. For this interval, data is in the range of 31 to 33  degree celsius which is the hottest time of the year. Median temperature comes out to be 31.46 degree celsius. These are the values with highest frequency in the dataset 30.84, 31.17 and 31.89 degree celsius.

Temperature rises were seen in the years1985, 2002 and 2011. Whereas, a drop in temperatures was seen in years- 1907,1917,1926,1933 and 1957. The highest temperature recorded over the time period was 33.46 degrees in 2010. Whereas the lowest temperature recorded is 29.92 degrees in the year 1907.

4. June – September Line Plot: Line plot includes temperature for months of June, July, August, and September. The range of temperature is recorded from 31.2 to 31.5 degrees Celsius for this time of the year. Temperature rises were seen in the years -1919,1945,1970,1985 and 2011. Whereas, a drop in temperatures was seen in years- 1920,1938,1959 and 1979.

The highest temperature recorded over the time period was 32.24 degrees in 1987 and 2009. Whereas the lowest temperature recorded is 30.2 degrees in the year 1956.

5. October – December Line Plot: Temperature readings in the months of October, November and December are taken into account while plotting this graph i.e. the last few months of the year or the winter period in India. Temperature is in the range of 27 to 28 degrees. The highest recorded temperature is 28.5 degrees in the year 1995 while the lowest temperature has been recorded in the year 1917 which was 25.7 degrees. Spikes in temperatures were recorded in  1922, 1941, 1955, 1999, 2001 whereas drops were seen in 1919, 1961 and year 2000.

Learning outcomes for Data Science:

  1. Learn about basic Exploratory Analysis of a dataset to find the number of records and columns, to check null values etc.
  2. Learn how to to calculate mean, standard deviation and minimum and maximum values of various columns.
  3. Learn how to prepare a dataset and plot graphs using Python.
  4. Learn the use of libraries like NumPy, Pandas and MatPlotLib for Python.
  5. Learn how to plot interactive graphs on a platform called 'Plotly'.

Learning outcomes for climate change:

The tools in this lesson plan will enable students to:

  1. learn about climate change and global warming with the help of the analysis of India's long term temperature records
  2. use Python functions to calculate and describe temperature anomalies in India from the beginning of the 20th century to recent times (1901-2001)
  3. discuss how these changes suggest that the planet has warmed significantly since the beginning of the industrial age
%d bloggers like this: