Lesson Plan: Exploratory Data Analysis using India Temperature Data

As a high school or undergraduate teacher of Mathematics or Data Science, this lesson plan will help you to teach Exploratory Data Analysis using a dataset of long-term temperature records in India.

Introduction

Exploratory Data Analysis is a process of understanding and analyzing the data sets and extracting insights or main characteristics of them.

The dataset used in this  Lesson Plan includes temperature data seen in India from 1901 to 2011.  India observes three distinct seasons: summer, rainy season (monsoon), and winter. This data can be used to study changes in temperature patterns in these seasons and when exactly the changes started showing early signs of them. The Python code given below can be used to perform the Exploratory Analysis of the temperature data to summarize their main characteristics and to find any trends.

india-map

Contents

  1. Video Lecture: "Exploratory Data Analysis" by Prof.  Patrick Meyer, Curry School of Education, University of Virginia, USA. The lecture is an introduction to exploratory data analysis that includes a discussion of descriptive statistics, graphs, outliers, and robust statistics.
  2. Classroom/ Laboratory activity (30 min): A classroom/Computer Lab activity that includes Python Code to practice Exploratory Data Analysis using a dataset of the temperature data (in degrees Celsius) seen in India from 1900 to 2011.

Suggested Questions

  1. Why is exploratory data analysis important in data science?
  2. What is climate change? What are the causes of global warming?
  3. What are the trends in the temperature of India over the past 100 years?
  4. Can you find a trend for the given temperature records?

About the Lesson Plan

  • Grade Level: High school, Undergraduate
  • Discipline: Data Science, Mathematics, Environmental Sciences
  • Topic(s) in Discipline: Basics of Data Analysis, Exploratory Data Analysis, Global Warming
  • Climate Topic: Introduction to Climate Change, Climate Variability Record, Long-term climate records
  • Location: India
  • Language(s): English

About the Data

The dataset includes temperature data seen in India from 1900 to 2011. The columns include Annual temperature, January to February, March to May, June to September, and October to December. The dataset is available for free access at the Open Government Platform India.

Video: Exploratory Data Analysis

Python Code

Exploratory Data Analysis

Here is a step-by-step guide to using this lesson plan in the classroom/laboratory. We have suggested these steps as a possible plan of action. You may customize the lesson plan according to your preferences and requirements

Step 1 : Introduction to Exploratory Data Analysis:

Use the video lecture "Exploratory Data Analysis" by Prof.  Patrick Meyer, Curry School of Education, University of Virginia, USA. The lecture is an introduction to exploratory data analysis that includes a discussion of descriptive statistics, graphs, outliers, and robust statistics.

Step 2 : Hands-on Classroom/Computer Lab Activity:

Conduct an activity to practice coding for Exploratory Data Analysis using Python. Guide the students to use the code to understand steps involved in loading the data, necessary libraries and plotting interactive graphs.

In the adjoining Python code, the libraries required to be imported in our Jupyter Notebook IDE to plot interactive graphs using the Plotly library are mentioned. Once the necessary libraries are loaded, arrays can be prepared that can be used to store the values/data in order to plot graphs.  The NumPy library is used to make arrays and store values in the array. These arrays are used to refer to the X-axis and Y-axis in the plot. The Plotly platform is used to plot interactive line graphs for the columns of Annual temperature, Jan-Feb temperature, March-May temperature, June-September temperature, and October-December temperature.

Step 3: Data Analysis and Suggested Discussion

Once the code is ready and running without any issues, encourage the students to analyze the data using the plots. The analysis can be run for Annual temperature and seasonal temperature for time intervals as January-February, March-May, June-September, and October-December. Help the students to note the changes in the average, highest and lowest values of the temperature over the period of 111 years.

A detailed Data Analysis and Discussion is suggested below.

Data Analysis and Suggested Discussion

We have suggested these points that can be used as a possible plan of action to conduct a discussion in your class. You may customize the lesson plan according to your preferences and requirements.

1. Annual Temperature Line Plot: The line plot for annual temperatures over a time period of 110 years (1901-2011) shows that the temperature range lies from 28 – 30 degrees celsius. According to the line trends, we can see that there are significant drops in the years- 1920, 1950 and 2000. Similarly, a significant rise in temperature can be seen in the years- 1940, 1959, and 1999.  In the years 1921 and 1961,  the temperature was slightly below the average temperature. Whereas, in the year 2001, there was a spike in the temperature of 29.9 9 (almost 30 degrees celsius).

Help your students to note that, there has been a notable rise in temperature since 1981.

Interactive Plots

2. January-February Line Plot: The Line plot for temperature for the months of January and February shows that the range of temperature is from 23 to 27 degrees Celsius for this time of the year. The median value is 24.51 degree Celsius. Temperature rises were seen in the years 1902,1946,1966 and 2006. Whereas, a drop in temperatures was seen in the years 1905,1968 and 2008.

Suggest your students note that the highest temperature recorded over the time period was 27.44 degrees Celsius in 2006 and the lowest temperature recorded is 22.25 degrees in the year 1905.

3. March-May Line Plot: The adjoining Line plot is for temperature for months of  March, April and May. For this interval, data is in the range of 31 to 33  degree celsius which is the hottest time of the year. Median temperature comes out to be 31.46 degree celsius. These are the values with highest frequency in the dataset 30.84, 31.17 and 31.89 degree celsius.

Temperature rises were seen in the years1985, 2002 and 2011. Whereas, a drop in temperatures was seen in years- 1907,1917,1926,1933 and 1957. The highest temperature recorded over the time period was 33.46 degrees in 2010. Whereas the lowest temperature recorded is 29.92 degrees in the year 1907.

Help the students to compare the median, lowest, and highest values of the temperature for other time intervals.

4. June – September Line Plot: Line plot includes temperature for months of June, July, August, and September. The range of temperature is recorded from 31.2 to 31.5 degrees Celsius for this time of the year. Temperature rises were seen in the years -1919,1945,1970,1985 and 2011. Whereas, a drop in temperatures was seen in years- 1920,1938,1959 and 1979.

The highest temperature recorded over the time period was 32.24 degrees in 1987 and 2009. Whereas the lowest temperature recorded is 30.2 degrees in the year 1956.

5. October – December Line Plot: The adjoining plot shows the temperature readings in the months of October, November and December, which is the winter period in India. Temperature is in the range of 27 to 28 degrees for this interval. The highest recorded temperature is 28.5 degrees in the year 1995 while the lowest temperature has been recorded in the year 1917 which was 25.7 degrees. Spikes in temperatures were recorded in  1922, 1941, 1955, 1999, 2001 whereas drops were seen in 1919, 1961, and year 2000.

Help the students to note the change in the lowest values of the temperature over the period of 111 years.

Learning outcomes for Data Science:

  1. Learn about basic Exploratory Analysis of a dataset to find the number of records and columns, identify obvious errors, understand patterns within the data, detect outliers or anomalous events, and find interesting relations among the variables.
  2. Learn how to calculate the mean, standard deviation, and minimum and maximum values of various columns.
  3. Learn about the overall distribution of data over a given range.
  4. Learn how to prepare a dataset and plot graphs using Python.
  5. Learn the use of libraries like NumPy, Pandas and MatPlotLib for Python.
  6. Learn how to plot interactive graphs on a platform called 'Plotly'.

Learning outcomes for climate change:

The tools in this lesson plan will enable students to:

  1. learn about climate change and global warming with the help of the analysis of India's long term temperature records
  2. use Python functions to calculate and describe temperature trends in India from the beginning of the 20th century to recent times (1901-2011)
  3. discuss how these changes suggest that the planet has warmed significantly since the beginning of the industrial age
  4. discuss the overall trends of temperature in summer, monsoon, and winter seasons in India
%d