As an **undergraduate Mathematics** or **Data Science** teacher, you can use this set of computer-based tools to help you in teaching **Introductory Predictive Analysis **and specifically Exploratory Data Analysis with Linear Regression.

### Introduction

This lesson plan will help you to teach **Introductory Predictive Analysis **through an Exploratory Data Analysis with ** Linear Regression **assignment**. **The lesson plan includes** a hands-on computer-based classroom activity** to be conducted on a dataset of the annual temperature records of Mumbai - a coastal city in Western India, for the span of 1842 to 2019. This activity includes hands-on **Python code,** and **a set of inquiry-based questions** that will enable your students to apply their understanding of **scatter plots, trendlines, moving averages, heatmaps, correlation coefficients, linear regression, and regression equations.**

Thus, the use of this lesson plan allows you to integrate the teaching of a climate science topic with a core topic in **Mathematics, Statistics, and Data Science**.

### Questions

Use this lesson plan to help your students find answers to:

- Use an example to describe the time series analysis of a dataset of 177 years of data
- Use an example to describe exploratory data analysis with linear regression analysis.
- What are heatmaps and correlation analysis for a given dataset?
- Use regression analyses to describe how the seasonal and annual temperatures of Mumbai have changed over time.
- Discuss reasons for changes in temperature patterns and the impact of climate change on temperatures of various cities in the world.

### Location of Mumbai in India

### About the Lesson Plan

Grade Level |
Undergraduate |

Discipline |
Mathematics, Data Science |

Topic(s) in Discipline |
Scatter Plots, Correlation Coefficients, Regression Equations, Linear Regression, OLS and LOWESS Trendlines, Heatmap |

Climate Topic |
Climate and the Atmosphere Climate Variability Record |

Location |
Global |

Language(s) |
English |

Access |
Online, Offline |

Computer Skills Required |
Intermediate |

Approximate Time Required |
60-90 min |

### Contents

**Contents**

Teaching Module
(25 min) |
A teaching module to explain the basics of scatter plots, correlation coefficients, regression equations, and linear regression |

Video micro-lectures
(14 min) |
A video micro-lecture to give Introduction to Simple Linear Regression |

Classroom/ Laboratory activity
(30 min) |
A classroom activity - Python Code to apply the understanding of Exploratory Data Analysis and Linear Regression by using a dataset of the annual and seasonal temperature of Mumbai city for the period of 1842 to 2019. |

### Video

Here is a step-by-step guide to using this lesson plan in the classroom/laboratory. We have suggested these steps as a possible plan of action. You may customize the lesson plan according to your preferences and requirements.

**Step 1: Topic introduction and discussion:**

1. Use the teaching module, ‘Introduction-Linear Regression and Correlation’ by OpenStax^{TM}, Rice University (for High School level) or ‘Chapter-3: Linear Regression’ provided by Ramesh Sridharan, Massachusetts Institute of Technology (for Undergraduate level), to introduce these topics of basic statistics.

2. Navigate to the sub-sections within the module to the basics of scatter plots, correlation coefficients, regression equations, and linear regression.

3. Use the in-built practice exercises and quizzes to evaluate your students’ understanding of the topics.

Find Linear Regression Teaching Module PDF here

**Step 2: Develop the topic further:**

Use the video micro-lecture, ‘Introduction to Simple Linear Regression by dataminingincae, INCAE Business School for a basic introduction to Simple Linear Regression and terms like dependant variable, independent variable, regression line, and regression coefficients.

**Step 3: Extend understanding by practicing Hands-on Python code:**

1. Use the provided Dataset mumbai-temp-data.csv and associated Python Notebook for Exploratory Data Analysis with Linear Regression.

2. The raw data was collected from Colaba’s meteorological station for the period of 1842 to 2019, which comes out to be a total of 177 years worth of data. However, the useful data is from the year 1878 to 2019 which is still 141 years of data in total. There are all twelve months' temperatures for all 141 years.

Data Source: Colaba Meteorological Station, Mumbai, India

3. Use the Python Notebook and Dataset to:

**Part 1: Load and prepare the data for use**- -Read the Dataset using DataFrame
- -Know the basics of the dataset like its dimensions, data types, and memory usage
- -Plot the scatter plot of the annual temperature variable, seasonal temperatures for Jan-Feb, March-May, June-September, October-December.
- -Use NumPy library to convert the DataFrame to NumPy Array which would be used in the further steps.
- Calculate moving averages to get a smoother curve when to plot our readings for the columns. The moving averages are calculated at an interval of five years for all the seasonal columns.
**Part 2: Exploratory Data Analysis: Know your data**- - Handling missing values: On taking the first look at the given dataset, we can gather that there are some missing value rows for the moving average’s columns and hence we get rid of those first by dropping those rows. Reducing our data from 143 rows to 138 rows in summation.
- - Basic statistical functions: Then we start to look at some basic statistical functions to get basic intuition so we know where to start looking in the data. We find out the mean, median, minimum and maximum values for each column to check if there are any outliers in the data. There isn’t a big difference between 75% and the maximum values of our moving average columns so we don’t have to do anything to identify and remove the outliers.
- - Null values exist for the moving average and were removed so the dataset is not affected in the future. A total of 10 (-999) missing values were present in data that had to be imputed by using averages of those columns.
**Part 3: Heatmaps for correlations:**- Now let the students create some visualizations to see feature correlations i.e. a heatmap. The first heatmap is of all 12 months and another one is the correlation between moving average columns of all seasons. This will help us determine the difference that will be made by taking moving averages.- The first heatmap gives us the interpretation that the months- February, May, September, October, November, and December contribute the most to the annual column and have a high positive correlation. Whereas the second heatmap shows a very high positive correlation in October-December moving average with the Annual temperature column. Thus oct-dec column contributes the most to the annual column which means that the rise in overall temperature in the annual column can be seen more because of the oct-dec column i.e. winter season rather than the summer season months. None of the columns have a negative correlation amongst themselves which means no inverse correlation exists amongst any of the columns.**Part 4: Trendlines**- - Let the students plot annual and seasonal temperature scatter plots using the data columns for 5-years moving averages.
- - Let them divide the data into 50 years intervals to understand the trends for every 50 years.
- - Encourage the students to analyze the trendlines and discuss their observations.
- - Discuss the data ranges for various seasons.
**Part 5: Linear Regression for Predictions**- -Find the Regression Coefficients for Simple Linear Regression
- -Plot the scatter plot and Regression Line as per the predicted coefficients
- -Calculate RMSE (Root Mean-Squared Error-values)
- -Discuss how well the Regression Line describes the data points for the total time period and for every 50 years.

4. Encourage your students to answer topical questions by applying their understanding of scatter plots, correlation coefficients, heatmap, moving averages, trendlines, and linear regression.

5. Use the regression analyses performed to initiate a discussion on the increase in temperatures from 1980 to 2020 due to anthropogenic activities, which is one major reason behind global climate change.

Suggested questions/assignments for learning evaluation :

- Use an example to describe the time series analysis of a dataset of 177 years of data
- Use an example to describe exploratory data analysis with linear regression analysis.
- What are heatmaps and correlation analysis for a given dataset?
- Use regression analyses to describe how the seasonal and annual temperatures of Mumbai have changed over time.
- Discuss reasons for changes in temperature patterns and the impact of climate change on temperatures of various cities in the world.
- Try applying similar Python code to long-term temperature data from other cities to explore the temperature trends over the past and current centuries.

The tools in this lesson plan will enable students to:

- learn about trendlines, linear regression and correlation
- understand linear regression equations and related terms such as correlation coefficients
- use linear regression analyses to describe the temperature rise from the Twentieth century to recent times (1900-2019)
- discuss how these changes suggest that the planet is facing a significant increase in temperature in the last 50 years

1 | Teaching Module, “Introduction- Linear Regression and Correlation” | Provided by OpenStax^{TM}, Rice University |

2 | Teaching Module, “Chapter 3: Linear Regression” | Provided by Ramesh Sridharan, MIT from ‘Statistics for Research Projects’ |

3 | Video micro-lecture, ‘Introduction to Simple Linear Regression’ | by dataminingincae, INCAE Business School |

4 | Dataset of Mumbai temperature records | Colaba Meteorological Station, Mumbai, India |

5 | Images | Ontheworldmap |

6 | Python code | Ashan Virdikar and Naveen Anupoju, India |

### Python Notebook

#### You may also be interested in

- Lesson Plan: Create Your Climate Model- Earth’s Energy…
- Lesson Plan: Teaching Linear Regression using Arctic Sea Ice…
- Lesson Plan: Climate Denial: How Language is Used to…
- Lesson Plan: Coding with Python: Modeling the Ice Albedo…
- Video: Upwelling
- Lesson Plan: Pond Ecosystems and Climate Change
- Reading: Climate Change and the Decline of Insect Population
- Lesson Plan: Data Science: Linear and Polynomial Regression