This lesson plan will help you to teach **Introductory Predictive Analysis **through an Exploratory Data Analysis with ** Linear Regression **assignment**. **The lesson plan includes** a hands-on computer-based classroom activity** to be conducted on a dataset of the annual temperature records of Mumbai – a coastal city in Western India, for the span of 1842 to 2019. This activity includes hands-on **Python code,** and **a set of inquiry-based questions** that will enable your students to apply their understanding of **scatter plots, trendlines, moving averages, heatmaps, correlation coefficients, linear regression, and regression equations.**

Thus, the use of this lesson plan allows you to integrate the teaching of a climate science topic with a core topic in **Mathematics, Statistics, and Data Science.**

**The tools in this lesson plan will enable students to:**

- Learn about trendlines, linear regression and correlation
- Understand linear regression equations and related terms such as correlation coefficients
- Use linear regression analyses to describe the temperature rise from the Twentieth century to recent times (1900-2019)
- Discuss how these changes suggest that the planet is facing a significant increase in temperature in the last 50 years

About

Step-by-Step User Guide

Questions

Credits

Review

About

Grade Level | Undergraduate |

Discipline | Mathematics and Statistics, Earth Sciences |

Topic(s) in Discipline |
Introduction to Statistics, Data Science, Trend Analysis, Computer Programming, Recent Climate Change, Scatter Plots, Correlation Coefficients, Regression Equations, Linear Regression, OLS and LOWESS Trendlines, Heatmap |

Climate Topic |
Climate Variability Record |

Location | Global, Asia, India |

Language(s) | English |

Access | Online / Offline |

Approximate Time Required | 60-90 min |

Share | |

Resource Download |

Step-by-Step User Guide

Here is a step-by-step guide to using this lesson plan in the classroom/laboratory. We have suggested these steps as a possible plan of action. You may customize the lesson plan according to your preferences and requirements.

Teaching Module (25 min)

- Introduction-Linear Regression and Correlation’ by OpenStax
^{TM}, Rice University (for High School level) or ‘Chapter-3: Linear Regression’ provided by Ramesh Sridharan, Massachusetts Institute of Technology (for Undergraduate level), to introduce these topics of basic statistics. - Navigate to the sub-sections within the module to the basics of scatter plots, correlation coefficients, regression equations, and linear regression.
- Use the in-built practice exercises and quizzes to evaluate your students’ understanding of the topics.

Video micro-lectures(15 mins)

Use the video micro-lecture, ‘Introduction to Simple Linear Regression by dataminingincae, INCAE Business School for a basic introduction to Simple Linear Regression and terms like dependant variable, independent variable, regression line, and regression coefficients.

Classroom/ Laboratory activity(30 min)

- Use the provided Dataset mumbai-temp-data.csv and associated Python Notebook for Exploratory Data Analysis with Linear Regression.
- The raw data was collected from Colaba’s meteorological station for the period of 1842 to 2019, which comes out to be a total of 177 years worth of data. However, the useful data is from the year 1878 to 2019 which is still 141 years of data in total. There are all twelve months’ temperatures for all 141 years.
- Use the Python Notebook and Dataset to:

**Part 1: Load and prepare the data for use**

- Read the Dataset using DataFrame
- Know the basics of the dataset like its dimensions, data types and memory usage
- Plot the scatter plot of the annual temperature variable, seasonal temperatures for Jan-Feb, March-May, June-September, October-December.
- Use NumPy library to convert the DataFrame to NumPy Array which would be used in the further steps.
- Calculate moving averages to get a smoother curve when to plot our readings for the columns. The moving averages are calculated at an interval of five years for all the seasonal columns.

**Part 2: Exploratory Data Analysis: Know your data**

- Handling missing values: On taking the first look at the given dataset, we can gather that there are some missing value rows for the moving average’s columns and hence we get rid of those first by dropping those rows. Reducing our data from 143 rows to 138 rows in summation.
- Basic statistical functions: Then we start to look at some basic statistical functions to get basic intuition so we know where to start looking in the data. We find out the mean, median, minimum and maximum values for each column to check if there are any outliers in the data. There isn’t a big difference between 75% and the maximum values of our moving average columns so we don’t have to do anything to identify and remove the outliers.
- Null values exist for the moving average and were removed so the dataset is not affected in the future. A total of 10 (-999) missing values were present in data that had to be imputed by using averages of those columns.

**Part 3: Heatmaps for correlations**

Now let the students create some visualizations to see feature correlations i.e. a heatmap. The first heatmap is of all 12 months and another one is the correlation between moving average columns of all seasons. This will help us determine the difference that will be made by taking moving averages.- The first heatmap gives us the interpretation that the months- February, May, September, October, November, and December contribute the most to the annual column and have a high positive correlation. Whereas the second heatmap shows a very high positive correlation in October-December moving average with the Annual temperature column. Thus oct-dec column contributes the most to the annual column which means that the rise in overall temperature in the annual column can be seen more because of the oct-dec column i.e. winter season rather than the summer season months. None of the columns have a negative correlation amongst themselves which means no inverse correlation exists amongst any of the columns.

**Part 4:Trendlines**

- Let the students plot annual and seasonal temperature scatter plots using the data columns for 5-years moving averages.
- Let them divide the data into 50 years intervals to understand the trends for every 50 years.
- Encourage the students to analyze the trendlines and discuss their observations.
- Discuss the data ranges for various seasons.

**Part 5:Linear Regression for Predictions**

- Find the Regression Coefficients for Simple Linear Regression
- Plot the scatter plot and Regression Line as per the predicted coefficients
- Discuss how well the Regression Line describes the data points for the total time period and for every 50 years.
- Encourage your students to answer topical questions by applying their understanding of scatter plots, correlation coefficients, heatmap, moving averages, trendlines, and linear regression.
- Use the regression analyses performed to initiate a discussion on the increase in temperatures from 1980 to 2020 due to anthropogenic activities, which is one major reason behind global climate change.

4. Encourage your students to answer topical questions by applying their understanding of scatter plots, correlation coefficients, heatmap, moving averages, trendlines, and linear regression.

5. Use the regression analyses performed to initiate a discussion on the increase in temperatures from 1980 to 2020 due to anthropogenic activities, which is one major reason behind global climate change.

Questions

**Use this lesson plan to help your students find answers to:**

- Use an example to describe the time series analysis of a dataset of 177 years of data
- Use an example to describe exploratory data analysis with linear regression analysis.
- What are heatmaps and correlation analysis for a given dataset?
- Use regression analyses to describe how the seasonal and annual temperatures of Mumbai have changed over time.
- Discuss reasons for changes in temperature patterns and the impact of climate change on temperatures of various cities in the world.

Credits

5Python codeAshan Virdikar and Naveen Anupoju, India

1 | Teaching Module, “Introduction- Linear Regression and Correlation” | Provided by OpenStax^{TM}, Rice University |

2 | Teaching Module, “Chapter 3: Linear Regression” | Provided by Ramesh Sridharan, MIT from ‘Statistics for Research Projects’ |

3 | Video micro-lecture, ‘Introduction to Simple Linear Regression’’ | by dataminingincae, INCAE Business School |

4 | Dataset of Mumbai temperature records’ | Colaba Meteorological Station, Mumbai, India |

Review

About

Step-by-Step User Guide

Questions

Credits

Review

About

Step-by-Step User Guide

Here is a step-by-step guide to using this lesson plan in the classroom/laboratory. We have suggested these steps as a possible plan of action. You may customize the lesson plan according to your preferences and requirements.

Teaching Module (25 min)

- Introduction-Linear Regression and Correlation’ by OpenStax
^{TM}, Rice University (for High School level) or ‘Chapter-3: Linear Regression’ provided by Ramesh Sridharan, Massachusetts Institute of Technology (for Undergraduate level), to introduce these topics of basic statistics. - Navigate to the sub-sections within the module to the basics of scatter plots, correlation coefficients, regression equations, and linear regression.
- Use the in-built practice exercises and quizzes to evaluate your students’ understanding of the topics.

Video micro-lectures(15 mins)

Use the video micro-lecture, ‘Introduction to Simple Linear Regression by dataminingincae, INCAE Business School for a basic introduction to Simple Linear Regression and terms like dependant variable, independent variable, regression line, and regression coefficients.

Classroom/ Laboratory activity(30 min)

- Use the provided Dataset mumbai-temp-data.csv and associated Python Notebook for Exploratory Data Analysis with Linear Regression.
- The raw data was collected from Colaba’s meteorological station for the period of 1842 to 2019, which comes out to be a total of 177 years worth of data. However, the useful data is from the year 1878 to 2019 which is still 141 years of data in total. There are all twelve months’ temperatures for all 141 years.
- Use the Python Notebook and Dataset to:

**Part 1: Load and prepare the data for use**

- Read the Dataset using DataFrame
- Know the basics of the dataset like its dimensions, data types and memory usage
- Plot the scatter plot of the annual temperature variable, seasonal temperatures for Jan-Feb, March-May, June-September, October-December.
- Use NumPy library to convert the DataFrame to NumPy Array which would be used in the further steps.
- Calculate moving averages to get a smoother curve when to plot our readings for the columns. The moving averages are calculated at an interval of five years for all the seasonal columns.

**Part 2: Exploratory Data Analysis: Know your data**

- Handling missing values: On taking the first look at the given dataset, we can gather that there are some missing value rows for the moving average’s columns and hence we get rid of those first by dropping those rows. Reducing our data from 143 rows to 138 rows in summation.
- Basic statistical functions: Then we start to look at some basic statistical functions to get basic intuition so we know where to start looking in the data. We find out the mean, median, minimum and maximum values for each column to check if there are any outliers in the data. There isn’t a big difference between 75% and the maximum values of our moving average columns so we don’t have to do anything to identify and remove the outliers.
- Null values exist for the moving average and were removed so the dataset is not affected in the future. A total of 10 (-999) missing values were present in data that had to be imputed by using averages of those columns.

**Part 3: Heatmaps for correlations**

Now let the students create some visualizations to see feature correlations i.e. a heatmap. The first heatmap is of all 12 months and another one is the correlation between moving average columns of all seasons. This will help us determine the difference that will be made by taking moving averages.- The first heatmap gives us the interpretation that the months- February, May, September, October, November, and December contribute the most to the annual column and have a high positive correlation. Whereas the second heatmap shows a very high positive correlation in October-December moving average with the Annual temperature column. Thus oct-dec column contributes the most to the annual column which means that the rise in overall temperature in the annual column can be seen more because of the oct-dec column i.e. winter season rather than the summer season months. None of the columns have a negative correlation amongst themselves which means no inverse correlation exists amongst any of the columns.

**Part 4:Trendlines**

- Let the students plot annual and seasonal temperature scatter plots using the data columns for 5-years moving averages.
- Let them divide the data into 50 years intervals to understand the trends for every 50 years.
- Encourage the students to analyze the trendlines and discuss their observations.
- Discuss the data ranges for various seasons.

**Part 5:Linear Regression for Predictions**

- Find the Regression Coefficients for Simple Linear Regression
- Plot the scatter plot and Regression Line as per the predicted coefficients
- Discuss how well the Regression Line describes the data points for the total time period and for every 50 years.
- Encourage your students to answer topical questions by applying their understanding of scatter plots, correlation coefficients, heatmap, moving averages, trendlines, and linear regression.
- Use the regression analyses performed to initiate a discussion on the increase in temperatures from 1980 to 2020 due to anthropogenic activities, which is one major reason behind global climate change.

4. Encourage your students to answer topical questions by applying their understanding of scatter plots, correlation coefficients, heatmap, moving averages, trendlines, and linear regression.

5. Use the regression analyses performed to initiate a discussion on the increase in temperatures from 1980 to 2020 due to anthropogenic activities, which is one major reason behind global climate change.

Questions

**Use this lesson plan to help your students find answers to:**

- What are derivatives and their functions?
- Using an example, describe polynomial differentiation.
- Is the extent of the Arctic Sea Ice decreasing since 1980?
- Has the speed of melting of Arctic Sea Ice changed from 1980- 2017?
- Discuss the Ice Albedo Feedback and Global Warming to explain the differences in rates of melting of and extent of Arctic Sea Ice over the past four decades.

Credits

5Python codeAshan Virdikar and Naveen Anupoju, India

1 | Teaching Module, “Introduction- Linear Regression and Correlation” | Provided by OpenStax^{TM}, Rice University |

2 | Teaching Module, “Chapter 3: Linear Regression” | Provided by Ramesh Sridharan, MIT from ‘Statistics for Research Projects’ |

3 | Video micro-lecture, ‘Introduction to Simple Linear Regression’’ | by dataminingincae, INCAE Business School |

4 | Dataset of Mumbai temperature records’ | Colaba Meteorological Station, Mumbai, India |

Review

TROP ICSU is a project of the International Union of Biological Sciences and Centre for Sustainability, Environment and Climate Change, FLAME University.