Text Book of Correlations and Regression : A. K. Sharma :Regression analysis is a statistical process for estimating the relationships among variables and includes many techniques for modeling and analyzing several variables. When the focus is on the relationship between a dependent variable and one or more independent variables. This involves data that fits a line in two dimensions. You will also study correlation which measures how strong the relationship is. The variable x is the independent variable, and y is the dependent variable.
Correlation & Regression: Concepts with Illustrative examples
Chapter 7: Correlation and Simple Linear Regression
Correlation and linear regression are the most commonly used techniques for investigating the relationship between two quantitative variables. The goal of a correlation analysis is to see whether two measurement variables co vary, and to quantify the strength of the relationship between the variables , whereas regression expresses the relationship in the form of an equation. For example, in students taking a Maths and English test, we could use correlation to determine whether students who are good at Maths tend to be good at English as well, and regression to determine whether the marks in English can be predicted for given marks in Maths. The starting point is to draw a scatter of points on a graph, with one variable on the X-axis and the other variable on the Y-axis, to get a feel of the relationship if any between the variables as suggested by the data. The closer the points are to a straight line, the stronger the linear relationship between two variables.
In many studies, we measure more than one variable for each individual. - Chapter 3. For example: the measures of height for individual human subjects, paired with their corresponding measures of weight; the number of hours that individual students in a statistics course spend studying prior to an exam, paired with their corresponding measures of performance on the exam; the amount of class time that individual students in a statistics course spend snoozing and daydreaming prior to an exam, paired with their corresponding measures of performance on the exam; and so on.
For example, here are two graphs. For the first, I dusted off the elliptical machine in our basement and measured my pulse after one minute of ellipticizing at various speeds:. For the second graph, I dusted off some data from McDonald : I collected the amphipod crustacean Platorchestia platensis on a beach near Stony Brook, Long Island, in April, , removed and counted the number of eggs each female was carrying, then freeze-dried and weighed the mothers:. There are three things you can do with this kind of data. For the exercise data, you'd want to know whether pulse rate was significantly higher with higher speeds. For the amphipod data, you'd want to know whether bigger females had more eggs or fewer eggs than smaller amphipods, which is neither biologically obvious nor obvious from the graph.
In Lesson 11 we examined relationships between two categorical variables with the chi-square test of independence. In this lesson, we will examine the relationships between two quantitative variables with correlation and simple linear regression. We will review some of the same concepts again, and we will see how we can test for the statistical significance of a correlation or regression slope using the t distribution. In addition to reading Section 9. Let's review.