To summarize the relationship between two variables, X and Y, we often use the least squares method. This method selects the line that minimizes the sum of squared residuals (SSR). Residuals—also known as errors—are the vertical distances between each data point and the line. The SSR captures the total magnitude of these errors by squaring them (which prevents negative and positive errors from canceling each other out) and summing them. The closer the line is to the data points, the smaller the errors and the SSR, and thus the better the line summarizes the relationship between X and Y.
Let's see how different lines fit the data to build an intuition about the least squares method:
STEP 1: The scatterplot below shows the relationship between X and Y. In theory, we could draw an infinite number of lines on this scatterplot, but some lines will summarize the relationship between X and Y better than others. Intuitively, we know that the line of best fit should be as close as possible to the data points, and thus have the smallest possible errors. Let's start by examining Line #1 (which is selected by default). Check the box to display both the line and its errors. This line does not fit the data well. It has a negative slope while X and Y clearly have a positive relationship, and as a result the errors are very large. In this case, SSR = 704.
STEP 2: Now, select Line #2 from the dropdown menu. This line fits the data better because it produces smaller errors (shorter vertical distances). Here, the SSR = 452. However, this is still not the best-fitting line, as the SSR can be reduced further.
STEP 3: Finally, select Line #3. The errors associated with Line #3 are the smallest among the three. Here, the SSR = 297, which is the lowest of all options. Line #3 is the one the least squares method would choose as the line of best fit, because it has the smallest SSR of all possible lines.
Note: A scatterplot enables us to visualize the relationship between two variables by plotting one variable against the other in two-dimensional space. Each dot represents an observation. The errors are the vertical distances between the dots and the line.