Calculating Simple Linear Regression

Simple linear regression is a procedure that provides an estimate of the value of a dependent variable (outcome) based on the value of an independent variable (predictor). Knowing that estimate with some degree of accuracy, we can use regression analysis to predict the value of one variable if we know the value of the other variable (Cohen & Cohen, 1983). The regression equation is a mathematical expression of the influence that a predictor has on a dependent variable, based on some theoretical framework. For example, in Exercise 14, Figure 14-1 illustrates the linear relationship between gestational age and birth weight. As shown in the scatterplot, there is a strong positive relationship between the two variables. Advanced gestational ages predict higher birth weights.

WRITE THIS ESSAY FOR ME

Tell us about your assignment and we will find the best writer for your paper.

Get Help Now!

A regression equation can be generated with a data set containing subjects’ x and y values. Once this equation is generated, it can be used to predict future subjects’ y values, given only their x values. In simple or bivariate regression, predictions are made in cases with two variables. The score on variable y (dependent variable, or outcome) is predicted from the same subject’s known score on variable x (independent variable, or predictor).
Research Designs Appropriate for Simple Linear Regression

Research designs that may utilize simple linear regression include any associational design (Gliner et al., 2009). The variables involved in the design are attributional, meaning the variables are characteristics of the participant, such as health status, blood pressure, gender, diagnosis, or ethnicity. Regardless of the nature of variables, the dependent variable submitted to simple linear regression must be measured as continuous, at the interval or ratio level.
Statistical Formula and Assumptions

Use of simple linear regression involves the following assumptions (Zar, 2010):

1. Normal distribution of the dependent (y) variable

2. Linear relationship between x and y

3. Independent observations

4. No (or little) multicollinearity

5. Homoscedasticity
320

Data that are homoscedastic are evenly dispersed both above and below the regression line, which indicates a linear relationship on a scatterplot. Homoscedasticity reflects equal variance of both variables. In other words, for every value of x, the distribution of y values should have equal variability. If the data for the predictor and dependent variable are not homoscedastic, inferences made during significance testing could be invalid (Cohen & Cohen, 1983; Zar, 2010). Visual examples of homoscedasticity and heteroscedasticity are presented in Exercise 30.

In simple linear regression, the dependent variable is continuous, and the predictor can be any scale of measurement; however, if the predictor is nominal, it must be correctly coded. Once the data are ready, the parameters a and b are computed to obtain a regression equation. To understand the mathematical process, recall the algebraic equation for a straight line:

y=bx+a
image

where

y=the dependent variable(outcome)
image

x=the independent variable(predictor)
image

b=the slope of the line
image

a=y-intercept(the point where the regression line intersects the y-axis)
image

No single regression line can be used to predict with complete accuracy every y value from every x value. In fact, you could draw an infinite number of lines through the scattered paired values (Zar, 2010). However, the purpose of the regression equation is to develop the line to allow the highest degree of prediction possible—the line of best fit. The procedure for developing the line of best fit is the method of least squares. The formulas for the beta (β) and slope (α) of the regression equation are computed as follows. Note that once the β is calculated, that value is inserted into the formula for α.

β=n∑xy−∑x∑yn∑x 2 −(∑x) 2
image

α=∑y−b∑xn
image
Hand Calculations

This example uses data collected from a study of students enrolled in a registered nurse to bachelor of science in nursing (RN to BSN) program (Mancini, Ashwill, & Cipher, 2014). The predictor in this example is number of academic degrees obtained by the student prior to enrollment, and the dependent variable was number of months it took for the student to complete the RN to BSN program. The null hypothesis is “Number of degrees does not predict the number of months until completion of an RN to BSN program.”

The data are presented in Table 29-1. A simulated subset of 20 students was selected for this example so that the computations would be small and manageable. In actuality, studies involving linear regression need to be adequately powered (Aberson, 2010; Cohen, 1988). Observe that the data in Table 29-1 are arranged in columns that correspond to 321the elements of the formula. The summed values in the last row of Table 29-1 are inserted into the appropriate place in the formula for b.

TABLE 29-1

ENROLLMENT GPA AND MONTHS TO COMPLETION IN AN RN TO BSN

Introducing our Online Essay Writing Services Agency, where you can confidently place orders for a wide range of academic assignments. Our reputable homework writing company specializes in crafting essays, term papers, research papers, capstone projects, movie reviews, presentations, annotated bibliographies, reaction papers, research proposals, discussions, and various other assignments. Rest assured, our content is guaranteed to be 100% original, as every piece is meticulously written from scratch. Say goodbye to concerns about plagiarism and trust us to deliver authentic and high-quality work.

WRITE THIS ESSAY FOR ME

WRITE MY ESSAY NOW

LATEST ASSIGNMENTS

Please complete the problem set listed below from the textbook.

1. Analysis of stockholders’ equity

CYC 822 Professional Practice and Identity