How does car mileage vary for various car models?

Variation in gasoline mileage among makes and models of automobiles is influenced substantially be the weight and horsepower of the vehicle. The date you will analyze is provided by the U.S. Environmental Protection Agency. The variables are:

VOL: Cubic feet of cab space

HP: Engine horsepower

MPG: Average miles per gallon

SP: Top speed (mph)

WT: Vehicle weight (100 lb)

date = read.table(“car_milage.txt”,header=true)

In the analysis below, we will investigate the association of the dependent variables to Average miles per gallon (response variable) using multiple linear regressionwith a focus on variable selection.

Question 1: Exploratory DataAnalysis

– Using a scatterplot describe the relationship between milage and the four independent variables. Describe the general trend )direction and form). Based on this analysis would you suggest that there is a linear relationship between milage and the four independent variables. If not, what transformation for the response variable would you suggest?

Question 2: Fitting the Linear Regression Model

– Fit a linear regression to evaluate the relationship between milage and the four independent variables. If you suggested a transformation in question 1 then use that transformation. Also include a second order term for the weight predictor and transform the horsepower using the logarithmic transformation. Why did I suggest these two model revisions? Write down the equation for the regression line and interpret the estimated value of the parameters in the context of the problem (include its standard errorin your interpretation).

Question 3: Variable Selection

Are all predictors statistically significantly associated with the response variable? Using three different criteria, Use Mallow CP and BIC, select a best sub-model. To search through the models try (i) all possible models, (ii) forward stepwise, (iii) backward stepwise. Summarize your findings. Compare the results with variable selection using lasso.

Question 4: Checking the Assumptions of the Model

Plot the relevant residual plots to check the model assumptions for a model selected in the previous question. Enumerate the assumptions and describe what graphical techniques you used. Interpret the displays with respect to the assumptions of the linear regression model. In other words, comment on whether there are any apparent departures from the assumptions of linear regression model. Are there any extreme outliers in the data/residuals?

Question 1:

scatterplotMatrix(~MPG+HP+SP+VOL+WT, reg.line=lm, smooth=FALSE, spread=FALSE, span=0.5, diagonal = ‘none’, data=Data)

The scatter plot matrix suggests that there is nonlinear relationship between MPG and HP and MPG and SP. Thus we may include square terms to improve the model adequacy.

Question 2:

New variable are created using the transformation

Data$Ln_HP <- with(Data, log(HP))

Data$HP2 <- with(Data, HP^2)

Data$SP2 <- with(Data, SP^2)

Data$WT2 <- with(Data, WT^2)

Regression model with HP , SP , VOL WT as explanatory variables

Call:

lm(formula = MPG ~ HP + SP + VOL + WT, data = Data)

Residuals:

Min 1Q Median 3Q Max

-9.0108 -2.7731 0.2733 1.8362 11.9854

Coefficients:

Estimate Std.

Error t value Pr(>|t|)

(Intercept) 192.43775 23.53161 8.178 4.62e-12 ***

HP 0.39221 0.08141 4.818 7.13e-06 ***

SP -1.29482 0.24477 -5.290 1.11e-06 ***

VOL -0.01565 0.02283 -0.685 0.495

WT -1.85980 0.21336 -8.717 4.22e-13 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

Residual standard error: 3.653 on 77 degrees of freedom

Multiple R-squared: 0.8733, Adjusted R-squared: 0.8667

F-statistic: 132.7 on 4 and 77 DF, p-value: < 2.2e-16

The estimated regression model is

MPG =192.4377+0.3922*HP-1.2948*SP-0.0156*VOL-1.8598*WT

This model is able to explain 87.33% variability in the MPG. The t test for the significance of the regression coefficient s suggests that all variables except VOL are highly significant. The standard error of estimated for this model is 3.653

Question 3:

RegModel<- lm(MPG~HP+Ln_HP+HP2++SP+SP2+VOL+WT+WT2, data=Data)

summary(RegModel)

stepwise(RegModel, direction=’backward’,criterion=’BIC’)

Direction: backward

Criterion: BIC

Start: AIC=216.8

MPG ~ HP + Ln_HP + HP2 + +SP + SP2 + VOL + WT + WT2

Df Sum of Sq RSS AIC

– SP 1 8.206 719.41 213.33

– WT 1 14.549 725.75 214.05

– SP2 1 14.872 726.07 214.09

– VOL 1 15.795 726.99 214.19

– WT2 1 16.628 727.83 214.29

<none> 711.20 216.80

– Ln_HP 1 44.914 756.11 217.41

– HP 1 72.260 783.46 220.33

– HP2 1 91.945 803.14 222.36

Step: AIC=213.33

MPG ~ HP + Ln_HP + HP2 + SP2 + VOL + WT + WT2

Df Sum of Sq RSS AIC

– WT2 1 9.78 729.18 210.03

– WT 1 12.21 731.61 210.31

– VOL 1 15.75 735.16 210.70

– SP2 1 20.58 739.98 211.24

<none> 719.41 213.33

– HP 1 89.46 808.87 218.54

– HP2 1 165.97 885.38 225.95

– Ln_HP 1 318.17 1037.57 …

The expert examines multiple regression analysis for the average miles per gallon.