CNC | C | C++ | Assembly | Python | R | Rust | Arduino | Solidworks | Embedded Systems

# R - Analysis of Covariance

We use Regression analysis to create models which describe the effect of variation in predictor variables on the response variable. Sometimes, if we have a categorical variable with values like Yes/No or Male/Female etc. The simple regression analysis gives multiple results for each value of the categorical variable. In such scenario, we can study the effect of the categorical variable by using it along with the predictor variable and comparing the regression lines for each level of the categorical variable. Such an analysis is termed as Analysis of Covariance also called as ANCOVA.

## Example

Consider the R built in data set mtcars. In it we observer that the field "am" represents the type of transmission (auto or manual). It is a categorical variable with values 0 and 1. The miles per gallon value(mpg) of a car can also depend on it besides the value of horse power("hp").
We study the effect of the value of "am" on the regression between "mpg" and "hp". It is done by using the aov() function followed by the anova() function to compare the multiple regressions.

## Input Data

Create a data frame containing the fields "mpg", "hp" and "am" from the data set mtcars. Here we take "mpg" as the response variable, "hp" as the predictor variable and "am" as the categorical variable.
input <- mtcars[,c("am","mpg","hp")]
When we execute the above code, it produces the following result −
am   mpg   hp
Mazda RX4          1    21.0  110
Mazda RX4 Wag      1    21.0  110
Datsun 710         1    22.8   93
Hornet 4 Drive     0    21.4  110
Valiant            0    18.1  105

## ANCOVA Analysis

We create a regression model taking "hp" as the predictor variable and "mpg" as the response variable taking into account the interaction between "am" and "hp".

### Model with interaction between categorical variable and predictor variable

# Get the dataset.
input <- mtcars

# Create the regression model.
result <- aov(mpg~hp*am,data = input)
print(summary(result))
When we execute the above code, it produces the following result −
Df Sum Sq Mean Sq F value   Pr(>F)
hp           1  678.4   678.4  77.391 1.50e-09 ***
am           1  202.2   202.2  23.072 4.75e-05 ***
hp:am        1    0.0     0.0   0.001    0.981
Residuals   28  245.4     8.8
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This result shows that both horse power and transmission type has significant effect on miles per gallon as the p value in both cases is less than 0.05. But the interaction between these two variables is not significant as the p-value is more than 0.05.

### Model without interaction between categorical variable and predictor variable

# Get the dataset.
input <- mtcars

# Create the regression model.
result <- aov(mpg~hp+am,data = input)
print(summary(result))
When we execute the above code, it produces the following result −
Df  Sum Sq  Mean Sq   F value   Pr(>F)
hp           1  678.4   678.4   80.15 7.63e-10 ***
am           1  202.2   202.2   23.89 3.46e-05 ***
Residuals   29  245.4     8.5
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This result shows that both horse power and transmission type has significant effect on miles per gallon as the p value in both cases is less than 0.05.

## Comparing Two Models

Now we can compare the two models to conclude if the interaction of the variables is truly in-significant. For this we use the anova() function.
# Get the dataset.
input <- mtcars

# Create the regression models.
result1 <- aov(mpg~hp*am,data = input)
result2 <- aov(mpg~hp+am,data = input)

# Compare the two models.
print(anova(result1,result2))
When we execute the above code, it produces the following result −
Model 1: mpg ~ hp * am
Model 2: mpg ~ hp + am
Res.Df    RSS Df  Sum of Sq     F Pr(>F)
1     28 245.43
2     29 245.44 -1 -0.0052515 6e-04 0.9806
As the p-value is greater than 0.05 we conclude that the interaction between horse power and transmission type is not significant. So the mileage per gallon will depend in a similar manner on the horse power of the car in both auto and manual transmission mode.

1. R - Overview
2. R - Environment Setup
3. R - Basic Syntax
4. R - Data Types
5. R - Variables
6. R - Operators
7. R - Decision Making
8. R - Loops
9. R - Functions
10. R - Strings
11. R - Vectors
12. R - Matrices
13. R - Arrays
14. R - Factors
15. R - Data Frames
16. R - Packages
17. R - Data Reshaping
18. R - CSV Files
19. R - Excel Files
20. R - Binary Files
21. R - XML Files
22. R - JSON Files
23. R - Web Data
24. R - Database
25. R - Pie Charts
26. R - Bar Charts
27. R - Boxplots
28. R - Histograms
29. R - Line Graphs
30. R - Scatterplots
31. R - Mean, Median and Mode
32. R - Linear Regression
33. R - Multiple Regression
34. R - Logistic Regression
35. R - Normal Distribution
36. R - Binomial Distribution
37. R - Poisson Regression
38. R - Analysis of Covariance
39. R - Time Series Analysis
40. R - Nonlinear Least Square
41. R - Decision Tree
42. R - Random Forest
43. R - Survival Analysis
44. R - Chi Square Tests