Programming Tutorials

CNC | C | C++ | Assembly | Python | R | Rust | Arduino | Solidworks | Embedded Systems

Loading...

R - Random Forest

In the random forest approach, a large number of decision trees are created. Every observation is fed into every decision tree. The most common outcome for each observation is used as the final output. A new observation is fed into all the trees and taking a majority vote for each classification model.
An error estimate is made for the cases which were not used while building the tree. That is called an OOB (Out-of-bag) error estimate which is mentioned as a percentage.
The R package "randomForest" is used to create random forests.

Install R Package

Use the below command in R console to install the package. You also have to install the dependent packages if any.
install.packages("randomForest)
The package "randomForest" has the function randomForest() which is used to create and analyze random forests.

Syntax

The basic syntax for creating a random forest in R is −
randomForest(formula, data)
Following is the description of the parameters used −
  • formula is a formula describing the predictor and response variables.
  • data is the name of the data set used.

Input Data

We will use the R in-built data set named readingSkills to create a decision tree. It describes the score of someone's readingSkills if we know the variables "age","shoesize","score" and whether the person is a native speaker.
Here is the sample data.
# Load the party package. It will automatically load other required packages.
library(party)

# Print some records from data set readingSkills.
print(head(readingSkills))
When we execute the above code, it produces the following result and chart −
  nativeSpeaker   age   shoeSize      score
1           yes     5   24.83189   32.29385
2           yes     6   25.95238   36.63105
3            no    11   30.42170   49.60593
4           yes     7   28.66450   40.28456
5           yes    11   31.88207   55.46085
6           yes    10   30.07843   52.83124
Loading required package: methods
Loading required package: grid
...............................
...............................

Example

We will use the randomForest() function to create the decision tree and see it's graph.
# Load the party package. It will automatically load other required packages.
library(party)
library(randomForest)

# Create the forest.
output.forest <- randomForest(nativeSpeaker ~ age + shoeSize + score, 
           data = readingSkills)

# View the forest results.
print(output.forest) 

# Importance of each predictor.
print(importance(fit,type = 2)) 
When we execute the above code, it produces the following result −
Call:
 randomForest(formula = nativeSpeaker ~ age + shoeSize + score,     
                 data = readingSkills)
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 1

        OOB estimate of  error rate: 1%
Confusion matrix:
    no yes class.error
no  99   1        0.01
yes  1  99        0.01
         MeanDecreaseGini
age              13.95406
shoeSize         18.91006
score            56.73051

Conclusion

From the random forest shown above we can conclude that the shoesize and score are the important factors deciding if someone is a native speaker or not. Also the model has only 1% error which means we can predict with 99% accuracy.

Table of contents: 
1. R - Overview
2. R - Environment Setup
3. R - Basic Syntax
4. R - Data Types
5. R - Variables
6. R - Operators
7. R - Decision Making
8. R - Loops
9. R - Functions
10. R - Strings
11. R - Vectors
12. R - Matrices
13. R - Arrays
14. R - Factors
15. R - Data Frames
16. R - Packages
17. R - Data Reshaping
18. R - CSV Files
19. R - Excel Files
20. R - Binary Files
21. R - XML Files
22. R - JSON Files
23. R - Web Data
24. R - Database
25. R - Pie Charts
26. R - Bar Charts
27. R - Boxplots
28. R - Histograms
29. R - Line Graphs
30. R - Scatterplots
31. R - Mean, Median and Mode
32. R - Linear Regression
33. R - Multiple Regression
34. R - Logistic Regression
35. R - Normal Distribution
36. R - Binomial Distribution
37. R - Poisson Regression
38. R - Analysis of Covariance
39. R - Time Series Analysis
40. R - Nonlinear Least Square
41. R - Decision Tree
42. R - Random Forest
43. R - Survival Analysis
44. R - Chi Square Tests

logoblog

No comments:

Post a Comment

Loading...