Programming Tutorials

CNC | C | C++ | Assembly | Python | R | Rust | Arduino | Solidworks | Embedded Systems

Loading...

R - Factors

Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values. Like "Male, "Female" and True, False etc. They are useful in data analysis for statistical modeling.
Factors are created using the factor () function by taking a vector as input.

Example

# Create a vector as input.
data <- c("East","West","East","North","North","East","West","West","West","East","North")

print(data)
print(is.factor(data))

# Apply the factor function.
factor_data <- factor(data)

print(factor_data)
print(is.factor(factor_data))
When we execute the above code, it produces the following result −
 [1] "East"  "West"  "East"  "North" "North" "East"  "West"  "West"  "West"  "East" "North"
[1] FALSE
 [1] East  West  East  North North East  West  West  West  East  North
Levels: East North West
[1] TRUE

Factors in Data Frame

On creating any data frame with a column of text data, R treats the text column as categorical data and creates factors on it.
# Create the vectors for data frame.
height <- c(132,151,162,139,166,147,122)
weight <- c(48,49,66,53,67,52,40)
gender <- c("male","male","female","female","male","female","male")

# Create the data frame.
input_data <- data.frame(height,weight,gender)
print(input_data)

# Test if the gender column is a factor.
print(is.factor(input_data$gender))

# Print the gender column so see the levels.
print(input_data$gender)
When we execute the above code, it produces the following result −
  height weight gender
1    132     48   male
2    151     49   male
3    162     66 female
4    139     53 female
5    166     67   male
6    147     52 female
7    122     40   male
[1] TRUE
[1] male   male   female female male   female male  
Levels: female male

Changing the Order of Levels

The order of the levels in a factor can be changed by applying the factor function again with new order of the levels.
data <- c("East","West","East","North","North","East","West","West","West","East","North")
# Create the factors
factor_data <- factor(data)
print(factor_data)

# Apply the factor function with required order of the level.
new_order_data <- factor(factor_data,levels = c("East","West","North"))
print(new_order_data)
When we execute the above code, it produces the following result −
 [1] East  West  East  North North East  West  West  West  East  North
Levels: East North West
 [1] East  West  East  North North East  West  West  West  East  North
Levels: East West North

Generating Factor Levels

We can generate factor levels by using the gl() function. It takes two integers as input which indicates how many levels and how many times each level.

Syntax

gl(n, k, labels)
Following is the description of the parameters used −
  • n is a integer giving the number of levels.
  • k is a integer giving the number of replications.
  • labels is a vector of labels for the resulting factor levels.

Example

v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
print(v)
When we execute the above code, it produces the following result −
Tampa   Tampa   Tampa   Tampa   Seattle Seattle Seattle Seattle Boston 
[10] Boston  Boston  Boston 
Levels: Tampa Seattle Boston


Table of contents: 
1. R - Overview
2. R - Environment Setup
3. R - Basic Syntax
4. R - Data Types
5. R - Variables
6. R - Operators
7. R - Decision Making
8. R - Loops
9. R - Functions
10. R - Strings
11. R - Vectors
12. R - Matrices
13. R - Arrays
14. R - Factors
15. R - Data Frames
16. R - Packages
17. R - Data Reshaping
18. R - CSV Files
19. R - Excel Files
20. R - Binary Files
21. R - XML Files
22. R - JSON Files
23. R - Web Data
24. R - Database
25. R - Pie Charts
26. R - Bar Charts
27. R - Boxplots
28. R - Histograms
29. R - Line Graphs
30. R - Scatterplots
31. R - Mean, Median and Mode
32. R - Linear Regression
33. R - Multiple Regression
34. R - Logistic Regression
35. R - Normal Distribution
36. R - Binomial Distribution
37. R - Poisson Regression
38. R - Analysis of Covariance
39. R - Time Series Analysis
40. R - Nonlinear Least Square
41. R - Decision Tree
42. R - Random Forest
43. R - Survival Analysis
44. R - Chi Square Tests

logoblog

No comments:

Post a Comment

Loading...