I have been trying to run a two side t-test in R but keep running into error. Below is my process flow, dataset details and script from R-studio. I used a dataset called LungCapacity that I downloaded from this website: https://www.statslectures.com/r-scripts-datasets.
#Imported data set into RStudio. # Ran a summary report to see the data and class. summary(LungCapData) # Here I could see that the smoke column is a character, so I converted it to a factor LungCapacityData$Smoke <- factor(LungCapacityData$Smoke) # On checking the summary. I see its converted to a factor with a yes and no. # I want to run a t-test between lung capacity and smoking. t.test(LungCapData$LungCap, LungCapData$Smoke, alternative = c("two.sided"), mu=0, var.equal = FALSE, conf.level = 0.95, paired = FALSE)
Error in var(y) : Calling var(x) on a factor x is defunct. Use something like 'all(duplicated(x)[-1L])' to test for a constant vector. In addition: Warning message: In mean.default(y) : argument is not numeric or logical: returning NA
You're very close, you just need to call
t.testwith a formula:
With your current approach, you're trying to compare
LungCapData$LungCapwhich is a numeric vector:
LungCapData$Smoke, which is a vector of factors:
Instead, you want to instruct
LungCapData$LungCapwhen grouping by
LungCapData$Smoke. That is achieved with a formula.
LungCap ~ Smokesays that
LungCapshould depend on
Smoke. When you use a formula, you also need to supply
When you try to convert
LungCapData$Smoketo numeric, you get the wrong result because you're just getting the factor level indices which have no biological significance.
The other way is to subset
LungCapData$LungCapyourself, but that's a lot more typing:
As specified in the OP,
t.test()attempts to compare two vectors, expecting them to be numeric.
Instead, use the formula version of