I have been trying to run a two side t-test in R but keep running into error. Below is my process flow, dataset details and script from R-studio. I used a dataset called LungCapacity that I downloaded from this website: https://www.statslectures.com/r-scripts-datasets.
#Imported data set into RStudio.
# Ran a summary report to see the data and class.
summary(LungCapData)
# Here I could see that the smoke column is a character, so I converted it to a factor
LungCapacityData$Smoke <- factor(LungCapacityData$Smoke)
# On checking the summary. I see its converted to a factor with a yes and no.
# I want to run a t-test between lung capacity and smoking.
t.test(LungCapData$LungCap, LungCapData$Smoke, alternative = c("two.sided"), mu=0, var.equal = FALSE, conf.level = 0.95, paired = FALSE)
现在,在运行它时,我得到以下错误。
Error in var(y) : Calling var(x) on a factor x is defunct.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
In addition: Warning message:
In mean.default(y) : argument is not numeric or logical: returning NA
我试图将Smoke变量从Yes和No转换为1和0。数据正在运行,但不正确。 我究竟做错了什么?
You're very close, you just need to call
t.test
with a formula:With your current approach, you're trying to compare
LungCapData$LungCap
which is a numeric vector:With
LungCapData$Smoke
, which is a vector of factors:Instead, you want to instruct
t.test
to compareLungCapData$LungCap
when grouping byLungCapData$Smoke
. That is achieved with a formula.The formula
LungCap ~ Smoke
says thatLungCap
should depend onSmoke
. When you use a formula, you also need to supplydata =
.When you try to convert
LungCapData$Smoke
to numeric, you get the wrong result because you're just getting the factor level indices which have no biological significance.您基本上是在问我们分配的因子水平的平均值是否与肺活量的平均值不同。
The other way is to subset
LungCapData$LungCap
yourself, but that's a lot more typing:As specified in the OP,
t.test()
attempts to compare two vectors, expecting them to be numeric.Instead, use the formula version of
t.test()
....以及输出: