# 由于字符变量，在R中运行T-Test时出现错误消息

I have been trying to run a two side t-test in R but keep running into error. Below is my process flow, dataset details and script from R-studio. I used a dataset called LungCapacity that I downloaded from this website: https://www.statslectures.com/r-scripts-datasets.

``````#Imported data set into RStudio.

# Ran a summary report to see the data and class.
summary(LungCapData)

# Here I could see that the smoke column is a character, so I converted it to a factor
LungCapacityData\$Smoke <- factor(LungCapacityData\$Smoke)

# On checking the summary. I see its converted to a factor with a yes and no.

# I want to run a t-test between lung capacity and smoking.
t.test(LungCapData\$LungCap, LungCapData\$Smoke, alternative = c("two.sided"), mu=0, var.equal = FALSE, conf.level = 0.95, paired = FALSE)
``````

``````Error in var(y) : Calling var(x) on a factor x is defunct.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
In mean.default(y) : argument is not numeric or logical: returning NA
``````

Doyle

You're very close, you just need to call `t.test` with a formula:

``````t.test(LungCap ~ Smoke, data = LungCapacityData,
alternative = c("two.sided"), mu=0, var.equal = FALSE,
conf.level = 0.95, paired = FALSE)

#   Welch Two Sample t-test
#
#data:  LungCap by Smoke
#t = -3.6498, df = 117.72, p-value = 0.0003927
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -1.3501778 -0.4003548
#sample estimates:
# mean in group no mean in group yes
#         7.770188          8.645455
``````

With your current approach, you're trying to compare `LungCapData\$LungCap` which is a numeric vector:

``````LungCapData\$LungCap[1:10]
# [1]  6.475 10.125  9.550 11.125  4.800  6.225  4.950  7.325  8.875  6.800
``````

With `LungCapData\$Smoke`, which is a vector of factors:

``````LungCapData\$Smoke[1:10]
# [1] no  yes no  no  no  no  no  no  no  no
``````

Instead, you want to instruct `t.test` to compare `LungCapData\$LungCap` when grouping by `LungCapData\$Smoke`. That is achieved with a formula.

The formula `LungCap ~ Smoke` says that `LungCap` should depend on `Smoke`. When you use a formula, you also need to supply `data =`.

When you try to convert `LungCapData\$Smoke` to numeric, you get the wrong result because you're just getting the factor level indices which have no biological significance.

``````as.numeric(LungCapData\$Smoke)[1:10]
# [1] 1 2 1 1 1 1 1 1 1 1
``````

The other way is to subset `LungCapData\$LungCap` yourself, but that's a lot more typing:

``````t.test(LungCapacityData\$LungCap[LungCapacityData\$Smoke == "yes"],
LungCapacityData\$LungCap[LungCapacityData\$Smoke == "no"],
alternative = c("two.sided"), mu=0, var.equal = FALSE,
conf.level = 0.95, paired = FALSE)
``````
zamet

As specified in the OP, `t.test()` attempts to compare two vectors, expecting them to be numeric.

Instead, use the formula version of `t.test()`.

``````data <- read.table(file = "./data/LungCapData.txt",header = TRUE)
t.test(LungCap ~ Smoke,data = data)
``````

...以及输出：

``````> t.test(LungCap ~ Smoke,data = data)

Welch Two Sample t-test

data:  LungCap by Smoke
t = -3.6498, df = 117.72, p-value = 0.0003927
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.3501778 -0.4003548
sample estimates:
mean in group no mean in group yes
7.770188          8.645455

>
``````