我正在使用R处理RNA测序数据,这是我的新手。我正在使用BioMart提供的参考资料的数据框,当包含GO术语时,它们的排列方式非常错误(如下所示)。
head(goZref)
Gene.stable.ID Transcript.stable.ID Protein.stable.ID
1 ENSDARG00000063344 ENSDART00000131829 ENSDARP00000123357
2 ENSDARG00000063344 ENSDART00000131829 ENSDARP00000123357
3 ENSDARG00000063344 ENSDART00000144883 ENSDARP00000114467
4 ENSDARG00000063344 ENSDART00000144883 ENSDARP00000114467
5 ENSDARG00000097685 ENSDART00000156963 ENSDARP00000128236
6 ENSDARG00000097685 ENSDART00000156963 ENSDARP00000128236
Gene.description Gene.name WikiGene.name
1 family with sequence similarity 162 member A [Source:NCBI gene;Acc:336363] fam162a fam162a
2 family with sequence similarity 162 member A [Source:NCBI gene;Acc:336363] fam162a fam162a
3 family with sequence similarity 162 member A [Source:NCBI gene;Acc:336363] fam162a fam162a
4 family with sequence similarity 162 member A [Source:NCBI gene;Acc:336363] fam162a fam162a
5 si:ch211-235i11.3 [Source:ZFIN;Acc:ZDB-GENE-131125-9] si:ch211-235i11.3 LOC101885363
6 si:ch211-235i11.3 [Source:ZFIN;Acc:ZDB-GENE-131125-9] si:ch211-235i11.3 LOC101885363
GO.term.name
1 membrane
2 integral component of membrane
3 membrane
4 integral component of membrane
5 nucleic acid binding
6 RNA polymerase II regulatory region sequence-specific DNA binding
I want to annotate a data frame of genes of interest (the gene names are in a character vector called genes
here), but I'm struggling to automate it given all the repetition and row duplication in the references. I've tried using match
but because it only finds the first instance of something I miss out on other rows. I would like to, for instance, search for "fam162a" and get something like "membrane, integral component of membrane", and then automate this for a list of 100 gene names. subset
is useful in giving me multiple rows with the same gene name identifier, and I've tried to pass it to ddply
but I don't really know what I'm doing and got stuck here:
test<- ddply(.data = goZref, .variables = genes, for (x in genes) {
+ paste(unique(subset(goZref, WikiGene.name==x, select= Go.term.name)), sep = ",")})
Error in parse(text = x) : <text>:1:12: unexpected symbol
1: si:dkey-224k5.13
^
任何帮助表示感谢!