根据n元语法选择数据Brame中的ID /行 - 码农俱乐部 - Golang中国

我有以下数据集：

ID       Text
12     Coolest fan we’ve ever seen.
12     SHARE this with anyone you know who can use this tip!
31     Time for a Royal Celebration! Save the date.
54     The way to a sports fan’s heart? Behind-the-scenes content from their favourite teams.
419    Start asking your questions now for tomorrow’s LIVE Q&A on careers you can do without going to university.
451    Save the date, we’re hosting a fabulous & fun meetup at Coffee Bar Bryant on 9/20. Stay tuned

我已经使用ngram来分析文本和单词/句子的频率。

from nltk import ngrams

text=df.Text.tolist()

list_n=[]


for i in text:
    n_grams = ngrams(i.split(), 3)

    for grams in n_grams:
        list_n.append(grams)

list_n

Since I am interested in finding in which text a particular word/words sequence was used, I would need to create an association between text (i.e. ID) and text with particular ngrams. For example: I am interested in finding texts which contains "Save the date", i.e. ID=31 and ID=451. To find the n-grams for one single word, I have been using this:

def ngram_filter(col, word, n):
    tokens = col.split()
    all_ngrams = ngrams(tokens, n)
    filtered_ngrams = [x for x in all_ngrams if word in x]
    return filtered_ngrams

However, I do not know how to find the ID associated to the text and how to select more words in the function above.

我该怎么办？任何的想法？

如果需要，请随时更改标签。谢谢