# 将自定义函数应用于大列表需要很长时间

I have a list of words of length 48,000 and I am trying to group possible 4 words (if present else lesser) that are closest to each other. I am taking help from the `difflib` module for this.

I had 2 ways to do this in my mind. Get 4 closest matches using `difflib.get_close_matches()` or make a Cartesian product of the words list and get the scores from each tuple from the product list.

``````import random , string , itertools , difflib
from functools import partial
N = 10
random.seed(123)
words = [''.join(random.choice(string.ascii_lowercase) for i in range(5)) for j in range(10)]
``````

1：创建了一个函数，该函数在创建笛卡尔乘积后将返回分数。发布后，我可以将第一个元素分组，并根据需要选择顶部n。

``````def fun(x) : return difflib.SequenceMatcher(None,*x).ratio()
products = list(itertools.product(words,words))
scores = list(map(fun,products))
``````

2：直接给出最佳n（4）个匹配项的函数

``````f = partial(difflib.get_close_matches , possibilities = words , n=4 , cutoff = 0.4)
matches = list(map(f,words)) #this gives 4 possible matches if present
``````

Save the first function (`fun`) in attempt 1 in a py file and then import it

``````import multiprocessing
pool = multiprocessing.Pool(8)
import fun
if__name__ == '__main__':
score_mlt_pr = pool.map(fun.fun, products ) #products is the cartesian product same as attempt 1
scores_mlt = list(score_mlt_pr)
``````

Using the same `f` as attempt 2 earlier but with pool:

``````close_matches = list(pool.map(f,words))
``````