列表与通用元素的并集

假设我有以下嵌套列表:

L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump'],
     ['Sam','Suri','NewYork'],['Suri','Orlando','Canada']]

如何通过将具有共同元素的子列表与至少一个子列表组合在一起来对这些子列表进行分组?因此,对于前面的示例,结果应为:

[['John','Sayyed','Simon'] ,['bush','trump'],
 ['Sam','Suri','NewYork','Orlando','Canada']]

Thus the first two sublists are joined as they share 'John'. Could someone please share their valuable thoughts ?

评论
  • 布水风
    布水风 回复

    合并2个列表:

    merge = lambda l1, l2: l1 + [ x for x in l2 if x not in l1 ]
    

    To be more efficient, create a set on l1;

  • kqui
    kqui 回复

    如果顺序很重要且列表很大,则可以使用以下两种方法:

     l = [['john', 'sayyid'], ['john', 'simon'], ['b', 't']]
    
     def join(l1, l2):
         mset = set(l1)
         result = l1[:] # deep copy
         for each in l2:
             if each in mset:
                 continue
             else:
                 result.append(each)
         return result
    

    要合并到主列表中,您可以按其排名调用列表并弹出原始列表:

    l1 = l.pop(0)
    l2 = l.pop(0)
    l.insert(0, join(l1, l2))
    >>> l:
    [['john', 'sayyid', 'simon'], ['b', 't']]
    
  • 老婆吃米线
    老婆吃米线 回复

    一个简单的方法

    L = [['John','Sayyed'], [ 'John' , 'Simon'] ,['bush','trump']]
    L[0].extend([x for x in L[1] if x not in L[0]])
    L.pop(1)
    print(L) 
    

    看到

    List Comprehensions

    Append vs Extend

  • 嘘!安静
    嘘!安静 回复

    You can use the function connected_components in networkx:

    import networkx as nx 
    ​
    L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump'],
         ['Sam','Suri','NewYork'],['Suri','Orlando','Canada']]
    ​
    G = nx.Graph()
    ​
    for i in L:
        G.add_path(i)
    ​
    lst = list(nx.connected_components(G))
    print(lst)
    

    输出:

    [{'John', 'Sayyed', 'Simon'},
     {'bush', 'trump'},
     {'Canada', 'NewYork', 'Orlando', 'Sam', 'Suri'}]
    
  • ut_quo
    ut_quo 回复

    nx.connected_components

    You can use networkx for that. Generate a graph, and add your list as the graph edges using add_edges_from. Then use connected_components, which will precisely give you a list of sets of the connected components in the graph:

    import networkx as nx 
    
    L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump']
    
    G=nx.Graph()
    G.add_edges_from(L)
    list(nx.connected_components(G))
    
    [{'John', 'Sayyed', 'Simon'}, {'bush', 'trump'}]
    

    包含多个项目的子列表

    In the case of having sublists with more than 2 elements, you can get all the length 2 combinations from each sublist and use these as the network edges:

    from itertools import combinations, chain
    
    L = [['John','Sayyed'], [ 'John' , 'Simon'] ,['bush','trump'],
         ['Sam','Suri','NewYork'],['Suri','Orlando','Canada']]
    
    L2_nested = [list(combinations(l,2)) for l in L]
    L2 = list(chain.from_iterable(L2_nested))
    #[('John', 'Sayyed'), ('John', 'Simon'), ('bush', 'trump'), ('Sam', 'Suri')...
    
    G=nx.Graph()
    G.add_edges_from(L2)
    list(nx.connected_components(G))
    
    [{'John', 'Sayyed', 'Simon'},
    {'bush', 'trump'},
    {'Canada', 'NewYork', 'Orlando', 'Sam', 'Suri'}]
    

    We can also vivisualize these connected components with nx.draw:

    pos = nx.spring_layout(G, scale=20)
    nx.draw(G, pos, node_color='lightblue', node_size=500, with_labels=True)
    


                       enter image description here

    细节

    More detailed explanation on connected components:

    在图论中,无向图的连接成分(或仅是成分)是一个子图,其中任意两个顶点通过路径相互连接,并且不与超图中的其他顶点连接

    So essentially, this code creates a graph, with edges from the list, where each edge is composed by two values u,v where u and v will be nodes connected by this edge.

    因此,子列表与至少一个具有公共元素的子列表的并集可以转换为图论问题,因为通过现有路径彼此之间可到达的所有节点。