R编程:遍历值以创建具有不同k值的kmeans()数据集群

我有以下代码:

for (i in 1:5) {

  print(i)

  iris_cluster[i]<- kmeans(iris_data[1:4], i, nstart = 10)
}

kmeans() is this: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/kmeans

但是运行它时出现以下错误:

Error in `[<-.data.frame`(`*tmp*`, i, value = list(cluster = c(`1` = 1L, : replacement element 2 is a matrix/data frame of 1 row, need 150

我正在使用r随附的著名的Iris数据集。

我正在寻找创建五个数据框:

iris_cluster1
iris_cluster2
iris_cluster3
iris_cluster4
iris_cluster5
评论
  • 花亦殘
    花亦殘 回复

    If the dataset is 'iris', we create a list with lapply

    lst1 <- lapply(1:5, function(i) kmeans(iris[1:4], i, nstart = 10))
    names(lst1) <- paste0("iris_cluster", 1:5)
    

    and use list2env if we need separate objects in the global env (not recommended)

    list2env(lst1, .GlobalEnv)
    iris_cluster1
    #K-means clustering with 1 clusters of sizes 150
    
    #Cluster means:
    #  Sepal.Length Sepal.Width Petal.Length Petal.Width
    #1     5.843333    3.057333        3.758    1.199333
    
    #Clustering vector:
    #  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     #[73] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    #[145] 1 1 1 1 1 1
    
    #Within cluster sum of squares by cluster:
    #[1] 681.3706
    # (between_SS / total_SS =   0.0 %)
    
    #Available components:
    
    #[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"      
    

    If we check the structure of the output of one of the elements, it is a named list of either vector or matrix. The list elements can be extracted with $ or [[

    str(iris_cluster1)
    #List of 9
    # $ cluster     : int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
    # $ centers     : num [1, 1:4] 5.84 3.06 3.76 1.2
    #  ..- attr(*, "dimnames")=List of 2
    #  .. ..$ : chr "1"
    #  .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
    # $ totss       : num 681
    # $ withinss    : num 681
    # $ tot.withinss: num 681
    # $ betweenss   : num 6.82e-13
    # $ size        : int 150
    # $ iter        : int 1
    # $ ifault      : NULL
    # - attr(*, "class")= chr "kmeans"
    

    从一个元素中,“ withinss”可以提取为

    iris_cluster1$withinss
    #[1] 681.3706
    

    From the list, we can loop over the list with lapply/sapply. As the length is different, either unlist or stack it to two-column data.frame to return the cluster name as well. From here, we can extract the 'values' with either $ or [[

    stack(lapply(lst1, `[[`, 'withinss'))[2:1]
    #          ind     values
    #1  iris_cluster1 681.370600
    #2  iris_cluster2  28.552075
    #3  iris_cluster2 123.795876
    #4  iris_cluster3  23.879474
    #5  iris_cluster3  15.151000
    #6  iris_cluster3  39.820968
    #7  iris_cluster4  18.703437
    #8  iris_cluster4  15.151000
    #9  iris_cluster4   9.749286
    #10 iris_cluster4  13.624750
    #11 iris_cluster5  15.151000
    #12 iris_cluster5   9.228889
    #13 iris_cluster5   4.655000
    #14 iris_cluster5   5.462500
    #15 iris_cluster5  11.963784