计算R中数据帧中每对分类单元之间的差异

From a contingency matrix, we can compute the dissimilarity between each pair of rows and then convert the output as a data.frame.

例如,使用Bray-Curtis距离,我们可以得到:

# Generate matrix -------------------------------------------------------------
set.seed(1)
ex <- matrix(data = round(runif(100000), 1), nrow = 1000, ncol = 100)
rownames(ex) <- paste0("row", 1:nrow(ex))
colnames(ex) <- paste0("col", 1:ncol(ex))
ex[1:5, 1:5]
     col1 col2 col3 col4 col5
row1  0.3  0.5  0.9  0.8  0.2
row2  0.4  0.7  1.0  0.5  0.5
row3  0.6  0.4  0.9  0.2  0.0
row4  0.9  1.0  0.4  0.4  0.5
row5  0.2  0.1  0.2  0.8  0.9

# Dissimilarity ---------------------------------------------------------------
# Example of Bray-Curtis
library(ecodist)
bray <- bcdist(ex, rmzero = FALSE)
bray <- as.matrix(bray)
bray[upper.tri(bray)] <- NA
diag(bray) <- NA

# Convert distance matrix into data.frame
bray <- reshape2::melt(bray, varnames = c("id1", "id2"))
# Remove NAs
bray <- bray[complete.cases(bray), ]

head(bray)
   id1  id2     value
2 row2 row1 0.2767599
3 row3 row1 0.3541247
4 row4 row1 0.3588235
5 row5 row1 0.3935618
6 row6 row1 0.2948328
7 row7 row1 0.4045643

Now, I am interested to know if it's possible to get the same output bray (i.e. a data frame having 3 columns) from a long format data frame as an input. For example, if we convert the example matrix provided above as:

# From a data.frame -----------------------------------------------------------
ex_df <- reshape2::melt(ex)
colnames(ex_df) <- c("row", "col", "value")

is it possible to get the same bray output containing the Bray-Curtis dissimilarity between each pair of rows? I bet efficient dplyr or data.table solutions exist.

评论
  • 忧郁逗比
    忧郁逗比 回复

    这会实现您的追求吗?基本上,它只是将长格式的数据重新排列成矩阵状的数据帧并从中计算BC。我想象您的实际数据集采用长格式。

    library(tidyverse)
    
    BC_dist <- ex_df %>% 
      spread(2,3) %>% 
      column_to_rownames("row") %>% 
      bcdist(rmzero = FALSE)