# 计算R中数据帧中每对分类单元之间的差异

From a contingency `matrix`, we can compute the dissimilarity between each pair of rows and then convert the output as a `data.frame`.

``````# Generate matrix -------------------------------------------------------------
set.seed(1)
ex <- matrix(data = round(runif(100000), 1), nrow = 1000, ncol = 100)
rownames(ex) <- paste0("row", 1:nrow(ex))
colnames(ex) <- paste0("col", 1:ncol(ex))
ex[1:5, 1:5]
col1 col2 col3 col4 col5
row1  0.3  0.5  0.9  0.8  0.2
row2  0.4  0.7  1.0  0.5  0.5
row3  0.6  0.4  0.9  0.2  0.0
row4  0.9  1.0  0.4  0.4  0.5
row5  0.2  0.1  0.2  0.8  0.9

# Dissimilarity ---------------------------------------------------------------
# Example of Bray-Curtis
library(ecodist)
bray <- bcdist(ex, rmzero = FALSE)
bray <- as.matrix(bray)
bray[upper.tri(bray)] <- NA
diag(bray) <- NA

# Convert distance matrix into data.frame
bray <- reshape2::melt(bray, varnames = c("id1", "id2"))
# Remove NAs
bray <- bray[complete.cases(bray), ]

head(bray)
id1  id2     value
2 row2 row1 0.2767599
3 row3 row1 0.3541247
4 row4 row1 0.3588235
5 row5 row1 0.3935618
6 row6 row1 0.2948328
7 row7 row1 0.4045643
``````

Now, I am interested to know if it's possible to get the same output `bray` (i.e. a `data frame` having 3 columns) from a long format `data frame` as an input. For example, if we convert the example `matrix` provided above as:

``````# From a data.frame -----------------------------------------------------------
ex_df <- reshape2::melt(ex)
colnames(ex_df) <- c("row", "col", "value")
``````

is it possible to get the same `bray` output containing the Bray-Curtis dissimilarity between each pair of rows? I bet efficient `dplyr` or `data.table` solutions exist.

• 忧郁逗比 回复

这会实现您的追求吗？基本上，它只是将长格式的数据重新排列成矩阵状的数据帧并从中计算BC。我想象您的实际数据集采用长格式。

``````library(tidyverse)

BC_dist <- ex_df %>%
spread(2,3) %>%
column_to_rownames("row") %>%
bcdist(rmzero = FALSE)
``````