I am trying to analyse a dataframe using hierarchical clustering hclust
function in R. I want to test a large number of plausible distance metrics along with the full set of clustering methods. So far using expand.grid
I have got
hyperparams = expand.grid(Meths=c("ward.D","ward.D2","single","complete","average","mcquitty","median","centroid"), Dists=c("euclidean", "maximum", "manhattan", "canberra", "binary","minkowski"))
The problem is that the dist
function that is passed to hclust
, when used with minkowski distance specifically, takes a further parameter p. For example, with iris
, you could perform the HCA yourself using Minkowski distance with p=3/2 for example, with:
contingtab = table(iris$Species, cutree(hclust(dist(iris[,1:4],method="minkowski",p=3/2),method="complete"),3))
I get an error if I try to simply include "minkowski, p=3/2" in my list for Dists
... however, if the method is anything but "minkowski"
, no p parameter is to be included.
I would like to pass in a vector of p
values I'll write beforehand (maybe something like c(5/4, 3/2, 7/4, 9/4)
) and be able to have these specified as the different p value options with Minkowski distance when I use expand.grid
. Ideally, when hyperparams
is viewed, it would also be clear which value of p has been used for each minkowski
, i.e. they should be labelled. So for example, where (if you run my code for hyperparams
) there would currently just be one minkowski
under Dists, for each of the methods in Meths, there would be, if I supplied the p
vector as c(5/4, 3/2, 7/4, 9/4)
, now instead 4 rows for Minkowski distance: minkowski, p=5/4
, minkowski, p=3/2
, minkowski, p=7/4
, minkowski, p=9/4
(or looking something like that, making the p values clear). Any ideas?
(注意:请不要提供任何软件包,只能以R为基数!)