retrieve 100 samples closest to the centroids of each cluster using r as a clustering technique

First we need a reproducible example of your data:

set.seed(42)
x <- matrix(runif(150), 50, 3)
kmeans.x <- kmeans(x, 10)

Now you want to find the observations in original data x that are closest to the centroids computed and stored as kmeans.x. We use the get.knnx() function in package FNN. We will just get the 5 closest observations for each of the 10 clusters.

library(FNN)
y <- get.knnx(x, kmeans.x$centers, 5)
str(y)
# List of 2
#  $ nn.index: int [1:10, 1:5] 42 40 50 22 39 47 11 7 8 16 ...
#  $ nn.dist : num [1:10, 1:5] 0.1237 0.0669 0.1316 0.1194 0.1253 ...
y$nn.index[1, ]
# [1] 42 38  3 22 43
idx1 <- sort(y$nn.index[1, ])
cbind(idx1, x[idx1, ])
#      idx1                          
# [1,]    3 0.28614 0.3984854 0.21657
# [2,]   22 0.13871 0.1404791 0.41064
# [3,]   38 0.20766 0.0899805 0.11372
# [4,]   42 0.43577 0.0002389 0.08026
# [5,]   43 0.03743 0.2085700 0.46407

The row indices of the nearest neighbors are stored in nn.index so for the first cluster, the 5 closest observations are 42, 38, 3, 22, 43.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top