First we need a reproducible example of your data:
set.seed(42)
x <- matrix(runif(150), 50, 3)
kmeans.x <- kmeans(x, 10)
Now you want to find the observations in original data x
that are closest to the centroids computed and stored as kmeans.x
. We use the get.knnx()
function in package FNN
. We will just get the 5 closest observations for each of the 10 clusters.
library(FNN)
y <- get.knnx(x, kmeans.x$centers, 5)
str(y)
# List of 2
# $ nn.index: int [1:10, 1:5] 42 40 50 22 39 47 11 7 8 16 ...
# $ nn.dist : num [1:10, 1:5] 0.1237 0.0669 0.1316 0.1194 0.1253 ...
y$nn.index[1, ]
# [1] 42 38 3 22 43
idx1 <- sort(y$nn.index[1, ])
cbind(idx1, x[idx1, ])
# idx1
# [1,] 3 0.28614 0.3984854 0.21657
# [2,] 22 0.13871 0.1404791 0.41064
# [3,] 38 0.20766 0.0899805 0.11372
# [4,] 42 0.43577 0.0002389 0.08026
# [5,] 43 0.03743 0.2085700 0.46407
The row indices of the nearest neighbors are stored in nn.index
so for the first cluster, the 5 closest observations are 42, 38, 3, 22, 43.
CLICK HERE to find out more related problems solutions.