Hierarchical clustering with coordinates and non-spatial parameters

A potential solution, marking as the answer for now:

First, rescale the variables that you want to include in your distance matrix. In this case I assign a larger weight (10) to the coordinate variables (x_cent and y_cent).

dat$x_cent <- scales::rescale(dat$x_cent, to = c(0, 10))
dat$y_cent <- scales::rescale(dat$y_cent, to = c(0, 10))
dat$tot_pop <- scales::rescale(dat$tot_pop, to = c(0, 1))

Second, subset the data to include only the covariates with which you are calculating distance:

dat <- dat[, c("x_cent", "y_cent", "tot_pop")]

Next, calculate the distance matrix:

dist <- distances::distances(as.data.frame(dat))

Calculate clusters using the scclust package and append values to the original dataset. This package allows you to incorporate constraints on your cluster size.

clust <- scclust::hierarchical_clustering(distances = dist, size_constraint = 10)
final <- dplyr::bind_cols(dat, clust) %>% dplyr::rename(block = `...4`)

You can see how many observations exist per cluster:

investigate_cluster <- dplyr::group_by(final, block) %>% dplyr::summarise(count = length(block))

head(investigate_cluster)

# A tibble: 6 x 2
  block     count
  <scclust> <int>
1 0            10
2 1            10
3 2            10
4 3            10
5 4            10
6 5            10

And easily visualize your clusters:

ggplot(final, mapping = aes(x = x_cent, y = y_cent, color = factor(block))) +
  geom_point() +
  ggConvexHull::geom_convexhull(alpha = .5, aes(fill = factor(block))) +
  theme_bw() + 
  theme(legend.position = "none")

enter image description here

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top