3

I have over 60,000 building polygons footprints in shapefile format (polygonZ) where I need to replace the z values (for each ring) for all the polygon vertices, so that all the Z values for the polygon are the same. The issue is that we have 3D polygons that have not been edited in 3D so some rings have correct Z values for all the vertices while other rings have a few erroneous Z values which need to be corrected.

I am able to do it via a for loop but it takes a good 45 minutes to complete as it needs to iterate through all the polygons.

I was wondering if anyone has any ideas on how to speed this up?

I did try the following, thinking I could replace all the values at once via indexing but it didn't work. I assume its due to there being a different number of vertices per polygon compared to the "new values" I want to use as replacement.

> myshape$geometry[][[1]][[1]][[1]][,2] <- new_values[]

Error in myshape$geometry[][[1]][[1]][[1]][, 2] <- new_values[] : 
  number of items to replace is not a multiple of replacement length

Here is a working example using stock R data. I am replacing the X instead of Z for this example.

library(sf)
library(tictoc) 

# load test data
myshape <- st_read(system.file("shape/nc.shp", package = "sf"))

# recast to polygon (my actual data are polygons, not multipolygons)
myshape <- st_cast(myshape, "POLYGON")

# make some random data
new_values <- runif(nrow(myshape))

# replace some values
tic()
for (row in 1:nrow(myshape)) {
  # replace
  myshape$geometry[row][[1]][[1]][,2] <- new_values[row]
}
toc()

edit: clarified that each ring will have the same Z value, recast to polygons (my actual data are polygons), provided background on my problem and also fixed an error in my for loop.

3
  • 1
    for (row in nrow(myshape)) will only loop over nrow(myshape), not 1 to nrow(myshape) ! Try 1:nrow(myshape) instead. Otherwise this looks fine (although its modifying X, not Z, but you prob know that...)
    – Spacedman
    Commented May 15 at 14:41
  • Actually this doesn't work for general multipolygon geometries since you are only changing the first ring of a multipolygon geometry. You really need to iterate over all myshape$geometry[row][[1]][[ring]][[part]][, 2] for all values of row, ring and part within features.
    – Spacedman
    Commented May 15 at 16:46
  • Thanks for catching the 1:nrows(myshape)! I edited the original post now and hopefully clarified I need to update all the values for each ring.
    – nola
    Commented May 16 at 7:23

2 Answers 2

1

There's something going on that maybe someone else can explain--possibly the whole geometry column is being overwritten with every iteration using the for loop. A Map approach is orders of magnitude faster with a larger sf object:

library(sf)

# load test data
myshape <- st_read(system.file("shape/nc.shp", package = "sf"))[sample(100, 1e4, 1),]

# make some random data
new_values <- runif(1e4)
myshape$geometry2 <- myshape$geometry3 <- myshape$geometry

system.time({
  for (row in 1:nrow(myshape)) {
    # replace
    myshape$geometry2[row][[1]][[1]][[1]][,2] <- new_values[row]
  }
})
#>    user  system elapsed 
#>  105.19    0.39  105.64

system.time({
  f <- function(g, x) {
    g[[1]][[1]][,2] <- x
    g
  }
  
  myshape$geometry3[] <- Map(f, myshape$geometry, new_values)
})
#>    user  system elapsed 
#>    0.07    0.00    0.06

identical(myshape$geometry2, myshape$geometry3)
#> [1] TRUE
7
  • 1
    Note my comment above about multipolygon geometries applies here. eg check myshape$geometry[4] which only has its Y set for the first ring.
    – Spacedman
    Commented May 15 at 16:47
  • Just following the OPs code. I can update it once the OP confirms the target is not just the first ring.
    – jblood94
    Commented May 15 at 18:34
  • I tried a simple matrix and the loop was "happy" : foo <- matrix(nr=100,nc=10000) floop <- function(x,y) { for(jrow in 1:nrow(x)) x[jrow,] <- y return(invisible(x)) } system.time(floop(foo,1:1e4)) system.time(floop(foo,1:1e4)) user system elapsed 0.034 0.010 0.042 Commented May 16 at 1:41
  • @jblood94 THANK YOU so much for teaching me about the map() function! My script that took 45 mins to complete is now finished in just 26 seconds as I had some other loops that were doing the same thing. Replaced them with map() and its lightening fast now.
    – nola
    Commented May 16 at 8:27
  • 1
    @Norrislam, I don't think it is Map per se that is especially fast. A for loop will typically be at least as fast as Map. As I mentioned to @CarlWitthoft, I suspect the poor performance has something to do with the structure of myshape$geometry and how the data are being accessed in the for loop.
    – jblood94
    Commented May 16 at 10:19
1

This scales relatively well. The repex is based on the nc.shp data, see below for a larger example. The workflow:

  1. Create new_values df with z values and associated feature ID e.g. nc$NAME
  2. st_cast() myshape to polygon, assign pid to individual polygons, st_cast() to points, join XYZ values and create geometry, rebuild multipolygons
library(sf)
library(sfheaders)
library(dplyr)
library(ggplot2)

# Load test data
myshape <- st_read(system.file("shape/nc.shp", package = "sf"))

# Make some random z coordinate data with associated feature ID e.g. myshape$NAME
set.seed(1)
new_values <- data.frame(NAME = myshape$NAME,
                         z = runif(nrow(myshape)))

# Create output with z values assigned
sf_poly <- myshape %>%
  # Split multipolygons and assign unique id to each polygon for 
  st_cast("POLYGON") %>%
  mutate(pid = 1:n(), .by = NAME) %>%
  # Cast to point, retrieve lon/lat values, left_join() z values
  st_cast("POINT") %>%
  mutate(x = st_coordinates(.)[,1],
         y = st_coordinates(.)[,2]) %>%
  left_join(., new_values, by = "NAME") %>%
  # Drop original geometry, create new XYZ geometry, assign CRS
  st_drop_geometry() %>%
  sf_pt(., keep = TRUE) %>%
  rename(geometryz = geometry) %>%
  st_set_crs(st_crs(myshape)) %>%
  # Rebuild multipolygons in two steps, otherwise non-contiguous features will be joined
  summarise(geometryz = st_combine(geometryz), .by = c(NAME, pid)) %>%
  st_cast("POLYGON") %>%
  summarise(geometryz = st_combine(geometryz), .by = NAME) %>%
  st_cast("MULTIPOLYGON")

result

Result from larger dataset on low-spec PC:
Using this data, with 60,324 polygons:

# Set extent
sfc <- st_sfc(st_polygon(list(rbind(c(0,0), c(20,0), c(20,20), c(0,0)))))

# Make polygon sf
sf_grid <- st_make_grid(sfc, cellsize = .088, square = FALSE) %>%
  st_as_sf() %>%
  mutate(NAME = paste0("poly", sprintf("%02d", 1:n()))) %>%
  st_set_crs(st_crs(myshape)) %>%
  rename(geometry = x)

It took ~4.63 minutes. Not entirely representative, as sf_grid has no multipolygon features, but a decent time reduction nonetheless.

3
  • WOW! this is some amazing R wizardry here! It will take me a bit to understand and implement the code for my script.
    – nola
    Commented May 16 at 8:15
  • @Norrislam - I've updated the code with annotations, and removed some unneeded steps.
    – L Tyrone
    Commented May 17 at 3:50
  • thanks! I am still new to R and trying to wrap my brain around the dplyr pipeline so great to see a solution using this.
    – nola
    Commented May 17 at 10:37

Not the answer you're looking for? Browse other questions tagged or ask your own question.