Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streaming CityJSON datasets

Streaming CityJSON datasets

Presentation given at the 3D Geoinfo 2024 conference in Vigo (Spain) on 2024-07-02.

The paper with more details is open access: https://doi.org/10.5194/isprs-archives-XLVIII-4-W11-2024-57-2024

The software to convert CityJSON <=> CityJSONSeq is : https://github.com/cityjson/cjseq

Hugo Ledoux

July 02, 2024
Tweet

More Decks by Hugo Ledoux

Other Decks in Research

Transcript

  1. Streaming CityJSON datasets 3DGeoInfo 2024 | Vigo, Spain | 2024-07-02

    Hugo Ledoux TUDelft Balázs Dukai 3DGI.nl Gina Stavropoulou TUDelft
  2. Streaming datasets: conveyor belt idea for unlimited data 2 “A

    stream is a sequence of potentially unlimited data elements made available over time” “items on a conveyor belt being processed one at a time rather than in large batches” “Normal functions [designed for batch data] cannot operate on streams”
  3. 3

  4. Streaming datasets: conveyor belt idea for unlimited data 4 Batch

    data 1-by-1 Examples: 1. calculate volume 2. convert to glTF 3. repair geometry 4. add/remove attributes 5. filter with bbox Processing Unit
  5. The problem: CityJSON v1.0 could not stream files 5 {

    "type": “CityJSON", "version": “2.0”, "metadata": {…}, "transform": {…} "CityObjects": { "id-1": { "type": "Building", "attributes": { "owner": “Elvis Presley" }, "geometry": [ { "type": "MultiSurface", "boundaries": [ [[0, 3, 2, 1]], [[4, 5, 6, 7]], [[0, 1, 5, 4]] ] } ] }, "id-2": { "type": "Building", "attributes": { "owner": “Jan Smit" }, "geometry": [ { "type": "MultiSurface", "boundaries": [ [[21, 24, 32, 16]], [[14, 53, 44, 77]], [[3, 13, 95, 4]] ] } ] }, "id-2": {…}, … "id-2868": {…} }, "vertices": [ [217989,242969,2494], [216100,242849,2494], [217779,238630,2494], [219649,238840,2494], [216100,242849,0], [217989,242969,0], [219649,238840,0], [217779,238630,0], [685389,280840,2320], [686259,278969,2320], [691769,281539,2320], [690909,283400,2320], [685389,280840,0], [690909,283400,0], [691769,281539,0], [686259,278969,0], [437607,387571,14595], [434595,374537,14595], [441375,372995,14595], [444399,386119,14595], [438311,387552,14595], [437639,387710,14595], [437639,387710,0], [444399,386119,0], [441375,372995,0], [434595,374537,0], [437436,386830,14595], [437436,386830,14435], [434595,374537,14435], [438311,387552,0], [441375,372995,14505], [444399,386119,14505], [437607,387571,15200], [437639,387710,15200], [437639,387710,15040], [437607,387571,15040], [437436,386830,15200], [437436,386830,15040] ] } Could be several millions “vertices”! CityJSON v1.0 had no solution for streaming, besides advising people to create small files (which does not work in practice at all…) CityJSON file
  6. CityJSONSeq — decompose a file into its features 7 {

    "type": “CityJSON", "version": “2.0”, "metadata": {…}, "transform": {…} "CityObjects": {}, "vertices": [] } Metadata + geom templates { "type": “CityJSONFeature", “CityObjects": { "id-1": { "id": “id-1", "type": "Building", "geometry": [ { "type": "MultiSurface", "boundaries": [ [[0, 3, 2, 1]], [[4, 5, 6, 7]], [[0, 1, 5, 4]] ] } ] } }, "vertices": [ [231, 23212, 110], [1111, 3211, 120], ... ] } 1st Building { "type": “CityJSONFeature", “CityObjects": { "id-1": { "id": “id-2”, "type": "Building", "attributes": { "owner": “Jan Smit” }, "geometry": [ { "type": "MultiSurface", "boundaries": [ [[0, 2, 7, 11]], [[4, 15, 6, 7]], [[0, 9, 4, 14]] ] } ] } }, "vertices": [ [432, 232, 231], [987, 236, 220], ... ] } 2nd Building { "type": “CityJSON", "version": “2.0”, "metadata": {…}, "transform": {…} "CityObjects": { "id-1": { "type": "Building", "geometry": [ { "type": "MultiSurface", "boundaries": [ [[0, 3, 2, 1]], [[4, 5, 6, 7]], [[0, 1, 5, 4]] ] } ] }, "id-2": { "type": "Building", "attributes": { "owner": “Jan Smit" }, "geometry": [ { "type": "MultiSurface", "boundaries": [ [[21, 24, 32, 16]], [[14, 53, 44, 77]], [[3, 13, 95, 4]] ] } ] } }, "vertices": [ [231, 23212, 110], [1111, 3211, 120], … [3111, 911, 990], [151, 5211, 420], ] } CityJSON file = + +
  7. Automatic conversion with the software cjseq 10 • 🔁 •

    open-source • cross-platform • fast (written in Rust) ~5s for a 580MB file
  8. Real-world datasets used for experiments 12 Table 1. The datasets

    used for the benchmark. dataset size of file vertices CityObjects app.(a) CityJSON CityJSONSeq compr.(b) total largest(c) shared(d) 3DBAG 1110 bldgs 6.7 MB 5.9 MB 12% 82 509 4112 0.1% 3DBV 71 634 misc 378 MB 317 MB 16% 4 110 319 116 670 21.0% Helsinki 77 231 bldgs 572 MB 412 MB 28% 3 038 576 2202 0.0% Helsinki tex 77 231 bldgs tex 713 MB 644 MB 10% 3 038 576 2202 0.0% Ingolstadt 55 bldgs 4.8 MB 3.8 MB 25% 87 972 12 800 0.0% Montr´ eal 294 bldgs tex 5.4 MB 4.6 MB 15% 31 585 3393 2.0% NYC 23 777 bldgs 105 MB 95 MB 10% 1 035 804 2608 0.8% Railway 50 misc tex+mat 4.3 MB 4.0 MB 8% 73 554 14 966 0.4% Rotterdam 853 bldgs tex 2.6 MB 2.7 MB -4% 22 246 631 20.0% Vienna 307 bldgs 5.4 MB 4.8 MB 11% 47 220 2025 0.0% Z¨ urich 52 834 bldgs 279 MB 247 MB 11% 3 472 989 4069 2.6% (a) appearance: ‘tex’ is textures stored; ‘mat’ is material stored (b) compression factor is size(CityJSON) size(CityJSONSeq) size(CityJSON) (c) number of vertices in the largest feature of the stream (d) percentage of vertices that are used to represent different city objects
  9. Filesize ==> 10%-15% compression factor⁉ 13 Table 1. The datasets

    used for the benchmark. dataset size of file vertices CityObjects app.(a) CityJSON CityJSONSeq compr.(b) total largest(c) shared(d) 3DBAG 1110 bldgs 6.7 MB 5.9 MB 12% 82 509 4112 0.1% 3DBV 71 634 misc 378 MB 317 MB 16% 4 110 319 116 670 21.0% Helsinki 77 231 bldgs 572 MB 412 MB 28% 3 038 576 2202 0.0% Helsinki tex 77 231 bldgs tex 713 MB 644 MB 10% 3 038 576 2202 0.0% Ingolstadt 55 bldgs 4.8 MB 3.8 MB 25% 87 972 12 800 0.0% Montr´ eal 294 bldgs tex 5.4 MB 4.6 MB 15% 31 585 3393 2.0% NYC 23 777 bldgs 105 MB 95 MB 10% 1 035 804 2608 0.8% Railway 50 misc tex+mat 4.3 MB 4.0 MB 8% 73 554 14 966 0.4% Rotterdam 853 bldgs tex 2.6 MB 2.7 MB -4% 22 246 631 20.0% Vienna 307 bldgs 5.4 MB 4.8 MB 11% 47 220 2025 0.0% Z¨ urich 52 834 bldgs 279 MB 247 MB 11% 3 472 989 4069 2.6% (a) appearance: ‘tex’ is textures stored; ‘mat’ is material stored (b) compression factor is size(CityJSON) size(CityJSONSeq) size(CityJSON) (c) number of vertices in the largest feature of the stream (d) percentage of vertices that are used to represent different city objects
  10. Filesize ==> 10%-15% compression factor⁉ 14 { "type": “CityJSON", "version":

    “2.0”, "metadata": {…}, "transform": {…} "CityObjects": { "id-1": { "type": "Building", "geometry": [ { "type": "MultiSurface", "boundaries": [ [[1023120, 1123443, 12122, 223441]], [[24344, 34425, 345346, 2343437]], [[10, 1121, 55566, 435734]], … ] } ] }, "id-2": { "type": "Building", "attributes": { "owner": “Jan Smit" }, "geometry": [ { "type": "MultiSurface", "boundaries": [ [[212323, 723434324, 334342, 13534346]], [[2352514, 53353523, 435354, 75457]], … [[3542353, 1352353, 946465, 446]] ] } ] } }, { "type": “CityJSONFeature", "CityObjects": { "id-1": { "type": "Building", "geometry": [ { "type": "MultiSurface", "boundaries": [ [[8,34, 12, 2]], [[8, 3, 45, 2]], [[10, 1, 5, 43]], … ] } ] }, "vertices": […] } } Smaller vertex IDs! { "type": “CityJSONFeature", "CityObjects": { “id-2": { "type": "Building", "attributes": { "owner": “Jan Smit" }, "geometry": [ { "type": "MultiSurface", "boundaries": [ [[8,34, 12, 2]], [[8, 3, 45, 2]], [[10, 1, 5, 43]], … ] } ] }, "vertices": […] } }
  11. Processing time + RAM 15 Table 2. Comparison of the

    processing time and maximum RAM usage for processing CityJSON and CityJSONSeq set size (RSS) is used, which is the portion of main memory occupied by the Python script. RAM used (MB) time (s) CityJSON CityJSONSeq CityJSON CityJSONSeq diff 3DBAG 76.9 16.1 0.10 0.07 1.4X 3DBV 4101.8 123.8 10.95 3.59 3.1X Helsinki 3743.1 15.0 13.39 2.74 4.9X Helsinki tex 5004.8 19.1 29.60 4.72 6.3X Ingolstadt 65.5 21.3 0.08 0.06 1.3X Montr´ eal 79.3 20.8 0.11 0.07 1.6X NYC 949.5 16.0 1.78 0.70 2.5X Railway 69.6 29.6 0.09 0.07 1.3X Rotterdam 42.4 14.6 0.04 0.04 1.0X Vienna 60.1 15.7 0.06 0.05 1.2X Zurich 2793.1 16.3 6.05 2.00 3.0X tions that manipulate 3D city models. When a city model is stored in its entirety in one CityJSON object, we need to deserialise the whole CityJSON object into memory in order to access the "transform" and "vertices" properties for instance. With a CityJSONSeq file, we can read the file line by line, pro- cessing and discarding the city objects one by one (and thus never have in memory more than the city object itself and the print and that it takes an order of magnitude le (for some local operations) makes it an attrac CityJSON for several use-cases. It should be noticed that the CityJSON spec prescribe the storage of CityJSONSeq, only CityJSONSeq stream. In practice, CityJSON in a variety of ways, for instance in a single file separate file, in a database, etc. The optimal st pends on the implementing application. As a