Opened 9 months ago
Last modified 9 months ago
#23671 needinfo enhancement
parallelize layer loading on session restore
Reported by: | anonymous | Owned by: | anonymous |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | Core | Version: | tested |
Keywords: | Cc: |
Description
When opening a saved session, layers (or at least OSM data layers) are loaded sequentially. This increases load time quite a bit when opening a session with multiple, large datasets. Ideally they could be parsed/loaded in parallel.
My use case are large auto-generated .osm files made up from GPX tracks, where the nodes additionally contain the video timestamp. This allows fast lookups of the detailed video, which is not something I can do with "raw GPX". Likewise, classic GPS-tagged JPEGs become unwieldy. I split the auto-generated .osm by year, but that didn't improve the load times due to the loading happening sequentially.
I'm aware this usage is not exactly common, so feel free to wontfix.
Thanks
Stefan
Attachments (0)
Change History (6)
comment:1 by , 9 months ago
Owner: | changed from | to
---|---|
Status: | new → needinfo |
comment:2 by , 9 months ago
Sure, here's one of the auto-generated OSM files, XZ compressed (unpacks to 66 MB):
https://www.breunig.xyz/share/2024-05-14/videos_2022.osm.xz
The session file looks something like this, though you'll likely need to adjust the absolute file paths and copy&paste the OSM file:
<?xml version="1.0" encoding="utf-8"?> <josm-session version="0.1"> <projection> <projection-choice> <id>core:mercator</id> <parameters/> </projection-choice> <code>EPSG:3857</code> </projection> <layer index="1" name="videos_2024.osm" type="osm-data" version="0.1" visible="false"> <file>file:/tmp/videos_2024.osm</file> </layer> <layer index="2" name="videos_2023.osm" type="osm-data" version="0.1" visible="false"> <file>file:/tmp/videos_2023.osm</file> </layer> <layer index="3" name="videos_2022.osm" type="osm-data" version="0.1" visible="false"> <file>file:/tmp/videos_2022.osm</file> </layer> <layer index="4" name="videos_2021.osm" type="osm-data" version="0.1" visible="false"> <file>file:/tmp/videos_2021.osm</file> </layer> <layer index="5" name="OpenStreetMap Carto (Standard)" type="imagery" version="0.1" visible="true"> <max-zoom>19</max-zoom> <valid-georeference>true</valid-georeference> <modTileFeatures>true</modTileFeatures> <transparent>true</transparent> <minimumTileExpire>86400</minimumTileExpire> <name>OpenStreetMap Carto (Standard)</name> <id>standard</id> <type>tms</type> <url>https://{switch:a,b,c}.tile.openstreetmap.org/{zoom}/{x}/{y}.png</url> <attribution-text>© OpenStreetMap contributors</attribution-text> <attribution-url>https://www.openstreetmap.org/</attribution-url> <permission-reference-url>https://wiki.osmfoundation.org/wiki/Terms_of_Use</permission-reference-url> <category>osmbasedmap</category> <show-errors>true</show-errors> <automatic-downloading>true</automatic-downloading> <automatically-change-resolution>true</automatically-change-resolution> </layer> </layers> </josm-session>
comment:3 by , 9 months ago
Regarding the bottleneck, given all video.osm files are around 60 MB, and JOSM using just ~2GB of the -Xmx 4096m I'm giving it, disk IO seems odd. Yes, this laptop has slow memory transfer speeds, but even with that 4*60MB with tons of overhead doesn't add up to that much. If this doesn't happen for you, maybe it's an oddity with my machine. If so, I'd be grateful for pointers on how to profile Java apps/JOSM these days.
If you don't mind sharing a sample file, I can do some profiling. I might try to generate a sample, but it will be different from your real-world use case.
Realistically, it is probably disk IO that will be the bottleneck. If it isn't, then profiling may lead me to making optimizations that don't require multiple threads. For example, we might be doing something that is "cheap" once, but is expensive if it happens hundreds of times.
We do have an issue in that we have id generators for "new" objects (and all new objects get a new unique id generated on load, even if they already had a "unique" id), and I don't know if that is threadsafe (it probably is), but the synchronization costs may make parallelization worthless.
I don't think it is worthwhile to try to parallelize layer loading on session restore, but I would want to profile something where it could make a difference.