Modify

Opened 7 months ago

Last modified 6 months ago

#23671 needinfo enhancement

parallelize layer loading on session restore

Reported by: anonymous Owned by: anonymous
Priority: minor Milestone:
Component: Core Version: tested
Keywords: Cc:

Description

When opening a saved session, layers (or at least OSM data layers) are loaded sequentially. This increases load time quite a bit when opening a session with multiple, large datasets. Ideally they could be parsed/loaded in parallel.

My use case are large auto-generated .osm files made up from GPX tracks, where the nodes additionally contain the video timestamp. This allows fast lookups of the detailed video, which is not something I can do with "raw GPX". Likewise, classic GPS-tagged JPEGs become unwieldy. I split the auto-generated .osm by year, but that didn't improve the load times due to the loading happening sequentially.

I'm aware this usage is not exactly common, so feel free to wontfix.

Thanks
Stefan

Attachments (0)

Change History (6)

comment:1 by taylor.smock, 6 months ago

Owner: changed from team to anonymous
Status: newneedinfo

If you don't mind sharing a sample file, I can do some profiling. I might try to generate a sample, but it will be different from your real-world use case.

Realistically, it is probably disk IO that will be the bottleneck. If it isn't, then profiling may lead me to making optimizations that don't require multiple threads. For example, we might be doing something that is "cheap" once, but is expensive if it happens hundreds of times.

We do have an issue in that we have id generators for "new" objects (and all new objects get a new unique id generated on load, even if they already had a "unique" id), and I don't know if that is threadsafe (it probably is), but the synchronization costs may make parallelization worthless.

I don't think it is worthwhile to try to parallelize layer loading on session restore, but I would want to profile something where it could make a difference.

comment:2 by Stefan, 6 months ago

Sure, here's one of the auto-generated OSM files, XZ compressed (unpacks to 66 MB):
https://www.breunig.xyz/share/2024-05-14/videos_2022.osm.xz

The session file looks something like this, though you'll likely need to adjust the absolute file paths and copy&paste the OSM file:

<?xml version="1.0" encoding="utf-8"?>
  <josm-session version="0.1">
    <projection>
      <projection-choice>
        <id>core:mercator</id>
        <parameters/>
      </projection-choice>
      <code>EPSG:3857</code>
    </projection>
  <layer index="1" name="videos_2024.osm" type="osm-data" version="0.1" visible="false">
    <file>file:/tmp/videos_2024.osm</file>
  </layer>
  <layer index="2" name="videos_2023.osm" type="osm-data" version="0.1" visible="false">
    <file>file:/tmp/videos_2023.osm</file>
  </layer>
  <layer index="3" name="videos_2022.osm" type="osm-data" version="0.1" visible="false">
    <file>file:/tmp/videos_2022.osm</file>
  </layer>
  <layer index="4" name="videos_2021.osm" type="osm-data" version="0.1" visible="false">
    <file>file:/tmp/videos_2021.osm</file>
  </layer>
      <layer index="5" name="OpenStreetMap Carto (Standard)" type="imagery" version="0.1" visible="true">
        <max-zoom>19</max-zoom>
        <valid-georeference>true</valid-georeference>
        <modTileFeatures>true</modTileFeatures>
        <transparent>true</transparent>
        <minimumTileExpire>86400</minimumTileExpire>
        <name>OpenStreetMap Carto (Standard)</name>
        <id>standard</id>
        <type>tms</type>
        <url>https://{switch:a,b,c}.tile.openstreetmap.org/{zoom}/{x}/{y}.png</url>
        <attribution-text>© OpenStreetMap contributors</attribution-text>
        <attribution-url>https://www.openstreetmap.org/</attribution-url>
        <permission-reference-url>https://wiki.osmfoundation.org/wiki/Terms_of_Use</permission-reference-url>
        <category>osmbasedmap</category>
        <show-errors>true</show-errors>
        <automatic-downloading>true</automatic-downloading>
        <automatically-change-resolution>true</automatically-change-resolution>
    </layer>
    </layers>
  </josm-session>

comment:3 by anonymous, 6 months ago

Regarding the bottleneck, given all video.osm files are around 60 MB, and JOSM using just ~2GB of the -Xmx 4096m I'm giving it, disk IO seems odd. Yes, this laptop has slow memory transfer speeds, but even with that 4*60MB with tons of overhead doesn't add up to that much. If this doesn't happen for you, maybe it's an oddity with my machine. If so, I'd be grateful for pointers on how to profile Java apps/JOSM these days.

comment:4 by taylor.smock, 6 months ago

In 19079/josm:

See #23671: Significantly improve performance of Utils.isBlank and Utils.isStripEmpty along with reducing intern cost when loading files

From Utils, isBlank and isStripEmpty are functionally similar; both take a
string and check for the same whitespace characters. Specifically, the code flow
for isBlank calls strip, which uses DEFAULT_STRIP as additional characters
to remove. strip eventually calls isStrippedChar to determine whether or not
to keep a character. isStrippedChar checks DEFAULT_STRIP always, so the
returns from Utils.isBlank and Utils.isStripEmpty should always be the same.

Since the two functions are the same, we change isBlank to point at
isStripEmpty, and isStripEmpty now calls String.isBlank prior to checking
every character. String.isBlank is a built-in method, and is thus more likely
to be well-optimized by the JVM. In the event that it returns false, we want to
avoid any additional work, so we use a traditional for loop instead of a stream;
we will usually know within a character or two whether or not the string is empty.
With a stream, we first have to construct the stream, then iterate through it.

In AbstractReader, we create a map of strings to interned strings to avoid
calling String.intern too often -- despite being something we want to do fairly
frequently. Part of the (current) problem with String.intern is that it uses a
ConcurrentHashTable, so it "costs" more to use. The HashMap (even with the
cost of the lambdas) is ~25% of the CPU cost of interning, given sufficiently
duplicated data.

comment:5 by stoecker, 6 months ago

Deprecate or remove isBlank(). No reason to keep it.

Last edited 6 months ago by stoecker (previous) (diff)

comment:6 by taylor.smock, 6 months ago

In 19080/josm:

See #23671: Deprecate Utils#isBlank and replace instances of it with Utils#isStripEmpty

As noted in r19079, the two functions were identical in behavior.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as needinfo The owner will remain anonymous.
as The resolution will be set. Next status will be 'closed'.
to The owner will be changed from anonymous to the specified user. Next status will be 'new'.
as duplicate The resolution will be set to duplicate. Next status will be 'closed'. The specified ticket will be cross-referenced with this ticket.
The owner will remain anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.