Modify

Opened 3 years ago

Closed 4 months ago

Last modified 4 months ago

#20405 closed enhancement (fixed)

[Patch] History browser for complex relations requires lots of memory

Reported by: GerdP Owned by: GerdP
Priority: normal Milestone: 24.01
Component: Core Version:
Keywords: template_report history performance Cc: Don-vip

Description

What steps will reproduce the problem?

  1. Browse history of r2795128 (EuroVelo 3 - Pilgrim's Route - part Germany), a relation with > 1200 versions and > 4600 members
  2. Open e.g. VisualVm to create and analyse a heapdump

What is the expected result?

No (additional) instances for RelationMemberData as the dialog doesn't show any member specific data

What happens instead?

5.345.959(!) instances of RelationMemberData which require ~244 MB
The single instance of HistoryRelation retains ~300MB.

Please provide any additional information below. Attach a screenshot if possible.

I think it would be better to wait for the use to actually select the members tab before we create instances of RelationMemberData. Should not take too long when the corresponding changeset files were downloaded and cached.

URL:https://josm.openstreetmap.de/svn/trunk
Repository:UUID: 0c6e7542-c601-0410-84e7-c038aed88b3b
Last:Changed Date: 2020-12-28 22:03:23 +0100 (Mon, 28 Dec 2020)
Build-Date:2020-12-30 02:30:55
Revision:17428
Relative:URL: ^/trunk

Identification: JOSM/1.5 (17428 en) Windows 10 64-Bit
OS Build number: Windows 10 Home 2004 (19041)
Memory Usage: 1414 MB / 3641 MB (1083 MB allocated, but free)
Java version: 1.8.0_221-b11, Oracle Corporation, Java HotSpot(TM) 64-Bit Server VM
Look and Feel: com.sun.java.swing.plaf.windows.WindowsLookAndFeel
Screen: \Display0 1920×1080 (scaling 1.00×1.00)
Maximum Screen Size: 1920×1080
Best cursor sizes: 16×16→32×32, 32×32→32×32
VM arguments: [-XX:StartFlightRecording=name=MyRecording2,settings=d:\dbg\gerd.jfc, -XX:FlightRecorderOptions=defaultrecording=true,dumponexit=true,dumponexitpath=e:\ld\perf_20210119_072729.jfr]

Plugins:
+ ColumbusCSV (35640)
+ buildings_tools (35669)
+ o5m (35640)
+ pbf (35650)
+ poly (35640)
+ reltoolbox (35640)
+ reverter (35688)
+ undelete (35640)
+ utilsplugin2 (35682)

Validator rules:
+ c:\josm\core\resources\data\validator\geometry.mapcss

Attachments (1)

20405.patch (1.4 KB ) - added by GerdP 4 months ago.
simple patch that implements a cache for members in history relation in the parser, very effective when relations have many versions as most members are the same in each version

Download all attachments as: .zip

Change History (23)

in reply to:  description comment:1 by Don-vip, 3 years ago

Keywords: performance added

Replying to GerdP:

5.345.959(!) instances of RelationMemberData which require ~244 MB
The single instance of HistoryRelation retains ~300MB.

wow

I think it would be better to wait for the use to actually select the members tab before we create instances of RelationMemberData. Should not take too long when the corresponding changeset files were downloaded and cached.

Indeed!

comment:2 by GerdP, 3 years ago

I guess it is a major change and I'll probably not have time to work on this soon. Maybe a GSOC project?

comment:3 by GerdP, 4 months ago

I guess it should be rather simple to reduce the number of instances of RelationMemberData with a cache, similar to String.intern().
But it seems the major issue now is that the server will block the user who downloads such a large history for 300 seconds or so:

2024-01-23 21:45:29.270 SEVERE: org.openstreetmap.josm.io.OsmApiException: ResponseCode=509, Error Header=<You have downloaded too much data. Please try again in 289 seconds.>

No idea if that can be avoided on the JOSM side.

by GerdP, 4 months ago

Attachment: 20405.patch added

simple patch that implements a cache for members in history relation in the parser, very effective when relations have many versions as most members are the same in each version

comment:4 by GerdP, 4 months ago

For a rather normal relation like Hunteradweg r151754 (current version is 290 and has 516 members)

  • 98977 instances of RelationMemberData without patch
  • 743 instances of RelationMemberData with patch

comment:5 by GerdP, 4 months ago

Milestone: 24.01
Owner: changed from team to GerdP
Status: newassigned
Summary: History browser for complex relations requires lots of memory[Patch] History browser for complex relations requires lots of memory

comment:6 by GerdP, 4 months ago

Resolution: fixed
Status: assignedclosed

In 18956/josm:

fix #20405: History browser for complex relations requires lots of memory

  • avoid to create equal instances of RelationMemberData. This is simple and very effective when relations have many versions as most members are the same in each version

comment:7 by taylor.smock, 4 months ago

@GerdP: Stupid question: Why not memberCache.computeIfAbsent(member, Function.identity());? You don't have to change anything, but it seems that is (semantically) what you were trying to do.

comment:8 by GerdP, 4 months ago

Simple answer: I've never seen Function.identity(). Have to learn what it means.

comment:9 by taylor.smock, 4 months ago

Function.identity() is effectively just an object -> object lambda.

EDIT: I used it in my question since I thought it would be simpler/easier to read than memberCache.computeIfAbsent(member, m -> m);

Last edited 4 months ago by taylor.smock (previous) (diff)

comment:10 by GerdP, 4 months ago

OK, like this? It is fewer code but I wouldn't understand it without a debugger.

  • src/org/openstreetmap/josm/io/AbstractParser.java

     
    66import java.time.Instant;
    77import java.util.HashMap;
    88import java.util.Map;
     9import java.util.function.Function;
    910
    1011import org.openstreetmap.josm.data.coor.LatLon;
    1112import org.openstreetmap.josm.data.osm.OsmPrimitiveType;
     
    185186        String role = getMandatoryAttributeString(atts, "role");
    186187        RelationMemberData member = new RelationMemberData(role, type, ref);
    187188        // see #20405: cache equal instances of members
    188         RelationMemberData cachedMember = memberCache .putIfAbsent(member, member);
    189         if (cachedMember == null)
    190             cachedMember = member;
    191         ((HistoryRelation) currentPrimitive).addMember(cachedMember);
     189        ((HistoryRelation) currentPrimitive).addMember(memberCache.computeIfAbsent(member, Function.identity()));
    192190    }
    193191
    194192    protected final boolean doStartElement(String qName, Attributes atts) throws SAXException {

comment:11 by taylor.smock, 4 months ago

You could do it like that, or

  • src/org/openstreetmap/josm/io/AbstractParser.java

     
    66import java.time.Instant;
    77import java.util.HashMap;
    88import java.util.Map;
     9import java.util.function.Function;
    910
    1011import org.openstreetmap.josm.data.coor.LatLon;
    1112import org.openstreetmap.josm.data.osm.OsmPrimitiveType;
     
    185186        String role = getMandatoryAttributeString(atts, "role");
    186187        RelationMemberData member = new RelationMemberData(role, type, ref);
    187188        // see #20405: cache equal instances of members
    188         RelationMemberData cachedMember = memberCache .putIfAbsent(member, member);
    189         if (cachedMember == null)
    190             cachedMember = member;
     189        RelationMemberData cachedMember = memberCache.computeIfAbsent(member, m -> m);
    191190        ((HistoryRelation) currentPrimitive).addMember(cachedMember);
    192191    }
    193192
    194193    protected final boolean doStartElement(String qName, Attributes atts) throws SAXException {

comment:12 by GerdP, 4 months ago

OK, I think that's indeed easier to read.

comment:13 by GerdP, 4 months ago

In 18957/josm:

see #20405: simplify code

comment:14 by taylor.smock, 4 months ago

OK, I think that's indeed easier to read.

I wasn't originally asking about it for readability -- I assumed you had profiled it and it wasn't as efficient with large relations, so you were deliberately avoiding it.

So I would have been happy with a source code comment like Using Map#putIfAbsent instead of Map#computeIfAbsent since the lambda takes up <u> memory allocations and <x> cpu cycles when profiled against <y> relation with <z> members.

comment:15 by GerdP, 4 months ago

No, didn't think about cpu cycles at all. Most of the time is spent waiting for the download...

comment:16 by taylor.smock, 4 months ago

Most of the time is spent waiting for the download...

You are probably right. I tend to profile a lot of random things. Some of which don't need to be profiled. But some of which can become problematic with larger datasets.

Browse history of r2795128 (EuroVelo 3 - Pilgrim's Route - part Germany), a relation with > 1200 versions and > 4600 members

5k members with 1.5k revisions would have (I would guess) at most 7.5 million calls to that method (I'm assuming we call it on every revision for every member). Probably a lot less, depending upon how that relation grew.

Anyway, profiling it:

  • Most of the time in handleMember is spent in Map#computeIfAbsent, but most of that is due to the current hashCode implementation of RelationMemberData. This also does a bunch of memory allocations.
    • CPU time was ~2% of total
    • Memory allocations were ~8% of total
  • OsmPrimitiveType#fromApiTypeName took up another 6.3% of the memory allocations and 1.5% of the CPU cycles; the memory allocations would be easy to fix by having an internal copy of the OsmPrimitiveType#values.
  • The G1 garbage collector took up 12% of the CPU cycles.

With all that said, the performance wasn't too bad, so probably not something we need to fix.

comment:17 by GerdP, 4 months ago

My only concern when I created this ticket was the heap memory. I think at that time the history relation was kept after the history dialog was closed. That was changed with r17471, but in a rather ugly way.
I think it would be good to cache changeset files (of closed changesets) so that they are not downloaded again and again when you open the history for a relation multiple times.

comment:18 by taylor.smock, 4 months ago

That would probably be a good idea -- I ran into rate limiting when testing.

Are you thinking in-memory or cached to disk? I'd do the latter instead of the former or a mix of both. And closed changesets cannot change, so caching them would be "safe" (which is probably why you specified "closed changesets").

I don't think CachedFile would be a good choice (it pollutes the preferences); a JCS cache would be better, I think.
Maybe we should write a generic cache for network requests? Specifically so that we can reuse it in different places for network requests that will not change/will not change often.

comment:19 by GerdP, 4 months ago

Yes, I had something like a disk cache in mind, but I wouldn't be able to write the code. See comment:2

comment:20 by skyper, 4 months ago

See also #21563.

A JCS cache would be really nice. In comment 4 on #21563 mmd wrote about the option to use multi_fetch with version numbers which might be considered, as well.

comment:21 by GerdP, 4 months ago

I just wonder how to treat redactions in this case. Does a redaction on the server side also mean that the content or existence of a changeset is changed?

comment:22 by taylor.smock, 4 months ago

I didn't think about redactions; I'd want to have a comparison of the before/after headers.

But a HEAD request isn't going to get us anywhere -- there isn't a Last-Modified field in the response headers anyway.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain GerdP.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.