#20405 closed enhancement (fixed)
[Patch] History browser for complex relations requires lots of memory
Reported by: | GerdP | Owned by: | GerdP |
---|---|---|---|
Priority: | normal | Milestone: | 24.01 |
Component: | Core | Version: | |
Keywords: | template_report history performance | Cc: | Don-vip |
Description
What steps will reproduce the problem?
- Browse history of r2795128 (EuroVelo 3 - Pilgrim's Route - part Germany), a relation with > 1200 versions and > 4600 members
- Open e.g. VisualVm to create and analyse a heapdump
What is the expected result?
No (additional) instances for RelationMemberData as the dialog doesn't show any member specific data
What happens instead?
5.345.959(!) instances of RelationMemberData
which require ~244 MB
The single instance of HistoryRelation
retains ~300MB.
Please provide any additional information below. Attach a screenshot if possible.
I think it would be better to wait for the use to actually select the members tab before we create instances of RelationMemberData
. Should not take too long when the corresponding changeset files were downloaded and cached.
URL:https://josm.openstreetmap.de/svn/trunk Repository:UUID: 0c6e7542-c601-0410-84e7-c038aed88b3b Last:Changed Date: 2020-12-28 22:03:23 +0100 (Mon, 28 Dec 2020) Build-Date:2020-12-30 02:30:55 Revision:17428 Relative:URL: ^/trunk Identification: JOSM/1.5 (17428 en) Windows 10 64-Bit OS Build number: Windows 10 Home 2004 (19041) Memory Usage: 1414 MB / 3641 MB (1083 MB allocated, but free) Java version: 1.8.0_221-b11, Oracle Corporation, Java HotSpot(TM) 64-Bit Server VM Look and Feel: com.sun.java.swing.plaf.windows.WindowsLookAndFeel Screen: \Display0 1920×1080 (scaling 1.00×1.00) Maximum Screen Size: 1920×1080 Best cursor sizes: 16×16→32×32, 32×32→32×32 VM arguments: [-XX:StartFlightRecording=name=MyRecording2,settings=d:\dbg\gerd.jfc, -XX:FlightRecorderOptions=defaultrecording=true,dumponexit=true,dumponexitpath=e:\ld\perf_20210119_072729.jfr] Plugins: + ColumbusCSV (35640) + buildings_tools (35669) + o5m (35640) + pbf (35650) + poly (35640) + reltoolbox (35640) + reverter (35688) + undelete (35640) + utilsplugin2 (35682) Validator rules: + c:\josm\core\resources\data\validator\geometry.mapcss
Attachments (1)
Change History (23)
comment:1 by , 4 years ago
Keywords: | performance added |
---|
comment:2 by , 4 years ago
I guess it is a major change and I'll probably not have time to work on this soon. Maybe a GSOC project?
comment:3 by , 11 months ago
I guess it should be rather simple to reduce the number of instances of RelationMemberData with a cache, similar to String.intern().
But it seems the major issue now is that the server will block the user who downloads such a large history for 300 seconds or so:
2024-01-23 21:45:29.270 SEVERE: org.openstreetmap.josm.io.OsmApiException: ResponseCode=509, Error Header=<You have downloaded too much data. Please try again in 289 seconds.>
No idea if that can be avoided on the JOSM side.
by , 11 months ago
Attachment: | 20405.patch added |
---|
simple patch that implements a cache for members in history relation in the parser, very effective when relations have many versions as most members are the same in each version
comment:4 by , 11 months ago
For a rather normal relation like Hunteradweg r151754 (current version is 290 and has 516 members)
- 98977 instances of RelationMemberData without patch
- 743 instances of RelationMemberData with patch
comment:5 by , 11 months ago
Milestone: | → 24.01 |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Summary: | History browser for complex relations requires lots of memory → [Patch] History browser for complex relations requires lots of memory |
comment:7 by , 11 months ago
@GerdP: Stupid question: Why not memberCache.computeIfAbsent(member, Function.identity());
? You don't have to change anything, but it seems that is (semantically) what you were trying to do.
comment:8 by , 11 months ago
Simple answer: I've never seen Function.identity(). Have to learn what it means.
comment:9 by , 11 months ago
Function.identity()
is effectively just an object -> object
lambda.
EDIT: I used it in my question since I thought it would be simpler/easier to read than memberCache.computeIfAbsent(member, m -> m);
comment:10 by , 11 months ago
OK, like this? It is fewer code but I wouldn't understand it without a debugger.
-
src/org/openstreetmap/josm/io/AbstractParser.java
6 6 import java.time.Instant; 7 7 import java.util.HashMap; 8 8 import java.util.Map; 9 import java.util.function.Function; 9 10 10 11 import org.openstreetmap.josm.data.coor.LatLon; 11 12 import org.openstreetmap.josm.data.osm.OsmPrimitiveType; … … 185 186 String role = getMandatoryAttributeString(atts, "role"); 186 187 RelationMemberData member = new RelationMemberData(role, type, ref); 187 188 // see #20405: cache equal instances of members 188 RelationMemberData cachedMember = memberCache .putIfAbsent(member, member); 189 if (cachedMember == null) 190 cachedMember = member; 191 ((HistoryRelation) currentPrimitive).addMember(cachedMember); 189 ((HistoryRelation) currentPrimitive).addMember(memberCache.computeIfAbsent(member, Function.identity())); 192 190 } 193 191 194 192 protected final boolean doStartElement(String qName, Attributes atts) throws SAXException {
comment:11 by , 11 months ago
You could do it like that, or
-
src/org/openstreetmap/josm/io/AbstractParser.java
6 6 import java.time.Instant; 7 7 import java.util.HashMap; 8 8 import java.util.Map; 9 import java.util.function.Function; 9 10 10 11 import org.openstreetmap.josm.data.coor.LatLon; 11 12 import org.openstreetmap.josm.data.osm.OsmPrimitiveType; … … 185 186 String role = getMandatoryAttributeString(atts, "role"); 186 187 RelationMemberData member = new RelationMemberData(role, type, ref); 187 188 // see #20405: cache equal instances of members 188 RelationMemberData cachedMember = memberCache .putIfAbsent(member, member); 189 if (cachedMember == null) 190 cachedMember = member; 189 RelationMemberData cachedMember = memberCache.computeIfAbsent(member, m -> m); 191 190 ((HistoryRelation) currentPrimitive).addMember(cachedMember); 192 191 } 193 192 194 193 protected final boolean doStartElement(String qName, Attributes atts) throws SAXException {
comment:14 by , 11 months ago
OK, I think that's indeed easier to read.
I wasn't originally asking about it for readability -- I assumed you had profiled it and it wasn't as efficient with large relations, so you were deliberately avoiding it.
So I would have been happy with a source code comment like Using Map#putIfAbsent instead of Map#computeIfAbsent since the lambda takes up <u> memory allocations and <x> cpu cycles when profiled against <y> relation with <z> members
.
comment:15 by , 11 months ago
No, didn't think about cpu cycles at all. Most of the time is spent waiting for the download...
comment:16 by , 11 months ago
Most of the time is spent waiting for the download...
You are probably right. I tend to profile a lot of random things. Some of which don't need to be profiled. But some of which can become problematic with larger datasets.
Browse history of r2795128 (EuroVelo 3 - Pilgrim's Route - part Germany), a relation with > 1200 versions and > 4600 members
5k members with 1.5k revisions would have (I would guess) at most 7.5 million calls to that method (I'm assuming we call it on every revision for every member). Probably a lot less, depending upon how that relation grew.
Anyway, profiling it:
- Most of the time in
handleMember
is spent inMap#computeIfAbsent
, but most of that is due to the currenthashCode
implementation ofRelationMemberData
. This also does a bunch of memory allocations.- CPU time was ~2% of total
- Memory allocations were ~8% of total
OsmPrimitiveType#fromApiTypeName
took up another 6.3% of the memory allocations and 1.5% of the CPU cycles; the memory allocations would be easy to fix by having an internal copy of theOsmPrimitiveType#values
.- The
G1
garbage collector took up 12% of the CPU cycles.
With all that said, the performance wasn't too bad, so probably not something we need to fix.
comment:17 by , 11 months ago
My only concern when I created this ticket was the heap memory. I think at that time the history relation was kept after the history dialog was closed. That was changed with r17471, but in a rather ugly way.
I think it would be good to cache changeset files (of closed changesets) so that they are not downloaded again and again when you open the history for a relation multiple times.
comment:18 by , 11 months ago
That would probably be a good idea -- I ran into rate limiting when testing.
Are you thinking in-memory or cached to disk? I'd do the latter instead of the former or a mix of both. And closed changesets cannot change, so caching them would be "safe" (which is probably why you specified "closed changesets").
I don't think CachedFile
would be a good choice (it pollutes the preferences); a JCS cache would be better, I think.
Maybe we should write a generic cache for network requests? Specifically so that we can reuse it in different places for network requests that will not change/will not change often.
comment:19 by , 11 months ago
Yes, I had something like a disk cache in mind, but I wouldn't be able to write the code. See comment:2
comment:20 by , 11 months ago
See also #21563.
A JCS cache would be really nice. In comment 4 on #21563 mmd wrote about the option to use multi_fetch with version numbers which might be considered, as well.
comment:21 by , 11 months ago
I just wonder how to treat redactions in this case. Does a redaction on the server side also mean that the content or existence of a changeset is changed?
comment:22 by , 11 months ago
I didn't think about redactions; I'd want to have a comparison of the before/after headers.
But a HEAD
request isn't going to get us anywhere -- there isn't a Last-Modified field in the response headers anyway.
Replying to GerdP:
wow
Indeed!