Opened 2 years ago
Last modified 14 months ago
#22614 new defect
[WIP patch] Combine ways is very slow
Reported by: | GerdP | Owned by: | team |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | Core | Version: | |
Keywords: | template_report performance | Cc: | gaben, stoecker |
Description
What steps will reproduce the problem?
- load attached file
- select all ways
- press C to combine them
What is the expected result?
A popup that says that the ways cannot be combined because several nodes have too many connections
What happens instead?
JOSM keeps one CPU core busy in method org.openstreetmap.josm.data.osm.NodeGraph.buildSpanningPath()
Please provide any additional information below. Attach a screenshot if possible.
Relative:URL: ^/trunk Repository:UUID: 0c6e7542-c601-0410-84e7-c038aed88b3b Last:Changed Date: 2022-12-08 17:09:09 +0100 (Thu, 08 Dec 2022) Revision:18612 Build-Date:2022-12-11 02:30:55 URL:https://josm.openstreetmap.de/svn/trunk Identification: JOSM/1.5 (18612 en) Windows 10 64-Bit OS Build number: Windows 10 Home 2009 (19044) Memory Usage: 335 MB / 1972 MB (206 MB allocated, but free) Java version: 17.0.4+8-LTS, Azul Systems, Inc., OpenJDK 64-Bit Server VM Look and Feel: com.sun.java.swing.plaf.windows.WindowsLookAndFeel Screen: \Display0 1920×1080 (scaling 1.00×1.00) Maximum Screen Size: 1920×1080 Best cursor sizes: 16×16→32×32, 32×32→32×32 System property file.encoding: Cp1252 System property sun.jnu.encoding: Cp1252 Locale info: en_DE Numbers with default locale: 1234567890 -> 1234567890 VM arguments: [-Djpackage.app-version=1.5.18531, --add-modules=java.scripting,java.sql,javafx.controls,javafx.media,javafx.swing,javafx.web, --add-exports=java.base/sun.security.action=ALL-UNNAMED, --add-exports=java.desktop/com.sun.imageio.plugins.jpeg=ALL-UNNAMED, --add-exports=java.desktop/com.sun.imageio.spi=ALL-UNNAMED, --add-opens=java.base/java.lang=ALL-UNNAMED, --add-opens=java.base/java.nio=ALL-UNNAMED, --add-opens=java.base/jdk.internal.loader=ALL-UNNAMED, --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED, --add-opens=java.desktop/javax.imageio.spi=ALL-UNNAMED, --add-opens=java.desktop/javax.swing.text.html=ALL-UNNAMED, --add-opens=java.prefs/java.util.prefs=ALL-UNNAMED, -Djpackage.app-path=%UserProfile%\AppData\Local\JOSM\HWConsole.exe] Dataset consistency test: No problems found Plugins: + OpeningHoursEditor (35924) + apache-commons (36034) + buildings_tools (36011) + contourmerge (v0.1.9) + ejml (35924) + geotools (36028) + jackson (36034) + jaxb (35952) + jts (36004) + o5m (35893) + opendata (36025) + pbf (36034) + poly (35976) + reltoolbox (35976) + reverter (36043) + undelete (36011) + utilsplugin2 (36011) Validator rules: + c:\josm\core\resources\data\validator\combinations.mapcss + c:\josm\core\resources\data\validator\geometry.mapcss + c:\josm\core\resources\data\validator\unnecessary.mapcss + d:\java_tools\JOSM\mygeometry.mapcss + https://josm.openstreetmap.de/josmfile?page=Rules/GermanySpecific&zip=1 Last errors/warnings: - 00000.572 W: extended font config - overriding 'filename.Myanmar_Text=mmrtext.ttf' with 'MMRTEXT.TTF' - 00000.574 W: extended font config - overriding 'filename.Mongolian_Baiti=monbaiti.ttf' with 'MONBAITI.TTF' - 00000.972 E: java.security.KeyStoreException: Windows-ROOT not found. Cause: java.security.NoSuchAlgorithmException: Windows-ROOT KeyStore not available
Attachments (8)
Change History (31)
by , 2 years ago
Attachment: | slow_combine.osm.xz added |
---|
comment:1 by , 2 years ago
Keywords: | performance added |
---|---|
Priority: | normal → minor |
comment:2 by , 2 years ago
I took a look, and the CPU cycles appears to be dominated in HashMap get operations and HashSet contains/remove/add methods. There is also a bit from creating/adding to an ArrayDeque. Most of the contains/remove code is in hashCode
, which is also the major creator of new objects.
Some optimizations I can see:
- Optimize NodePair hashCode -- it looks like this is hit frequently on that code path. This currently takes ~15% of the CPU cycles. Maybe
a.hashCode() * 31 + b.hashCode()
? This will also save an array allocation. - Marking NodePair as
final
with an intent to change it to avalue
/record
class might help from a memory perspective -- most of the memory (over an hour profiling, I was kind of ignoring it) was innew NodePair
from NodePair#swap, followed by HashSet#add, getOutboundPairs, and Deque#addAll. Note: I had changed the implementation ofhashCode
to be31 * a.hashCode() + b.hashCode()
, which might be why I didn't see hashCode as a major issue in the hour-long run.
But what I think is going on here is a n2 (or worse) problem. Either that, or we are magically going in circles.
In any case, it should bail at some point (see nodes at 52.9903756, 8.2774942
).
comment:3 by , 2 years ago
I don't think that it will help much to optimize the code. I am pretty sure the loop doesn't end with this input.
I didn't invest much time but it seems that the algo has no special treatment for nodes with exactly two connected ways and those for nodes with more. I am not sure if the code is meant to combine segments so that they form p-shaped ways? Else any node with 3+ ways would be a clear signal to stop.
comment:4 by , 2 years ago
It is true that it won't help the endless loop problem to optimize the code. I was mostly looking at the optimization paths as something to look at later, once we fix the endless loop problem.
Anyway, it appears to have something to do with the number of nodes. Or maybe a specific node.
Simplify Way | Nodes removed | Time spent |
---|---|---|
100 | 4362 | 11ms |
10 | 3589 | 14ms |
9.5 | 3565 | 13ms |
9 | 3535 | 14ms |
8.9 | 3531 | 17ms |
8.8 | 3521 | 16ms |
8.7 | 3513 | 15ms |
8.6 | 3504 | 14ms |
8.5 | 3496 | failed |
comment:5 by , 2 years ago
I'll see if I can provide a much smaller sample that shows the loop. Maybe the algorithm that we use requires something that is not assured.
by , 2 years ago
Attachment: | 22614-smaller1.osm added |
---|
much smaller dataset which still seems to loop forever
by , 2 years ago
Attachment: | 22614-smaller2-finishes.osm added |
---|
this even smaller one finishes with the expected message, but also requires a few seconds on my PC
by , 2 years ago
Attachment: | 22614.patch added |
---|
Could it be that simple? Passes all unit tests and improves speed drastically
comment:6 by , 2 years ago
That really makes me wonder why we have dupCheck.remove
in the first place. (see r15574 for removeLast -- regression test was added, see r15554 for initial version -- no test added). You'd be the best person to know. :)
We should probably test the patch against attachment:Brandenburg.osm:ticket:18368.
comment:7 by , 2 years ago
My patch works fine with the brandenburg.osm data, but it doesn't work with the combineways_good.osm example. So, no, it's not that simple.
There is a very simply check that we don't perform yet: If there are more than two terminal nodes (my complex example contains 4) we can stop, but of course this doesn't solve the general performance problem. I'll see if I can find a better algo to build the spanning graph.
comment:8 by , 2 years ago
Cc: | added |
---|
comment:9 by , 2 years ago
If I got that right the problem is NP-hard, so no matter how fast we implement the algo we can always hit the situation that the action will not finish in a reasonable time. So, I think we should use the PleaseWaitProgressMonitor
so that the user can cancel the action.
Maybe always, maybe only when there are at least n nodes with more than two attached segments.
comment:10 by , 2 years ago
The fun thing is that if we simplify the ways in slow_combine.osm.xz, it works. So there has got to be something else we are missing.
I'd almost modify NodeGraph such that it builds the NodePairs with only nodes that are first/last nodes, or have more than 1 parent way.
But adding a ProgressMonitor
is probably a better option.
comment:11 by , 2 years ago
Sure, the current implementation wastes lots of time with nodes where only one connection is possible, so execution time and memory consumption time depends on the number of nodes in the ways instead of the complexity of the simplified graph. OTOH it is unlikely that anybody really needs the result, I jsut stumbled over this
I just tried to implement a solution with ProgressMonitor
but I don't seem to remember how to submit a task AND wait until it is either canceled of finished.
comment:12 by , 2 years ago
The problem is that a user could accidentally freeze JOSM, potentially losing some work.
I just tried to implement a solution with
ProgressMonitor
but I don't seem to remember how to submit a task AND wait until it is either canceled of finished.
It looks like you want to use PleaseWaitRunnable
.
comment:13 by , 2 years ago
Yes, but I failed to find out how to submit the task and wait for the result / cancelation.
by , 2 years ago
Attachment: | 22614_faster1.patch added |
---|
comment:14 by , 2 years ago
With this rather simply patch the performance problems mentioned in this ticket and also #22651 are solved or at least very much improved.
The change in NodePair.toString()
just helps while debugging data with new nodes, it is not required.
The unit test testTicket18367NeedsSplit
doesn't pass, I think it checks the wrong thing. The data requires a split but it doesn't require a reversed edge, so the test should be changed.
I no longer like the code in joinWithMultipolygonCode()
, this was only added to solve performance problems but it also allows to combine ways with overlapping segments and - according to #18368 - this should be allowed and it is tested with another unit test.
I'll try to find a better solution in NodeGraph to handle overlapping segments.
comment:15 by , 2 years ago
Cc: | added |
---|
@all:
I don't see a way to handle overlapping way segments with the current implementation in NodePair
and NodeGraph
which are also used in some plugins (opendata
, merge-overlap
). These classes are designed to handle overlapping segments as equal, therefore it is not possible to build a path containing overlapping segments . Only the code in CombineWayAction.joinWithMultipolygonCode(Collection<Way> ways)
allows this and I didn't really intend this, it was just a side effect.
Question is if I can simply remove that code so that CombineWayAction
never combines ways with overlapping segments? It would allow to remove a lot of code.
The alternative would be to write new classes and unit tests with different data structures, maybe with settings so that the user can decide wether actions should refuse to create invalid OSM data or not.
comment:16 by , 2 years ago
Hmm, combining overlapping ways is a long standing issue as far as I know. In the early days it was necessary because areas have used overlaps to make holes.
I think self-overlapping ways don't have any use in OSM nowadays, so we probably don't need to support them anymore. But when we reject to support them we must prevent a proper UI, so that the user gets feedback why combining is rejected.
E.g. I sometimes join boundaries into single ways for export and it is a nightmare to find the reasons why it fails. Usually I got step by step and when an error occurs do smaller steps.
What I probably would like in case of an error is that combine combines what it can and leaves the remains, so that error tracking gets easier, because majority of stuff was done.
comment:17 by , 2 years ago
Maybe a compromise could be to support overlapping ways only in special cases, e.g. when selecting two ways and not more?
comment:18 by , 2 years ago
so that the user gets feedback why combining is rejected.
I also think this should be done. After writing the last comment I understood that the code sometimes stops too early, so another patch will follow. Maybe we should add an info like "ways are not all connected" if NodeGraph.isConnected() returns false, or "too many terminal nodes" or "way would contain overlapping segments".
What I probably would like in case of an error is that combine combines what it can and leaves the remains, so that error tracking gets easier, because majority of stuff was done.
I used this trick: Create a temporary type=route relation with all the selected ways and use the sort in the relation editor. Helps to find the gaps (with jump to next gap). I wouldn't want to duplicate all that code in CombineWaysAction
.
Maybe a compromise could be to support overlapping ways only in special cases, e.g. when selecting two ways and not more?
You probably think of the very simple case that 2 (or more) ways can be connected without splitting them? That would be code similar to the one for multipolygons, but without the restriction that they must form closed rings.
comment:19 by , 2 years ago
You probably think of the very simple case that 2 (or more) ways can be connected without splitting them? That would be code similar to the one for multipolygons, but without the restriction that they must form closed rings.
I mean for 2 ways you only need to check whether one of the end-points matches the end-point of the other way. Everything else can be ignored. That may result in strange results, but that way there is a chance to produce such strange results if they are needed.
comment:20 by , 2 years ago
Hmm, I wonder if the code in NodeGraph.buildSpanningPath()
tries to implement Fleury's algorithm to find a Eulerian path https://en.wikipedia.org/wiki/Eulerian_path#Fleury's_algorithm without using recursion. It probably should, but the chosen data structures seem to be wrong. I have to dig deeper...
by , 2 years ago
Attachment: | 22614_check_eulerian.patch added |
---|
comment:21 by , 2 years ago
Just some findings:
The patch implements code to check if the graph has an Eulerian path or Circuit. This fast and neccessary check was missing in the unpatched code, but I don't know if this will avoid the loop.
This page https://www.geeksforgeeks.org/eulerian-path-undirected-graph/ claims to contain efficient code to find the path, including java code.
I've not yet made up my mind whether we have to support directed graphs, and the mixture of directed and undirected which we use in createNearlyUndirectedGraphFromNodeWays()
is not handled in the theory. I think we have to get rid of that method first.
by , 19 months ago
Attachment: | 22614-alt-algo.patch added |
---|
WIP patch for alternative algorithm to find spanning path
comment:22 by , 19 months ago
I've last worked on this patch in January, so it is not really work in progress, but maybe it is useful in combination with #21881. The problem is that it sometimes finds different results compared to the original algo and it is hard to say which result is better. I think it would be best to find a graph that doesn't require any reversing of way segments.
comment:23 by , 14 months ago
Summary: | Combine ways is very slow → [WIP patch] Combine ways is very slow |
---|
(I'm just putting the patch existence in the title)
There seems to be something special with this data, I tried with some smaller sub sets of the ways and saw the expected immediate response.
No real problem for me, I was just experimenting with the data to produce a GPX file with all the ways I recently cycled...