Modify

Opened 14 years ago

Last modified 3 years ago

#4626 new enhancement

parallelize validator checks

Reported by: Matt Toups <mtoups@…> Owned by: team
Priority: normal Milestone:
Component: Core validator Version: latest
Keywords: performance, java7 Cc: Don-vip, GerdP, skyper

Description (last modified by Don-vip)

Hi, I run JOSM on a quad-core system and I want to validate some very large pieces of OSM data. Some checks with validator run for a long time, and only one CPU core is in use while the check runs.

It would be nice if checks could be run in parallel to utilize the other idle CPU cores on my system. (Perhaps even some "fix" operations could be parallelized for performance gains.)

Thanks!

Attachments (2)

4626_alpha.patch (6.7 KB ) - added by Don-vip 11 years ago.
Some work in progress I had some time ago. Solution may be need to be redesigned to implement UI stuff
4626.patch (7.7 KB ) - added by simon04 3 years ago.

Download all attachments as: .zip

Change History (29)

by Don-vip, 11 years ago

Attachment: 4626_alpha.patch added

Some work in progress I had some time ago. Solution may be need to be redesigned to implement UI stuff

comment:1 by Don-vip, 11 years ago

Description: modified (diff)

This could also be interesting to see if tests taking a long time to perform could benefit from the new Fork/Join framework introduced in Java 7.

comment:2 by Don-vip, 11 years ago

Keywords: java7 added

comment:3 by Don-vip, 10 years ago

Milestone: 14.05

comment:4 by skyper, 10 years ago

Cc: skyper added

comment:5 by Don-vip, 10 years ago

Milestone: 14.0514.06

Too complicated/risky for this release.

comment:6 by Don-vip, 10 years ago

Milestone: 14.0614.07

Move all tickets for which no work has been done yet to next milestone

comment:7 by Don-vip, 10 years ago

Milestone: 14.0714.08

Move some tickets to next milestone

comment:8 by Don-vip, 10 years ago

In 7423/josm:

see #9680, see #4626 - multithreaded multipolygon creation

comment:9 by Don-vip, 10 years ago

Milestone: 14.08

comment:10 by anonymous, 6 years ago

Currently, MapCSS validation takes way too long on large datasets (few hundred MB). Is it possible to run the check as a multithreaded process?
My 4 core, 8 threaded CPU used 15% while running MapCSS validation. But the speedup welcomed on the other checks as well.

comment:11 by simon04, 3 years ago

In 17615/josm:

see #4626 - Extract PerformanceTestUtils.getNeubrandenburgDataSet

comment:12 by simon04, 3 years ago

In 17616/josm:

see #4626 - Extract class ValidationTask

comment:13 by simon04, 3 years ago

In 17617/josm:

see #4626 - Add ValidationTaskPerformanceTest

comment:14 by simon04, 3 years ago

In 17618/josm:

see #4626 - PerformanceTestUtils.PerformanceTestTimer: use Stopwatch

comment:15 by simon04, 3 years ago

In 17619/josm:

see #4626 - Extract class MapCSSTagCheckerRule

comment:16 by simon04, 3 years ago

In 17620/josm:

see #4626 - Extract interface MapCSSTagCheckerFixCommand

comment:18 by simon04, 3 years ago

In 17625/josm:

see #4626 - Fix StyleCacheTest (cannot use PerformanceTestUtils)

comment:19 by simon04, 3 years ago

Cc: Don-vip GerdP added

Running all tests (that is, subclasses of org.openstreetmap.josm.data.validation.Test) in parallel yields only a

@@ ValidationTaskPerformanceTest
-[8307, 7321, 7285, 7293, 7280, 7410, 7424, 7298] ms
+[7016, 6383, 6421, 6369, 6307, 6121, 6156, 6384] ms

Running MapCSSTagChecker parallel on all primitives yields a nice performance gain, but sporadically triggers some ConcurrentModificationException (the various HashMap instances make a parallel usage dangerous/impossible)

+[3876, 3400, 3000, 3045, 2938, 2919, 3027, 3222] ms

Running MapCSSTagChecker parallel on the MapCSSTagChecker#checks yields a smaller speed-up:

+[2950, 3691, 3129, 4401, 5933, 4363, 3287, 4877] ms

by simon04, 3 years ago

Attachment: 4626.patch added

comment:20 by GerdP, 3 years ago

Some tests may require lots of memory per primitive, esp. the geometry tests for overlaps or insidness. Please make sure that you test this as well.

comment:22 by GerdP, 3 years ago

Performance improvemant with 4626.patch really looks good. So far I've not seen a ConcurrentModificationException. Did you find a fix for that already?
A few things that I noticed:

  • Cancel button doesn't work
  • tests seem to be executed in random order? Might cause trouble because we have some logic that expects the current order
  • Progress monitor doesn't show the MapCSSTests at all and "blames" a random - typically fast - java test for quite a while, probably when a complex geometry test takes long
  • I see wrong results when I use upload and my data contains new objects which should produce warnings, e.g. with two new water areas inside the same water area I get only one warning

comment:23 by GerdP, 3 years ago

Reg. upload:
This seems to help, maybe you have an idea how to avoid the code duplication?

  • src/org/openstreetmap/josm/actions/upload/ValidateUploadHook.java

     
    1919import org.openstreetmap.josm.data.validation.Severity;
    2020import org.openstreetmap.josm.data.validation.Test;
    2121import org.openstreetmap.josm.data.validation.TestError;
     22import org.openstreetmap.josm.data.validation.tests.MapCSSTagChecker;
    2223import org.openstreetmap.josm.data.validation.util.AggregatePrimitivesVisitor;
    2324import org.openstreetmap.josm.gui.ExtendedDialog;
    2425import org.openstreetmap.josm.gui.MainApplication;
     
    6364            test.setBeforeUpload(true);
    6465            test.setPartialSelection(true);
    6566            test.startTest(null);
    66             test.visit(selection);
     67            if (test instanceof MapCSSTagChecker) {
     68                // VARIANT 1
     69                selection.parallelStream()
     70                        .filter(test::isPrimitiveUsable)
     71                        .forEach(p -> p.accept(test));
     72            } else {
     73                test.visit(selection);
     74            }
     75
    6776            test.endTest();
    6877            if (ValidatorPrefHelper.PREF_OTHER.get() && ValidatorPrefHelper.PREF_OTHER_UPLOAD.get()) {
    6978                errors.addAll(test.getErrors());

I see a potential problem regarding the HashSet surrounding in MapCSSTagChecker:
With the default validator rules there is only geometry.mapcss that requires this, but if a user adds his own spatial tests in another *.mapcss file we'll see problems with concurrent updates. Same is probably true for mpAreaCache.

comment:24 by GerdP, 3 years ago

Ah, stop. The new code simply skips the test of the surrounding objects with a partial selection, so this requires more work.

in reply to:  22 comment:25 by simon04, 3 years ago

Replying to GerdP:

So far I've not seen a ConcurrentModificationException. Did you find a fix for that already?

The synchronized (errors) accommodates for the exceptions I've seen while testing. Unsynchronised access to HashMap is a ticking time-bomb, as have shown various bugs in the past.

A few things that I noticed:

  • Cancel button doesn't work
  • Progress monitor doesn't show the MapCSSTests at all and "blames" a random - typically fast - java test for quite a while, probably when a complex geometry test takes long

I've deliberately left out those aspects for the proof-of-concept.

  • tests seem to be executed in random order? Might cause trouble because we have some logic that expects the current order

Yeah. If there are inter-dependencies between different tests, parallelising the validator will never work unless we ensure those dependencies via e.g. Future or CompletableFuture.

Sigh, in principle everything sounds easy, but JOSM has accumulated various peculiarities over 15 years...

comment:26 by GerdP, 3 years ago

Reg. the order I might be wrong. There was a patch that required this but I think it wasn't committed yet.

comment:27 by GerdP, 3 years ago

Found it, see last patch for #17528

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The owner will remain team.
as The resolution will be set. Next status will be 'closed'.
to The owner will be changed from team to the specified user.
Next status will be 'needinfo'. The owner will be changed from team to Matt Toups <mtoups@…>.
as duplicate The resolution will be set to duplicate. Next status will be 'closed'. The specified ticket will be cross-referenced with this ticket.
The owner will be changed from team to anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.