Modify

Opened 3 years ago

Closed 3 months ago

Last modified 3 months ago

#20425 closed defect (fixed)

[Patch] Duplicate Relation test is too lazy/too aggressive

Reported by: GerdP Owned by: team
Priority: normal Milestone: 24.02
Component: Core validator Version:
Keywords: Cc:

Description

Follow up of #20393, #20424 and #19956
The test that produces message "Duplicated relations" is a quite misleading. With the current implementation (r17476)

  • it ignores all tag keys returned by AbstractPrimitive.getUninterestingKeys() (e.g. description, note, fixme, comment)
  • it ignores members which appear multiple times (eg. two times in one relation, only once in the other)
  • it ignores the order of members (wich is good for multipolygons but not for route relations)

It's too lazy because

  • it will flag two multipolygon relations with the same tags and the same geometry if they use different ways, but only when those different ways have the same nodes in the same order and the same tags, so I am not sure if the test really tries to find the equal geometries.
  • it ignores relations with identical member lists when any of the members isn't downloaded (#20242)

The major problem: It offers an autofix which also ignores these differences and might remove the "better" version of a relation, e.g. it might keep the one with the fixme tag.
I wonder if I should try to improve the code or better simply remove the autofix or the complete code.

Attachments (4)

dup-bus-route.osm (1.6 KB ) - added by GerdP 3 months ago.
two bus routes, same tags, different order of members
20425.patch (19.3 KB ) - added by GerdP 3 months ago.
20425.2.patch (19.3 KB ) - added by GerdP 3 months ago.
add missing check for empty list
20425.3.patch (19.4 KB ) - added by GerdP 3 months ago.

Download all attachments as: .zip

Change History (12)

comment:1 by skyper, 3 years ago

In order to delete one of the relations you need to take a look at the history of both. Please, remove the autofix.

comment:2 by GerdP, 3 months ago

Looking at this again. According to #5642 the test was written to find duplicated multipolygons in imports. My thoughts:

  1. The test should be changed to produce only one error for a given pair set of relations, either code 1901 "Duplicated relations" or code 1902 "Relations with same members" or maybe another new message, see below. Edit: This can be tricky, so I consider it less important.
  2. The test that produces the informational message "Relations with same members" should ignore the order of the members and also the number of occurences. It should only compare roles and member types and unique ids and not there geometry. This allows to compare incomplete relations.
  3. Relations with different tags but identical member lists should have a different text, maybe "Relations with identical member lists". In this case order and number of occurences of members should matter.
  4. A message "Duplicated relations" requires knowledge about the meaning of the order of the members. If the order matters it should be compared, else ignored, but different numbers of the same member should never be ignored. I think tags like source=* or fixme=* should not be ignored here, but those which we call "discardable" can be ignored.
  5. The current test tries to compare the geometry, this requires a pair of complete relations. It seems reasonable to ignore the "uninteresting" tags like source in this case and also to offer an autofix which removes the newer relation. I am just not happy with the message "Duplicated relations". I'd prefer to have something like "Relations with equal meaning describe identical geometry". I wonder if this geometry test should better be done in MultipolygonTest?
  6. Regarding autofix: Besides the special case above I think only two relations with identical list of members and equals tags can be autofixed by removing the newer relation. Or do we have to ignore "discardable" tags here?
Last edited 3 months ago by GerdP (previous) (diff)

by GerdP, 3 months ago

Attachment: dup-bus-route.osm added

two bus routes, same tags, different order of members

comment:3 by GerdP, 3 months ago

@skyper: Please try attached dup-bus-route.osm. Current code in tested r18969 reports a duplicate relation although the order of the members is different (reversed). I wonder how to improve this. I think two (or more) route relations with the same tags are more or less always problematic unless they appear in completely different areas?
Do we have other relation types where the order of the members does matter?

in reply to:  3 comment:4 by skyper, 3 months ago

Replying to GerdP:

I wonder how to improve this.

Maybe add more detailed warnings when the order matters:

  • identical (same tags and same order)
  • duplicate (same tags but different order)
  • similar (different tags but same members in same order)
  • same members (same members but different tags and order)?

I think two (or more) route relations with the same tags are more or less always problematic unless they appear in completely different areas?

Yes, but the tags could be incomplete, e.g. most route should have an operator and a ref (symbol or name) which should make the difference in most cases. You example is the same. It could be two incompletely mapped relations (missing the stops) going in opposite directions. Theoretically you could have a two relations completely mapped using only stop_position as "stop" and same tags but even than they should have at least one tag describing the difference.
Currently, we do not count all tags as description, FIXME, note plus some more are ignored which needs to be thought about, again, and which could lead to more categories above if only "uninteresting" tags are different.

Do we have other relation types where the order of the members does matter?

From current presets in core there is additionally only type=waterway. Though, I have to say that I lost track of all used/proposed relations.

by GerdP, 3 months ago

Attachment: 20425.patch added

comment:5 by GerdP, 3 months ago

Summary: Duplicate Relation test is too lazy/too aggressive[Patch] Duplicate Relation test is too lazy/too aggressive

by GerdP, 3 months ago

Attachment: 20425.2.patch added

add missing check for empty list

comment:6 by GerdP, 3 months ago

Ths patch has a performance issue with large numbers of relations. So work in progress...
The patch was simply wrong so that super.endTest() was not called. That means the progress bar is not updated while further tests are running.
I keep forgetting that I should not use if ... return; in endTest()
Fixed with 20425.3.patch

Last edited 3 months ago by GerdP (previous) (diff)

by GerdP, 3 months ago

Attachment: 20425.3.patch added

comment:7 by GerdP, 3 months ago

Resolution: fixed
Status: newclosed

In 18976/josm:

fix #20424, fix #20425: Duplicate relations not detected with incomplete members, Duplicate Relation test is too lazy/too aggressive

  • add code to compare just members so that it doesn't matter if they are complete
  • change original test which compares geometry to use a list instead of a set to store members. That means that the order of members is taken into account as well as duplicated members
  • before comparing tags only the "discardable" tag keys are removed, previously all "uninteresting" keys were removed, so fixme=* or note=* was ignored
  • if two relations have the same members in the same order but different tags the new message informational message "Identical members" is produced.

comment:8 by GerdP, 3 months ago

Milestone: 24.02

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.