Opened 6 years ago
Closed 6 years ago
#17521 closed enhancement (fixed)
Complain about invisible characters (unicode bidi control) in tags
Reported by: | mkoniecz | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | 19.03 |
Component: | Core validator | Version: | |
Keywords: | template_report unicode bidi control character | Cc: |
Description (last modified by )
What steps will reproduce the problem?
- Load https://www.openstreetmap.org/way/40545876/history in version 4
- Run validator
What is the expected result?
Validator complains about invisible characters and offers to remove them.
What happens instead?
Unknown property value - Value 'ground' for key 'surface' is unknown, maybe 'ground' is meant? (1)
It complains but in way that is extremely confusing.
Please provide any additional information below. Attach a screenshot if possible.
This specific issue should be soon gone as result of https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/elimination_of_nonprintable_characters_at_start_or_end_of_tags but just this specific malformed ground value was present 6k times
URL:https://josm.openstreetmap.de/svn/trunk Repository:UUID: 0c6e7542-c601-0410-84e7-c038aed88b3b Last:Changed Date: 2019-03-24 22:30:59 +0100 (Sun, 24 Mar 2019) Build-Date:2019-03-25 02:30:52 Revision:14927 Relative:URL: ^/trunk Identification: JOSM/1.5 (14927 en) Linux Ubuntu 16.04.6 LTS Memory Usage: 392 MB / 869 MB (166 MB allocated, but free) Java version: 1.8.0_201-b09, Oracle Corporation, Java HotSpot(TM) 64-Bit Server VM Screen: :0.0 1920x1080 Maximum Screen Size: 1920x1080 Dataset consistency test: No problems found Plugins: + OpeningHoursEditor (34867) + buildings_tools (34904) + continuosDownload (82) + imagery_offset_db (34867) + measurement (34867) + reverter (34946) + todo (30306) Validator rules: + ${HOME}/Desktop/tmp/unnecessary.validator.mapcss Last errors/warnings: - W: Invalid jar file ''<josm.userdata>/plugins/reverter.jar.new'' (exists: false, canRead: false) - W: No configuration settings found. Using hardcoded default values for all pools. - W: java.net.SocketException: Socket closed - E: java.net.SocketException: Socket closed
Attachments (2)
Change History (15)
comment:1 by , 6 years ago
Description: | modified (diff) |
---|
comment:2 by , 6 years ago
Description: | modified (diff) |
---|
by , 6 years ago
Attachment: | sample.osm added |
---|
comment:3 by , 6 years ago
comment:4 by , 6 years ago
Do you have a hint how to detect those characters in Java? In this case it is 0x202c. I assume there are more?
comment:5 by , 6 years ago
comment:6 by , 6 years ago
Not really. I've already started to code a similar method `containsNonPrintable()' but up to now I found no general rule
to detect characters which are not displayed. All the code snippets that I found so far would not return true for the sample.
comment:7 by , 6 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
Closed as duplicate of #15645.
comment:8 by , 6 years ago
by , 6 years ago
comment:10 by , 6 years ago
Keywords: | unicode bidi control character added |
---|---|
Summary: | Complain about invisible characters in tags → Complain about invisible characters (unicode bidi control) in tags |
It's an Unicode bidi control character: https://en.wikipedia.org/wiki/Unicode_control_characters#Bidirectional_text_control
ASCII control characters are already detected (that's effectively tracked in #15645)
comment:11 by , 6 years ago
JDK implementation (in sun.text.bidi.BidiBase
) is:
static boolean IsBidiControlChar(int c) { /* check for range 0x200c to 0x200f (ZWNJ, ZWJ, LRM, RLM) or 0x202a to 0x202e (LRE, RLE, PDF, LRO, RLO) */ return (((c & 0xfffffffc) == 0x200c) || ((c >= 0x202a) && (c <= 0x202e))); }
comment:12 by , 6 years ago
Milestone: | → 19.03 |
---|
Thanks for finding this. I've already noticed these strange tags in taginfo but wasn't able to find one in OSM.
I've attached a sample file based on the wrong tag in the mentioned way.