Modify

Opened 8 months ago

Closed 7 months ago

#23235 closed defect (fixed)

Correct url is reported with invalid path

Reported by: GerdP Owned by: team
Priority: normal Milestone: 23.11
Component: Core validator Version:
Keywords: template_report Cc:

Description (last modified by GerdP)

What steps will reproduce the problem?

  1. Run validator on https://www.openstreetmap.org/way/771715139 which has
    url=https://www.nabu-weyhe.de/projekte-und-themen/eulen-falken-und-deren-nistkästen/trafotürme
    

What is the expected result?

No warning, the link seems to work fine when I open it in JOSM

What happens instead?

URL validator - 'url': URL contains an invalid path: /projekte-und-themen/eulen-falken-und-deren-nistkästen/trafotürme (1)

Please provide any additional information below. Attach a screenshot if possible.

I assume the German umlauts are the reason?

Relative:URL: ^/trunk
Repository:UUID: 0c6e7542-c601-0410-84e7-c038aed88b3b
Last:Changed Date: 2023-08-29 13:38:40 +0200 (Tue, 29 Aug 2023)
Revision:18822
Build-Date:2023-08-30 01:30:57
URL:https://josm.openstreetmap.de/svn/trunk

Identification: JOSM/1.5 (18822 en) Windows 10 64-Bit
OS Build number: Windows 10 Home 2009 (19045)
Memory Usage: 1334 MB / 2016 MB (744 MB allocated, but free)
Java version: 17.0.5+8-LTS, Azul Systems, Inc., OpenJDK 64-Bit Server VM
Look and Feel: com.sun.java.swing.plaf.windows.WindowsLookAndFeel
Screen: \Display0 1920×1080 (scaling 1.50×1.50)
Maximum Screen Size: 1920×1080
Best cursor sizes: 16×16→48×48, 32×32→48×48
System property file.encoding: Cp1252
System property sun.jnu.encoding: Cp1252
Locale info: en_DE
Numbers with default locale: 1234567890 -> 1234567890
VM arguments: [-Djpackage.app-version=1.5.18622, --add-modules=java.scripting,java.sql,javafx.controls,javafx.media,javafx.swing,javafx.web, --add-exports=java.base/sun.security.action=ALL-UNNAMED, --add-exports=java.desktop/com.sun.imageio.plugins.jpeg=ALL-UNNAMED, --add-exports=java.desktop/com.sun.imageio.spi=ALL-UNNAMED, --add-opens=java.base/java.lang=ALL-UNNAMED, --add-opens=java.base/java.nio=ALL-UNNAMED, --add-opens=java.base/jdk.internal.loader=ALL-UNNAMED, --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED, --add-opens=java.desktop/javax.imageio.spi=ALL-UNNAMED, --add-opens=java.desktop/javax.swing.text.html=ALL-UNNAMED, --add-opens=java.prefs/java.util.prefs=ALL-UNNAMED, -Djpackage.app-path=%UserProfile%\AppData\Local\JOSM\HWConsole.exe]
Dataset consistency test: No problems found

Plugins:
+ OpeningHoursEditor (36126)
+ buildings_tools (36134)
+ measurement (36126)
+ o5m (36126)
+ poly (36126)
+ reverter (36126)
+ undelete (36126)
+ utilsplugin2 (36134)

Tagging presets:
+ d:\josm\core\resources\data\defaultpresets.xml

Last errors/warnings:
- 00001.233 W: extended font config - overriding 'filename.Myanmar_Text=mmrtext.ttf' with 'MMRTEXT.TTF'
- 00001.237 W: extended font config - overriding 'filename.Mongolian_Baiti=monbaiti.ttf' with 'MONBAITI.TTF'
- 00003.467 E: java.security.KeyStoreException: Windows-ROOT not found. Cause: java.security.NoSuchAlgorithmException: Windows-ROOT KeyStore not available
- 01623.706 W: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out

Attachments (0)

Change History (11)

comment:1 by GerdP, 8 months ago

Description: modified (diff)

comment:2 by taylor.smock, 8 months ago

Looks like it. source:trunk/src/org/openstreetmap/josm/data/validation/routines/UrlValidator.java#L332.

The regex we are using is ^(/[-\w:@&?=+,.!/~*'%$_;\(\)]*)?$. We can probably fix it by adding Pattern.UNICODE_CHARACTER_CLASS to the pattern compilation.

comment:3 by taylor.smock, 8 months ago

Resolution: fixed
Status: newclosed

In 18869/josm:

Fix #23235: Allow unicode characters in URL paths

comment:4 by taylor.smock, 8 months ago

Milestone: 23.10

comment:5 by stoecker, 8 months ago

Resolution: fixed
Status: closedreopened

I'm not sure if this a really good idea. UTF-8 support is a bit strange, but the REAL URL is
"https://www.nabu-weyhe.de/projekte-und-themen/eulen-falken-und-deren-nistk%C3%A4sten/trafot%C3%BCrme"

You get this, when you paste the URL into e.g. Firefox, submit it and then copy the URL line.

Many systems convert Unicode to such a representation, but it's not sure all do this and also no every server really uses UTF-8.

I'm not sure if we want to encourage using the UTF-8 form of the URL instead of the technically correct one.

I'd rather suggest to use the url-encoded form and only display the Unicode (like e.g. Firefox does).

comment:6 by taylor.smock, 8 months ago

I took a quick look at the RFCs before I made the change -- I didn't see anything which restricted the path to ascii. But it does look like browsers automatically convert it to ascii.

comment:7 by stoecker, 8 months ago

https://datatracker.ietf.org/doc/html/rfc3986#section-1.2.1

"A URI is a sequence of characters from a very limited set: the letters of the basic Latin alphabet, digits, and a few special characters."

" Percent-encoded octets (Section 2.1) may be used within a URI to represent characters outside the range of the US-ASCII coded character set if this representation is allowed by the scheme or by the protocol element in which the URI is referenced."

So while UTF-8 usually works nowadays and most recent software can handle it, it will be non-standard, as it's no longer an URI. Depending on the software you'll also get different calls to the server. Some software will send it as is, some will correctly percent encode it and some may convert the charset and sent it e.g. in iso-8859-1, which then will be wrong.

comment:8 by taylor.smock, 7 months ago

Milestone: 23.1023.11

Ticket retargeted after milestone deleted

comment:9 by stoecker, 7 months ago

Actually I still think that should be reverted. Maybe the warning could be adapted to tell that "proper URL encoding must be used" and maybe a short help how to reach that, i.e. copy into browser URL line and paste result.

comment:10 by stoecker, 7 months ago

Ticket #23287 has been marked as a duplicate of this ticket.

comment:11 by taylor.smock, 7 months ago

Resolution: fixed
Status: reopenedclosed

In 18896/josm:

Fix #23235: Revert r18869

The URI specification only allows for ascii characters.
See https://datatracker.ietf.org/doc/html/rfc3986#section-1.2.1.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.