#18542 closed enhancement (fixed)
Obtain tag2link rules from Wikidata and OSM Sophox
Reported by: | simon04 | Owned by: | Don-vip |
---|---|---|---|
Priority: | normal | Milestone: | 20.01 |
Component: | Core tag2link | Version: | |
Keywords: | wikidata privacy | Cc: | nyurik, stoecker |
Description (last modified by )
Wikidata links items and properties to OSM keys and tags via https://www.wikidata.org/wiki/Property:P1282
Wikidata maintains an URL formatter for various properties via https://www.wikidata.org/wiki/Property:P1630
We can combine those two pieces of information – https://w.wiki/FD6 – and augment JOSM's tag2link capabilities
Relates to #17842.
Attachments (0)
Change History (19)
comment:1 by , 5 years ago
Description: | modified (diff) |
---|---|
Keywords: | wikidata added |
comment:2 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:6 by , 5 years ago
The following rules are used:
- internal rules for basic tags
- rules from Wikidata based on OSM tag or key (https://www.wikidata.org/wiki/Property:P1282); formatter URL (https://www.wikidata.org/wiki/Property:P1630); third-party formatter URL (https://www.wikidata.org/wiki/Property:P3303)
– see source:trunk/data/tag2link.wikidata.sparql for the query - rules from OSM Sophox based on permanent key ID (https://wiki.openstreetmap.org/wiki/Property:P16); formatter URL (https://wiki.openstreetmap.org/wiki/Property:P8)
– see source:trunk/data/tag2link.sophox.sparql for the query
comment:7 by , 5 years ago
Summary: | Obtain tag2link rules from Wikidata → Obtain tag2link rules from Wikidata and OSM Sophox |
---|
comment:8 by , 5 years ago
Cc: | added |
---|
follow-up: 10 comment:9 by , 5 years ago
Cc: | added |
---|
Replying to 13901#comment:13 stoecker:
I don't like much that this change causes permanent web accesses to non-JOSM servers for elements which are not users-selected. We're giving a telemetry of user actions this way to providers we haven't under control.
I assume, you are referring to this change? The user cannot be tracked on her individual action. On JOSM startup, one query to https://query.wikidata.org/sparql and https://sophox.org/sparql, each, is made and the results are cached.
comment:10 by , 5 years ago
Replying to simon04:
Replying to 13901#comment:13 stoecker:
I don't like much that this change causes permanent web accesses to non-JOSM servers for elements which are not users-selected. We're giving a telemetry of user actions this way to providers we haven't under control.
I assume, you are referring to this change? The user cannot be tracked on her individual action. On JOSM startup, one query to https://query.wikidata.org/sparql and https://sophox.org/sparql, each, is made and the results are cached.
I know that this is no major issue, but I also know that in the last years small issues showed that the impact may be much larger. I'd prefer if we cache these files via the JOSM server. This way we reduce the impact. I'll setup a proxy for this purpose.
E.g. It's much easier to restrict JOSM when there is only one server you need to block in a firewall.
follow-up: 16 comment:11 by , 5 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I did setup a caching proxy on JOSM server:
https://josm.openstreetmap.de/remote/wikidata-sparql
https://josm.openstreetmap.de/remote/sophox-sparql
That works fine for the first link, but I can't get sophox to work, neither as cache nor direct.
follow-up: 19 comment:12 by , 5 years ago
Replying to stoecker:
@stoecker sorry just saw this ticket. What issues are you having with Sophox? Could you paste the specific SPARQL query you are running against it? Also, please join https://osmus-slack.herokuapp.com/ (OSM Slack) -- we can discuss it there in #sophox or #josm channels. (Ping nyurik). Thanks!
follow-up: 14 comment:13 by , 5 years ago
I have been trying to wrap by head around the goal and approach of this ticket, and still highly confused.
If the goal is to get the right value->URL formatter in tag2link, the easiest is to follow the same scheme as what iD does:
- convert key to a "sitelink":
("Key:" + key).replace('_', ' ')
(replace underscores with spaces). See iD code. - call
https://wiki.openstreetmap.org/w/api.php?action=wbgetentities&...
to get the data items for each of the sitelinks. You should pass all sitelinks to it in a single call, rather than calling it multiple times. See iD code. - check if P8 property claim is defined on each of the resulting data items, and if so, use it.
We already have formatter defined on some of the keys - list, and it will be very easy to add more.
Lastly, I believe the first two steps should be placed into the core JOSM (at least eventually), because these same steps will be useful for any other kind of data item access, e.g. getting key/tag documentation.
follow-up: 15 comment:14 by , 5 years ago
Replying to nyurik:
I have been trying to wrap by head around the goal and approach of this ticket, and still highly confused.
Instead of running 1.+2.+3. on every possible tag, upon JOSM start all URL formatters related to OSM keys are obtained from both Wikidata and OSM Wiki Wikibase using the queries mentioned in comment:6.
follow-up: 17 comment:15 by , 5 years ago
Replying to simon04:
Instead of running 1.+2.+3. on every possible tag, upon JOSM start all URL formatters related to OSM keys are obtained from both Wikidata and OSM Wiki Wikibase using the queries mentioned in comment:6.
You don't need to run them individually. Instead, you can take all keys you see (either all keys on a single object, or all keys in all objects), and do them at once. Step (2) allows you to get up to 50 or a 100 i think in one call.
Benefits of this approach:
- same code will allow you to get other key/tag metadata, such as key/tag documentation, or even simple validation rules like regex - see Key:population example (at the bottom).
- you only rely on a single source of data -- OSM wiki, without any additional querying mechanism (i.e. wikidata.org, query.wikidata.org, or sophox.org).
On the other hand, you could use a single Sophox query to get the same data too - I can write a simple qurey for you that will return all available formatters. The only downside is that sophox.org is a bit less stable than OSM wiki itself, but it allows richer querying for the same data.
comment:16 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
comment:17 by , 5 years ago
Replying to nyurik:
You don't need to run them individually. Instead, you can take all keys you see (either all keys on a single object, or all keys in all objects), and do them at once. Step (2) allows you to get up to 50 or a 100 i think in one call.
I'm not convinced, nor do I want to rewrite the working code and get swept up in the Wikibase internals. This query https://wiki.openstreetmap.org/w/api.php?action=wbgetentities&format=json&languagefallback=1&languages=en&origin=*&sites=wiki&titles=Key%3Anatural%7CTag%3Anatural%3Dpeak (for one key and one tag only) returns 21KB of data exposing all the Wikibase internals (mainsnak, snaktype, datavalue) that has to be parsed again. If anyone else is up to looking into wbgetentities
, open a separate ticket and attach a patch.
comment:18 by , 5 years ago
Keywords: | privacy added |
---|
comment:19 by , 5 years ago
Replying to nyurik:
@stoecker sorry just saw this ticket. What issues are you having with Sophox? Could you paste the specific SPARQL query you are running against it? Also, please join https://osmus-slack.herokuapp.com/ (OSM Slack) -- we can discuss it there in #sophox or #josm channels. (Ping nyurik). Thanks!
After setup of caching in #18599 I see following result "Query string present but no explicit expiration time" for sophox. Do you have any influence in the server sending the data? If so, it would be fine if the server would send an Expire header for the sparql request answers, so that caching works. That would also reduce load on the server.
I.e. like wikipedia add something like Cache-Control: public, max-age=300
or any other valid time. More important is probably the Last-Modified
line.
In 15677/josm: