#22991 closed enhancement (fixed)
[PATCH] Improve precision for boundaries + add subdivisions of Indonesia + add autonomous regions of various countries
Reported by: | westnordost | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | 23.06 |
Component: | Core | Version: | |
Keywords: | boundaries | Cc: |
Description (last modified by )
Changes overview:
- Revised all borders to fulfill the "Villages and mayor roads should be on the right side of the border" guideline in terms of precision.
- Added regions (islands) of Indonesia
- Added autonomous / self-administered subdivisions
For each change, the plus in boundaries.osm size is noted.
(I would have liked to show a visual (geospatial) diff but I have not found a way to properly do that. The geojson-diff ruby gem used by github when looking at a geojson diff may work, but at least on github it tells me that the diff is too large to be displayed. I tried something with QGIS, but I also failed with that - haven't used QGIS before, though.)
1. Revised all borders to fulfill the guideline in terms of precision (+691 KB)
"Villages and mayor roads should be on the right side of the border". This is the guideline I followed with earlier contributions. Yes, this was an extreme amount of work 😰
Most changes were necessary in Africa, India and China because these have become somewhat less blank in the years since this file first has been created, hence, more villages are visible when tracing the boundaries from the OSM map.
E.g. IIRC in the Congo, an area the size of Sicily (+towns) was on the wrong side of the border. In India and China, also thousands of villages were on the wrong side of the province border.
Despite the greatly increased size (=increased precision), the precision in China and India may still not be up to the cited guideline 100% even in places where the villages are already mapped because the borders are so bonkers sometimes and just go straight through populated areas, e.g. see the border between Madhya Pradesh and Uttar Pradesh: https://www.openstreetmap.org/relation/1950071#map=11/25.2847/78.9220
(For some borders, especially in India, I also doubt that what is mapped in OSM is actually correct/precise because sometimes they have almost the same shape as a river that is mapped half a kilometer in another direction and things like that. So, no need for this data to be so precise either, then.)
I expect the file to grow somewhat even further when Africa, India and China get even less blank over time. In retrospect, the cited guidelines is a high goal to aim for. At the same time, to have every road on the right side of the border would be even better. After all, it matters also near to the border on which side you have to drive, and in which language the POIs and street names are, for example.
But at least the density of (named) roads is somewhat higher in populated areas (villages etc.), so I think this is a good compromise.
2. Added regions (islands) of Indonesia (+7 KB)
Indonesia is a big and populous country with significant cultural and language differences. (Indonesia is the most language diverse country in the world, with 500+ languages spoken, IIRC). Also, it is very cheap to add these, because they are islands (=few added geometry)
Added Indonesian subdivisions :
Added autonomous / self-administered subdivisions (+178 KB)
Autonomous regions have a great deal of autonomy and hence have the power to have different legislation for things that concern OpenStreetMap, such as traffic regulations (e.g. Scotland), have different official languages (e.g. Basque Country) or open data in general.
In general, I think this is a useful consistent point to which to include country subdivisions in this file: Other than of large (federated) countries (US, CA, AU, IN, CN, ID, ...), have those subdivisions that have (certain) autonomy, such as the republics within Russia etc.
Source: https://en.wikipedia.org/wiki/List_of_autonomous_areas_by_country
The following were ommitted because there are not any ISO 3166 codes for the self-administered zones / autonomous districts / counties mentioned in the source as autonomous: Bolivia, Myanmar, India, China, Somalia.
Antigua and Barbuda (+1 KB):
- Barbuda
Bosnia and Herzegovina (+24 KB):
(all regions are autonomous)
- Federation of Bosnia and Herzegovina
- Republika Srpska
- Brčko District
Comoros (+1 KB):
(all regions are autonomous)
- Anjouan
- Grande Comore
- Mohéli
Fiji (+1 KB):
- Rotuma (1)
Georgia (+5 KB):
Greece (+1 KB):
- Monastic community of Mount Athos
Indonesia (+ 7KB):
- Yogyakarta a sultanate within republic of Indonesia
- Aceh Sharia law is offical law
- Papua (1)
- West Papua (1)
- Highland Papua (1)
- Central Papua (1)
- South Papua (1)
Iraq (+ 12KB):
- Kurdistan Region (1)
Italy (+ 10KB):
Mauritius (+ 1KB):
- Rodrigues
Moldavia (+ 7KB):
- Administrative-Territorial Units of the Left Bank of the Dniester (1), (2) (as Transnistria)
- Bender (1), (2) (as Transnistria)
- Gagauzia (1)
Nicaragua (+ 3KB):
- North Caribbean Coast Autonomous Region
- South Caribbean Coast Autonomous Region
Pakistan (+ 5KB):
- Azad Kashmir
Papua New Guinea (+1 KB):
- Bougainville intents to become independent in a few years
Philippines (+6 KB):
- Bangsamoro Muslim mayority region
Portugal (+1 KB):
- Azores
- Madeira
Saint Kitts and Nevis (+1 KB):
- Nevis
- Saint Kitts
São Tomé and Principe (+ 1KB):
- Principe
Serbia (+ 4KB):
- Vojvodina (1)
South Korea (+ 1KB):
- Jeju Province
Spain (+ 68KB):
(all regions are autonomous)
- Andalusia
- Aragon (1)
- Asturias (1)
- Balearic Islands (1)
- Basque Country (1)
- Canary Islands
- Cantabria
- Castile and León
- Castilla-La Mancha
- Catalonia (1)
- Extremadura
- Galicia (1)
- La Rioja
- Madrid
- Murcia
- Navarre (1)
- Valencia (1)
Tajikistan (+ 5KB):
- Badakhshan Mountainous Autonomous Region
Tanzania (+ 2KB):
(actually, Zanzibar+Pemba are autonomous as one but there is no ISO code for all of the subdivisions combined)
Trinidad and Tobago (+ 1KB):
- Tobago
United Kingdom (+ 7KB):
- Wales (1)
- England not an autonomous region, but the only subdivision that is missing to have them all
Uzbekistan (+ 2KB):
- Karakalpakstan (1)
(1) different official languages
(2) currently independent state with limited international recognition
Attachments (1)
Change History (12)
by , 22 months ago
Attachment: | boundaries_new.osm added |
---|
comment:1 by , 22 months ago
Description: | modified (diff) |
---|
comment:2 by , 22 months ago
Continuing from the discussion in #22835, for your information:
The generated file sizes from this data for the countryboundaries library are:
File | Size | Zipped |
boundaries60x30.ser | 298 KB | 180 KB |
boundaries180x90.ser | 518 KB | 226 KB |
comment:3 by , 22 months ago
Milestone: | → 23.06 |
---|
This brings up the in-memory representation from 3.52 MiB to 4.68 MiB. I think this will be survivable for most people.
comment:4 by , 22 months ago
FML. I'm going to have to go through it and renumber everything. Please use the dataset from JOSM Preferences -> Advanced Preferences
-> More...
-> Edit boundaries
in the future! (You do need to start JOSM with --debug
).
follow-up: 7 comment:5 by , 22 months ago
Sorry, I did, but only for the first session. I worked on it for more than a week or so (on and off). For the subsequent sessions, I loaded the saved file.
How do you renumber all the nodes?
comment:6 by , 22 months ago
According to my back of the napkin calculation, renumbering all the ids to start at -1 will lead to a file size decrease of about 120 kb.
comment:7 by , 22 months ago
Replying to westnordost:
Sorry, I did, but only for the first session. I worked on it for more than a week or so (on and off). For the subsequent sessions, I loaded the saved file.
That makes it a bit harder. We cannot use the id
from the file for various reasons (which isn't good). I don't know what you could have done, since it was multiple edit sessions.
How do you renumber all the nodes?
This time I'm just doing osmium renumber --start-id=-1 ~/Downloads/boundaries_new.osm -o resources/data/boundaries.osm --overwrite
. I don't want to do that in the future, as it is going to effectively reset the ids. This also decreased the raw size of the file from 2930071 bytes to 2642628 bytes (-287443 bytes).
EDIT: It looks like part of that size difference might be due to trimming a space right before the />
.
Anyway, max ids:
- Node: -25351 (down from -140719)
- Way: -578 (down from -112911)
- Relation: -55 (down from -99820)
comment:8 by , 22 months ago
Hm, nice. Double of what I estimated. (Now, remove the indentation and newlines ;-) )
comment:9 by , 22 months ago
Now, remove the indentation and newlines
Yeah, no (at least not the source). Maybe in a processing step prior to putting it in the jar, but I don't think it has a significant enough decrease to justify the time.
It is a pretty big decrease from a raw file standpoint (down to 2277203 bytes, -652868 bytes or ~637 KiB), but the compressed difference isn't that much (469510 down to 455420, -14090, or ~14 kb).
new boundaries.osm file