#22835 closed enhancement (fixed)
[PATCH] add republics + autonomous okrugs/oblasts for Russia
Reported by: | westnordost | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | 23.04 |
Component: | Core | Version: | |
Keywords: | boundaries | Cc: |
Description
Rationale:
Some of the federal subjects of Russia do have a certain amount of autonomy. Nominally, the republics each have an own constitution and an own legislature.
This means, that these regions may have different traffic legislation and other things related to OpenStreetMap, such as that each also have different official languages alongside Russian. See https://github.com/streetcomplete/countrymetadata/blob/master/data/officialLanguages.yml#L305-L330
Guideline followed in terms of precision:
Settlements and at least the more important roads must be on the right side of the border.
More information:
Attachments (1)
Change History (18)
comment:1 by , 2 years ago
comment:3 by , 2 years ago
Ah, indeed there was: Something at the 180th meridian. In fact, the CURRENT version of the boundaries file does have issues at the 180th meridian - some slim strip near the 180th meridian in Northeastern Russia would count as not within Russia at all because the coords were not exactly placed at longitude -180 / +180. I fixed this in the attached file.
by , 2 years ago
Attachment: | boundaries.osm added |
---|
new boundaries.osm file - I found it may be easier to supply the whole file rather than a .patch file
comment:4 by , 2 years ago
Replying to westnordost:
Is there a problem with my patch?
I haven't looked. I've just been horrible about looking at patches.
comment:5 by , 2 years ago
OK. One semi-major problem:
The file is 8% larger than it was previously.
Current boundaries.osm: 1958203 (336696) bytes
New boundaries.osm: 2117422 (352771) bytes
Difference: 159219 (16075) bytes
% Increase: 8.1% (4.7%)
With that said, since they may have significantly different laws/mapping practices from Russia, I'm inclined to merge it. I'm just starting to feel that we should look into splitting the boundaries.osm file into subfiles. I'll probably revert some of the changes since I don't they should be part of this change (specifically, the fiddling of Russia's border).
follow-up: 7 comment:6 by , 2 years ago
Into subfiles? Hmm, what good is that for? Subfiles or not, the file is included (zipped) in the distribution, so it doesn't really matter if it is in one file or several. A slightly larger file size doesn't seem to be too relevant to me, as zipped it is still somewhat small (IMO).
Also, my Java library https://github.com/westnordost/countryboundaries uses this file as a source (which is why I contribute to this file so often). Better to have one such source file everyone contributes to than everyone doing his own thing. For me / for the use in that library, it would be most inconvenient if the information was scattered throughout several files.
If you are either...
- concerned about the file size in the distribution or
- speed of point-in-polygon queries with that many polygons around
... you may want to consider using this library, too. It is (AFAIK) currently used by StreetComplete and Vespucci. The upcoming version will use a revised serialization format that further shrinks the distribution file to 200 - 900 KB - containing the same data as the .osm file (the bigger the file, the faster a point-in-country query will be on average - i.e. which size you use depends on how frequently you need the country in which a point is located in JOSM).
Now, anyway, for countryboundaries, it is actually beneficial if the boundaries do not span half the ocean but keep closer to the coast even if that means that there are a few more vertices. It makes the file size even smaller (and queries in the ocean faster 🤷♀️).
So if we should either go separate ways (i.e. I would fork the contryboundaries.osm file) or JOSM would start to use the library too, I'd actually go all-in with the border fiddling on oceans.
I've not been doing that so far because I know JOSM / OSM data model has different requirements here. For Russia, I only did it insofar as the newly added Republics etc. allowed it to, if I remember correctly.
comment:7 by , 2 years ago
Replying to westnordost:
Into subfiles? Hmm, what good is that for? Subfiles or not, the file is included (zipped) in the distribution, so it doesn't really matter if it is in one file or several. A slightly larger file size doesn't seem to be too relevant to me, as zipped it is still somewhat small (IMO).
I was thinking about memory usage -- the file is loaded into memory with JOSM. If we were to avoid loading subregions until we really need them, we could avoid some memory usage. I did just look, and I think it won't be as much of an issue as I thought -- we have a tiling system for it.
With that said, I just did a memory dump, and it looks like that probably won't be an issue.
- 3,568,832 bytes (retained, original)
- 3,695,188 bytes (retained, new)
I think most users can handle an additional .2 MB of memory usage.
It looks like your library has a raster approach of some kind, which is a bit different from what is currently used in JOSM.
Also, my Java library https://github.com/westnordost/countryboundaries uses this file as a source (which is why I contribute to this file so often). Better to have one such source file everyone contributes to than everyone doing his own thing. For me / for the use in that library, it would be most inconvenient if the information was scattered throughout several files.
I'll give up on the splitting idea then -- I'd rather have a single source for everyone, than have multiple different locations that need to be updated.
If you are either...
- concerned about the file size in the distribution or
- speed of point-in-polygon queries with that many polygons around
... you may want to consider using this library, too. [...snip...]
I, personally, am open to it. But I'd rather discuss that in a different ticket with other maintainers, who may not like adding another dependency to JOSM.
Now, anyway, for countryboundaries, it is actually beneficial if the boundaries do not span half the ocean but keep closer to the coast even if that means that there are a few more vertices. It makes the file size even smaller (and queries in the ocean faster 🤷♀️).
That is fair. We do have two conflicting approaches to how the data is used, which makes it hard to optimize for size for both.
comment:9 by , 2 years ago
Milestone: | → 23.04 |
---|
follow-up: 11 comment:10 by , 2 years ago
JOSM only eats 3.5 MB memory? Even just from fresh start without loading any data, this sounds like awfully little. The presets alone should weigh some MB I would have expected. Not counting all the UI.
Yes, the approach is to split the data up *beforehand* into a raster. That data is then saved in some serial format and can be loaded into memory again, so the splitting up does not need to be done on every load. Querying the data is then subsequently very fast because it is in most cases a simple lookup in a two-dimensional array and only in border-regions, point-in-polygon checks need to be made at all. However, only on very few and very small polygons - those in the particular raster cell.
The library is reasonably small (11 KB) and has no dependencies.
comment:11 by , 2 years ago
Replying to westnordost:
JOSM only eats 3.5 MB memory? Even just from fresh start without loading any data, this sounds like awfully little. The presets alone should weigh some MB I would have expected. Not counting all the UI.
The loaded territories file takes ~3.5 MB of memory. I just looked at the objects that hold the data for the lookup table (GeoPropertyIndex
).
comment:13 by , 2 years ago
I just took a look at the data directory of your library, and the sizes of the files are:
- boundaries180x90.ser: 859 kb
- boundaries360x180.ser: 1.74 mb (xz did compress this to 325 kb)
- boundaries60x30.ser: 444 kb
By comparison,
- boundaries.osm: 2 mb
- boundaries.osm.xz: 185 kb
- boundaries.osm.bz: 269 kb
- boundaries.osm.gz: 347 kb
- boundaries.osm.zip: 347 kb
Besides parsing time, is there any particular advantage to your ser files?
I think I'm more inclined to keep our current method, since it means that we will never accidentally delete the source file.
comment:14 by , 2 years ago
For the next major version I was looking into reducing the size of that data (by ~50%). I will be doing the release soon, it is currently on a branch:
https://github.com/westnordost/countryboundaries/tree/new-ser-format
(The main change is that each coordinate is now saved in 2x2 bytes instead of 2x4 bytes. This is possible without loss of precision because the positions are now interpreted as relative to the raster cell, not the whole world.)
---
The main advantage of countryboundaries is speed for the query. This is why I created it. StreetComplete needs to do a lot of point-in-country checks.
Side-effects are a smaller unpacked file size and a format that is faster to parse (no XML parsing, just a bytestream directly put into data) -> faster loading time.
The next major version should also have a considerably smaller packed size than the .osm format. But well, 500 KB more or less doesn't make such a difference nowadays.
comment:15 by , 2 years ago
boundaries360x180.zip is 252KB then.
boundaries60x30.zip is 140KB then.
Other formats are probably irrelevant since a JAR is a ZIP.
comment:16 by , 2 years ago
In regards to accidentally deleting the source file: If you use the countryboundaries library, it would make more sense to transfer that source file into an own repo (if you want to maintain ownership / shared ownership) or probably the countryboundaries repo (if I should manage PRs) anyway.
comment:17 by , 2 years ago
I released countryboundaries 2.0 now. https://github.com/westnordost/countryboundaries/releases/tag/v2.0
It should appear on mvnrepository soon.
Thinking about this... some states/provinces of other countries come to my mind where a second language is co-official too. From the top of my head: South Tyrol (Italy), Wales (GB), Galicia + Basque Country + Catalunia ... (Spain). But I'd add these in a separate ticket and of course only these regions and not all states of these countries.