Modify

Opened 10 years ago

Last modified 16 months ago

#10101 new enhancement

Improve JOSM performances with large datasets

Reported by: ggeldenhuis Owned by: team
Priority: normal Milestone:
Component: Core Version:
Keywords: performance Cc:

Description

Hi
I have a large dataset that I am busy importing. It is a shapefile that has had the following actions performed on it:

  • Simplified ways in ArcGIS ( The choice of simplifications methods is better)
  • Removed all but one field
  • Re-projected for OSM/JOSM compatibility.

The shapefile was then imported into JOSM (7214) and saved as a OSM file.

My working methodology is as follows.

  • Open OSM file
  • Select data to import
  • Remove tags (Slow)
  • Copy data to clipboard (Slow)
  • Create new layer
  • Download from OSM
  • Copy data
  • Do corrections and further way simplification
  • Upload to OSM
  • Switch back to OSM file layer
  • Delete data that was previous selected (SLOW, as in takes 2-3 hours with 100% CPU usage)

Josm keeps on working but it takes a very long time for certain actions to be performed, specifically the ones marked as slow.

I am using JOSM 7214 and the latest Java 7 available for Mac.
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

Josm is started as follows: java -Xms4096m -jar Downloads/josm-latest.jar

I can share the original shapefile and/or the converted OSM file if it would help.

Attachments (1)

Screenshot20230601.png (941.6 KB ) - added by anonymous 18 months ago.
pbf of malaysia when it is just opened

Download all attachments as: .zip

Change History (22)

comment:1 by stoecker, 10 years ago

What means "large"?

comment:2 by Don-vip, 10 years ago

It's impossible to expect the same performances with a million (or more) objects than with a nominal dataset.

If you really want to operate on very large files, you should first disable rendering by switching on wireframe mode (ctrl-w) for example, that should speed things a little.

comment:3 by ggeldenhuis, 10 years ago

Large means a 23Mb OSM file. The problem is even if I work with subsets of the data and try to delete say 20 000 points I get into this CPU bound problem where JOSM takes a very long time to respond back after doing the deletion.

Its not a million data points and that would be unreasonable to expect the same performance but working with 20k-50k points at a time I feel should be working.

comment:4 by stoecker, 10 years ago

I had no problem to import a dataset which had whole residential streets of Germany (took nearly 20 GB memory), but it takes several minutes to react to user actions.

Changing the software to be really responsive for large datasets is a lot of work. This means partial data handling, caching of display status and many many optimizations in display handling and data processing, so really only a minimum amount of data as touched for each action. I've seen that even commercial software like ArcGIS has problems with really large datasets.

Nobody of the core developers is able to do that ATM.

comment:5 by ggeldenhuis, 10 years ago

Thank you for the response.
I am having problems with a much smaller dataset than 20Gb which is why I took time to report it. I appreciate ( or try to at least ) the additional amount of effort that would be required to handle large datasets better. That being said, I gave JOSM 4Gb of ram and I have a Macbook pro with SSD drive so it should be really fast for the most part. Accessing the same dataset in ArcGIS on a Windows VM on the same laptop is a lot faster.

I appreciate all the effort that goes into developing JOSM as it is my tool of choice, and it is a shame that more money is not being spent on developing it. I would love to see the same level of investment in JOSM as in iD...

Changing to wireframe mode did almost nothing for me speed wise. There is a huge amount of relations in the original dataset which might be contributing towards the slowness.

comment:6 by stoecker, 10 years ago

If you can get funding for paid development you can ask e.g. my company to let me improve JOSM. But it's not cheap. :-) Don't know for the other main devs, but probably they could do the same. Expect something in the range of several man-months for such improvements.

comment:7 by malenki, 10 years ago

To speed up your workflow maybe it would help if you use processing tools like osmfilter and osmosis.
Additionally you could just split your data into smaller chunks…

Out of curiosity: which Import do you work on?

comment:8 by Don-vip, 10 years ago

Keywords: performance added
Summary: Working on a large dataset makes JOSM VERY unresponsiveImprove JOSM performances with large datasets
Type: defectenhancement

comment:9 by Don-vip, 10 years ago

Ticket #10676 has been marked as a duplicate of this ticket.

by anonymous, 18 months ago

Attachment: Screenshot20230601.png added

pbf of malaysia when it is just opened

comment:10 by ag, 18 months ago

hi, I know this is old but, I'm experiencing the same issues for opening large OSB, or OSM.pbf files.

The simplest way to reproduce this issue, download a country level map that is say >100M pbf with millions of details
https://download.geofabrik.de/
e.g. Malaysia, Singapore
the parsing is a wait but it is managable.
But when it comes to rendering, at the lowest zoom level, it attempts to display all that millions of details all at once.
I've attached a screenshot in the ticket.
so the app freezes, not responding for like 5-10 minutes between each click.
it also requires running with java -Xmx8192M probably bigger for larger files to display it.

Is there a way, or can some things be done so that at the lowest zoom levels, perhaps only the boundaries and major roads (e.g. highways) are rendered/displayed? Apparently, things are somewhat improved when zoomed in to large zoom levels say displaying just a town or part of a city.

Wireframe mode as suggested cannot be selected prior to the pbf map being rendered, and the moment the map is first rendered a single click cause the app to freeze for next 5-10 minutes. Can Wireframe mode be enabled prior to loading the file, e.g. some sticky preferences etc?

Things I tried that partially works but is still bad, I tried using filtering and setting it to disable everything except the administrative boundaries, that slightly improved things but it is still extremely sluggish waiting minutes between each click.

comment:11 by ag, 18 months ago

just like to say that disabling potlatch2 rendering leaving as OSM default and enabling wireframe mode helps somewhat at the lowest zoom level, i.e. just opening the file.
But that wireframe mode requires first opening a OSM map, enable the flag, then it stays sticky and I can next open the large osm.pbf file > 100M, rendering is still a wait like 1-few minutes, but that it is better than freezing for 5-10 minutes for a single click.

hope that the wireframe mode can be selected as a 'workaround' for opening large (country level) maps > 100M in size.

comment:12 by taylor.smock, 18 months ago

@ag: I've got a WIP patch for rendering OSM data to tiles. See #11487.

It won't render the map any faster (in fact, the first paint will be a bit slower), but it will let the UI respond to events, so you can zoom in/out and pan as it renders the tiles. I would greatly appreciate some feedback on that, if you don't mind applying the patch to JOSM source code and compiling the jar yourself.

Can Wireframe mode be enabled prior to loading the file, e.g. some sticky preferences etc?

Set the advanced preference mappaint.renderer-class-name to org.openstreetmap.josm.data.osm.visitor.paint.WireframeMapRenderer.

comment:13 by ag, 18 months ago

hi smock,
Thanks much, I'd take a look at the 'advanced preference' settings.

Oh a different note, I'd think it may be necessary for JOSM to use an r-tree or preferably r*-tree implementation
https://github.com/search?q=rtree+language%3AJava&type=repositories&l=Java

when opening large osm.pbf files e.g. from 10s of Megs to > 100 Meg. i.e. create/use an r-tree or r*-tree index.
I'd think even this alone isn't adequate, it would likely need codes as well as alternate indexes and filtering to exclude objects from being rendered at low zoom levels. e.g. that it doesn't make sense to render millions of details at zoom level 0 or 1 (i.e. no zoom) for a country level map down to perhaps a single shop, signpost (POI), a single traffic light, a postbox, in a large urban sprawl where those things are probably less than a single pixel when displayed.
Perhaps at low zoom, administrative borders and main highways, perhaps large mountain range features are probably adequate and displayable for country level maps.

I'd think it'd also take profiling the codes to see where the bottlenecks are, if it is simply querying, indexes would help. But it is probably more complex in the sense that it could take deciding at what zoom levels, what gets displayed.

these would likely need quite a lot of time and effort to develop.

comment:14 by stoecker, 18 months ago

Nearly everything you're talking about is already in place for a long time or otherwise JOSM wouldn't be responsive at all for much smaller datasets.

in reply to:  13 ; comment:15 by taylor.smock, 18 months ago

Replying to ag:

hi smock,
Thanks much, I'd take a look at the 'advanced preference' settings.

Oh a different note, I'd think it may be necessary for JOSM to use an r-tree or preferably r*-tree implementation
https://github.com/search?q=rtree+language%3AJava&type=repositories&l=Java

when opening large osm.pbf files e.g. from 10s of Megs to > 100 Meg. i.e. create/use an r-tree or r*-tree index.

The problem with large PBF files are as follows:

  • Memory (we load the entire file into memory). We've done some optimizations for memory in the past. For example, this is why we have a bunch of int flags in AbstractPrimitive, why tags are stored as an array instead of a map, and so on.
  • Paint speed. This is what #11487 addresses, by caching the painted area.

Perhaps at low zoom, administrative borders and main highways, perhaps large mountain range features are probably adequate and displayable for country level maps.

I've thought about this in the past. This would require a mapcss extension (to set the priority of the paint). The problem here is that we would have to do some kind of progressive paint, where the most important objects are painted first, instead of last (which is what currently happens). It might be possible to do this by counting the number of nodes that will be painted, and if node count > x, only paint objects of priority y or higher. This would clash with the tiling approach, since (by definition) a tile will have a fraction of the objects that the full window would have. The other issue with this approach is that a user could get differing renders based off of how they pan.

I'd think it'd also take profiling the codes to see where the bottlenecks are, if it is simply querying, indexes would help. But it is probably more complex in the sense that it could take deciding at what zoom levels, what gets displayed.

The problem isn't (largely) querying; the problems are as follows (from my recollections of various profile runs):

  • Java Garbage Collection
  • Faithfully painting ways -- we paint every node in the visible area for each way; we could try to bin nodes into pixels. When I've tried to do this in the past, there were some rendering issues. This would have reduced the GC pressure.

these would likely need quite a lot of time and effort to develop.

Yep. That is why they haven't happened.

in reply to:  15 ; comment:16 by stoecker, 18 months ago

Replying to taylor.smock:

Perhaps at low zoom, administrative borders and main highways, perhaps large mountain range features are probably adequate and displayable for country level maps.

I've thought about this in the past. This would require a mapcss extension (to set the priority of the paint). The problem here is that we would have to do some kind of progressive paint, where the most important objects are painted first, instead of last (which is what currently happens). It might be possible to do this by counting the number of nodes that will be painted, and if node count > x, only paint objects of priority y or higher. This would clash with the tiling approach, since (by definition) a tile will have a fraction of the objects that the full window would have. The other issue with this approach is that a user could get differing renders based off of how they pan.

These are two different things. He described a zoom-level dependent feature painting. We have that. You're talking about a priority in the same zoom level. That would bring a lot of issues when done automatic. When done via MapCSS it would need lots a manual work for a few use cases only.

we paint every node in the visible area for each way;

I'm pretty sure that's untrue. Nodes aren't painted anymore in lower zoom levels.

in reply to:  16 ; comment:17 by taylor.smock, 18 months ago

Replying to stoecker:

we paint every node in the visible area for each way;

I'm pretty sure that's untrue. Nodes aren't painted anymore in lower zoom levels.

I should have been more specific; each node in the current view port for a specific way is used to draw the way.

As an example, if a way has 20 nodes (lets number them 1-20 for ease of reference), and nodes 1 and 20 are outside the paint area, nodes 2-19 are used for the painting of the way. Going further on that, let us say that nodes 5-10 differ in location by an infinitesimal amount, such that they would be drawn in the same pixel. The current renderer would still draw the way using nodes 2-19, instead of nodes 2-5, 11-19.
The problem here is that this causes unnecessary array copies when creating the path to render for the way (IIRC), which in turn increases the amount of time spent in GC.
I'd have to profile it again, but I think that is what my conclusion was.

in reply to:  17 comment:18 by stoecker, 18 months ago

Replying to taylor.smock:

Replying to stoecker:

we paint every node in the visible area for each way;

I'm pretty sure that's untrue. Nodes aren't painted anymore in lower zoom levels.

I should have been more specific; each node in the current view port for a specific way is used to draw the way.

As an example, if a way has 20 nodes (lets number them 1-20 for ease of reference), and nodes 1 and 20 are outside the paint area, nodes 2-19 are used for the painting of the way. Going further on that, let us say that nodes 5-10 differ in location by an infinitesimal amount, such that they would be drawn in the same pixel. The current renderer would still draw the way using nodes 2-19, instead of nodes 2-5, 11-19.
The problem here is that this causes unnecessary array copies when creating the path to render for the way (IIRC), which in turn increases the amount of time spent in GC.
I'd have to profile it again, but I think that is what my conclusion was.

It's probably non-trivial to fix this, but not too complicated. For a plot graphic I once wrote a pre-processor, which reworked the graphs to be drawn so that instead of drawing the original data it only drew an adapted subset which resulted in the same output. Essentially as you say I dropped any data drawn on the same point. That did speedup drawing a lot as the loss due to pre-processing was only a little, but the gain by not drawing useless lines was a lot. For JOSM it's bit more complex - my plot had only one line style...

in reply to:  15 comment:19 by ag, 18 months ago

hi smock,

thanks for your response

Replying to taylor.smock:

Replying to ag:

hi smock,
Thanks much, I'd take a look at the 'advanced preference' settings.

Oh a different note, I'd think it may be necessary for JOSM to use an r-tree or preferably r*-tree implementation
https://github.com/search?q=rtree+language%3AJava&type=repositories&l=Java

when opening large osm.pbf files e.g. from 10s of Megs to > 100 Meg. i.e. create/use an r-tree or r*-tree index.

The problem with large PBF files are as follows:

  • Memory (we load the entire file into memory). We've done some optimizations for memory in the past. For example, this is why we have a bunch of int flags in AbstractPrimitive, why tags are stored as an array instead of a map, and so on.
  • Paint speed. This is what #11487 addresses, by caching the painted area.

loading the PBF info memory currently has the downside when large PBF is loaded, e.g. for Malaysia about 140-180 MB, when that is loaded it takes an argument like -Xmx8192M to cope with it. this is probably ok for Desktop systems with lots of memory. I'd guess for now we'd 'make do' till 'better ways' can be developed.

I'd think even if we use an r-tree index, that r-tree index itself could easily be literally larger than the PBF itself.
I'm not sure though just a wild guess. But that an r-tree index would probably help to put it in external storage rather than in memory. It would likely reduce the memory footprint significantly.
My guess is r-tree indexes could take very significant cpu processing to build, it could be a O(N3) or higher effort, so if there are say 100 millions objects it is 100 million 3 or maybe much operations to build that r-tree index.

A thought is about using postgis as the database, but that may still require significant rewrite of JOSM codes?
http://postgis.net/
I've not review JOSM codes so this is just an enquiry.

Perhaps at low zoom, administrative borders and main highways, perhaps large mountain range features are probably adequate and displayable for country level maps.

I've thought about this in the past. This would require a mapcss extension (to set the priority of the paint). The problem here is that we would have to do some kind of progressive paint, where the most important objects are painted first, instead of last (which is what currently happens). It might be possible to do this by counting the number of nodes that will be painted, and if node count > x, only paint objects of priority y or higher. This would clash with the tiling approach, since (by definition) a tile will have a fraction of the objects that the full window would have. The other issue with this approach is that a user could get differing renders based off of how they pan.

I'd think it'd also take profiling the codes to see where the bottlenecks are, if it is simply querying, indexes would help. But it is probably more complex in the sense that it could take deciding at what zoom levels, what gets displayed.

The problem isn't (largely) querying; the problems are as follows (from my recollections of various profile runs):

  • Java Garbage Collection
  • Faithfully painting ways -- we paint every node in the visible area for each way; we could try to bin nodes into pixels. When I've tried to do this in the past, there were some rendering issues. This would have reduced the GC pressure.

these would likely need quite a lot of time and effort to develop.

Yep. That is why they haven't happened.

As I've not reviewed JOSM codes, I'd make some wild guesses. The issue as it happened, is when rendering the 140 MB PBF file (Malaysia), the parsing took some time but is managable. It is when the total set of all details is attempted to be rendered 140 MB of it in that single frame/window with literally millions of details, that is when it freezes, a single click takes like 5-10 minutes response rendering all that 140 MB in a single frame/window.

My thoughts are that during rendering, JOSM would need to deliberately omit objects and relations that is 'too small to be displayed' to be simply omitted altogether. e.g. at 'low zoom levels', to omit 'everything' except for ('things you can see at 20000 feet ) administrative boundaries (e.g. countries, states etc), major highways, (mountain ranges?) etc.

Apparently, when zoomed in into a small area, the response is significantly better, I'd guess it is simply as objects outside the displayed bounding box are not rendered. Still laggy but manageable.

And as mentioned, I've not seen the codes, hence I'm just speculating.
And I'd guess either way, there would be significant efforts to tune up and possibly redesign things.

comment:20 by taylor.smock, 16 months ago

Ticket #16251 has been marked as a duplicate of this ticket.

comment:21 by taylor.smock, 16 months ago

Ticket #16931 has been marked as a duplicate of this ticket.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The owner will remain team.
as The resolution will be set. Next status will be 'closed'.
to The owner will be changed from team to the specified user.
Next status will be 'needinfo'. The owner will be changed from team to ggeldenhuis.
as duplicate The resolution will be set to duplicate. Next status will be 'closed'. The specified ticket will be cross-referenced with this ticket.
The owner will be changed from team to anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.