Opened 7 years ago
Last modified 7 years ago
#15574 new enhancement
[patch] [experimental] make large jpeg loading work through the use of JNI bridge to turbojpeg system library
Reported by: | cmuelle8 | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Core image mapping | Version: | latest |
Keywords: | jpeg geoimage turbojpeg large images libjpeg-turbo jni | Cc: |
Description (last modified by )
Image loading in JOSM is currently done with
Image img = Toolkit.getDefaultToolkit().createImage(file.getPath()); [..] tracker.addImage(img, 1);
This may cause issues when trying to load images with large dimensions, if the heap space cannot hold the buffer of a full decode of the image. JPEG loading however may be done in a way that the native decoding library decodes and scales the image on the fly (before the pixels are being sent to a BufferedImage).
JDK8 and JDK9 already use jpeg or turbojpeg system libraries in the background to do the heavy lifting, but there does not seem to be a proper way in those JDKs to delegate the scaling wish down to the JNI level. The imageio framework has canSetSourceRenderSize()
to check if a plugin can handle a custom source render size, but this is currently (JDK7, JDK8, JDK9) not implemented for the JPEG decoder, see
- https://docs.oracle.com/javase/9/docs/api/javax/imageio/ImageReadParam.html#canSetSourceRenderSize--
- https://docs.oracle.com/javase/9/docs/api/index.html?javax/imageio/plugins/jpeg/JPEGImageReadParam.html
One solution is to employ the JNI bridge to turbojpeg system library.
- https://libjpeg-turbo.org/Documentation/Documentation
- https://cdn.rawgit.com/libjpeg-turbo/libjpeg-turbo/dev/java/doc/index.html
Problems (caveats) with this approach are
- libjpeg-turbo on the host system must have been compiled with the
--with-java
swith to./configure
- debian/ubuntu precompiled versions do not have java enabled by default, see debian/rules
- the library needs to be found and readable by the JVM to load (general JNI problem)
- as the loaded image will be a scaled version, zooming in 1:1 will not be possible
- this may be overcome by reloading subregions of the source image from file as needed
in principle this should be possible using this decompress variant instead of the convenience function- not true currently: this variant merely enables to write to another offset than (0,0) in the dest buffer
- while partial jpeg decoding was added in turbojpeg 1.5 it has not been promoted to the callable jni functions yet
- i.e. turbojpeg-jni.c lacks sth. like
TJDecompressor_decompress_partial
that employs jpeg_crop_scanline(..) and jpeg_skip_scanlines(..) - once this is added to the jni interface of turbojpeg it might be feasible to load the visible subregions (resulting from zoom or drag operations) dynamically from a large jpeg file as needed
Test data (a jpeg image with 30.000x21.952 pixels) may be found under
Trying to load this file without the proposed patch results in
- in
java.lang.OutOfMemoryError: Java heap space
or - if JOSM is started with
-Xmx4G
(or a variant) to supply more heap space- in a terminated JVM and a segfaulting Xorg (libpthread related)
- serious bug related to OOM conditions in openjdk (?)
The proposed patch makes an effort to determine the presumably available memory before loading an image. It reads width and height of an image without loading it and then checks
- if the image will presumably fit into half of the avail memory to the JVM loading is defered to the default toolkit as usual
- otherwise an instance of TJDecoder is used to load a scaled version (that will also not employ more than half of the available JVM heap space memory)
Not taking more than half of the memory available is especially necessary for the bilinear scaling to work as it duplicates the image buffer (worst case) in best fit mode.
In general, because of the deliberate checking for possible OOM conditions, JOSM will be from now on less likely to hit such exceptions if large images are loaded (this applies to all image formats, not just JPEG, but for JPEGs the code will make a second attempt in loading the image, instead of giving up).
Future work:
Considering that bilinear scaling may be turned off in prefs, the constraint of not using more than half the available memory may be loosened. However, a dynamic check depending on the state bilin* options are in, might need more work, i.e. if that option is changed during runtime and an image is loaded already employing more than half of the available memory, it may seem strange to do an expensive reload of the whole image, just because this scaling option changed. So for now, this check is not dependent on the scaling option used for ImageDisplay.
Pledge for further testing:
This patch had limited testing on a single dev machine and a single OS.
It has not been tested, whether libjpeg-turbo is properly loaded under Windows or MacOS, e.g.
Attachments (6)
Change History (32)
comment:1 by , 7 years ago
Description: | modified (diff) |
---|
comment:2 by , 7 years ago
Description: | modified (diff) |
---|
comment:3 by , 7 years ago
Description: | modified (diff) |
---|
comment:5 by , 7 years ago
Description: | modified (diff) |
---|
clarified situation on partial jpeg decoding with turbojpeg, implemented in C, but not exported to jni atm..
comment:6 by , 7 years ago
The patch is quite big, and I don't really like depending on a new library for a feature probably rarely useful. How did you find this, and why would you need it?
comment:7 by , 7 years ago
How did I find this? .. That's a weird question, I'm uncomfortable with it somehow.
Why would that matter? All the information contained herein is present on the net.
If you would have searched the net in that direction, I'm pretty sure
you would have come up with a similar solution.
Why would I/anyone need it? Well, imaging technology is advancing.
A lot of mobile phones around these days produce images between 13 to 23 MP
and there is no reason to believe that technology will stop at this point.
And I have not started to talk about professional digital camera equipment yet.
Another issue is that the current image loading is not RAM constraint safe,
the patch also contains some improvement on that side. So I'd rather say,
if we can support it at low cost, why would you think we should not support it?
Please also keep in mind that people may use JOSM on computers with as little
as half a gig of ram. In these cases scaled loading will also be of great help,
which means the patch does not necessarily target the example given in the
description, but scale down to other use cases and/or environments.
I agree with you however, that the current state is still dissatisfying, in
my mind largely because of missing JNI binding support in the standard
jpeg libs that are packaged for distributions (debian, kubuntu, windows(?))
As for the size of the patch:
- we do not pull in a library, just a /binding/ to a (very standard) library
- turbojpeg itself is not exotic at all, its used as the standard jpeg decoding library on lots and lots of boxes
- we do not need the encoding part, just the decoder, so it will maybe shrink by half of the size, when compiled this won't add much to JOSM
- actually it's rather small patch compared to others (twelvemonkey lib e.g.)
As I added [experimental] to the title, I do not expect you to apply it
currently. It's here as a work-in-progress and for people needing this
right now (that are advanced enough to pick, apply and compile this them-
selves).
comment:8 by , 7 years ago
stripped some non-needed bindings from patch:
encoder, yuvimage handling parts, deprecated methods
you can further reduce the size by stripping documentation,
but, while you could argue that it is available online or in the
turbojpeg sources, i do not recommend doing so.
btw: this patch does no harm, if turbojpeg is not found or not
bindable by jni, it silently skips the 2nd image loading attempt
comment:10 by , 7 years ago
@Don-vip: I've extracted some useful parts of the larger patch for you to apply independently of the broader topic of this ticket:
- check presumably available RAM before image loads
- the ImageObserver is used to obtain width and height of an image, before it is fully loaded with the tracker
- this might have some issues with exotic image formats that do not provide WIDTH and HEIGHT without a full decode
-> read: does the JDK allow such image formats at all? If yes, does this have some relevance to the
geoimage component? Presumably not, but I've not done an in depth check, maybe you have some more insight.
- do not use expensive bilinear scaling when the image is dragged
- gracefully fall-back to default scaler if allocating duplicate buffer for bilinear op fails
- refactors ImageDisplay.java somewhat, so the turbojpeg fragments (or other lib) can be plugged/patched at a later time
- and finally, add some javadoc, as requested by you in an aftermath to ticket:15476#comment:22
by , 7 years ago
Attachment: | josm-large-jpeg-scaled-loading-through-jni-turbojpeg-to-workaround-oom-experimental.patch added |
---|
josm-large-jpeg-scaled-loading-through-jni-turbojpeg-to-workaround-oom-experimental.patch (update with fix for ticket:15625)
by , 7 years ago
Attachment: | josm-image-loading_check-ram-constraints-to-prevent-oom-exceptions_do-not-use-expensive-scaling-while-dragging.patch added |
---|
light version; does ram constraint checking as proposed with the larger patch; adds some javadoc; does not use bilinear scaling when image is dragged; [main diff to larger patch: strips 2nd loading attempt if loading fails due to ram constraints] (update with fix to ticket:15625)
comment:11 by , 7 years ago
.. patches are based on r13174, but checked to apply cleanly to trunk/src/org/openstreetmap/josm/gui/layer/geoimage/ImageDisplay.java?rev=13186
follow-up: 15 comment:13 by , 7 years ago
Thanks for converting the Logging.info() call to use the common placeholder format, before applying.
Do you know if DevelopersGuide/StyleGuide is up to date? E.g. do you run your eclipse setup for josm like this? I'd like to pretty format patch submissions in the future, since I've seen you ran through the hazzle of checkstyling each time. Unless you like doing this, that is ;-)
comment:14 by , 7 years ago
@Don-vip: I've noticed on jenkins four warnings classified high priority (BAD STYLE) for ImageDisplay.java - to make findbugs happy, please use this follow-up patch. In theory we should not have more than one ImageDisplay instance, but it may have some relevance if this class is ever used more often than once.
EDIT: added fixes to findbug rants introduced with r13191, see https://josm.openstreetmap.de/jenkins/job/JOSM/3595/findbugsResult/new/. There are also warnings about the usage of System.gc()
, do you think its ok to work with @SupressFBWarning ..
in these cases to ignore them? They should be called at times, when performance is degraded anyway due to error or caught outOfMem condition.
EDIT2: include proper checks in the event image dimensions cannot be read (because of file format or other condition), this ensures that the thread will not linger around waiting forever. if lots of such images are loaded and switched between these lingering threads may aggregate, so not waiting forever and properly let them finish is somewhat important.
EDIT3: add repaint call if reading width and height timed out and run() method returns early
EDIT4: superceded by attachment:josm-store-width-and-height-info-in-ImageEntry.patch
comment:15 by , 7 years ago
Replying to cmuelle8:
Do you know if DevelopersGuide/StyleGuide is up to date?
It looks like up to date.
E.g. do you run your eclipse setup for josm like this? I'd like to pretty format patch submissions in the future, since I've seen you ran through the hazzle of checkstyling each time. Unless you like doing this, that is ;-)
Yes. With the Checkstyle plugin it's fast to spot violations and fix them during the code review.
follow-up: 17 comment:16 by , 7 years ago
The patch in comment:14 should fix most of the findbugs warnings, except those related to System.gc()
usage in exceptional events. Its debatable how to deal with them:
- leave them as is for devs to occasionally stumble upon when checking reports
- do not use
System.gc()
even in/after OutOfMemory conditions- e.g. does the JVM make gc() attempts on its own anyway after such conditions?
- why-is-it-bad-practice-to-call-system-gc mentions reasons why findbugs may warn (should not apply here)
- use SupressFBWarnings
follow-up: 18 comment:17 by , 7 years ago
Replying to cmuelle8:
- leave them as is for devs to occasionally stumble upon when checking reports
Not the best choice, but we have a couple of issues like this.
- do not use
System.gc()
even in/after OutOfMemory conditions
Is it really needed after all? If we remove it, does your use case still work?
- use SupressFBWarnings
No, because JOSM would require findbugs to compile.
comment:18 by , 7 years ago
Replying to Don-vip:
Replying to cmuelle8:
Is it really needed after all? If we remove it, does your use case still work?
Applies to turbojpeg-enabled code:
Seems to work without. If the user switches images around a lot and the JVM has not had the chance to do GC yet, a reload of a big image will gracefully load at a lower resolution. Which actually is better performance-wise in such conditions. If the user actually would want to wait for a higher resolution to be loadable it is suboptimal, but this seems unlikely in the event images are switched a lot in a small amount of time. Reducing the click frequency will have the JVM garbage collect rather quick, so that higher resolution loading is automagically re-enabled.
Applies to both loaders (default toolkit and turbojpeg):
So I've removed the explicit GC call after all.
To fix the remaining findbugs warnings the updated patch in comment:14 should do. I've not used checkstyle yet, so you might have to still reformat some bits. Also the ImageObserver is used with a timeout now, which is important for images that width/height cannot be read from.
Greetings
by , 7 years ago
Attachment: | josm-store-width-and-height-info-in-ImageEntry.patch added |
---|
patch against r13193; based on the patch in comment:14; additionally refactors ImageDisplay to use ImageEntry instead; stores width and height info while metadata of images are read; might break plugin code
by , 7 years ago
Attachment: | josm-large-jpeg-scaled-loading-through-jni-turbojpeg-to-workaround-oom-experimental__r3.patch added |
---|
patch against r13191 + josm-store-width-and-height-info-in-ImageEntry.patch applied; deprecates the two oldest patches; of limited use until (a) distros support jni in turbojpeg lib; atm debian, ubuntu, etc do not and (b) binding classes for turbojpeg support partial image decoding
by , 7 years ago
Attachment: | josm-turbojpeg-decompressor-JNI-glue.patch added |
---|
binding classes for turbojpeg, with a custom loader in TJLoader.java and a modified TJDecompressor.java to not depend on the portions that handle intermediate YUV images; lacks partial image decoding as upstream does atm
follow-up: 20 comment:19 by , 7 years ago
@Don-vip:
- do you know if there is a way to deprecate attachments in trac tickets? without overwriting them individually?
- In DevelopersGuide/StyleGuide
the instructions for calling checkstyle without ant from CLI do not work, as it depends on(just updated the wiki page regarding this)build2/org/openstreetmap/josm/TopLevelJavadocCheck.class
, working command is# build2 might not be present until 'ant checkstyle-compile' was run svn diff --summarize | awk '{ print $2 }' | grep "^[a-z]" | xargs \ java -classpath "build2:tools/checkstyle/checkstyle-all.jar" \ com.puppycrawl.tools.checkstyle.Main -c tools/checkstyle/josm_checks.xml
- "correct indentation" when file is saved is not advisable (example below), if you find the time please update the relevant screenshots
-
src/org/openstreetmap/josm/gui/layer/geoimage/ImageDisplay.java
550 552 visibleRect.checkPointInside(p); 551 553 VisRect selectedRect = new VisRect( 552 554 p.x < mousePointInImg.x ? p.x : mousePointInImg.x, 553 p.y < mousePointInImg.y ? p.y : mousePointInImg.y,554 p.x < mousePointInImg.x ? mousePointInImg.x - p.x : p.x - mousePointInImg.x,555 p.y < mousePointInImg.y ? mousePointInImg.y - p.y : p.y - mousePointInImg.y,556 visibleRect);555 p.y < mousePointInImg.y ? p.y : mousePointInImg.y, 556 p.x < mousePointInImg.x ? mousePointInImg.x - p.x : p.x - mousePointInImg.x, 557 p.y < mousePointInImg.y ? mousePointInImg.y - p.y : p.y - mousePointInImg.y, 558 visibleRect); 557 559 selectedRect.checkRectSize(); 558 560 selectedRect.checkRectPos(); 559 561 ImageDisplay.this.selectedRect = selectedRect;
-
comment:20 by , 7 years ago
Replying to cmuelle8:
@Don-vip:
- do you know if there is a way to deprecate attachments in trac tickets? without overwriting them individually?
You can't. What we can do:
- delete them
- version them manually in the filename (v1, v2, etc.)
- In DevelopersGuide/StyleGuide
the instructions for calling checkstyle without ant from CLI do not work
Thanks for updating the wiki :)
- "correct indentation" when file is saved is not advisable (example below), if you find the time please update
Time is a critical resource, I hardly find enough time to review patches :) But at least for this, everyone can help.
follow-up: 22 comment:21 by , 7 years ago
Here is an ant job for checkstyle running on the files changed that will give clickable output to the eclipse console.
It is the command I've updated the wiki with, appended by a call to sed
to produce a format recognized by eclipse console.
You can more easily navigate to the source files from the console output of checkstyle this way.
-
build.xml
808 808 encoding="UTF-8" classpath="tools/checkstyle/checkstyle-all.jar"> 809 809 </javac> 810 810 </target> 811 <target name="checkstyle-changed" depends="checkstyle-compile"> 812 <exec append="false" executable="bash" failifexecutionfails="true"> 813 <arg value="-c"/> 814 <arg value="svn diff --summarize | grep -o '\(src\|test\).*' | xargs java -cp 'tools/checkstyle/checkstyle-all.jar:${checkstyle-build.dir}' com.puppycrawl.tools.checkstyle.Main -c tools/checkstyle/josm_checks.xml | sed -e 's:\([^ ]*\) [^:]*/\([^:/]*.java\:[^:]*\):(\2)\1:'"/> 815 </exec> 816 </target> 811 817 <target name="checkstyle" depends="checkstyle-compile"> 812 818 <taskdef resource="com/puppycrawl/tools/checkstyle/ant/checkstyle-ant-task.properties" 813 819 classpath="tools/checkstyle/checkstyle-all.jar:${checkstyle-build.dir}"/>
follow-up: 24 comment:22 by , 7 years ago
Replying to cmuelle8:
Here is an ant job for checkstyle running on the files changed that will give clickable output to the eclipse console.
It works only for Linux. I am converting it to Windows too:
<target name="checkstyle-changed" depends="checkstyle-compile"> <exec append="false" osfamily="unix" executable="bash" failifexecutionfails="true"> <arg value="-c"/> <arg value="svn status -q --ignore-externals src test | grep -o '\(src\|test\)/.*' | xargs java -cp '${checkstyle.dir}/checkstyle-all.jar:${checkstyle-build.dir}' com.puppycrawl.tools.checkstyle.Main -c ${checkstyle.dir}/josm_checks.xml | sed -e 's:\([^ ]*\) [^:]*/\([^:/]*.java\:[^:]*\):(\2)\1:'"/> </exec> <exec append="false" osfamily="windows" executable="powershell" failifexecutionfails="true"> <arg value="/c"/> <arg value="svn status -q --ignore-externals src test | ForEach-Object {java -cp '${checkstyle.dir}/checkstyle-all.jar;${checkstyle-build.dir}' com.puppycrawl.tools.checkstyle.Main -c ${checkstyle.dir}/josm_checks.xml $_.split(' ')[7]}"/> </exec> </target>
but have some trouble to understand what your sed
s:\([^ ]*\) [^:]*/\([^:/]*.java\:[^:]*\):(\2)\1:
command does, could you please explain it to me with an example? Thanks.
comment:24 by , 7 years ago
Replying to Don-vip:
but have some trouble to understand what your sed
s:\([^ ]*\) [^:]*/\([^:/]*.java\:[^:]*\):(\2)\1:
command does, could you please explain it to me with an example? Thanks.
sure:
- a colon is used to delimit search expression from replacment
- this is configurable in sed: whichever character follows the leading
s
, is used - to use a colon in the search expression, we need to quote it using backslash, i.e.
\:
is not a delimiter
- this is configurable in sed: whichever character follows the leading
- \( \) are used to build capture groups (note the backslashes, without those braces are literals)
- the first capture group captures words like WARN or INFO (that are printed space delimited before the full file path)
- the second captures the file name followed by the line number (separated by a colon),
[^:]*/
matches the directory, but that is not used in the replacement expression
- the replacement expresion is
(\2)\1
, which is a back reference to the second captured group with added, literal braces, followed by the first capture group (to not loose the checkstyle classification in the output (like WARN, INFO, etc.)
comment:25 by , 7 years ago
.. eclipse, in the default configuration, only recognizes expressions of the form (%n:%l)
, e.g. (Filename.java:15)
, when translating console output to hyperlinks, which was the reason to run the streaming editor sed
on the checkstyle output.
.. another way might be to tinker with eclipse preferences (iirc the search pattern is configurable somewhere), so that it recognizes the native checkstyle output.
.. doing a regex replacement however has the advantage for us that its easier to transport (i.e. we don't have to worry about getting the correct setting pushed via .settings/ folder in the git repo). then, prefs files of eclipse might see changes, which is another point to assume that this would be less future proof, and prefering to use standard shell utilities
Have you checked it works on Mac OS? FreeBSD? ;-)
by , 7 years ago
Attachment: | josm-enable-large-jpeg-loading__if-memory-can-hold-scaled-version-only__by-using-jni-turbojpeg__r4.patch added |
---|
use try-with-resources statement, patch against r13271; obsoletes first four patches of this ticket; applies in combination with josm-turbojpeg-decompressor-JNI-glue.patch
Some added files (TJCompressor.java, TJTransofrm.java, etc.) may not actually be needed, eventually a subset to do the decoding will do (and hence shrink the overall patch size).