Opened 4 years ago
Last modified 4 years ago
#20433 assigned task
Imagery Integration tests
Reported by: | GerdP | Owned by: | Don-vip |
---|---|---|---|
Priority: | normal | Milestone: | Longterm |
Component: | Unit tests | Version: | |
Keywords: | imagery jenkins | Cc: | Don-vip, stoecker |
Description
IIGTR the job is submitted every 6 hours but its runtime is > 7 hours.
Attachments (1)
Change History (47)
comment:1 by , 4 years ago
follow-up: 9 comment:2 by , 4 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
It's a combination of:
- regression from JUnit 5 migration which disabled the parallel execution
- the cumulative increasing number of timeout errors
I changed the cron to once per 24h, and still have the parallel execution on my radar.
Any help to fix imagery sources which timeout will help to reduce the duration of this test and provide working entries to our users.
@Gerd what does "IIGTR" mean?
comment:3 by , 4 years ago
Component: | Core → Unit tests |
---|
follow-up: 6 comment:5 by , 4 years ago
reg. help: I have no clue about all the parameters used in the WMS/TMS definitions or how the results of the test help to find what has to be changed :(
Isn't that something that has to be done by those who created the wiki entries?
comment:6 by , 4 years ago
Replying to GerdP:
Isn't that something that has to be done by those who created the wiki entries?
You can't expect people who submitted a source several years ago to monitor it daily for production issues, change of url, broken certificates and so on.
comment:7 by , 4 years ago
As for clues... Well for each error you have to dig: has the layer name changed, has the server url changed, has the server been decommissioned, are they blocking German IP addresses, and so on.
comment:8 by , 4 years ago
You can't expect people who submitted a source several years ago to monitor it daily for production issues, change of url, broken certificates and so on.
My idea is something like an automatic filtering of the wrong entries so that JOSM can still show the entry but with a flag that it is probably not working because of errors in the wiki definition.
Well for each error you have to dig.
Does that mean you contact someone and ask or do you have tools to find that out?
follow-up: 10 comment:9 by , 4 years ago
Replying to Don-vip:
- regression from JUnit 5 migration which disabled the parallel execution
Oops. I'll see if I can fix that.
comment:10 by , 4 years ago
Replying to taylor.smock:
Oops. I'll see if I can fix that.
It's OK, I should just have to configure the job to add required properties. It's pure Jenkins configuration, I can do it without modifying anything in the source tree I think.
comment:11 by , 4 years ago
There is a JUnit
annotation for running parallel tests.
Unfortunately, it is currently considered experimental.
Anyway, I'll post a patch for it (I finished up the work for the patch about the time you posted).
Its fairly tiny. The hard part (for me) is extracting it from the rest of the stuff I've been modifying for #16567, and then ensuring it applies cleanly (for you).
by , 4 years ago
Attachment: | 20433.patch added |
---|
Enable parallel test execution and annotate ImageryPreferenceTestIT#testImageryEntryValidity with @Execution(ExecutionMode.CONCURRENT)
follow-up: 13 comment:12 by , 4 years ago
Doesn't setting the property at this location enable parallel mode for ALL tests?
comment:13 by , 4 years ago
Replying to Don-vip:
Doesn't setting the property at this location enable parallel mode for ALL tests?
-Djunit.jupiter.execution.parallel.enabled=true
just allows the @Execution
annotations to be used.
If I added -Djunit.jupiter.execution.parallel.mode.default=concurrent
, then yes.
Link to docs: https://junit.org/junit5/docs/5.7.0/user-guide/#writing-tests-parallel-execution (if you want to look at all the various config options).
comment:14 by , 4 years ago
And of course I totally forgot to mention that I committed r17474 which is the main reason for the recent duration increase.
follow-up: 17 comment:15 by , 4 years ago
@Taylor ok thank you, I didn't know about the annotation and thought the properties were the only way to go.
comment:17 by , 4 years ago
Replying to Don-vip:
@Taylor ok thank you, I didn't know about the annotation and thought the properties were the only way to go.
No problem. There is a lot of new things in JUnit 5, and I am pretty certain I know less than half of the new features. In this case, I assumed that JUnit wouldn't force tests to be all parallel or all sequential. And to be fair, a good portion is marked as experimental (specifically, in this case, the concurrent execution).
Let me know how it worked on the server -- it took ~260 minutes on my machine with and without the patch.
comment:18 by , 4 years ago
Test still takes 7 hours as follows:
- ImageryPreferenceTestIT.AR-ign-wms => 3h39
- ImageryPreferenceTestIT.AR-Mapa-Educativo-wms => 1h00
- 20 tests take more than 1 min
- 20 tests take between 15s and 1 min
comment:20 by , 4 years ago
Zalitoar made a great PR to fix Argentina sources: https://github.com/osmlab/editor-layer-index/pull/1058/files
comment:21 by , 4 years ago
Milestone: | → Longterm |
---|
comment:23 by , 4 years ago
Several tests are failing because of invalid bounding boxes (see #20354).
Are the BBOX values hard coded or calculated?
comment:25 by , 4 years ago
Yes the Argentina entries make everything hang. You can help me by fixing them. See the ELI PR above.
comment:26 by , 4 years ago
Keywords: | imagery jenkins added |
---|---|
Summary: | Is Jenkins job "JOSM-Imagery-Integration" OK? → Imagery Integration tests |
follow-up: 29 comment:28 by , 4 years ago
Review if the changes are OK and update JOSM wiki accordingly.
follow-up: 33 comment:29 by , 4 years ago
Replying to Don-vip:
Review if the changes are OK and update JOSM wiki accordingly.
I fear you are ten steps ahead. I have zil knowledge about this stuff, I just use background images in JOSM and I understand that this is about the entries that I see in the corresponding JOSM menu.
So far I've cloned https://github.com/osmlab/editor-layer-index.git and I found https://josm.openstreetmap.de/wiki/Maps#Documentation and started to read, but got lost in all the details.
I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow? I don't know how to use the PR in my local clone cause I don't know how to use git / GitHub etc.
Do I 1st have to learn git to help with this or can I skip this somehow?
follow-up: 31 comment:30 by , 4 years ago
I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow?
If you have a pull URL like https://github.com/osmlab/editor-layer-index/pull/1058/files add a .patch or .diff behind the number like https://github.com/osmlab/editor-layer-index/pull/1058.patch :-)
follow-up: 32 comment:31 by , 4 years ago
Replying to stoecker:
I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow?
If you have a pull URL like https://github.com/osmlab/editor-layer-index/pull/1058/files add a .patch or .diff behind the number like https://github.com/osmlab/editor-layer-index/pull/1058.patch :-)
Oh, so very obvious ;)
comment:32 by , 4 years ago
Replying to GerdP:
Replying to stoecker:
I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow?
If you have a pull URL like https://github.com/osmlab/editor-layer-index/pull/1058/files add a .patch or .diff behind the number like https://github.com/osmlab/editor-layer-index/pull/1058.patch :-)
Oh, so very obvious ;)
There probably is a hidden text or button in the UI somewhere, which links to that as well. I have no idea where...
comment:33 by , 4 years ago
Replying to GerdP:
Replying to Don-vip:
Review if the changes are OK and update JOSM wiki accordingly.
I fear you are ten steps ahead. I have zil knowledge about this stuff
Do I 1st have to learn git to help with this or can I skip this somehow?
This has nothing to do with Git or GitHub. You just have to look at the changes, review them, report them to JOSM wiki when they're OK. I've disabled the job until we fix at least Argentina sources as it makes Jenkins hang everyday.
follow-up: 35 comment:34 by , 4 years ago
OK, I guess I am able to transform the changes in the PR to the syntax used in the JOSM wiki. I still have no clue what actions the review includes.
follow-up: 36 comment:35 by , 4 years ago
Replying to GerdP:
OK, I guess I am able to transform the changes in the PR to the syntax used in the JOSM wiki.
Not necessary. Call "ant imageryindexdownload". You'll get the recent ELI file in our XML format.
I still have no clue what actions the review includes.
Essentially check if the changes improve the situation or are incorrect. Sadly we can't rely on any changes from ELI to be correct.
Any changes which improve the situation should be copied. Everything else added to ignore list.
comment:36 by , 4 years ago
Replying to stoecker:
Replying to GerdP:
OK, I guess I am able to transform the changes in the PR to the syntax used in the JOSM wiki.
Not necessary. Call "ant imageryindexdownload". You'll get the recent ELI file in our XML format.
My understanding is that the PR was not yet applied. I may be wrong with that.
I still have no clue what actions the review includes.
Essentially check if the changes improve the situation or are incorrect. Sadly we can't rely on any changes from ELI to be correct.
Any changes which improve the situation should be copied. Everything else added to ignore list.
Well, that's my problem. I don't know what changes I should look for and how to decide if new is better than old. E.g. I see that the PR removes a line containing "EPSG:4326",
How do I know if this is a good idea or not? It also adds new entries. What has to be done to verify those?
comment:37 by , 4 years ago
Put the projection changes aside unless they are absolutely needed (by testing the entry in JOSM). They made a lot of changes in this area and I need to update the integration tests to test the projections better. Focus on URLs. The timeout we observe on Jenkins likely come from a bad URL.
comment:38 by , 4 years ago
OK, but don't wait for me. This morning I tried a few things image layers in AR and got all kinds of messages that I don't yet understand. No progress so far..
comment:39 by , 4 years ago
It looks like there several problems in the test code:
- no parallel execution
- more than 10'000 retries on the same server, even when the server return 403 or even worst run into a timeout
Here is a short analyse of the last run.
10'045 requests for http://mapa.educacion.gob.ar/geoserver/wms always returning 403 (taking one hour):
[junitlauncher] 2021-03-08 00:14:09.839 INFO: GET http://mapa.educacion.gob.ar/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=publico:analfabetismo_depto_2010&STYLES=&SRS=EPSG:2393&WIDTH=512&HEIGHT=512&BBOX=3486281.9505065,-30183771.1349102,23523790.2932958,-10146262.7921210 -> HTTP/1.1 403 (531 ms; 1018 B) ... [junitlauncher] 2021-03-08 01:15:32.266 INFO: GET http://mapa.educacion.gob.ar/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=publico:analfabetismo_depto_2010&STYLES=&SRS=EPSG:30791&WIDTH=512&HEIGHT=512&BBOX=-12080989.0463769,-6480670.3377719,-12061421.1671358,-6461102.4585309 -> HTTP/1.1 403 (253 ms; 1018 B)
203 request for http://geoadmin.agroindustria.gob.ar, ending with a "Read time out" after 5 minutes - for each request(!)
[junitlauncher] 2021-03-08 01:17:00.271 INFO: Skipping unsupported image format utfgrid [junitlauncher] 2021-03-08 01:22:05.026 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out [junitlauncher] 2021-03-08 01:22:05.024 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:4131&WIDTH=512&HEIGHT=512&BBOX=-102.6664550,-270.0080852,257.3335450,89.9919148 -> !!! (5 min 0 s) [junitlauncher] java.net.SocketTimeoutException: Read timed out removed call stack [junitlauncher] 2021-03-08 01:27:05.374 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:5463&WIDTH=512&HEIGHT=512&BBOX=486459.2657183,-30187847.7236765,40561475.9512968,9887168.9619020 -> !!! (5 min 0 s) [junitlauncher] 2021-03-08 01:27:05.375 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out [junitlauncher] java.net.SocketTimeoutException: Read timed out removed call stack [junitlauncher] 2021-03-08 01:32:05.727 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:6794&WIDTH=512&HEIGHT=512&BBOX=-9412103.8675231,-33695717.9308253,30662912.8180554,6379298.7547532 -> !!! (5 min 0 s) [junitlauncher] 2021-03-08 01:32:05.727 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out [junitlauncher] java.net.SocketTimeoutException: Read timed out removed call stack [junitlauncher] 2021-03-08 01:37:06.087 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out [junitlauncher] java.net.SocketTimeoutException: Read timed out removed call stack ... [junitlauncher] 2021-03-08 18:13:16.875 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out [junitlauncher] java.net.SocketTimeoutException: Read timed out removed call stack [junitlauncher] 2021-03-08 18:13:16.874 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:3673&WIDTH=512&HEIGHT=512&BBOX=-11782348.4487659,-25634747.7019831,28292668.2368126,14440268.9835954 -> !!! (5 min 0 s)
If there are also 10'000 calls to be expected, the test will run another 34 days!
comment:40 by , 4 years ago
@mdk: What input did you use for your analyses?
I tried to find the wiki entries which might produce the 403 messages. I updated the "default entries" in JOSM to refresh cached file mirror_https___josm.openstreetmap.de_maps
. It contains only one url that starts with mapa.educacion.gob.ar/geoserver but neither the bounds nor the projections from the log in comment:appear in the file, so I wonder what data the unit test is testing?
<url><![CDATA[http://mapa.educacion.gob.ar/geoserver/ows?service=wms&version=1.1.1&request=GetCapabilities]]></url> <entry> <name>Educational map (WMS)</name> <name lang="es">Mapa Educativo (WMS)</name> <id>Mapa-Educativo-wms</id> <category>map</category> <type>wms_endpoint</type> <url><![CDATA[http://mapa.educacion.gob.ar/geoserver/ows?service=wms&version=1.1.1&request=GetCapabilities]]></url> <permission-ref>https://datos.gob.ar/acerca/seccion/Marco%20legal</permission-ref> <projections> <code>CRS:84</code> <code>EPSG:4326</code> <code>EPSG:3857</code> </projections>
The files in editor-layer-index don't contain the url. I really get frustrated here because I have no clue where to start.
What I am missing is something like the "steps to reproduce" in the TRAC tickets. I don't know what the problem is, I don't know how to reproduce it.
comment:41 by , 4 years ago
I just take a look at the console output of the failed (aborted) JOSM-Imagery-integration Jenkins job: https://josm.openstreetmap.de/jenkins/job/JOSM-Imagery-Integration/2642/jdk=JDK8/consoleFull
I don't know where the test get gets the URLs from.
comment:42 by , 4 years ago
The URL in https://josm.openstreetmap.de/wiki/Maps/Argentina is http://mapa.educacion.gob.ar/geoserver/ows?service=wms&version=1.1.1&request=GetCapabilities
As far as I understood request=GetCapabilities
returns an XML with all possible layers and projections. In this case there are about 6673 <SRS>
elements with different projections. Maybe the test code has a generic method to generate all possible URLs from this XML.
comment:43 by , 4 years ago
Ah, thanks, that helps to understand a bit more. Now, what should happen after the first "server return 403"?
comment:44 by , 4 years ago
My simple approach would be to remove the wiki entry "Educational map (WMS)" from https://josm.openstreetmap.de/wiki/Maps/Argentina#EducationalmapWMS
comment:45 by , 4 years ago
Here some information about the ows
in the URL:
OWS is not a protocol. It's a stand-in term for (I believe) OGC Web Service - basically it means its an endpoint that could be hosting any of the OGC services. It's commonly seen on GeoServer endpoints.
So for example:
http://www.example.com/geoserver/wms - would in theory be an endpoint for just WMS.
Whereas
http://www.example.com/geoserver/ows - would be an endpoint that could serve any of WMS, WFS, WCS, WMTS.
The problem is, that anytime another server could fail. In case of a timeout the test could easily run a month this way. So I think the assumed generic method should be changed. If a server fails with 403 or timeout, the code should stop generation further tests.
As a workaround these two servers should be comment out:
- https://josm.openstreetmap.de/wiki/Maps/Argentina#EducationalmapWMS
- https://josm.openstreetmap.de/wiki/Maps/Argentina#MinistryofAgroindustryWMS
But maybe there are more tests which may fail this way afterwards.
Perhaps we should think about using a "normal" URL instead of the generic approach. Also we should check if the EducationalmapWMS return 403 "forbidden"
because we treat the server with 10k+ requests each time the test is running. An admin of this site could block our test server URL because of abuse.
Probably should be reduced to once every 24h?