Modify

Opened 4 years ago

Last modified 4 years ago

#20433 assigned task

Imagery Integration tests

Reported by: GerdP Owned by: Don-vip
Priority: normal Milestone: Longterm
Component: Unit tests Version:
Keywords: imagery jenkins Cc: Don-vip, stoecker

Description

IIGTR the job is submitted every 6 hours but its runtime is > 7 hours.

Attachments (1)

20433.patch (1.9 KB ) - added by taylor.smock 4 years ago.
Enable parallel test execution and annotate ImageryPreferenceTestIT#testImageryEntryValidity with @Execution(ExecutionMode.CONCURRENT)

Download all attachments as: .zip

Change History (47)

comment:1 by stoecker, 4 years ago

Probably should be reduced to once every 24h?

comment:2 by Don-vip, 4 years ago

Owner: changed from team to Don-vip
Status: newassigned

It's a combination of:

  • regression from JUnit 5 migration which disabled the parallel execution
  • the cumulative increasing number of timeout errors

I changed the cron to once per 24h, and still have the parallel execution on my radar.

Any help to fix imagery sources which timeout will help to reduce the duration of this test and provide working entries to our users.

@Gerd what does "IIGTR" mean?

Last edited 4 years ago by Don-vip (previous) (diff)

comment:3 by Don-vip, 4 years ago

Component: CoreUnit tests

comment:4 by GerdP, 4 years ago

IIGTR: If I got that right

comment:5 by GerdP, 4 years ago

reg. help: I have no clue about all the parameters used in the WMS/TMS definitions or how the results of the test help to find what has to be changed :(
Isn't that something that has to be done by those who created the wiki entries?

in reply to:  5 comment:6 by Don-vip, 4 years ago

Replying to GerdP:

Isn't that something that has to be done by those who created the wiki entries?

You can't expect people who submitted a source several years ago to monitor it daily for production issues, change of url, broken certificates and so on.

comment:7 by Don-vip, 4 years ago

As for clues... Well for each error you have to dig: has the layer name changed, has the server url changed, has the server been decommissioned, are they blocking German IP addresses, and so on.

comment:8 by GerdP, 4 years ago

You can't expect people who submitted a source several years ago to monitor it daily for production issues, change of url, broken certificates and so on.

My idea is something like an automatic filtering of the wrong entries so that JOSM can still show the entry but with a flag that it is probably not working because of errors in the wiki definition.

Well for each error you have to dig.

Does that mean you contact someone and ask or do you have tools to find that out?

in reply to:  2 ; comment:9 by taylor.smock, 4 years ago

Replying to Don-vip:

  • regression from JUnit 5 migration which disabled the parallel execution

Oops. I'll see if I can fix that.

in reply to:  9 comment:10 by Don-vip, 4 years ago

Replying to taylor.smock:

Oops. I'll see if I can fix that.

It's OK, I should just have to configure the job to add required properties. It's pure Jenkins configuration, I can do it without modifying anything in the source tree I think.

comment:11 by taylor.smock, 4 years ago

There is a JUnit annotation for running parallel tests.

Unfortunately, it is currently considered experimental.

Anyway, I'll post a patch for it (I finished up the work for the patch about the time you posted).

Its fairly tiny. The hard part (for me) is extracting it from the rest of the stuff I've been modifying for #16567, and then ensuring it applies cleanly (for you).

by taylor.smock, 4 years ago

Attachment: 20433.patch added

Enable parallel test execution and annotate ImageryPreferenceTestIT#testImageryEntryValidity with @Execution(ExecutionMode.CONCURRENT)

comment:12 by Don-vip, 4 years ago

Doesn't setting the property at this location enable parallel mode for ALL tests?

Last edited 4 years ago by Don-vip (previous) (diff)

in reply to:  12 comment:13 by taylor.smock, 4 years ago

Replying to Don-vip:

Doesn't setting the property at this location enable parallel mode for ALL tests?

-Djunit.jupiter.execution.parallel.enabled=true just allows the @Execution annotations to be used.

If I added -Djunit.jupiter.execution.parallel.mode.default=concurrent, then yes.

Link to docs: https://junit.org/junit5/docs/5.7.0/user-guide/#writing-tests-parallel-execution (if you want to look at all the various config options).

Last edited 4 years ago by taylor.smock (previous) (diff)

comment:14 by Don-vip, 4 years ago

And of course I totally forgot to mention that I committed r17474 which is the main reason for the recent duration increase.

comment:15 by Don-vip, 4 years ago

@Taylor ok thank you, I didn't know about the annotation and thought the properties were the only way to go.

comment:16 by Don-vip, 4 years ago

In 17478/josm:

see #16567 - see #20433 - restore parallel execution of imagery integration test

in reply to:  15 comment:17 by taylor.smock, 4 years ago

Replying to Don-vip:

@Taylor ok thank you, I didn't know about the annotation and thought the properties were the only way to go.

No problem. There is a lot of new things in JUnit 5, and I am pretty certain I know less than half of the new features. In this case, I assumed that JUnit wouldn't force tests to be all parallel or all sequential. And to be fair, a good portion is marked as experimental (specifically, in this case, the concurrent execution).

Let me know how it worked on the server -- it took ~260 minutes on my machine with and without the patch.

comment:18 by Don-vip, 4 years ago

Test still takes 7 hours as follows:

  • ImageryPreferenceTestIT.AR-ign-wms => 3h39
  • ImageryPreferenceTestIT.AR-Mapa-Educativo-wms => 1h00
  • 20 tests take more than 1 min
  • 20 tests take between 15s and 1 min

comment:19 by Don-vip, 4 years ago

In 17517/josm:

see #20433 - don't loop over all tile sources if we face maximum allowed server time

comment:20 by Don-vip, 4 years ago

Zalitoar made a great PR to fix Argentina sources: https://github.com/osmlab/editor-layer-index/pull/1058/files

comment:21 by Don-vip, 4 years ago

Milestone: Longterm

comment:22 by Don-vip, 4 years ago

Ticket #20546 has been marked as a duplicate of this ticket.

comment:23 by mdk, 4 years ago

Several tests are failing because of invalid bounding boxes (see #20354).
Are the BBOX values hard coded or calculated?

comment:24 by GerdP, 4 years ago

The job seems to block anything else on Jenkins?

comment:25 by Don-vip, 4 years ago

Yes the Argentina entries make everything hang. You can help me by fixing them. See the ELI PR above.

comment:26 by Don-vip, 4 years ago

Keywords: imagery jenkins added
Summary: Is Jenkins job "JOSM-Imagery-Integration" OK?Imagery Integration tests

comment:27 by GerdP, 4 years ago

How can I help with this PR?

comment:28 by Don-vip, 4 years ago

Review if the changes are OK and update JOSM wiki accordingly.

in reply to:  28 ; comment:29 by GerdP, 4 years ago

Replying to Don-vip:

Review if the changes are OK and update JOSM wiki accordingly.

I fear you are ten steps ahead. I have zil knowledge about this stuff, I just use background images in JOSM and I understand that this is about the entries that I see in the corresponding JOSM menu.
So far I've cloned ​https://github.com/osmlab/editor-layer-index.git and I found https://josm.openstreetmap.de/wiki/Maps#Documentation and started to read, but got lost in all the details.

I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow? I don't know how to use the PR in my local clone cause I don't know how to use git / GitHub etc.
Do I 1st have to learn git to help with this or can I skip this somehow?

Last edited 4 years ago by stoecker (previous) (diff)

comment:30 by stoecker, 4 years ago

I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow?

If you have a pull URL like ​https://github.com/osmlab/editor-layer-index/pull/1058/files add a .patch or .diff behind the number like ​https://github.com/osmlab/editor-layer-index/pull/1058.patch :-)

in reply to:  30 ; comment:31 by GerdP, 4 years ago

Replying to stoecker:

I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow?

If you have a pull URL like ​https://github.com/osmlab/editor-layer-index/pull/1058/files add a .patch or .diff behind the number like ​https://github.com/osmlab/editor-layer-index/pull/1058.patch :-)

Oh, so very obvious ;)

in reply to:  31 comment:32 by stoecker, 4 years ago

Replying to GerdP:

Replying to stoecker:

I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow?

If you have a pull URL like ​https://github.com/osmlab/editor-layer-index/pull/1058/files add a .patch or .diff behind the number like ​https://github.com/osmlab/editor-layer-index/pull/1058.patch :-)

Oh, so very obvious ;)

There probably is a hidden text or button in the UI somewhere, which links to that as well. I have no idea where...

in reply to:  29 comment:33 by Don-vip, 4 years ago

Replying to GerdP:

Replying to Don-vip:

Review if the changes are OK and update JOSM wiki accordingly.

I fear you are ten steps ahead. I have zil knowledge about this stuff
Do I 1st have to learn git to help with this or can I skip this somehow?

This has nothing to do with Git or GitHub. You just have to look at the changes, review them, report them to JOSM wiki when they're OK. I've disabled the job until we fix at least Argentina sources as it makes Jenkins hang everyday.

comment:34 by GerdP, 4 years ago

OK, I guess I am able to transform the changes in the PR to the syntax used in the JOSM wiki. I still have no clue what actions the review includes.

in reply to:  34 ; comment:35 by stoecker, 4 years ago

Replying to GerdP:

OK, I guess I am able to transform the changes in the PR to the syntax used in the JOSM wiki.

Not necessary. Call "ant imageryindexdownload". You'll get the recent ELI file in our XML format.

I still have no clue what actions the review includes.

Essentially check if the changes improve the situation or are incorrect. Sadly we can't rely on any changes from ELI to be correct.

Any changes which improve the situation should be copied. Everything else added to ignore list.

in reply to:  35 comment:36 by GerdP, 4 years ago

Replying to stoecker:

Replying to GerdP:

OK, I guess I am able to transform the changes in the PR to the syntax used in the JOSM wiki.

Not necessary. Call "ant imageryindexdownload". You'll get the recent ELI file in our XML format.

My understanding is that the PR was not yet applied. I may be wrong with that.

I still have no clue what actions the review includes.

Essentially check if the changes improve the situation or are incorrect. Sadly we can't rely on any changes from ELI to be correct.

Any changes which improve the situation should be copied. Everything else added to ignore list.

Well, that's my problem. I don't know what changes I should look for and how to decide if new is better than old. E.g. I see that the PR removes a line containing "EPSG:4326",
How do I know if this is a good idea or not? It also adds new entries. What has to be done to verify those?

comment:37 by Don-vip, 4 years ago

Put the projection changes aside unless they are absolutely needed (by testing the entry in JOSM). They made a lot of changes in this area and I need to update the integration tests to test the projections better. Focus on URLs. The timeout we observe on Jenkins likely come from a bad URL.

comment:38 by GerdP, 4 years ago

OK, but don't wait for me. This morning I tried a few things image layers in AR and got all kinds of messages that I don't yet understand. No progress so far..

comment:39 by mdk, 4 years ago

It looks like there several problems in the test code:

  • no parallel execution
  • more than 10'000 retries on the same server, even when the server return 403 or even worst run into a timeout

Here is a short analyse of the last run.

10'045 requests for http://mapa.educacion.gob.ar/geoserver/wms always returning 403 (taking one hour):

[junitlauncher] 2021-03-08 00:14:09.839 INFO: GET http://mapa.educacion.gob.ar/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=publico:analfabetismo_depto_2010&STYLES=&SRS=EPSG:2393&WIDTH=512&HEIGHT=512&BBOX=3486281.9505065,-30183771.1349102,23523790.2932958,-10146262.7921210 -> HTTP/1.1 403 (531 ms; 1018 B)
...
[junitlauncher] 2021-03-08 01:15:32.266 INFO: GET http://mapa.educacion.gob.ar/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=publico:analfabetismo_depto_2010&STYLES=&SRS=EPSG:30791&WIDTH=512&HEIGHT=512&BBOX=-12080989.0463769,-6480670.3377719,-12061421.1671358,-6461102.4585309 -> HTTP/1.1 403 (253 ms; 1018 B)

203 request for http://geoadmin.agroindustria.gob.ar, ending with a "Read time out" after 5 minutes - for each request(!)

[junitlauncher] 2021-03-08 01:17:00.271 INFO: Skipping unsupported image format utfgrid
[junitlauncher] 2021-03-08 01:22:05.026 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out
[junitlauncher] 2021-03-08 01:22:05.024 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:4131&WIDTH=512&HEIGHT=512&BBOX=-102.6664550,-270.0080852,257.3335450,89.9919148 -> !!! (5 min 0 s)
[junitlauncher] java.net.SocketTimeoutException: Read timed out
removed call stack
[junitlauncher] 2021-03-08 01:27:05.374 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:5463&WIDTH=512&HEIGHT=512&BBOX=486459.2657183,-30187847.7236765,40561475.9512968,9887168.9619020 -> !!! (5 min 0 s)
[junitlauncher] 2021-03-08 01:27:05.375 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out
[junitlauncher] java.net.SocketTimeoutException: Read timed out
removed call stack
[junitlauncher] 2021-03-08 01:32:05.727 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:6794&WIDTH=512&HEIGHT=512&BBOX=-9412103.8675231,-33695717.9308253,30662912.8180554,6379298.7547532 -> !!! (5 min 0 s)
[junitlauncher] 2021-03-08 01:32:05.727 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out
[junitlauncher] java.net.SocketTimeoutException: Read timed out
removed call stack
[junitlauncher] 2021-03-08 01:37:06.087 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out
[junitlauncher] java.net.SocketTimeoutException: Read timed out
removed call stack
...
[junitlauncher] 2021-03-08 18:13:16.875 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out
[junitlauncher] java.net.SocketTimeoutException: Read timed out
removed call stack
[junitlauncher] 2021-03-08 18:13:16.874 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:3673&WIDTH=512&HEIGHT=512&BBOX=-11782348.4487659,-25634747.7019831,28292668.2368126,14440268.9835954 -> !!! (5 min 0 s)

If there are also 10'000 calls to be expected, the test will run another 34 days!

comment:40 by GerdP, 4 years ago

@mdk: What input did you use for your analyses?
I tried to find the wiki entries which might produce the 403 messages. I updated the "default entries" in JOSM to refresh cached file mirror_https___josm.openstreetmap.de_maps. It contains only one url that starts with mapa.educacion.gob.ar/geoserver but neither the bounds nor the projections from the log in comment:appear in the file, so I wonder what data the unit test is testing?

<url><![CDATA[http://mapa.educacion.gob.ar/geoserver/ows?service=wms&version=1.1.1&request=GetCapabilities]]></url>
<entry>
<name>Educational map (WMS)</name>
<name lang="es">Mapa Educativo (WMS)</name>
<id>Mapa-Educativo-wms</id>
<category>map</category>
<type>wms_endpoint</type>
<url><![CDATA[http://mapa.educacion.gob.ar/geoserver/ows?service=wms&version=1.1.1&request=GetCapabilities]]></url>
<permission-ref>https://datos.gob.ar/acerca/seccion/Marco%20legal</permission-ref>
<projections>
<code>CRS:84</code>
<code>EPSG:4326</code>
<code>EPSG:3857</code>
</projections>

The files in editor-layer-index don't contain the url. I really get frustrated here because I have no clue where to start.

What I am missing is something like the "steps to reproduce" in the TRAC tickets. I don't know what the problem is, I don't know how to reproduce it.

comment:41 by mdk, 4 years ago

I just take a look at the console output of the failed (aborted) JOSM-Imagery-integration Jenkins job: https://josm.openstreetmap.de/jenkins/job/JOSM-Imagery-Integration/2642/jdk=JDK8/consoleFull
I don't know where the test get gets the URLs from.

Last edited 4 years ago by mdk (previous) (diff)

comment:42 by mdk, 4 years ago

The URL in https://josm.openstreetmap.de/wiki/Maps/Argentina is http://mapa.educacion.gob.ar/geoserver/ows?service=wms&version=1.1.1&request=GetCapabilities

As far as I understood request=GetCapabilities returns an XML with all possible layers and projections. In this case there are about 6673 <SRS> elements with different projections. Maybe the test code has a generic method to generate all possible URLs from this XML.

comment:43 by GerdP, 4 years ago

Ah, thanks, that helps to understand a bit more. Now, what should happen after the first "server return 403"?

comment:44 by GerdP, 4 years ago

My simple approach would be to remove the wiki entry "Educational map (WMS)" from https://josm.openstreetmap.de/wiki/Maps/Argentina#EducationalmapWMS

comment:45 by mdk, 4 years ago

Here some information about the ows in the URL:

OWS is not a protocol. It's a stand-in term for (I believe) OGC Web Service - basically it means its an endpoint that could be hosting any of the OGC services. It's commonly seen on GeoServer endpoints.
So for example:
http://www.example.com/geoserver/wms - would in theory be an endpoint for just WMS.
Whereas
http://www.example.com/geoserver/ows - would be an endpoint that could serve any of WMS, WFS, WCS, WMTS.

The problem is, that anytime another server could fail. In case of a timeout the test could easily run a month this way. So I think the assumed generic method should be changed. If a server fails with 403 or timeout, the code should stop generation further tests.

As a workaround these two servers should be comment out:

But maybe there are more tests which may fail this way afterwards.

Perhaps we should think about using a "normal" URL instead of the generic approach. Also we should check if the EducationalmapWMS return 403 "forbidden" because we treat the server with 10k+ requests each time the test is running. An admin of this site could block our test server URL because of abuse.

Last edited 4 years ago by mdk (previous) (diff)

comment:46 by simon04, 4 years ago

JOSM-Integration tests are hanging on Jenkins (again)...

Modify Ticket

Change Properties
Set your email in Preferences
Action
as assigned The owner will remain Don-vip.
as The resolution will be set. Next status will be 'closed'.
to The owner will be changed from Don-vip to the specified user. Next status will be 'new'.
Next status will be 'needinfo'. The owner will be changed from Don-vip to GerdP.
as duplicate The resolution will be set to duplicate. Next status will be 'closed'. The specified ticket will be cross-referenced with this ticket.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.