Bug 1428

Summary: Performance regression in context makeCurrent() logic from 2.3.2 to 2.4.0
Product: [JogAmp] Jogl Reporter: Kevin Nittler <kevin>
Component: openglAssignee: Sven Gothel <sgothel>
Status: RESOLVED WORKSFORME    
Severity: normal CC: gouessej
Priority: P4    
Version: 2.4.0   
Hardware: All   
OS: all   
Type: DEFECT SCM Refs:
Workaround: ---
Attachments: results
TestSharedContextListNEWT2 - 2.3.2 results.log
TestSharedContextListNEWT2 - 2.4.0 results.log
TestSharedContextListNEWT2 - 2.3.2 profiling.png
TestSharedContextListNEWT2 - 2.4.0 profiling.png

Description Kevin Nittler 2021-04-07 20:05:00 CEST
Created attachment 850 [details]
results

I have an application that can have an arbitrary number of NewtCanvasAWT's, each with an associated Animator thread and GLWindow context. These contexts are all shared with a permanent "master" context that is created as a dummy GLAutoDrawable from createDummyAutoDrawable().

This system works very well in 2.3.2 even with many contexts. However, in 2.4.0 I am getting massive, intermittent hangs when the library tries to make each Animator's context current.

This change in behavior can be seen clearly in the TestSharedContextListNEWT2 junit test (at least in the case of my Windows 10 x64 platform).

The "TestSharedContextListNEWT2 - 2.X.X results.log" files show my results from the junit test in the corresponding build as well as my system info. It can be seen that 2.4.0 takes 4.135 sec to complete vs 2.773 sec for 2.2.3. Visually, the animation of the gears stutters in 2.4.0.

In the "TestSharedContextListNEWT2 - 2.X.X profiling.png" files, I extend the animation time of TestSharedContextListNEWT2 to gather some profiling information. It can be seen that 2.4.0 is spending about three times as long in the makeCurrent() logic as in 2.3.2.

I have not tested on platforms other than Windows x64. Please let me know if there is other information needed or other testing I can do.

This issue goes away when the number of contexts/Animators is reduced to one. Has something changed with the management of multiple contexts in 2.4.0? Possibly something that is resulting in more calls to makeCurrent(), or more inter-thread blocking while doing so?
Comment 1 Kevin Nittler 2021-04-07 20:07:09 CEST
Created attachment 851 [details]
TestSharedContextListNEWT2 - 2.3.2 results.log
Comment 2 Kevin Nittler 2021-04-07 20:07:39 CEST
Created attachment 852 [details]
TestSharedContextListNEWT2 - 2.4.0 results.log
Comment 3 Kevin Nittler 2021-04-07 20:08:05 CEST
Created attachment 853 [details]
TestSharedContextListNEWT2 - 2.3.2 profiling.png
Comment 4 Kevin Nittler 2021-04-07 20:08:29 CEST
Created attachment 854 [details]
TestSharedContextListNEWT2 - 2.4.0 profiling.png
Comment 5 Julien Gouesse 2021-04-07 22:40:25 CEST
It would be fine if you could test several release candidates to help us to find approximately when the regression was introduced.
Comment 6 Kevin Nittler 2021-04-07 22:44:50 CEST
(In reply to Julien Gouesse from comment #5)
Will do.
Comment 7 Kevin Nittler 2021-04-08 20:47:30 CEST
The regression occurs in gluegen commit 330dad069dee5a0cc0480cf5cd9052000004223a.

I have jogl built to commit 52cc68870604e406a7225d23f563ae035299dadc when testing this.
Comment 8 Julien Gouesse 2021-04-09 18:22:16 CEST
Are you really sure this commit causes this bug?
https://jogamp.org/cgit/gluegen.git/commit/?id=330dad069dee5a0cc0480cf5cd9052000004223a
Comment 9 Kevin Nittler 2021-04-09 20:15:25 CEST
(In reply to Julien Gouesse from comment #8)

Yes. I was very surprised to. I have been looking more into it.

I can revert all of the changes to the previous commit except for build.xml and Platform.java and still have the regression. These two files were responsible for the gluegen-rt.dll -> gluegen_rt.dll change. This implies the .dll name change is itself causing the issue. I do not understand why.

To reiterate, it doesn't appear to have anything to do with these aspects of the commit:

- Recognize new Java9+ version string as of JEP 223
- Added 'JNI_OnLoad_gluegen_rt' to recognize statical linked JNI code
- Added JNI_VERSION_1_8 to jni/jni.h
Comment 10 Kevin Nittler 2021-04-09 22:51:33 CEST
I have further confirm that the .dll name is the issue.

I can build the latest version of gluegen and jogl with the single change of 

private static final String libBaseName = "gluegen-rt";

in Platform.java (instead of "gluegen_rt"), and then manually rename the gluegen .dll that is produced by the build script from gluegen_rt.dll to gluegen-rt.dll. This fixes the issue.

I do not have enough experience with Java library loading to understand what is happening here. Maybe there are special considerations that occur when the .jar and .dll names match exactly?

I do not think it is JEP 223 related since I had the problem even when I removed JVM_JNI8.c, which I believe should prevent the static linking.
Comment 11 Julien Gouesse 2023-01-16 12:16:53 CET
Is it still reproducible with Java 17?
Comment 12 Sven Gothel 2023-01-24 08:52:49 CET
Hi Kevin,

I browsed through GLContext* changes, nothing much to see there.
Then you also claim in comment 7 - comment 10 
that it seems to be a 'weird sideeffect' of some sorts?

Glad we have a long test history with our CI, 
I do not see a performance regression here

<https://jogamp.org/chuck/job/jogl/label=windows-x86_64/lastCompletedBuild/testReport/com.jogamp.opengl.test.junit.jogl.acore/TestSharedContextListNEWT2/history/>

Can you reproduce this w/ Java17 and
https://jogamp.org/deployment/archive/master/gluegen_950-joal_668-jogl_1514-jocl_1154/

?

Please reopen for version 2.5.0 if persisting.
Comment 13 Kevin Nittler 2023-11-02 18:18:26 CET
(In reply to Sven Gothel from comment #12)

I upgraded my application to 2.5.0 and Bellsoft JDK/JRE 17.0.9.

The issue is no longer present. I still have no idea what was happening, but it is resolved.

Thank you all for your work on this project.