1428 – Performance regression in context makeCurrent() logic from 2.3.2 to 2.4.0

Bug 1428 - Performance regression in context makeCurrent() logic from 2.3.2 to 2.4.0

Summary: Performance regression in context makeCurrent() logic from 2.3.2 to 2.4.0

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	Jogl
Classification:	JogAmp
Component:	opengl (show other bugs)
Version:	2.4.0
Hardware:	All all

Importance:	P4 normal
Assignee:	Sven Gothel

URL:

Depends on:
Blocks:

Reported:	2021-04-07 20:05 CEST by Kevin Nittler
Modified:	2023-11-02 18:18 CET (History)
CC List:	1 user (show)

See Also:
Type:	DEFECT
SCM Refs:
Workaround:	---

Attachments
results (417.08 KB, application/x-7z-compressed) 2021-04-07 20:05 CEST, Kevin Nittler	Details
TestSharedContextListNEWT2 - 2.3.2 results.log (27.09 KB, text/plain) 2021-04-07 20:07 CEST, Kevin Nittler	Details
TestSharedContextListNEWT2 - 2.4.0 results.log (24.96 KB, text/plain) 2021-04-07 20:07 CEST, Kevin Nittler	Details
TestSharedContextListNEWT2 - 2.3.2 profiling.png (220.68 KB, image/png) 2021-04-07 20:08 CEST, Kevin Nittler	Details
TestSharedContextListNEWT2 - 2.4.0 profiling.png (219.79 KB, image/png) 2021-04-07 20:08 CEST, Kevin Nittler	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kevin Nittler 2021-04-07 20:05:00 CEST

Created attachment 850 [details]
results

I have an application that can have an arbitrary number of NewtCanvasAWT's, each with an associated Animator thread and GLWindow context. These contexts are all shared with a permanent "master" context that is created as a dummy GLAutoDrawable from createDummyAutoDrawable().

This system works very well in 2.3.2 even with many contexts. However, in 2.4.0 I am getting massive, intermittent hangs when the library tries to make each Animator's context current.

This change in behavior can be seen clearly in the TestSharedContextListNEWT2 junit test (at least in the case of my Windows 10 x64 platform).

The "TestSharedContextListNEWT2 - 2.X.X results.log" files show my results from the junit test in the corresponding build as well as my system info. It can be seen that 2.4.0 takes 4.135 sec to complete vs 2.773 sec for 2.2.3. Visually, the animation of the gears stutters in 2.4.0.

In the "TestSharedContextListNEWT2 - 2.X.X profiling.png" files, I extend the animation time of TestSharedContextListNEWT2 to gather some profiling information. It can be seen that 2.4.0 is spending about three times as long in the makeCurrent() logic as in 2.3.2.

I have not tested on platforms other than Windows x64. Please let me know if there is other information needed or other testing I can do.

This issue goes away when the number of contexts/Animators is reduced to one. Has something changed with the management of multiple contexts in 2.4.0? Possibly something that is resulting in more calls to makeCurrent(), or more inter-thread blocking while doing so?

Comment 1 Kevin Nittler 2021-04-07 20:07:09 CEST

Created attachment 851 [details]
TestSharedContextListNEWT2 - 2.3.2 results.log

Comment 2 Kevin Nittler 2021-04-07 20:07:39 CEST

Created attachment 852 [details]
TestSharedContextListNEWT2 - 2.4.0 results.log

Comment 3 Kevin Nittler 2021-04-07 20:08:05 CEST

Created attachment 853 [details]
TestSharedContextListNEWT2 - 2.3.2 profiling.png

Comment 4 Kevin Nittler 2021-04-07 20:08:29 CEST

Created attachment 854 [details]
TestSharedContextListNEWT2 - 2.4.0 profiling.png

Comment 5 Julien Gouesse 2021-04-07 22:40:25 CEST

It would be fine if you could test several release candidates to help us to find approximately when the regression was introduced.

Comment 6 Kevin Nittler 2021-04-07 22:44:50 CEST

(In reply to Julien Gouesse from comment #5)
Will do.

Comment 7 Kevin Nittler 2021-04-08 20:47:30 CEST

The regression occurs in gluegen commit 330dad069dee5a0cc0480cf5cd9052000004223a.

I have jogl built to commit 52cc68870604e406a7225d23f563ae035299dadc when testing this.

Comment 8 Julien Gouesse 2021-04-09 18:22:16 CEST

Are you really sure this commit causes this bug?
https://jogamp.org/cgit/gluegen.git/commit/?id=330dad069dee5a0cc0480cf5cd9052000004223a

Comment 9 Kevin Nittler 2021-04-09 20:15:25 CEST

(In reply to Julien Gouesse from comment #8)

Yes. I was very surprised to. I have been looking more into it.

I can revert all of the changes to the previous commit except for build.xml and Platform.java and still have the regression. These two files were responsible for the gluegen-rt.dll -> gluegen_rt.dll change. This implies the .dll name change is itself causing the issue. I do not understand why.

To reiterate, it doesn't appear to have anything to do with these aspects of the commit:

- Recognize new Java9+ version string as of JEP 223
- Added 'JNI_OnLoad_gluegen_rt' to recognize statical linked JNI code
- Added JNI_VERSION_1_8 to jni/jni.h

Comment 10 Kevin Nittler 2021-04-09 22:51:33 CEST

I have further confirm that the .dll name is the issue.

I can build the latest version of gluegen and jogl with the single change of 

private static final String libBaseName = "gluegen-rt";

in Platform.java (instead of "gluegen_rt"), and then manually rename the gluegen .dll that is produced by the build script from gluegen_rt.dll to gluegen-rt.dll. This fixes the issue.

I do not have enough experience with Java library loading to understand what is happening here. Maybe there are special considerations that occur when the .jar and .dll names match exactly?

I do not think it is JEP 223 related since I had the problem even when I removed JVM_JNI8.c, which I believe should prevent the static linking.

Comment 11 Julien Gouesse 2023-01-16 12:16:53 CET

Is it still reproducible with Java 17?

Comment 12 Sven Gothel 2023-01-24 08:52:49 CET

Hi Kevin,

I browsed through GLContext* changes, nothing much to see there.
Then you also claim in comment 7 - comment 10 
that it seems to be a 'weird sideeffect' of some sorts?

Glad we have a long test history with our CI, 
I do not see a performance regression here

<https://jogamp.org/chuck/job/jogl/label=windows-x86_64/lastCompletedBuild/testReport/com.jogamp.opengl.test.junit.jogl.acore/TestSharedContextListNEWT2/history/>

Can you reproduce this w/ Java17 and
https://jogamp.org/deployment/archive/master/gluegen_950-joal_668-jogl_1514-jocl_1154/

?

Please reopen for version 2.5.0 if persisting.

Comment 13 Kevin Nittler 2023-11-02 18:18:26 CET

(In reply to Sven Gothel from comment #12)

I upgraded my application to 2.5.0 and Bellsoft JDK/JRE 17.0.9.

The issue is no longer present. I still have no idea what was happening, but it is resolved.

Thank you all for your work on this project.