Bug 1398 - OpenGL context operations must be executed on the main thread
Summary: OpenGL context operations must be executed on the main thread
Status: IN_PROGRESS
Alias: None
Product: Jogl
Classification: JogAmp
Component: macosx (show other bugs)
Version: 2.4.0
Hardware: All macosx
: P4 major
Assignee: Sven Gothel
URL:
Depends on:
Blocks:
 
Reported: 2019-10-22 17:48 CEST by jani
Modified: 2020-02-22 18:33 CET (History)
2 users (show)

See Also:
Type: DEFECT
SCM Refs:
4fad4869d4a929739c830f6ce3ac171d8dd5427a f4f92cdc0eb89c62070a865601527097e6d5cc72 ff780fc11602fb79a7ce1dcf879fdaeb865b9fa8 78b96b89a68ff35969aea83de294cd3cc1178f26
Workaround: ---


Attachments
Moves OpenGL context manipulations to main thread (2.33 KB, patch)
2019-10-22 17:56 CEST, jani
Details | Diff
Corrected patch to also work on macOS 10.8 (2.09 KB, patch)
2019-11-19 22:14 CET, jani
Details | Diff
Test case (11.58 KB, patch)
2020-02-07 22:25 CET, jani
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description jani 2019-10-22 17:48:20 CEST
Crashed Thread:        4

Exception Type:        EXC_BAD_INSTRUCTION (SIGABRT)
Exception Codes:       0x0000000000000001, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
-[NSOpenGLContext setView:] must be called from the main thread.

Thread 4 Crashed:
0   libsystem_kernel.dylib        	0x00007fff6eec547a __pthread_kill + 10
1   libsystem_pthread.dylib       	0x00007fff6ef82707 pthread_kill + 384
2   libsystem_c.dylib             	0x00007fff6ee4da08 abort + 120
3   libjvm.dylib                  	0x000000011fd1daa1 os::abort(bool) + 25
4   libjvm.dylib                  	0x000000011fe45e96 VMError::report_and_die() + 2304
5   libjvm.dylib                  	0x000000011fd1f6e6 JVM_handle_bsd_signal + 1131
6   libjvm.dylib                  	0x000000011fd1b92b signalHandler(int, __siginfo*, void*) + 47
7   libsystem_platform.dylib      	0x00007fff6ef77b1d _sigtramp + 29
8   com.apple.GeForceGLDriver     	0x00007fff3259720c 0x7fff32299000 + 3138060
9   libjogl_desktop.dylib         	0x0000000134ca3474 createContext + 388
10  libjogl_desktop.dylib         	0x0000000134d0a8c5 Java_jogamp_opengl_macosx_cgl_CGL_createContext0__JJZJZLjava_lang_Object_2I + 149


This also occurs in updateContext and setContextView.
Comment 1 jani 2019-10-22 17:50:47 CEST
BTW, this occurs on MacOS Catalina (10.15).
Comment 2 jani 2019-10-22 17:56:19 CEST
Created attachment 824 [details]
Moves OpenGL context manipulations to main thread
Comment 3 jani 2019-11-19 22:14:50 CET
Created attachment 825 [details]
Corrected patch to also work on macOS 10.8

On macOS 10.8 resources may get released prior to the code block executing on the main thread causing objc_msgSend to a deallocated instance.
Comment 4 Sven Gothel 2019-11-29 02:53:24 CET
Thank you. 
AFAIK I have already resolved this issue in one of my latest commits.
I will check this soon.
Earmarked for version 2.4.0 ofc.
Comment 5 Sven Gothel 2019-12-08 08:37:17 CET
Duplicate of Bug 1393

AFAIK fixes related to main-thread in Bug 1393 comment 2,
or commit b12a80e386b12d9d8fa63cf07124f8da989dcd04

    Bug 1393: Run orderFront0(=setVisible) async off-thread on AppKit after sync AppKit NSWindow creation
    
    MacOS 10.14.6 + OpenJDK11U produces occasional freezes on AppKit Main Thread
    
    Latest manual tests after resolving Bug 1389
    disclosed a few occasional freezes using NEWT + Java11.
    
    These are related to probable AWT changes since Java8,
    as these do not occur with Java8.
    
    Fix: Spun off orderFront0(=setVisible) async off-thread on AppKit after sync AppKit NSWindow creation.
    
    This fix also aligns the macos createWindow code with the new simplified ios implementation,
    see commit 004c67c73a0309158c30929cd0d6513e23f34803

commit e33aa16904d8abddaeceb1374ffa45bd45a96210

and

commit 7e76df3a05b7eb2404cb4584ee0b34ea287eb9bf (validation)

Please reopen 1393 (if I closed it and bugs persist)

*** This bug has been marked as a duplicate of bug 1393 ***
Comment 6 jani 2020-01-06 18:13:34 CET
I have just reproduced this issue after pulling the latest master and typecasting all (except junit) position(int), limit​(int), flip​(), clear​(), mark​(), reset​(), and rewind​() NIO buffer calls I could find. I'm currently building with JDK9 but a need the software to run on JRE8 hence the typecast requirement.

Applying the attached patch fixes the issue.
Comment 7 Sven Gothel 2020-02-07 07:08:09 CET
(In reply to jani from comment #6)

It would help, if you could provide a unit test case
reproducing this issue.
As of today, this bug doesn't show up in our unit tests,
as they all pass on latest MacOS update.

Regardless, let me double check on your patch

1) NSOpenGLContext* createContext(NSOpenGLContext* share, NSView* view, ..
Java: CGL.createContext(long shareContext, long view, ..

[ctx setView:view]; -> main thread blocking

2) void setContextView(NSOpenGLContext* ctx, NSView* view)
Java: CGL.setContextView(long ctx, long view)

[ctx setView:view]; -> main thread blocking

3) void updateContext(NSOpenGLContext* ctx)
Java: CGL.updateContext(long ctx)

[ctx update]; -> main thread blocking

+++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++

So this change would render all three method blocking until done,
potentially introducing a deadlock.
None of these methods were blocking before.

Now let me check their caller...

3) Java: CGL.updateContext(long ctx)

3.1) Caller 'jogamp.opengl.macosx.cgl.MacOSXCGLContext.drawableUpdatedNotify()'
I added a 'useAppKit' flag in 'jogamp.opengl.macosx.cgl.MacOSXCGLContext'
for testing months ago. 
Due to issues running on 'RunOnMainThread blocking', 
I have kept this to false.

I would need to retest also against your unit test.

Sideeffects to spawning of methods to the MainThread are always sensitive in nature, prone to cause a deadlock.

To provide a reliable resolution, I very often need to allow the caller to accept a delayed async transaction, i.e. not expecting the result yet and hence perform the action on MainThread _without_ blocking.

See evolution of Bug 1393, disclosing some of the sensitivity in this domain.

NOTICE: Same is true for 'jogamp.opengl.macosx.cgl.MacOSXCGLContext.getUpdateHandle()'
which also acts on the 'useAppKit' flag.

+++

2) Java: CGL.setContextView(long ctx, long view)

2.1) Caller 'jogamp.opengl.macosx.cgl.MacOSXCGLContext.NSOpenGLImpl.associateDrawable(boolean)'

CGL.setContextView(long ctx, long view) only called if 'null==backingLayerHost',
aka older onscreen Java-AWT on MacOS?

Good news this method potentially can be issued deferred on MainThread, non-blocking. This is already done for the 

However, I don't see the use case for onscreen Java-AWT,
so only a NEWT onscreen case might be related.

Would like to see your unit test.

+++

1) Java: CGL.createContext(long shareContext, long view, ..

1.1) Caller: 'jogamp.opengl.GLContextImpl.createContextARB(long, boolean)'

Probably as hard to resolve as 3.1)

+++++++++++++++++++++++

Bottom line, it might be OK to at least see to a non blocking resolution.

Information I need to have first:

A) Unit test exposing this issue.

B) Detailed description of running system exposing this issue.

C) Original application 'suffering' having this issue.

Please note that my systems unit tests all pass on Catalina with latest XCode 11.
I also tested with property 'nativewindow.debug.OSXUtil.MainThreadChecker' set,
i.e. running using MacOS MainThreadChecker.
While some soft issues may still occur, no crash has been experienced.
See Bug 1393 comment 2

But I am sure we will resolve this issue for all parties involved if committed.
Comment 8 jani 2020-02-07 22:25:01 CET
Created attachment 838 [details]
Test case

System info:

Model: MacBookPro11,3, BootROM 157.0.0.0.0, 4 processors, Quad-Core Intel Core i7, 2.5 GHz, 16 GB, SMC 2.19f12
Graphics: kHW_IntelIrisProItem, Intel Iris Pro, spdisplays_builtin
Graphics: kHW_NVidiaGeForceGTX750MItem, NVIDIA GeForce GT 750M, spdisplays_pcie_device, 2 GB
Memory Module: BANK 0/DIMM0, 8 GB, DDR3, 1600 MHz, 0x80AD, 0x484D54343147533641465238412D50422020
Memory Module: BANK 1/DIMM0, 8 GB, DDR3, 1600 MHz, 0x80AD, 0x484D54343147533641465238412D50422020
AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0x134), Broadcom BCM43xx 1.0 (7.77.105.1 AirPortDriverBrcmNIC-1429)
Bluetooth: Version 7.0.0f8, 3 services, 27 devices, 1 incoming serial ports
Network Service: Wi-Fi, AirPort, en0
Serial ATA Device: APPLE SSD SM0512F, 500.28 GB
USB Device: USB 3.0 Bus
USB Device: Apple Internal Keyboard / Trackpad
USB Device: BRCM20702 Hub
USB Device: Bluetooth USB Host Controller
Thunderbolt Bus: MacBook Pro, Apple Inc., 17.1


Command line:

./make.sh && ./Bug1398macOSContextOpsOnMainThread /Library/Java/JavaVirtualMachines/adoptopenjdk11.jdk/Contents/Home/lib/jli/libjli.dylib


The attachment contains a diff for a portion of the native launcher code from the "suffering" application and a simple Java class to reproduce the issue. There is no need for the JFrame/GLCanvas etc...only thing needed is:

static {
        GLProfile.initSingleton();
}


I also pulled the repos today before testing to make sure I was up-to-date.
Comment 9 Sven Gothel 2020-02-08 12:29:21 CET
(In reply to jani from comment #8)
Excellent, thank for the test case and description jani.

So I assume this test case reliably reproduces the issue
or is it sporadic?

I will test hopefully be able to test this case soon.

We have to see if this is just a corner case of using XCode < 11
or not.

However, I agree with the general sentiment that an overall solution 
for the setView and createContext should be found.
I continue thinking about such and will tackle it from the java side.

Goal is to have these items async on MainThread if anyhow possible,
similar to the calayer invocations.
This guarantees a dead-lock free implementation.

I also understand the setView case for non calayer invocation now,
so all use case question should be resolved by now.
Comment 10 jani 2020-02-08 14:27:45 CET
It is repeatable every time. If I build with [ctx setView] it will always crash, if I push it to main thread it always works and I have not seen any deadlock issues. I've had a patched version of JOGL bundled with a product since Nov 2019 and have not heard of issues so far.

Here is my build setup:

Xcode 11.1
Build version 11A1027

openjdk version "11.0.5" 2019-10-15
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.5+10)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.5+10, mixed mode)


Let me know if there is anything else I can do or help with. Maybe I can take a look at how the CALayer calls are handled and see if I can figure how to make my patch async :D
Comment 11 Sven Gothel 2020-02-08 19:50:38 CET
(In reply to jani from comment #10)
> It is repeatable every time. If I build with [ctx setView] it will always
> crash, 

Good that your test case crashes reliably, which eases validation

> if I push it to main thread it always works and I have not seen any
> deadlock issues. 
> I've had a patched version of JOGL bundled with a product since Nov 2019 and 
> have not heard of issues so far.

That is hardly a qualifying statement.

All unit tests passed as-is <https://jogamp.org/chuck/job/jogl/1503/label=macosx-10_6-x86_64-nvidia/testReport/>

Plus validation from a application maintainer testing on MacOS.

As I wrote in comment 7, I already ran into a deadlock with a similar change.

Further I would like to identify the difference which causes the crash
versus our normal working use-case.

+++

> Let me know if there is anything else I can do or help with. Maybe I can take a 
> look at how the CALayer calls are handled and see if I can figure how to make 
> my patch async :D

I will comment on this next week.
Comment 12 Sven Gothel 2020-02-22 18:33:13 CET
Thank you Jani for elaborating on this issue here.

I finally was able to reproduce the issue and figured out
the culprit introducing this crash
within '[NSOpenGLContext setView:]'.

XCode 11 _with_ using its default SDK 'macosx10.15' introduces this crash,
while using SDK 'macosx10.11' does not (even using XCode 11).

Below a list of commits adding this native test.

I will follow up with a fix.

Thank you.

+++

commit 4fad4869d4a929739c830f6ce3ac171d8dd5427a

    Bug 1398: Importing Jani's native test, attempting to reproduce the crash using XCode 11 and JOGL 2.3.2 as well as current tip
    
    This change imports 'jani@nexcus.com' patch as
    reported and provided in Bug 1398.
    
    I can not execute this patch properly,
    as received a error message regarding missing Info.plist at start.


commit f4f92cdc0eb89c62070a865601527097e6d5cc72

    Bug 1398: Fixing native test allowing its execution using JOGL 2.3.2 and latest tip using XCode 11
    
    Changes to test
    - Using own 'NSApplicationMain' entry to avoid the 'missing Info.plist' message!
    
    - Configurable CLASSPATH and LIBPATH at compile time.
      Note that the java.library.path is now hardcoded as well.
    
    - Don't close stderr in test, just fflush
    
    - Don't close the JVM after launch via 'die(env)', let it run.
    
    - Java: Add GLEventListener RedSquareES2 to see something in action @ 30fps
    
    - Java: GLCanvase visible bounds


commit ff780fc11602fb79a7ce1dcf879fdaeb865b9fa8

    Bug 1398: Refine test case: Make classpath and libpath runtime configurable + show JOGL version


commit 78b96b89a68ff35969aea83de294cd3cc1178f26

    Bug 1398: Crash only occurs @ -[NSOpenGLContext setView:] when using XCode 11 _and_ its default SDK 'macosx10.15'
    
    This patch demonstrates that using the SDK 'macosx10.11' does not cause the crash @ -[NSOpenGLContext setView:].
    
    SDK 'macosx10.15' enforces Apple's own Cargo Cult of 'main-thread' by throwing a SIGILL signal (or SIGABRT)
    - essentially an exception.
    
    This surely renders our code officially invalid due to this policy,
    i.e. we are not allowed to issue [* setView] on any non main-thread.
    
    +++
    
    The crash occurs independently of used Java version on Java 8 - 11,
    as well as on JogAmp 2.3.2 - current master tip.
    
    +++
    
    The initial remedy to issue said action on the main-thread in a blocking/wait
    manner has the risk to deadlock, due to
    
    1) [NSOpenGLContext setView:] itself using a mutex (Thanks to Ken Harris's analysis)
    
    and
    
    (2) in case where we are 'thread hopping':
    - [main-thread] Event like 'window ready' -> kick off action on EDT-thread *blocking*
    - [EDT-thread] Create stuff incl OpenGLContext -> kick off setView on main-thread *blocking*
    
    This has to be further investigated.
    
    This crash finally has been reliably reproduced now.