Bug 705

Summary: JVM crashes when System.exit(0) is called w/ Mesa X11 SW Driver < 8.0
Product: [JogAmp] Jogl Reporter: Martin Hegedus <martin.hegedus>
Component: coreAssignee: Sven Gothel <sgothel>
Status: RESOLVED FIXED    
Severity: normal CC: gouessej, sgothel
Priority: P3    
Version: 2   
Hardware: pc_x86_32   
OS: linux   
Type: --- SCM Refs:
dc8d3bbbba7910b3e2453a0d7d37d2bf8a05f4b6 (Martin's native test 1) 4d7c527756f0410f31915e86815732ba912c1bee (Martin's native test 2) 5b47372590ec715647ebbd75d70c41ec7a64485a (Martin's fix) ff25c711fe4fa627004c57e0d308d06759156290 (cleanup) cea002636c4c8e1cf41288662fffc26bd24f90ac .. 94d3960757db292eeefb7acdfa1220b6b7168b98
Workaround: TRUE
Attachments: SimpleTest.java
test ouput
JVM error log for test.sh crash
test_dbg.sh output
JVM error log for test_dbg.sh crash
output (std and err) from SimpleTest
JVM error log for SimpleTest crash
output from SimpleTest with JOGL 2.0-bmanual-20111124
Version 2 of SimplTest
modified to create only one window
displayMultiple using glXCreateNewContext
patch for xdisplayclose mods

Description Martin Hegedus 2013-03-15 20:33:36 CET
Created attachment 423 [details]
SimpleTest.java

The attached SimpleTest program crashes the JVM when System.exit(0) is called.  The programs test.sh and test_dbg.sh also crash the JVM.  /deployment/archive/master/gluegen_646-joal_408-jogl_930-jocl_756/archive/jogamp-all-platforms.7z was used.  An older version of jogl, JOGL implementation version 2.0-bmanual-20111124, did not crash JVM.  (Note, I compiled the older version [2.0-bmanual-20111124] locally but used the provided build for the new one [2.0-b646-20130313].)

Attachments:
1) SimpleTest.java
Comment 1 Martin Hegedus 2013-03-15 20:35:35 CET
Ooops, hit return while still adding detail.  :(  Sorry about that.  Still adding more
Comment 2 Martin Hegedus 2013-03-15 20:43:19 CET
Created attachment 424 [details]
test ouput
Comment 3 Martin Hegedus 2013-03-15 20:44:18 CET
Created attachment 425 [details]
JVM error log for test.sh crash
Comment 4 Martin Hegedus 2013-03-15 20:45:56 CET
Created attachment 426 [details]
test_dbg.sh output
Comment 5 Martin Hegedus 2013-03-15 20:46:53 CET
Created attachment 427 [details]
JVM error log for test_dbg.sh crash
Comment 6 Martin Hegedus 2013-03-15 20:50:12 CET
Created attachment 428 [details]
output (std and err) from SimpleTest
Comment 7 Martin Hegedus 2013-03-15 20:51:40 CET
Created attachment 429 [details]
JVM error log for SimpleTest crash
Comment 8 Martin Hegedus 2013-03-15 20:57:52 CET
Created attachment 430 [details]
output from SimpleTest with JOGL 2.0-bmanual-20111124
Comment 9 Martin Hegedus 2013-03-15 20:59:08 CET
OK, finished adding attachments.  Let me know if you have any questions.
Comment 10 Martin Hegedus 2013-03-15 23:19:05 CET
I retrieved the latest version using
git clone git://jogamp.org/srv/scm/gluegen.git gluegen
git clone git://jogamp.org/srv/scm/jogl.git jogl

and compiled it with ant.

It seems to have compiled successfully, however various programs running under the test program junit.run also cause the JVM to crash.  SimpleTest still has the same behavior as when running under the latest aggregated autobuild.  If I comment out e.getWindow().dispose() then the JVM will not crash when System.exit(0) is called.
Comment 11 Sven Gothel 2013-03-20 09:14:17 CET
jogamp@jogamp05:~/projects/JOGL/jogl-bugs/705-JVMCrash$ java SimpleTest 
#SimpleTest:  Calling: GLProfile.getDefault()
libEGL warning: unsupported platform Windows
libEGL warning: unsupported platform Windows
#SimpleTest:  Calling: new GLCapabilities(glp)
#SimpleTest:  Calling: new GLCanvas(caps)
#SimpleTest:  Calling: new Frame("Test")
#SimpleTest:  Calling: frame.setSize(300, 300)
#SimpleTest:  Calling: addWindowListener(new WindowAdapter()
#SimpleTest:  Calling: canvas.addGLEventListener(new SimpleTest())
#SimpleTest:  Calling: frame.add(canvas)
#SimpleTest:  Calling: frame.setVisible(true)
#SimpleTest:  Calling: e.getWindow().dispose()
#SimpleTest:  Calling: System.exit(0)

Tested on Linux w/ NV and Mesa driver and latest aggregated build.

GL_RENDERER   = Mesa DRI Intel(R) Sandybridge Desktop 
GL_VERSION    = 3.0 Mesa 9.0
GL_VENDOR     = Intel Open Source Technology Center

No crash.

Note that AWT windowClosing() on a Frame
may only need frame.dispose() being called.

However, Julien has experienced a System.exit() might be 
required for JNLP Webstart - I don't know.

We test this w/  'TestAWT02WindowClosing'
in our unit tests.

Regardless whether System.exit() is recommended 
to be called or not - a crash shall not happen.
Maybe I need to run the test w/ an older Mesa driver.
Comment 12 Sven Gothel 2013-03-20 10:04:42 CET
Ok, tested on a vbox debian7:

GL_RENDERER   = Gallium 0.4 on llvmpipe (LLVM 0x209)
GL_VERSION    = 2.1 Mesa 8.0.5
GL_VENDOR     = VMware, Inc.

No crash w/ tip d9d0abc0974f42dbc1c7c91bc004b92729b8ae8a

http://jogamp.org/deployment/archive/master/gluegen_652-joal_413-jogl_936-jocl_761/

Please retry .. it's odd that the test behave so different.
Comment 13 Sven Gothel 2013-03-20 10:10:24 CET
-> normal, since not reproducible here
Comment 14 Martin Hegedus 2013-03-21 19:55:46 CET
I've test this some more and will shortly upload the latest SimpleTest program.  I have still not been able to resolve this.  However, here are some things I've noticed.

1)  The error is occurring the second time glXDestroyContext is called.  Therefore if two frames are created and disposed of, the error will occur on the second call to frame.dispose().
2)  Shared resource uses the SharedResourceRunner thread to access the glX routines.  This may cause a problem if all gl calls should be on the AWT-EventQueue thread.
3)  GLProfile.getDefault() and GLProfile.shutdown() can not be called from the AWT thread if the glx calls are being forced onto the AWT thread because it will block.  The thread chain would be AWT-EventQueue-0 -> SharedResourceRunner -> AWT-EventQueue-0
4)  If a Threading class method is called before GLProfile.getDefault() then a working thread is created for the gl calls instead of using the awt thread.  This is because the static routine in ThreadingImpl calls GLProfile.isAWTAvailable().  This is not necessarily a bug, just something to be aware of.
5)  For my system, it seems that calling glProfile.shutdown() before System.exit(0) works the best.  For some reason if glProfile.shutdown() is called by the system exit hook then things occasionally hang.
6)  If I create and destroy two frames rather than one, then both the older and never version of jogl crash the JVM.
7)  If I don't allow GLCanvas's display method to be called then the program terminates OK.  It seems the act of calling makeCurrent and/or release inside GLDrawableHelper.invokeGLImpl sets up the issue for glXDestoryContext to crash.  I gather that is because makeCurrent creates the context the first time it is called.

I'll try digging around some more and updating Mesa.
Comment 15 Martin Hegedus 2013-03-21 20:00:43 CET
Created attachment 434 [details]
Version 2 of SimplTest

This version will crash the older and newer versions of jogl because two frames are being created and disposed.
Comment 16 Martin Hegedus 2013-03-28 19:29:08 CET
I was unable to dedicate the time and risk to updating my Mesa since the OS for the computer is too old.

So I dug around some more in jogl

Overall, the error revolves around using XCloseDisplay and glXDestroyContext in conjunction.  If I set markAllDisplaysUnclosable to true in X11Util then the JVM does not crash.  Or, if glXDestroyContext is not called then the JVM does not crash.  Looking at the order these methods are called, things seem OK.  In other words, glXDestroyContext is always called before XCloseDisplay.  And they are always done on the same thread.  Except to say that the shared resource is working on the main-SharedResourceRunner thread and frames are working on the AWT-EventQueue-0 thread.

Any thoughts on how to proceed?  Is there a simple test code which tests this?  Is there a recommended way of setting markAllDisplaysUnclosable to true?

I looked at the TestAWT02WindowClosing program, but it does not create and destroy contexts.
Comment 17 Martin Hegedus 2013-03-28 20:04:58 CET
Oh, I do want to mention that I did compile displayMultiple02.c and it seems to execute successfully.
Comment 18 Sven Gothel 2013-03-28 21:15:20 CET
(In reply to comment #17)
> Oh, I do want to mention that I did compile displayMultiple02.c and it seems
> to execute successfully.

I see - you followed my path of horrors of buggy GLX impl. regarding to multithreading :)

So:
  GL_VENDOR      Brian Paul
  GL_RENDERER    Mesa X11
  GL_VERSION     2.1 Mesa 7.0

is buggy like AMD's GLX hack - you claim ?

Since you are already pretty familiar w/ our stuff,
maybe you like to add such quirk into GLRendererQuirks ?

Maybe 'DontCloseX11DisplayConnection' 
to be enabled w/ AMD as detected in X11Util and Mesa 7.0 ?

Now we would need to know (or guess) the Mesa version range for this.
Yes, this comes pretty late (X11 display connection already created),
however .. hopefully its soon enough at least when kicked off by the 
shared-resource-runner who queries all profiles.

Since this seems to be a bug or related to such an old Mesa version,
I would like to keep this shortcut .. short :) .. i.e. not loosing too much time on it.
Can you do it ?
Comment 19 Martin Hegedus 2013-03-28 21:47:23 CET
There seems to be more to this.

I changed displayMultiple02 so it uses glXCreateNewContext instead of glXCreateContext and it now crashes.  So the connection seems involve XOpenDisplay, XCloseDisplay, glxCreateNewContext, glxMakeCurrent, and glXDestroyContext.

Then I took the second XOpenDisplay out or the original displayMultiple02 (i.e. I only open one window in displayMultiple02's testOrder) and I get the following X11 error the second time testOrder is called:

X Error of failed request:  BadRequest (invalid request code or no such operation)
  Major opcode of failed request:  0 ()
  Serial number of failed request:  15
  Current serial number in output stream:  15


Is it confirmed that the following can be done?
disp1 = XOpenDisplay(NULL);
...do stuff..
XCloseDisplay(disp1);
disp1 = XOpenDisplay(NULL);
...do stuff..
XCloseDisplay(disp1);

I'll upload my modified version of displaymultiple02
Comment 20 Martin Hegedus 2013-03-28 21:59:50 CET
Created attachment 446 [details]
modified to create only one window

Modified so testOrder creates only one window.  This generates the following output


-xlock    (XLockDisplay): 0
Normal order: Create #1
Normal order: Destroy #1.0
Normal order: Destroy #1.X
Normal order: Success - no bug
Reverse order: Create #1
X Error of failed request:  BadRequest (invalid request code or no such operation)
  Major opcode of failed request:  0 ()
  Serial number of failed request:  15
  Current serial number in output stream:  15
Comment 21 Martin Hegedus 2013-03-28 23:31:44 CET
Created attachment 447 [details]
displayMultiple using glXCreateNewContext

This is the version of displayMultiple02 which uses glXCreateNewContext instead of glXCreateContext.  The output is ass follows:

-xlock    (XLockDisplay): 0
Normal order: Create #1
Normal order: Create #2
Normal order: Destroy #1.0
Normal order: Destroy #1.X
Normal order: Destroy #2.0
Normal order: Destroy #2.1
Normal order: Destroy #2.2
Segmentation fault
Comment 22 Martin Hegedus 2013-03-28 23:55:32 CET
Just added my version of displayMonitor02 which uses glXCreateNewContext.

In summary
displayMonitor02 which uses glXCreateContext for 2X2 windows works.
displayMonitor02_mch which uses glXCreateContext for 2x1 window gives X11 error
displayMonitor02_new_mch which uses glXCreateNewContext for 2x2 windows seg faults.

If I take out the calls to glXCreateContext, glXMakeCurrent, and glXDestroyContext in displayMonitor02_mch, it works.  I assume the same is true for displayMonitor02_new_mch.

If I comment out XCloseDisplay in any of the above programs, the program executes successfully.

In regards to JOGL.

Should I put in a DontCloseX11DisplayConnection that can be activated at the command line with a -D...?  I'm not sure if the error experienced here is a X11, Mesa, driver, or whatever issue.  So maybe a catch all is appropriate.
Comment 23 Sven Gothel 2013-04-05 01:42:01 CEST
(In reply to comment #22)
> 
> Should I put in a DontCloseX11DisplayConnection that can be activated at the
> command line with a -D...?  I'm not sure if the error experienced here is a
> X11, Mesa, driver, or whatever issue.  So maybe a catch all is appropriate.

Any updates about your patch, Martin ? I thought you wanted to offer a git patch 
or pull request ?
Comment 24 Martin Hegedus 2013-04-05 02:08:30 CEST
Sven,  sent you an email about that two days ago, after your suggestion about unified diff.  I wanted to try once more to push the changes onto a branch using git and got an error.  Didn't hear back from you so thought you were busy.  If this git push thing doesn't work I'll upload a unified git diff file.

Here's what I've done in regards to trying to push changes to git

>git clone git://jogamp.org/srv/scm/jogl.git jogl
>cd jogl
>git branch mesa7_quirk_closedpy
>git checkout mesa7_quirk_closedpy
...modify files...
>git config --global user.email "martin.hegedus@gmail.com"
>git commit -a -m "XDisplayClose mod"
[mesa7_quirk_closedpy eb948fa] XDisplayClose mod
 4 files changed, 121 insertions(+), 6 deletions(-)
>git branch
  master
* mesa7_quirk_closedpy
>git remote -v
origin  git://jogamp.org/srv/scm/jogl.git (fetch)
origin  git://jogamp.org/srv/scm/jogl.git (push)
>git push origin mesa7_quirk_closedpy
fatal: The remote end hung up unexpectedly
Comment 25 Martin Hegedus 2013-04-05 02:39:17 CEST
Created attachment 452 [details]
patch for xdisplayclose mods

Patch was created with:
git format-patch master

I made changes to GLContextImpl.java, GLProfile.java, X11Util.java, GLRendererQuirks.java in regards to adding a DontCloseX11DisplayConnection quirk and a nativewindow.debug.X11Util.HasX11CloseDisplayBug debug property.  The quirk is a little hacky since I had to add it after the calls to initProfilesForDevice(defaultEGLDevice) and initProfilesForDevice(defaultDesktopDevice) in GLProfile since GLRendererQuirks were not set up till after that.  I wanted to add it before the first call to XOpenDisplay but could not figure out how to access glGetString without going through an initialized GLContext.  Also, the workaround can be broken if the shared resource does not use the Mesa X11 driver but later a display is opened with the X11 driver.  I assume this could happen if a remote display is chosen.  Further coding, if the need arises, may be required.
Comment 26 Sven Gothel 2013-04-06 06:27:58 CEST
(In reply to comment #25)
> Created attachment 452 [details]
> patch for xdisplayclose mods
> 
> Patch was created with:
> git format-patch master
> 
> I made changes to GLContextImpl.java, GLProfile.java, X11Util.java,
> GLRendererQuirks.java in regards to adding a DontCloseX11DisplayConnection
> quirk and a nativewindow.debug.X11Util.HasX11CloseDisplayBug debug property.
> The quirk is a little hacky since I had to add it after the calls to
> initProfilesForDevice(defaultEGLDevice) and
> initProfilesForDevice(defaultDesktopDevice) in GLProfile since
> GLRendererQuirks were not set up till after that.  I wanted to add it before
> the first call to XOpenDisplay but could not figure out how to access
> glGetString without going through an initialized GLContext.  Also, the
> workaround can be broken if the shared resource does not use the Mesa X11
> driver but later a display is opened with the X11 driver.  I assume this
> could happen if a remote display is chosen.  Further coding, if the need
> arises, may be required.

Sorry, didn't get the patch file via email - bit now it's here and I will review later. Thank you!
Comment 27 Sven Gothel 2013-04-14 06:45:40 CEST
I guess we can move the 'early' quirk detection from GLProfile to the X11 SharedResourceImpl,
which will free GLProfile from such quirks ..

Other than that .. I will patch-in your changes:
  - w/o date+name in source code (this belongs to a git commit)
  - checking whether we can reuse other 'detection' code

I will put your name in the git commit message as orig. author, if that is OK w/ you.
I will also try to commit w/ you as author .. and clean up afterwards .. lets see.

Good job, hopefully I can do that tomorrow/Monday. Thank you.
Comment 28 Martin Hegedus 2013-04-14 06:55:12 CEST
Cool and thank you.  I am glad I was able to contribute.
Comment 29 Sven Gothel 2013-04-15 03:36:26 CEST
Tested on a vbox instance w/   [kubuntu 8.04.1] and confirmed your reported behavior:

Works: Mesa glx/dri drivers!
  GL_VENDOR      Mesa project: www.mesa3d.org
  GL_RENDERER    Mesa GLX Indirect
  GL_VERSION     1.4 (2.1 Mesa 7.0.3-rc2)

Crash: Mesa swx11 drivers!
  GL_VENDOR      Brian Paul
  GL_RENDERER    Mesa X11
  GL_VERSION     2.1 Mesa 7.0.3-rc2
Comment 30 Sven Gothel 2013-04-16 06:25:57 CEST
.. see git logs

Note:

Validated that Quirk DontCloseX11Display is not required w/ Mesa >= 8.0
    
Tested w/ kubuntu 12.04, Mesa 8.0.4 (glx/dri/gallium and X11 SW rasterizer)