Bug 480

Summary: etc/test.bat causes HotSpot crash on b271 and later (Win XP 32-bit, ATI)
Product: [JogAmp] Jogl Reporter: Wade Walker <wwalker3>
Component: openglAssignee: Sven Gothel <sgothel>
Status: VERIFIED FIXED    
Severity: enhancement CC: carlo.salinari, odimond
Priority: ---    
Version: 2   
Hardware: All   
OS: all   
Type: --- SCM Refs:
ab93183b90e83b9aebc29031c7b88b9a3dc58ff5
Workaround: ---
Attachments: etc\test.bat log file
HotSpot error file
etc\test.bat log file, this time with debug info and proper MIME type
HotSpot error file with proper MIME type
HotSpot error file for OneTriangleAWT test case from tutorials page

Description Wade Walker 2011-03-06 17:15:11 CET
Created attachment 234 [details]
etc\test.bat log file

All JOGL builds starting at b271 cause a HotSpot crash when I run etc\test.bat. I get similar results when trying to run unit tests.
Comment 1 Wade Walker 2011-03-06 17:17:19 CET
Created attachment 235 [details]
HotSpot error file

This is from Oracle JDK 1.6.0_u24
Comment 2 Wade Walker 2011-03-06 17:20:44 CET
The system is Win XP Pro SP3, 32-bit. The graphics card is ATI Mobility Radeon x300 at driver version 8.162.0.0 (8/3/2005), which is the latest version.
Comment 3 Wade Walker 2011-03-06 18:03:53 CET
Created attachment 236 [details]
etc\test.bat log file, this time with debug info and proper MIME type
Comment 4 Wade Walker 2011-03-06 18:04:50 CET
Created attachment 237 [details]
HotSpot error file with proper MIME type
Comment 5 Wade Walker 2011-03-09 16:11:16 CET
I noticed last night that when you turn on debug flags, the HotSpot crash goes away, and there's just an NPE in the debug log.
Comment 6 Wade Walker 2011-03-10 01:48:12 CET
This bug was caused by commit 8adc04788a6d9dd44de5a4636b46d14dbb70b799 (GLCapabilities enhancements: Choosing, All-Available, Data Handling (X11, WGL and EGL)). It's a huge commit that affects 42 files, so it may be difficult for me to find the cause by looking at the commit diffs :)

I check the commit right before it, and that one was fine. Commit 8adc04788a6d9dd44de5a4636b46d14dbb70b799 actually freezes, then when I apply the next commit (Fix WindowsDummyWGLDrawable: onscreen && !pbuffer, a one-line fix) I see the HotSpot crash.
Comment 7 Wade Walker 2011-03-13 00:08:07 CET
This looks like a threading problem. When JOGL calls a native WGL function like wglGetPixelFormatAttribivARB, it has the correct address, and calls the function properly. But the first thing the function in the ATI DLL does is try to get another function pointer from a jump table like this:

69346b70:   mov %fs:0xbf0,%eax
69346b76:   mov 0x431f8(%eax),%eax
69346b7c:   jmp *0x758(%eax)

%fs:0xbf0 is an offset inside the Thread Information Block (TIB). The range from 0x714 to 0xbf4 is reserved for GL. After the first instruction, %eax is zero, which implies that GL isn't set up properly on this thread (otherwise, there would be some memory address stored in %fs:0xbf0).

I'll keep looking to try to find the root cause.
Comment 8 Wade Walker 2011-03-15 01:53:20 CET
It turns out the problem is because a GL context is not current on the thread. Apparently wglMakeCurrent() is what sets the data into the reserved GL area of the Thread Information Block, which other GL functions rely on later. Having this data be zero is what causes the crash.

When I insert a call to GLProfile.initSingleton( false ) and turn on debugging flags, it makes a GL context current on the thread as a side effect, and this crash goes away.

This problem also shows up in AWT programs that call awt.GLCanvas.chooseGraphicsConfiguration() indirectly as a result of calling frame.setVisible( true ). I'll attach a stack trace for that error too.
Comment 9 Wade Walker 2011-03-15 01:56:17 CET
Created attachment 242 [details]
HotSpot error file for OneTriangleAWT test case from tutorials page

Same error, different path down to a WGL function with no GL context set on the main thread.
Comment 10 Sven Gothel 2011-03-20 18:52:22 CET
*** Bug 469 has been marked as a duplicate of this bug. ***
Comment 11 Sven Gothel 2011-03-21 07:17:36 CET
commit ab93183b90e83b9aebc29031c7b88b9a3dc58ff5
Author: Sven Gothel <sgothel@jausoft.com>
Date:   Mon Mar 21 07:13:45 2011 +0100

    Fix Bug #480 (attempt) - ATI + WinXP: make context current for ARB PFD queries/selection
    
    TODO: Validate if bug is actually relates to the 'old' ATI Windows driver for old GPU's like X300 etc
    and unrelated to the actual Windows version !
    
    Also ensure that the no pixelformat is being set on external context/HDC.

http://jogamp.org/git/?p=jogl.git;a=commit;h=ab93183b90e83b9aebc29031c7b88b9a3dc58ff5

Thanks to Wade for allowing me to use his machine - plus his bug triage!

Maybe we need to verify is this bug happens on other windows version > WinXP-32bit
and/or is related to a specific range of ATI Windows drivers/GPUs!
Comment 12 Carlo Salinari 2011-03-21 17:51:40 CET
I've just revived an old dell with an ATI X600 (RV370). Does it qualify to verify the fix? How should I proceed?
Comment 13 Wade Walker 2011-03-21 20:57:40 CET
I think your old system should be great to verify this fix. All you need to do is download the latest development build from http://jogamp.org/deployment/autobuilds/master/ (check with the build server at https://jogamp.org/chuck/ to make sure Sven's change went in).

Once you download the build, unzip it, cd into the dir above etc, jar, and lib, then type etc\test.bat and see if you get a crash :)

Also, please check the output of test.bat (it goes into a log file as well as to the stdout) to make sure it reports your card correctly (I've seen it report Microsoft's GL driver sometimes, but haven't had time to recheck after this fix yet).
Comment 14 Carlo Salinari 2011-03-22 17:54:26 CET
Verified, works fine under both Win XP 32 and Win 7 32.

GL_VENDOR is ok. I was seeing Microsoft as well, because I run the test before installing the ATI drivers, and the test output is appended to the log file. Maybe the same happened to you.