Bug 1192

Summary: JamVM crash in Jam_CallVoidMethod called from Java_jogamp_newt_driver_x11_DisplayDriver_DispatchMessages0
Product: [JogAmp] Newt Reporter: Xerxes Rånby <xerxes>
Component: x11Assignee: Sven Gothel <sgothel>
Status: RESOLVED INVALID    
Severity: normal CC: xerxes
Priority: ---    
Version: 2.3.2   
Hardware: embedded_arm   
OS: linux   
Type: --- SCM Refs:
576ba0b8334dae8ab8b0b2686422d450c7f360da
Workaround: ---
Attachments: jamvm --enable-tracedll log
not-us.log

Description Xerxes Rånby 2015-08-11 16:18:59 CEST
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-1~deb7u1+rpi1)
JamVM (build 2.0.0, inline-threaded interpreter with stack-caching)

Hardware: Quad Core ARMv7 Raspberry Pi 2
OS: Rasbian with /opt/vc GPU driver

Reproducer:
startx
#open a terminal and run
wget http://labb.zafena.se/jogamp/vc4/libgdx-demo-pax-britannica-1.6.3-armhf.jar
java -jamvm -jar libgdx-demo-pax-britannica-1.6.3-armhf.jar 

(gdb) bt
#0  0x76d973b8 in executeJava () at interp.c:2416
#1  0x76d7e0a8 in executeMethodVaList (ob=ob@entry=0x68a3f878, 
    class=0x6877d2c8, mb=0x1718a58, jargs=...) at execute.c:129
#2  0x76d7f39c in Jam_CallVoidMethod (env=<optimized out>, 
    obj=<optimized out>, methodID=0x1718a58) at jni.c:1184
#3  0x5fa4ab20 in Java_jogamp_newt_driver_x11_DisplayDriver_DispatchMessages0
    ()
   from /tmp/jogamp_0001/file_cache/jln6586747877065205123/jln1540222421010353978/libnewt.so
#4  0x76da198c in done () at callNativeEABIHardARM.S:231
#5  0x76da198c in done () at callNativeEABIHardARM.S:231

This bug appears to be confined to using JamVM, for example running using Zero do not reproduce the bug.

The bug do not trigger all the time, thus the testcase may run using jamvm.
Usually the bug triggers early before the main window is displayed.

However many of the JNI signatures in the comments of the src/newt/native/X11Display.c is wrong.
This find calls for a thorough investigation if all the arguments to the many JNI methods called from the DispatchMessages0 uses the right argument types. Using the wrong type may cause Java -> JNI or JNI -> Java truncation depending on the JVM implementation.
Comment 1 Xerxes Rånby 2015-08-11 16:52:18 CEST
One test run emitted this interesting ArrayIndexOutOfBoundsException
with jogamp.nativewindow.jawt.JAWTUtil in the backtrace.

pi@raspberrypi ~ $ java -jamvm -jar libgdx-demo-pax-britannica-1.6.3-armhf.jar 
java.lang.ArrayIndexOutOfBoundsException: 692
	at sun.awt.XSettings$Update.getCARD8(XSettings.java:140)
	at sun.awt.XSettings$Update.updateOne(XSettings.java:249)
	at sun.awt.XSettings$Update.update(XSettings.java:229)
	at sun.awt.XSettings.update(XSettings.java:56)
	at sun.awt.X11.XAWTXSettings.getUpdatedSettings(XAWTXSettings.java:140)
	at sun.awt.X11.XAWTXSettings.initPerScreenXSettings(XAWTXSettings.java:96)
	at sun.awt.X11.XAWTXSettings.initXSettings(XAWTXSettings.java:61)
	at sun.awt.X11.XAWTXSettings.<init>(XAWTXSettings.java:53)
	at sun.awt.X11.XToolkit.loadXSettings(XToolkit.java:1504)
	at sun.awt.X11.XToolkit.initXSettingsIfNeeded(XToolkit.java:1475)
	at sun.awt.X11.XToolkit.lazilyLoadDesktopProperty(XToolkit.java:1445)
	at java.awt.Toolkit.getDesktopProperty(Toolkit.java:1826)
	at jogamp.nativewindow.jawt.JAWTUtil$3.run(JAWTUtil.java:433)
	at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:302)
	at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:745)
	at java.awt.EventQueue.access$300(EventQueue.java:103)
	at java.awt.EventQueue$3.run(EventQueue.java:706)
	at java.awt.EventQueue$3.run(EventQueue.java:704)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
	at java.awt.EventQueue.dispatchEvent(EventQueue.java:715)
	at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242)
	at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161)
	at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:146)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:138)
	at java.awt.EventDispatchThread.run(EventDispatchThread.java:91)
glGetError 0x500
glGetError 0x500
glGetError 0x500
Segmentation fault

gdb java core
warning: Could not load shared library symbols for 7 libraries, e.g. /tmp/jogamp_0001/file_cache/jln6586747877065205123/jln1540222421010353978/libgluegen-rt.so.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
Core was generated by `java -jamvm -jar libgdx-demo-pax-britannica-1.6.3-armhf.jar'.
Program terminated with signal 11, Segmentation fault.
#0  0x76d973b8 in executeJava () at interp.c:2416
2416	interp.c: No such file or directory.
(gdb) bt
#0  0x76d973b8 in executeJava () at interp.c:2416
#1  0x76d7e0a8 in executeMethodVaList (ob=ob@entry=0x68a3f878, 
    class=0x6877d2c8, mb=0x1718a58, jargs=...) at execute.c:129
#2  0x76d7f39c in Jam_CallVoidMethod (env=<optimized out>, 
    obj=<optimized out>, methodID=0x1718a58) at jni.c:1184
#3  0x5fa4ab20 in ?? ()
#4  0x5fa4ab20 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Comment 2 Sven Gothel 2015-08-11 17:18:58 CEST
(In reply to comment #0)
> However many of the JNI signatures in the comments of the
> src/newt/native/X11Display.c is wrong.

Yes .. that reminds me of a security critical code review,
where one said: don't you ever add comments!

> This find calls for a thorough investigation if all the arguments to the
> many JNI methods called from the DispatchMessages0 uses the right argument
> types. Using the wrong type may cause Java -> JNI or JNI -> Java truncation
> depending on the JVM implementation.

The JNI methods themselves are validated by the compiler,
since they are compared to the javah generated headers. Check.

The JNI -> Java method callbacks may be an issue here,
since they use anonymous var-args. 
We had an issue w/ those already, see Bug 1166.

I don't know how we can validate those automatically.

Further, I don't know why such mistakes won't break i586 machines,
which are being tested w/ our jenkins setup.

We shall consider adding an arm32 jenkins node
only using reliable drivers, i.e. software rendering and such
w/o binary blobs.

(all .. as discussed in IIRC)
Comment 3 Sven Gothel 2015-08-11 17:20:09 CEST
> glGetError 0x500
> Segmentation fault
> 
> gdb java core
> warning: Could not load shared library symbols for 7 libraries, e.g.
> /tmp/jogamp_0001/file_cache/jln6586747877065205123/jln1540222421010353978/
> libgluegen-rt.so.
> Use the "info sharedlibrary" command to see the complete listing.
> Do you need "set solib-search-path" or "set sysroot"?
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
> Core was generated by `java -jamvm -jar
> libgdx-demo-pax-britannica-1.6.3-armhf.jar'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x76d973b8 in executeJava () at interp.c:2416
> 2416	interp.c: No such file or directory.
> (gdb) bt
> #0  0x76d973b8 in executeJava () at interp.c:2416
> #1  0x76d7e0a8 in executeMethodVaList (ob=ob@entry=0x68a3f878, 
>     class=0x6877d2c8, mb=0x1718a58, jargs=...) at execute.c:129
> #2  0x76d7f39c in Jam_CallVoidMethod (env=<optimized out>, 
>     obj=<optimized out>, methodID=0x1718a58) at jni.c:1184
> #3  0x5fa4ab20 in ?? ()
> #4  0x5fa4ab20 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)


-> methodID=0x1718a58

We may need to add DBG_PRINTF for all our method-ids ..
Comment 4 Sven Gothel 2015-08-11 17:24:23 CEST
(In reply to comment #2)
> We shall consider adding an arm32 jenkins node
> only using reliable drivers, i.e. software rendering and such
> w/o binary blobs.

.. and a 'good known' JVM of course.

In short: testing w/ good known, working components
to validate our code base only.
Comment 5 Sven Gothel 2015-08-11 17:34:42 CEST
(In reply to comment #3)
> > (gdb) bt
> > #0  0x76d973b8 in executeJava () at interp.c:2416
> > #1  0x76d7e0a8 in executeMethodVaList (ob=ob@entry=0x68a3f878, 
> >     class=0x6877d2c8, mb=0x1718a58, jargs=...) at execute.c:129
> > #2  0x76d7f39c in Jam_CallVoidMethod (env=<optimized out>, 
> >     obj=<optimized out>, methodID=0x1718a58) at jni.c:1184
> > #3  0x5fa4ab20 in ?? ()
> > #4  0x5fa4ab20 in ?? ()
> > Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> 
> -> methodID=0x1718a58
> 
> We may need to add DBG_PRINTF for all our method-ids ..

Also .. you might want to use a debug build, i.e. not stripping symbols:
    -Dc.compiler.debug=true \
    -Djavacdebuglevel="source,lines,vars" \

This should disclose at least:
> > #3  0x5fa4ab20 in ?? ()
Comment 6 Xerxes Rånby 2015-08-13 17:13:21 CEST
Created attachment 718 [details]
jamvm --enable-tracedll log

Recompiled and used javmvm with --enable-tracedll
trace log:
<DLL: Looking up dispatch_eglChooseConfig0 internally>
<DLL: Calling JNI method com/jogamp/opengl/egl/EGL.dispatch_eglChooseConfig0(JLjava/lang/Object;ILjava/lang/Object;IILjava/lang/Object;IJ)Z>
<DLL: Looking up dispatch_eglGetConfigAttributes internally>
<DLL: Calling JNI method com/jogamp/opengl/egl/EGL.dispatch_eglGetConfigAttributes(JJILjava/lang/Object;ILjava/lang/Object;IJ)V>
<DLL: Looking up dispatch_eglGetConfigAttrib0 internally>
<DLL: Calling JNI method

indicate that the crash can triggers early after EGL
dispatch_eglGetConfigAttrib0

it appears the the heap is corrupt
below shows that a invalid pointer crash in interp.c:2477
that happened after jamvm is returning from a
void run() method
the pointer to the old frame on the heap appears to have been overwritten.


Core was generated by `java -jam -jar ../libgdx-demo-pax-britannica-1.6.3-armhf.jar'.
Program terminated with signal 11, Segmentation fault.
#0  executeJava () at interp.c:2477
2477	    if(frame->mb == NULL) {
(gdb) list
2472	
2473	methodReturn:
2474	    /* Set interpreter state to previous frame */
2475	    frame = frame->prev;
2476	
2477	    if(frame->mb == NULL) {
2478	        /* The previous frame is a dummy frame - this indicates
2479	           top of this Java invocation. */
2480	        return ostack;
2481	    }
(gdb) p frame
$1 = (Frame *) 0x1

(gdb) frame 1
#1  0x76d3789c in executeMethodVaList (ob=0x68a32f98, class=0x67cc57c8, mb=0xc7ac78, jargs=...) at execute.c:129
129	        executeJava();

(gdb) p mb->name
$1 = 0x76d647ac "run"
(gdb) p mb->type 
$2 = 0x76d651e0 "()V"

(gdb) p class
$3 = (Class *) 0x67cc57c8
(gdb) p class->
class  lock   
(gdb) p class->lock 
$4 = 0
(gdb) p class->class 
$5 = (Class *) 0x67cc20e0
Comment 7 Xerxes Rånby 2015-08-13 17:58:04 CEST
managed to capture the crash in gdb, this time we got the full method signature

<DLL: Looking up dispatch_eglGetConfigAttrib0 internally>
<DLL: Calling JNI method com/jogamp/opengl/egl/EGL.dispatch_eglGetConfigAttrib0(JJILjava/lang/Object;IJ)Z>

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x6287c470 (LWP 17709)]
0x630dd848 in ?? ()
(gdb) bt
#0  0x630dd848 in ?? ()
#1  0x76ec1e5c in __GI___libc_free (mem=<optimized out>) at malloc.c:3738
#2  0x67d76bc8 in ?? ()
#3  0x67d76bc8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) list
1	arena.c: No such file or directory.

arena.c is part of glibc malloc
Comment 8 Xerxes Rånby 2015-08-13 18:39:14 CEST
Created attachment 719 [details]
not-us.log

when re-running the testcase several times in gdb
i managed to hit a classloader error using jamvm
even before loading any jogl classes.

thus something starts to malfunction before using any jogamp classes or natives.
Comment 9 Xerxes Rånby 2015-08-13 18:40:20 CEST
closing the bug as invalid since we can observe malfunction before any jogamp classes or natives has been loaded.
Comment 10 Xerxes Rånby 2015-08-13 19:00:53 CEST
branch to merge: https://github.com/xranby/jogl/tree/Bug1192

Bug 1192: NEWT native/X11Display.c Cleanup JNI signatures in comments
Comment 11 Sven Gothel 2015-08-18 03:41:50 CEST
576ba0b8334dae8ab8b0b2686422d450c7f360da
 NEWT native/X11Display.c Cleanup JNI signatures in comments