Bug 1166 - JVMUtil_NewDirectByteBufferCopy corrupts the heap commonly seen during GLX initialization on ARM 32
: JVMUtil_NewDirectByteBufferCopy corrupts the heap commonly seen during GLX in...
Status: RESOLVED FIXED
: None
: Gluegen
: JogAmp
: core (show other bugs)
: 2.3.2
: All all
: P3 normal
: Sven Gothel
:
:
:
 
Reported: 2015-06-28 00:22 CEST by Xerxes Rånby
Modified: 2015-09-27 01:30 CEST (History)
0 users

:
Type: DEFECT
SCM Refs:
e424c28f869269f5a22c22ef017230346b22847a f6a5ac4473135bbc4bc1a5f537e060df45eb4824 6ecc869eea932ac77dd6d4604eb205a8a659f83d a3701528aa4be01924c983ce74e2efeaba0e58bc
Workaround: ---


Attachments
ODROID-C runtime version check test.log and debug test_dbg.log (98.62 KB, application/gzip)
2015-06-28 00:24 CEST, Xerxes Rånby
Details
hotspot crashlog on ODROID-C (58.39 KB, text/plain)
2015-06-28 03:31 CEST, Xerxes Rånby
Details
jogl-debug-of-glx-CustomJavaCode-glx-CustomCCode.patch (4.22 KB, patch)
2015-07-07 17:33 CEST, Xerxes Rånby
Details | Diff
patch to fix 1166 according to comment 11 solution B: commit-842ead6-Fix-B.patch (1.29 KB, patch)
2015-07-08 15:43 CEST, Xerxes Rånby
Details | Diff
patch to fix 1166 according to solution C: commit-e424c28-Fix-C.patch (1.72 KB, patch)
2015-07-08 22:20 CEST, Xerxes Rånby
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Xerxes Rånby 2015-06-28 00:22:23 CEST
JOGL can crash by continuing to initialize a display using libGL X11 depite that mesa3d failed to initialize a display on the GPU.

back-traces generated using gdb seen using two ODROID single board ARM systems. both systems uses mali GPU's that mesa3d do not have drivers for.
on both systems jogl crashed when it tries to use the display using xrander/glx after mesa3d failed to initialize the GPU using libGL.

Ubuntu 14.04 on a factory installed ODROID-C Rev 3:
libGL error: MESA-LOADER: malformed or no PCI ID
libGL error: dlopen /usr/lib/arm-linux-gnueabihf/dri/mali_drm_dri.so failed (/usr/lib/arm-linux-gnueabihf/dri/mali_drm_dri.so: cannot open shared object file: No such file or directory)
libGL error: dlopen ${ORIGIN}/dri/mali_drm_dri.so failed (${ORIGIN}/dri/mali_drm_dri.so: cannot open shared object file: No such file or directory)
libGL error: dlopen /usr/lib/dri/mali_drm_dri.so failed (/usr/lib/dri/mali_drm_dri.so: cannot open shared object file: No such file or directory)
libGL error: unable to load driver: mali_drm_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: mali_drm
# A fatal error has been detected by the Java Runtime Environment:
..
# V  [libjvm.so+0x343158]  Klass::search_secondary_supers(Klass*) const+0x10
...
#3  0xa4ccd36c in Java_jogamp_nativewindow_x11_X11Lib_XRenderFindVisualFormat1 ()
   from /tmp/jogamp_0000/file_cache/jln4489495869428337977/jln7893739557114247018/libnativewindow_x11.so 


Ubuntu 14.04 on a fresh ODROID-XU3 Lite:
libGL error: MESA-LOADER: malformed or no PCI ID
libGL error: dlopen /usr/lib/arm-linux-gnueabihf/dri/exynos_dri.so failed (/usr/lib/arm-linux-gnueabihf/dri/exynos_dri.so: cannot open shared object file: No such file or directory)
libGL error: dlopen ${ORIGIN}/dri/exynos_dri.so failed (${ORIGIN}/dri/exynos_dri.so: cannot open shared object file: No such file or directory)
libGL error: dlopen /usr/lib/dri/exynos_dri.so failed (/usr/lib/dri/exynos_dri.so: cannot open shared object file: No such file or directory)
libGL error: unable to load driver: exynos_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: exynos
Backtrace:
(gdb) bt
*** Error in `/usr/bin/java': free(): corrupted unsorted chunks: 0xa180e7f8 *** 
#4  0xb6ee7022 in malloc_printerr (action=1, str=0xb6f61a9c "free(): corrupted unsorted chunks", ptr=<optimized out>) at malloc.c:4996
#5  0xb6ee7a48 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#6  0xa27429d6 in XFree () from /usr/lib/arm-linux-gnueabihf/libX11.so.6
#7  0xa1fac368 in Java_jogamp_opengl_x11_glx_GLX_dispatch_1glXChooseVisual ()
   from /tmp/jogamp_0000/file_cache/jln4091126271820588169/jln3781205514237873832/libjogl_desktop.so

attached ODROID-C test_dbg.log
Comment 1 Xerxes Rånby 2015-06-28 00:24:27 CEST
Created attachment 699 [details]
ODROID-C runtime version check test.log and debug test_dbg.log
Comment 2 Xerxes Rånby 2015-06-28 03:22:25 CEST
complete backtrace for the ODROID-C

(gdb) bt             
#0  0xb6815158 in Klass::search_secondary_supers(Klass*) const ()
   from /usr/lib/jvm/java-8-oracle/jre/lib/arm/client/libjvm.so
#1  0xb6794444 in jni_IsInstanceOf ()
   from /usr/lib/jvm/java-8-oracle/jre/lib/arm/client/libjvm.so
#2  0xb6798250 in jni_GetDirectBufferAddress ()
   from /usr/lib/jvm/java-8-oracle/jre/lib/arm/client/libjvm.so
#3  0xa4ccd36c in Java_jogamp_nativewindow_x11_X11Lib_XRenderFindVisualFormat1
    ()
   from /tmp/jogamp_0000/file_cache/jln2618029001281501706/jln7934769121838554378/libnativewindow_x11.so

line 313 of jogl/src/nativewindow/native/x11/Xmisc.c reveal that the data for jobject xRenderPictFormat is corrupt, the jobject passed is *something else* than a java.nio.Buffer!
https://github.com/sgothel/jogl/blob/master/src/nativewindow/native/x11/Xmisc.c#L313
Comment 3 Xerxes Rånby 2015-06-28 03:31:31 CEST
Created attachment 700 [details]
hotspot crashlog on ODROID-C

the crash log indicate that the oracle jvm shipped with the odroid is incompatible with the java 8 jni spec.
http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetDirectBufferAddress
GetDirectBufferAddress is supposed to return NULL and not crash if the given object is not a direct java.nio.Buffer.
Comment 4 Xerxes Rånby 2015-06-28 03:48:06 CEST
running using openjdk 7 instead of jdk 8 on the ODROID-C reproduces the same crash reported for ODROID-XU3 

odroid@odroid:~$ gdb --args /usr/lib/jvm/java-7-openjdk-armhf/bin/java -jar libgdx-demo-pax-britannica-1.6.3-SNAPSHOT.jar 
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
...
[New Thread 0x93c06b40 (LWP 21189)]
libGL error: MESA-LOADER: malformed or no PCI ID
libGL error: dlopen /usr/lib/arm-linux-gnueabihf/dri/mali_drm_dri.so failed (/usr/lib/arm-linux-gnueabihf/dri/mali_drm_dri.so: cannot open shared object file: No such file or directory)
libGL error: dlopen ${ORIGIN}/dri/mali_drm_dri.so failed (${ORIGIN}/dri/mali_drm_dri.so: cannot open shared object file: No such file or directory)
libGL error: dlopen /usr/lib/dri/mali_drm_dri.so failed (/usr/lib/dri/mali_drm_dri.so: cannot open shared object file: No such file or directory)
libGL error: unable to load driver: mali_drm_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: mali_drm
*** Error in `/usr/lib/jvm/java-7-openjdk-armhf/bin/java': corrupted double-linked list: 0x94822d18 ***

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x929bf460 (LWP 21186)]
__libc_do_syscall ()
    at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44
44	../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory.
(gdb) bt
#0  __libc_do_syscall ()
    at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44
#1  0xb6eecf0e in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#2  0xb6eef766 in __GI_abort () at abort.c:89
#3  0xb6f13474 in __libc_message (do_abort=<optimized out>, 
    fmt=0xb6f94904 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
#4  0xb6f1a022 in malloc_printerr (action=1, 
    str=0xb6f94948 "corrupted double-linked list", ptr=<optimized out>)
    at malloc.c:4996
#5  0xb6f1ad16 in _int_free (av=0x94800010, p=<optimized out>, have_lock=0)
    at malloc.c:3996
#6  0x9329d9d6 in XFree () from /usr/lib/arm-linux-gnueabihf/libX11.so.6
#7  0x922f0ad4 in Java_jogamp_opengl_x11_glx_GLX_dispatch_1glXChooseFBConfig
    ()
   from /tmp/jogamp_0000/file_cache/jln5089986396162481917/jln1759574444477786384/libjogl_desktop.so
#8  0xb6bd6be0 in .fast_no_args ()
   from /usr/lib/jvm/java-7-openjdk-armhf/jre/lib/arm/server/libjvm.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
Comment 5 Xerxes Rånby 2015-06-28 04:07:58 CEST
(In reply to comment #4)
> running using openjdk 7 instead of jdk 8 on the ODROID-C reproduces the same
> crash reported for ODROID-XU3 
> 

actually not the same backtrace compared to the ODROID-XU3
ODROID-C  took the path using Java_jogamp_opengl_x11_glx_GLX_dispatch_1glXChooseFBConfig

ODROID-XU3 took the path using Java_jogamp_opengl_x11_glx_GLX_dispatch_1glXChooseVisual

..
> #4  0xb6f1a022 in malloc_printerr (action=1, 
>     str=0xb6f94948 "corrupted double-linked list", ptr=<optimized out>)
>     at malloc.c:4996
> #5  0xb6f1ad16 in _int_free (av=0x94800010, p=<optimized out>, have_lock=0)
>     at malloc.c:3996
> #6  0x9329d9d6 in XFree () from /usr/lib/arm-linux-gnueabihf/libX11.so.6
> #7  0x922f0ad4 in Java_jogamp_opengl_x11_glx_GLX_dispatch_1glXChooseFBConfig
>     ()
>    from
> /tmp/jogamp_0000/file_cache/jln5089986396162481917/jln1759574444477786384/
> libjogl_desktop.so
> #8  0xb6bd6be0 in .fast_no_args ()
>    from /usr/lib/jvm/java-7-openjdk-armhf/jre/lib/arm/server/libjvm.so
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb)

the sourcecode for Java_jogamp_opengl_x11_glx_GLX_dispatch_1glXChooseFBConfig is at:
https://github.com/sgothel/jogl/blob/master/make/config/jogl/glx-CustomCCode.c#L70
Comment 6 Sven Gothel 2015-07-03 11:30:47 CEST
(In reply to comment #4)
> [New Thread 0x93c06b40 (LWP 21189)]
> libGL error: MESA-LOADER: malformed or no PCI ID
> libGL error: dlopen /usr/lib/arm-linux-gnueabihf/dri/mali_drm_dri.so failed
> (/usr/lib/arm-linux-gnueabihf/dri/mali_drm_dri.so: cannot open shared object
> file: No such file or directory)
> libGL error: dlopen ${ORIGIN}/dri/mali_drm_dri.so failed
> (${ORIGIN}/dri/mali_drm_dri.so: cannot open shared object file: No such file
> or directory)
> libGL error: dlopen /usr/lib/dri/mali_drm_dri.so failed
> (/usr/lib/dri/mali_drm_dri.so: cannot open shared object file: No such file
> or directory)
> libGL error: unable to load driver: mali_drm_dri.so
> libGL error: driver pointer missing
> libGL error: failed to load driver: mali_drm
> *** Error in `/usr/lib/jvm/java-7-openjdk-armhf/bin/java': corrupted
> double-linked list: 0x94822d18 ***
> 
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0x929bf460 (LWP 21186)]
> __libc_do_syscall ()
>     at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44
> 44	../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or
> directory.
> (gdb) bt
> #0  __libc_do_syscall ()
>     at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44
> #1  0xb6eecf0e in __GI_raise (sig=sig@entry=6)
>     at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #2  0xb6eef766 in __GI_abort () at abort.c:89
> #3  0xb6f13474 in __libc_message (do_abort=<optimized out>, 
>     fmt=0xb6f94904 "*** Error in `%s': %s: 0x%s ***\n")
>     at ../sysdeps/posix/libc_fatal.c:175
> #4  0xb6f1a022 in malloc_printerr (action=1, 
>     str=0xb6f94948 "corrupted double-linked list", ptr=<optimized out>)
>     at malloc.c:4996
> #5  0xb6f1ad16 in _int_free (av=0x94800010, p=<optimized out>, have_lock=0)
>     at malloc.c:3996
> #6  0x9329d9d6 in XFree () from /usr/lib/arm-linux-gnueabihf/libX11.so.6
> #7  0x922f0ad4 in Java_jogamp_opengl_x11_glx_GLX_dispatch_1glXChooseFBConfig
>     ()
>    from
> /tmp/jogamp_0000/file_cache/jln5089986396162481917/jln1759574444477786384/
> libjogl_desktop.so
> #8  0xb6bd6be0 in .fast_no_args ()
>    from /usr/lib/jvm/java-7-openjdk-armhf/jre/lib/arm/server/libjvm.so
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb)

[1] Mesa fails to init due to failing to load mali-drm, then 
[2] Java_jogamp_opengl_x11_glx_GLX_dispatch_1glXChooseFBConfig 
    crashes within XFree(_res)

    _res = glXChooseFBConfig(...);
    ...
    if (NULL == _res) return NULL;
    ...
    *** XFree(_res); *** crash ***

Hence this must be a driver issue .. or corrupt heap, etc.
Comment 7 Sven Gothel 2015-07-03 11:39:55 CEST
(11:27:15 AM) sgothel: seems like the result is != NULL .. and XFree crashes
(11:27:30 AM) sgothel: this could only mean that result is invalid .. -> driver error
(11:28:44 AM) xranby: sgothel: yes the crash logs do not make sence... for example in https://jogamp.org/bugzilla/show_bug.cgi?id=1166#c2 and  https://jogamp.org/bugzilla/show_bug.cgi?id=1166#c3    we can trigger a crash in hotspot, for a function that shall accept any object  thus it looks to me that the heap is all corrupt
(11:30:13 AM) xranby: after mesa3d have pruinted the debug output then we crash later in  A) hotspot B) GLX C) libxrender    if i rerun the runtime verison check its kind of random where the crash ends up
(11:30:27 AM) xranby: i can reproduce the bug repeatedly while running the runtime version check
(11:30:53 AM) xranby: sh etc/test.sh
(11:31:06 AM) xranby: i fail to reproduce the bug when enable debugging
(11:31:10 AM) sgothel: how about using Mesa software renderer ?
(11:31:12 AM) xranby: sh etc/test_dbg.sh
(11:31:38 AM) xranby: sgothel: i expected mesa 3d to list doftware rendering like it did before  and not make jogl crash
(11:31:42 AM) sgothel: so even a JDK issue on that particular platform ?
(11:31:43 AM) xranby: list software rendering
(11:32:12 AM) sgothel: export LIBGL_ALWAYS_SOFTWARE=true
(11:32:19 AM) xranby: i can try that
(11:32:28 AM) sgothel: export LIBGL_DEBUG=verbose 
(11:32:28 AM) sgothel: export MESA_DEBUG=true 
(11:32:56 AM) sgothel: but well, if glxgears works .. w/ that driver .
(11:33:21 AM) xranby: glxgears also output the mesa debug log warning
(11:33:31 AM) sgothel: hmm 
(11:33:35 AM) xranby: yet it do not crash
(11:33:42 AM) xranby: it runs
(11:33:52 AM) sgothel: 'glxgears -info' ?
(11:33:56 AM) sgothel: which renderer ?
(11:34:16 AM) xranby: out of my head it said software rendering
(11:34:21 AM) xranby: i am at work right now
(11:34:23 AM) sgothel: I assume not the mali one then? since it complained? right .. 
(11:34:31 AM) xranby: i have the dev board at home that can generate the crash
(11:34:35 AM) sgothel: ah
(11:34:41 AM) xranby: its a mali gpu
(11:34:48 AM) xranby: and mesa3d do not have a driver for this gpu
(11:35:18 AM) sgothel: sound more like a config hell .. Ubuntu I read .. well :)
(11:35:39 AM) xranby: if you only use the arm mali driver then opengl es works fine
(11:35:43 AM) sgothel: I remember those 'blobs' using Mesa libGL dispatcher things .. ahem
(11:36:06 AM) sgothel: so .. then just do that, since all other things are probably not supported by that blob
(11:36:27 AM) sgothel: only Ubuntu tires to impl. libGL this way .. we already had this discussion a while ago .. AFAIK
(11:36:53 AM) sgothel: use Debian .. or 'rm libGL.*' 
(11:37:08 AM) xranby: i will try understand why it is crashing a bit more.. before i give up and claim it is simply a libgl misconfiguration error
(11:37:39 AM) xranby: it puzzles me why i cant trigger the crash when running the jogamp test_dbg.sh
(11:37:52 AM) xranby: and can trigger the crash when running the test.sh
(11:38:41 AM) sgothel: sure .. the debug code results in a different heap .. so the existing bug may not trigger a crash here
(11:39:11 AM) sgothel: valgrind or other heap-safe measures may allow it to crash properly and reliably
Comment 8 Xerxes Rånby 2015-07-07 17:25:48 CEST
http://jogamp.org/log/irc/jogamp_20150703050623.html#l111
20150703 09:41:19 <sgothel> Q: why is the glX* call returning non NULL then ?
20150703 09:41:26 <sgothel> what is the resulting count ?
20150703 09:42:04 <sgothel> ^^ count = _nitems_ptr[0];
20150703 09:43:40 <xranby> i can add some debuging code to https://github.com/sgothel/jogl/blob/master/make/config/jogl/glx-CustomCCode.c#L87 and check later
20150703 09:44:21 <sgothel> if _res is != NULL, but not pointing to a valid memory region, the error is propagated to the JVM via JVMUtil_NewDirectByteBufferCopy (and yes, mind the memove in the code, i.e. _res is 'GLXFBConfig *' <- an array)


A:

the glX* call return PointerBuffer:AbstractBuffer[direct true, hasArray false, capacity 0, position 0, elementSize 4, buffer[capacity 0, lim 0, pos 0]]

count is 1 inside the first call to glXChooseFBConfig done at X11GLXGraphicsConfigurationFactory.java:234
coint is 96 inside the second call to glXChooseFBConfig done at

I find it odd that gluegen's JVMUtil_NewDirectByteBufferCopy generated an empty PointerBuffer with lim 0 despite that the calculated capacity passed to the function was 4 and 384!

debug output below:

libGL error: failed to load driver: mali_drm
libGL: OpenDriver: trying /usr/lib/arm-linux-gnueabihf/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /usr/lib/arm-linux-gnueabihf/dri/swrast_dri.so
libGL: Can't open configuration file /home/odroid/.drirc: No such file or directory.
[Dynamic-linking native method jogamp.opengl.x11.glx.GLX.dispatch_glXQueryVersion0 ... JNI]
java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1329)
	at jogamp.opengl.x11.glx.GLX.glXChooseFBConfig(GLX.java:903)
	at jogamp.opengl.x11.glx.X11GLXGraphicsConfigurationFactory.chooseGraphicsConfigurationFBConfig(X11GLXGraphicsConfigurationFactory.java:303)
	at jogamp.opengl.x11.glx.X11GLXGraphicsConfigurationFactory.chooseGraphicsConfigurationStatic(X11GLXGraphicsConfigurationFactory.java:234)
	at jogamp.opengl.x11.glx.X11GLXDrawableFactory.createMutableSurfaceImpl(X11GLXDrawableFactory.java:524)
	at jogamp.opengl.x11.glx.X11GLXDrawableFactory.createDummySurfaceImpl(X11GLXDrawableFactory.java:535)
	at jogamp.opengl.x11.glx.X11GLXDrawableFactory$SharedResourceImplementation.createSharedResource(X11GLXDrawableFactory.java:283)
	at jogamp.opengl.SharedResourceRunner.run(SharedResourceRunner.java:297)
	at java.lang.Thread.run(Thread.java:745)
[Dynamic-linking native method jogamp.opengl.x11.glx.GLX.dispatch_glXChooseFBConfig ... JNI]
before glXChooseFBConfig.0: Count 1
after glXChooseFBConfig.X: Count 1
copy bytes 4
_res!=null java.nio.DirectByteBuffer[pos=0 lim=0 cap=0]
native orderjava.nio.DirectByteBuffer[pos=0 lim=0 cap=0]
wrapPointerBuffer:AbstractBuffer[direct true, hasArray false, capacity 0, position 0, elementSize 4, buffer[capacity 0, lim 0, pos 0]]
java.lang.Exception: Stack trace
	at java.lang.Thread.dumpStack(Thread.java:1329)
	at jogamp.opengl.x11.glx.GLX.glXChooseFBConfig(GLX.java:903)
	at jogamp.opengl.x11.glx.X11GLXGraphicsConfigurationFactory.chooseGraphicsConfigurationFBConfig(X11GLXGraphicsConfigurationFactory.java:334)
	at jogamp.opengl.x11.glx.X11GLXGraphicsConfigurationFactory.chooseGraphicsConfigurationStatic(X11GLXGraphicsConfigurationFactory.java:234)
	at jogamp.opengl.x11.glx.X11GLXDrawableFactory.createMutableSurfaceImpl(X11GLXDrawableFactory.java:524)
	at jogamp.opengl.x11.glx.X11GLXDrawableFactory.createDummySurfaceImpl(X11GLXDrawableFactory.java:535)
	at jogamp.opengl.x11.glx.X11GLXDrawableFactory$SharedResourceImplementation.createSharedResource(X11GLXDrawableFactory.java:283)
	at jogamp.opengl.SharedResourceRunner.run(SharedResourceRunner.java:297)
	at java.lang.Thread.run(Thread.java:745)
before glXChooseFBConfig.0: Count 96
after glXChooseFBConfig.X: Count 96
copy bytes 384
_res!=null java.nio.DirectByteBuffer[pos=0 lim=0 cap=0]
native orderjava.nio.DirectByteBuffer[pos=0 lim=0 cap=0]
wrapPointerBuffer:AbstractBuffer[direct true, hasArray false, capacity 0, position 0, elementSize 4, buffer[capacity 0, lim 0, pos 0]]
X11GLXGraphicsConfiguration.chooseGraphicsConfigurationFBConfig: Failed glXChooseFBConfig (X11GraphicsScreen[X11GraphicsDevice[type .x11, connection :0, unitID 0, handle 0xffffffffa6006640, owner true, ResourceToolkitLock[obj 0x15f5239, isOwner true, <87f835, da3204>[count 2, qsz 0, owner <main-SharedResourceRunner>]]], idx 0],GLCaps[rgba 8/8/8/0, opaque, accum-rgba 0/0/0/0, dp/st/ms 16/0/0, dbl, mono  , hw, GLProfile[GL2/GL2.sw], on-scr[.]]): PointerBuffer:AbstractBuffer[direct true, hasArray false, capacity 0, position 0, elementSize 4, buffer[capacity 0, lim 0, pos 0]], 96
*** Error in 'java': malloc(): memory corruption (fast): 0xa6013ff0 ***
Aborted
Comment 9 Xerxes Rånby 2015-07-07 17:33:55 CEST
Created attachment 702 [details]
jogl-debug-of-glx-CustomJavaCode-glx-CustomCCode.patch

attached patch used to generate the debug output in comment 8 in combination with the following environment and runtime flags.

export LIBGL_DEBUG=verbose
export MESA_DEBUG=true

extra flags added to etc/test.sh
D_ARGS="-Dnativewindow.debug.GrapicsConfiguration"
java -verbose:jni
Comment 10 Xerxes Rånby 2015-07-08 14:40:42 CEST
We have a problem passing a jlong from jni to java on 32bit ARM that trigger
on line 2561 of gluegen JavaEmitter when running on 32bit ARM

capacity is 1 and 96 inside JVMUtil_NewDirectByteBufferCopy when called from glx-CustomCCode.c line 108

gluegen/src/java/com/jogamp/gluegen/JavaEmitter.java
JVMUtil_NewDirectByteBufferCopy 
https://jogamp.org/git/?p=gluegen.git;a=blob;f=src/java/com/jogamp/gluegen/JavaEmitter.java;h=60cd3f4b3d23ec1dd3b046b567f2ba5705a89210;hb=HEAD#l2561src/java/com/jogamp/gluegen/JavaEmitter.java

2561          "    jbyteBuffer  = (*env)->CallStaticObjectMethod(env, clazzBuffers, cstrBuffersNew, capacity);\n"+
2562          "    byteBufferPtr = (*env)->GetDirectBufferAddress(env, jbyteBuffer);\n"+
2563          "    memcpy(byteBufferPtr, source_address, capacity);\n"+

bit when we reach the static java function
com/jogamp/common/nio/Buffers newDirectByteBuffer
then the numElements argument has been changed to 0

thus the returned DirectByteBuffer is of size 0
and the memcpy on line 2563 corrupts the heap by writing over capacity
Comment 11 Xerxes Rånby 2015-07-08 15:03:13 CEST
this is a type conversion bug that can happen because:

capacity in gluegen JavaEmitter is a jlong
we pass this jlong using JNI to a java method that takes a jint

We may solve this bug by:
A) Update gluegen com.jogamp.common.nio.Buffers to operate using long instead of int's and update the clazzNameBuffersStaticNewCstrSignature in JavaEmitter to use (J)Ljava/nio/ByteBuffer;
OR
B) Correctly typecast the jlong to a jint before we pass it over JNI
Comment 12 Xerxes Rånby 2015-07-08 15:43:10 CEST
Created attachment 704 [details]
patch to fix 1166 according to comment 11 solution B: commit-842ead6-Fix-B.patch
Comment 13 Xerxes Rånby 2015-07-08 20:49:57 CEST
(In reply to comment #12)
> Created attachment 704 [details]
> patch to fix 1166 according to comment 11 solution B:
> commit-842ead6-Fix-B.patch

(In reply to comment #11)
> this is a type conversion bug that can happen because:
> 
> capacity in gluegen JavaEmitter is a jlong
> we pass this jlong using JNI to a java method that takes a jint
> 
> We may solve this bug by:
> A) Update gluegen com.jogamp.common.nio.Buffers to operate using long
> instead of int's and update the clazzNameBuffersStaticNewCstrSignature in
> JavaEmitter to use (J)Ljava/nio/ByteBuffer;
> OR
> B) Correctly typecast the jlong to a jint before we pass it over JNI

There is no point in working on the A solution to support allocating buffers larger than 4Gb using long's because 
java nio/ByteBuffer allocateDirect only takes an int
http://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html#allocateDirect-int-

The attached B patch is thus the right solution.
Comment 14 Xerxes Rånby 2015-07-08 22:20:40 CEST
Created attachment 705 [details]
patch to fix 1166 according to solution C: commit-e424c28-Fix-C.patch

C)  Fix JavaEmitter JVMUtil_NewDirectByteBufferCopy
    to only use jint for capacity
    
    Prevents jlong to jint truncation
    when capacity is passed from jni to java.

    com.jogamp.common.nio.Buffers newDirectByteBuffer
    and the underlying java.nio.ByteBuffer allocateDirect
    only work with capacitys of int size.
Comment 15 Sven Gothel 2015-07-16 03:20:46 CEST
(In reply to comment #14)
> Created attachment 705 [details]
> patch to fix 1166 according to solution C: commit-e424c28-Fix-C.patch
> 
> C)  Fix JavaEmitter JVMUtil_NewDirectByteBufferCopy
>     to only use jint for capacity
>     
>     Prevents jlong to jint truncation
>     when capacity is passed from jni to java.
> 
>     com.jogamp.common.nio.Buffers newDirectByteBuffer
>     and the underlying java.nio.ByteBuffer allocateDirect
>     only work with capacitys of int size.

EXCELLENT! Great find - proper solution!

We could enhance this patch adding 
a test of the value range, i.e. capacity <= MAX_INT.

Here we would need to pass 'size_t capacity' 
and test its value range before casting to 'jint'.
In case of a failure .. well, we could only bail out (FatalError).
Comment 16 Sven Gothel 2015-07-16 04:51:15 CEST
(In reply to comment #15)
> (In reply to comment #14)
> > Created attachment 705 [details]
> > patch to fix 1166 according to solution C: commit-e424c28-Fix-C.patch
> > 
> > C)  Fix JavaEmitter JVMUtil_NewDirectByteBufferCopy
> >     to only use jint for capacity
> >     
> >     Prevents jlong to jint truncation
> >     when capacity is passed from jni to java.
> > 
> >     com.jogamp.common.nio.Buffers newDirectByteBuffer
> >     and the underlying java.nio.ByteBuffer allocateDirect
> >     only work with capacitys of int size.
> 
> EXCELLENT! Great find - proper solution!
> 
> We could enhance this patch adding 
> a test of the value range, i.e. capacity <= MAX_INT.
> 
> Here we would need to pass 'size_t capacity' 
> and test its value range before casting to 'jint'.
> In case of a failure .. well, we could only bail out (FatalError).

commit f6a5ac4473135bbc4bc1a5f537e060df45eb4824
    Since JVMUtil_NewDirectByteBufferCopy is being called w/ 'size_t'
    values, e.g. 'count * sizeof(Structure)',
    we shall validate whether 'capacity' is valid, i.e. <= MAX_INT.
    
    After validation, 'capacity' is being cast to 'jint' before
    being passed to the java method.


commit a3701528aa4be01924c983ce74e2efeaba0e58bc
    - Perform a NULL check on Buffers.newDirectByteBuffer(..) result.
    - Only copy memory if capacity > 0, incl fetching direct buffer address