Bug 1185

Summary: jvm crashes in [libc.so.6+0x31334] getenv+0xc4 on armv6hf w/ NEWT
Product: [JogAmp] Jogl Reporter: Gottfried Haider <gottfried.haider>
Component: coreAssignee: Sven Gothel <sgothel>
Status: RESOLVED INVALID    
Severity: normal CC: gottfried.haider, sgothel, xerxes
Priority: ---    
Version: 2.3.2   
Hardware: embedded_arm   
OS: linux   
Type: --- SCM Refs:
Workaround: ---
Attachments: Crash log 1/7
Crash log 2/7
Crash log supposedly with debug infos

Description Gottfried Haider 2015-07-28 13:09:34 CEST
Created attachment 710 [details]
Crash log 1/7

I am seeing crashes that randomly happen roughly 8% of the times when I start a Processing sketch that uses NEWT and the 3D renderer.

The crash location and the registers are very much the same each times this happens.

Could this perhaps be related to JOGL by chance? I searched the source tree for occurrences of "getenv", and it seems to be using his in connection with attaching threads to the JVM, so something that I could see a chance of something going wrong..

Please let me know if there's anything I can do to help debug this further.

I was testing this image on the Raspberry Pi 2: http://sukzessiv.net/~gohai/vc4-buildbot/build/20150727-1947-vc4-image.zip
Comment 1 Gottfried Haider 2015-07-28 13:09:59 CEST
Created attachment 711 [details]
Crash log 2/7
Comment 2 Sven Gothel 2015-07-28 14:34:04 CEST
Which version of jogamp are you using?

Using Xerxes WIP regarding Bug 1178 on top as well?
Please mention it - and add Bug 1178 to the dependencies here,
as I did for Bug 1183.

+++

- You need to compile all of jogamp w/ native and java debug code enabled:
  Use ant commandline defines:
    -Dc.compiler.debug=true \
    -Djavacdebuglevel="source,lines,vars" \

See 'scripts/make.jogl.all.linux-armv6hf-cross.sh'
in gluegen/make, joal/make and jogl/make.

Only then we are able to see a proper native stack trace 
allowing us to determine the culprit.

+++

It seems your are including all of jogamp's native jar files
for all platforms in your CLASSPATH?
Please do not include any native jar of jogamp in your CLASSPATH,
since they will be determined at runtime.

+++

I also see 'eclipse' in your stack trace.
If you run a test to prove a failure in jogamp,
please try to run the smallest possible configuration and test.

+++

Thank you!
Comment 3 Gottfried Haider 2015-07-29 10:06:40 CEST
Thank you, for looking into this, Sven!

+++

I was using the autobuild from July 25, so I believe it's what was then in your master branch, without Bug 1183? (This was definitely using mesa and the new Gallium3d driver vc4.)

+++

I shall recompile both today and will update this bug when I have more details.

+++

Will look into the CLASSPATH situation, thanks for pointing out!

+++

The "eclipse" from the stack trace seem to have come from deep within the JVM - believe those are the full paths from the machine where our JVM was build, somewhere in Oracle-land ;)

Best
Comment 4 Gottfried Haider 2015-07-29 17:49:42 CEST
I'll attach a log from a build of Processing that included JOGL & Gluegen built with the command line arguments for ant you suggested. It didn't seem to have made a difference unfortunately, or perhaps I am just not seeing it.

I also tried launching java through the debugger (gdb --args), but this dies early on from, what I believe is a different Segmentation fault.

Any ideas? Do you want me to send you some binary JOGL files, so that we can be sure that the debug informations were included?

Thanks!
Comment 5 Gottfried Haider 2015-07-29 17:50:07 CEST
Created attachment 714 [details]
Crash log supposedly with debug infos
Comment 6 Gottfried Haider 2015-07-30 15:19:08 CEST
I got some more data today. Uploaded it here: http://sukzessiv.net/~gohai/vc4-buildbot/crash

hs_err_pid4124.log happens first (that's the SIGSEGV relating to [libc.so.6+0x31334]  getenv+0xc4).

Later I managed to capture a core dump, and looked at it through gdb and jstack and jmap. I believe this is not exactly the same state as when the SIGSEGV happens - e.g. I don't see a thread 4124 in the latter, and thread 4125 is already doing generateTrace().

But I don't see a lot of references to JOGL either - so perhaps this is not at all related? I'll see if I can get the JVM to not handle the signal itself (-Xrs), but if someone else wants to take a look this would be greatly appreciated.

(The core file was running the java from JDK7 for Arm, r51.)
Comment 7 Sven Gothel 2015-08-04 15:33:49 CEST
This crash is confined to the Raspi ARM?
This crash is confined to Eric's 'alpha' OpenGL driver?

Platform -> embedded_arm

See Bug 1183 comment 7.

If I digg through these issues, I need to know a reliable recipe 
for reproduction ofc ..

It would be nice, if this can be reproduced w/ our unit tests
or a new unit test! Same for Bug 1183.
Comment 8 Sven Gothel 2015-08-05 14:11:28 CEST
I checked your files on your server, thank you!

We do call Java's rt.jar 'System.getenv(String)' a few times
from within GlueGen at launch - yes.

This call is not supposed to crash, of course.

As you mentioned (comment 6), I don't see anything JogAmp related 
in your crash report - hence I assume it is not our turf.

- Can this crash been reproduced more reliably w/ a simple test case?
  (i.e. w/o Processing?)

- Will it crash similar w/ a simple X11/GL FB/software driver as well?

- Will it crash similar w/ the BCM GL driver as well?

Please validate and let me know.

In case above questions cannot be answered,
I will not be able to work on this issue.

Maybe the answers will point to the buggy alpha driver?
Comment 9 Gottfried Haider 2015-08-06 16:15:55 CEST
Thanks for looking into it Sven! I will try a couple of steps over the next days to see if I can somehow narrow this down a bit.
Comment 10 Gottfried Haider 2015-08-09 15:41:42 CEST
Closing. I haven't seen that particular getenv crash today when running current non-debug autobuild binaries. I had a different crash that I attribute to the Mesa driver (following up with the driver author), but since nothing really points at JOGL anymore..