Bug 1508 - Segfault when unpacking .so from JAR
Summary: Segfault when unpacking .so from JAR
Alias: None
Product: Gluegen
Classification: JogAmp
Component: core (show other bugs)
Version: 2.5.0
Hardware: All linux
: P4 normal
Assignee: Sven Gothel
Depends on:
Reported: 2024-05-17 12:55 CEST by Owen Riddy
Modified: 2024-05-17 12:55 CEST (History)
0 users

See Also:
SCM Refs:
Workaround: ---


Note You need to log in before you can comment on or make changes to this bug.
Description Owen Riddy 2024-05-17 12:55:36 CEST

JOGL is crashing the JVM if an standalone jar is used with the shared libraries stored in some.jar!/natives/linux-amd64/lib*.so. I posted on the about a month ago with an initial assessment of the failure https://forum.jogamp.org/Segfault-loading-native-libraries-td4043450.html. Then I went on holiday and my graphics card died so it took a little longer to get back than I had hoped. I now believe this is a duplicate of #1046 - but I don't have access to the .jar used in 1046 (and I have done hours of investigating to try and figure out what is happening) so I hope I will be forgiven for raising a new bug. I don't really understand the linux dynamic linker ld.so, so there is also a bit of speculation here, forgive inaccuracies:

1) All the .so detection code is working fine. The .so are fine. Gluegen's loading logic is also fine. The crash is deterministic but hard to debug because a lot of it is happening outside Java. Bug appears in 2.3.2, 2.5.0 and a random master branch (commit hash d9604cf4687 from ~3 months ago).

2) When ld.so loads a dynamic library it can handle pointers in 2 ways - either it loads the library into memory and patches all the pointers in RAM or it can store a "start of library" address and use the pointers as offsets. It works out which mode it is operating in by whether the memory is writable-only (assumes first method) or not (assumes second). I think this crash is because libgluegen_rt.so enters a state where the library is set up the second way, but the memory is writeable and glibc gets befuddled. The evidence for this was gathered by analysing a crash log with `coredumpctl` and `coredumpctl debug`. 

3) I believe the trigger in Gluegen (evidence: lots of printline debugging, thinking hard and `strace`) is that we copy libgluegen.so to disk, System.load() it, then truncate it and re-write it a little later. Possibly this is an interaction with mmap? I dunno.

4) I believe this bug can be fixed by adjusting com/jogamp/common/util/JarUtil.java so that extract doesn't overwrite existing files. I added this a little bit before the OutputStream was created (around line 628) and the crash went away:

else if (destFile.exists()) {
  System.out.println("Crash be gone!");

5) Bug #1046 attracted the comment "The provided test jar file does not comply w/ our supported models .. (separate, fat, 'tagged').". I believe that but I don't understand why - I thought this layout was a so-called "fat jar". In this case, I would beg politely that the file truncate stops since it'd make the obvious way work for Clojure devs.