Bug 1136

Summary: Crash in clBuildProgram
Product: [JogAmp] Jocl Reporter: Wade Walker <wwalker3>
Component: openclAssignee: Wade Walker <wwalker3>
Status: UNCONFIRMED ---    
Severity: normal CC: bram.leenhouwers, sgothel, wwalker3
Priority: ---    
Version: 1   
Hardware: pc_x86_64   
OS: linux   
Type: --- SCM Refs:
Workaround: ---
Attachments: A kernel that crashes the driver in 340.65

Description Wade Walker 2015-02-27 21:17:28 CET
From forum entry at http://forum.jogamp.org/crash-in-Java-com-jogamp-opencl-llb-impl-CLImpl-clBuildProgram0-td4034077.html:

I am facing a jocl crash issue with clBuildProgram. 
Important info : I am using RHEL6 64b and the latest nvidia driver for k20 (340.65) 

When I build a kernel using jocl, the jvm crashes. 
I have recompiled jocl to investigate where this comes from and I'm a bit stuck now so I turn to you guys. 
it seems that the issue comes from the clBuildProgram function call in the JNI file. 

Four interesting points : 

1/ Building the same kernel with a plain old c program does not crash (same parameters passed to the driver) 
2/ Building the same kernel with a plain old c program using dlopen and dlsym just like jocl does not crash either. 
3/ The crash occurs only on 340.65, I have been able to downgrade to an older version (304.xx) and jocl works just fine. 
4/ 340.65 works fine on windows 

here is the stack before the crash : 

#172 0x00007f10fdfca5c7 in NvCliCompileProgram () from /usr/lib64/libnvidia-compiler.so.340.65 
#173 0x00007f11271c20ac in ?? () from /usr/lib64/libnvidia-opencl.so.1 
#174 0x00007f11271b64c5 in ?? () from /usr/lib64/libnvidia-opencl.so.1 
#175 0x00007f112c3bb8c9 in Java_com_jogamp_opencl_llb_impl_CLImpl_clBuildProgram0 () 

Any ideas on why this could happen ? Maybe it's a bug on the driver side, but if it's the case, I would like to be able to send them a reproduction case without jocl in the middle.
Comment 1 bram.leenhouwers 2015-03-03 11:25:41 CET
Created attachment 691 [details]
A kernel that crashes the driver in 340.65
Comment 2 bram.leenhouwers 2015-03-03 11:29:02 CET
This does not seem to be tied to the kernel being compiled.
I did however attach a test kernel that crashes the driver, so that we have the same reproduction steps.

I did try on 331.38 (the one I see you use in the jenkins jobs)
The behavior is a bit different, the jvm does not crash instantly, but becomes unstable and eventually killing the java process sometimes crashes the machine.
Comment 3 Wade Walker 2015-03-25 03:13:46 CET
Sorry this took me some time to get around to :)

I created a JUnit test for this and tested it on my home machine (64b Windows 8), without seeing the problem. I can try it on Ubuntu 14 (my other main machine) and see if that gives any different results.

However, one thing that might be causing this problem is the limit returned by querying CL_DEVICE_MAX_PARAMETER_SIZE. It's supposed to be guaranteed to be at least 256 bytes, which you shouldn't be hitting, but you might be very close to the limit if the OpenCL compiler is interpreting your "0" literals as 64 bit types (a 0.0f would insure floats). You might try using 0.0f or reducing the number of arguments to see if the problem goes away.

You might also check your locale, sometimes that has weird effects on shader/kernel compilers. Try setting it to something boring like en_US.UTF-8 and see if that makes a difference.