Bug 606 - New AWT threading implementation breaks OpenGL in Processing 2.0 (OSX)
Summary: New AWT threading implementation breaks OpenGL in Processing 2.0 (OSX)
Status: RESOLVED FIXED
Alias: None
Product: Jogl
Classification: JogAmp
Component: awt (show other bugs)
Version: 2
Hardware: pc_all macosx
: --- normal
Assignee: Sven Gothel
URL:
Depends on:
Blocks:
 
Reported: 2012-07-14 09:58 CEST by ac
Modified: 2012-08-02 09:21 CEST (History)
0 users

See Also:
Type: ---
SCM Refs:
4b5a0f6557d7152ec770bc13ad3c494449de0529
Workaround: ---


Attachments
AWT Processing error log (88.19 KB, text/plain)
2012-07-14 10:08 CEST, ac
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ac 2012-07-14 09:58:15 CEST
OpenGL sketches run from within the Processing 2.0 environment on Mac OSX (tested on 10.6.8 and 10.7.3) give a null-pointer exception:

java.lang.NullPointerException 
        at processing.mode.java.runner.Runner.findException(Runner.java:652) 
        at processing.mode.java.runner.Runner.reportException(Runner.java:597) 
        at processing.mode.java.runner.Runner.exception(Runner.java:537) 
        at processing.mode.java.runner.EventThread.exceptionEvent(EventThread.java:367) 
        at processing.mode.java.runner.EventThread.handleEvent(EventThread.java:255) 
        at processing.mode.java.runner.EventThread.run(EventThread.java:89) 
Exception in thread "Animation Thread" java.lang.NullPointerException 
        at javax.media.opengl.awt.GLCanvas.validateGLDrawable(GLCanvas.java:555) 
        at javax.media.opengl.awt.GLCanvas.display(GLCanvas.java:398) 
        at processing.opengl.PGL.requestDraw(Unknown Source) 
        at processing.opengl.PGraphicsOpenGL.requestDraw(Unknown Source) 
        at processing.core.PApplet.run(PApplet.java:1860) 
        at java.lang.Thread.run(Thread.java:680) 

using JOGL jars compiled since these commits:

http://jogamp.org/git/?p=jogl.git;a=commit;h=3ed491213f8f7f05d7b9866b50d764370d8ff5f6

http://jogamp.org/git/?p=gluegen.git;a=commit;h=3d527ea538c9e9897f86a0f6bdae0cab44d239c3

which introduced changes in the AWT threading. 

The null pointer exception doesn't occur all the time, but roughly in 1 out of 5 runs. When Processing is set to use NEWT, then there is no error. The full debug log is attached.

To reproduce this error, you need to checkout the latest revision from the development trunk of Processing 2.0:

http://code.google.com/p/processing/source/checkout

build, and then replace the JOGL jars in <app folder>/Contents/Resources/Java/modes/java/libraries/opengl/library/

with the ones build from the jogl/gluegen commits mentioned earlier.

Any OpenGL sketch should trigger the error, for example:

void setup() {
  size(400,400,P3D);  
}

void draw() {
  background(100);
  ellipse(mouseX, mouseY, 100, 100);
}
Comment 1 ac 2012-07-14 10:08:14 CEST
Created attachment 363 [details]
AWT Processing error log
Comment 2 Sven Gothel 2012-07-21 03:28:30 CEST
something is very very odd w/ your logfile!

the stack-trace doesn't match with the source code revision 
as stated in it:
   jogl.build.number=699
   jogl.build.id=2012-03-08_15-56-10
   jogl.build.branch=master
   jogl.build.commit=d28e1b139f14b10b9e22750ac44dbc18f08a0d34

I even dl the build and checked w/ git .. it's the correct sha1 / source,
but no line numbers (e.g. within GLCanvas match).

Exception in thread "Animation Thread" main: setRealized: MacOSXPbufferCGLDrawable false -> true
java.lang.NullPointerException
	at javax.media.opengl.awt.GLCanvas.validateGLDrawable(GLCanvas.java:555)
	at javax.media.opengl.awt.GLCanvas.display(GLCanvas.java:398)

But source says:
 555         // before native peer is valid: X11
 556         disableBackgroundErase();
 557 
 558         // issues getGraphicsConfiguration() and creates the native peer
 559         super.addNotify();
.. 
and:
 395   public boolean isRealized() {
 396       drawableSync.lock();
 397       try {
 398         return ( null != drawable ) ? drawable.isRealized() : false;
 399       } finally {
 400         drawableSync.unlock();
 401       }
 402   }

also the mentioned sha1 / build contains the locking code.

is it possible that you have 'edited' the file ?

Or is it too later for me tonight ?

+++

Also .. I could not run the svn checkedout processing :(
I cloned it, ran 'ant run' to build/run .. OK, but then no GLEventListener found.
I guess the classpath isn't setup ?
Me even tried to build a dist .. but no jogl jar's referenced this way.
Pls advise .. since I need to double check the locking code.
Comment 3 Sven Gothel 2012-07-21 03:36:37 CEST
(In reply to comment #2)
> something is very very odd w/ your logfile!
> 
> the stack-trace doesn't match with the source code revision 
> as stated in it:
>    jogl.build.number=699
>    jogl.build.id=2012-03-08_15-56-10
>    jogl.build.branch=master
>    jogl.build.commit=d28e1b139f14b10b9e22750ac44dbc18f08a0d34
<http://jogamp.org/git/?p=jogl.git;a=commit;h=d28e1b139f14b10b9e22750ac44dbc18f08a0d34>

> 
> I even dl the build and checked w/ git .. it's the correct sha1 / source,
> but no line numbers (e.g. within GLCanvas match).
> 
> Exception in thread "Animation Thread" main: setRealized:
> MacOSXPbufferCGLDrawable false -> true
> java.lang.NullPointerException
>     at javax.media.opengl.awt.GLCanvas.validateGLDrawable(GLCanvas.java:555)
>     at javax.media.opengl.awt.GLCanvas.display(GLCanvas.java:398)
> 

<http://jogamp.org/git/?p=jogl.git;a=blob;f=src/jogl/classes/javax/media/opengl/awt/GLCanvas.java;h=d8d9ddf6b0dcc921a819b2ca028341013d7dfe48;hb=efa70cd39e1a2ac18c3e8660f8d57e4569b19018#l555>

<http://jogamp.org/git/?p=jogl.git;a=blob;f=src/jogl/classes/javax/media/opengl/awt/GLCanvas.java;h=d8d9ddf6b0dcc921a819b2ca028341013d7dfe48;hb=efa70cd39e1a2ac18c3e8660f8d57e4569b19018#l398>
Comment 4 ac 2012-07-21 16:50:36 CEST
Sorry, maybe I did something wrong when "downgrading" to the earlier build?

Since I originally checked out the latest revision for both jogl and gluegen, I tried older revisions by running the following command:

git reset --hard <commit id>

For instance, in order to test the build that I think created the problem, I did:

git reset --hard 3ed491213f8f7f05d7b9866b50d764370d8ff5f6

Is this the correct way to do it?

With regards to the Processing testing, it turns out that Ben Fry started just yesterday to move the opengl library directly into the core packages, so the most recent revisions from the trunk might not be fully functional yet. I just asked him if the porting is completed or if he is still working on it. I will let you know.
Comment 5 Sven Gothel 2012-07-21 22:59:49 CEST
(In reply to comment #4)
> Sorry, maybe I did something wrong when "downgrading" to the earlier build?
> 
> Since I originally checked out the latest revision for both jogl and gluegen, I
> tried older revisions by running the following command:
> 
> git reset --hard <commit id>
> 
> For instance, in order to test the build that I think created the problem, I
> did:
> 
> git reset --hard 3ed491213f8f7f05d7b9866b50d764370d8ff5f6
> 
> Is this the correct way to do it?
> 

This should work, but you probably need to delete all builds ('rm -rf build*').

What I do to bisect things [manually] is to create branches at commit points,
check them out to use it, delete the build folder, build and test.

> With regards to the Processing testing, it turns out that Ben Fry started just
> yesterday to move the opengl library directly into the core packages, so the
> most recent revisions from the trunk might not be fully functional yet. I just
> asked him if the porting is completed or if he is still working on it. I will
> let you know.

Thx.

So .. 2 hours went to a full review of all GLAutoDrawable locking code (AWT, SWT, NEWT, ..)
considering code changes and remarks:
  3ed491213f8f7f05d7b9866b50d764370d8ff5f6
  1a91ec5c8b6fd9d9db7bc115569c369fe7b38e9b
  3334a924309a9361a448d69bc707d4cce416b430
  4f27bcecf7484dc041551f52a5c49e2884cb3867

It seems necessary to have proper locking employed for all
semantic actions changing, creating and destroying drawable/context.
To avoid deadlock, we have to ensure the locked code segment will not spawn 
off to another thread, ofc.
Other read-only methods at least shall utilize a safe local copy of a volatile
if further use to produce the result is necessary.

I have implemented the above pattern (all GLAutoDrawable*) and currently 
testing it. Will ping you when pushed, so you can test - which would be very helpful.

  - create
Comment 6 Sven Gothel 2012-07-22 04:20:08 CEST
http://jogamp.org/git/?p=jogl.git;a=commit;h=4b5a0f6557d7152ec770bc13ad3c494449de0529

    Considering code changes and remarks:
      3ed491213f8f7f05d7b9866b50d764370d8ff5f6
      1a91ec5c8b6fd9d9db7bc115569c369fe7b38e9b
      3334a924309a9361a448d69bc707d4cce416b430
      4f27bcecf7484dc041551f52a5c49e2884cb3867
    
    It seems necessary to have
    
    - recursive locking employed for all semantic actions which changes drawable & context 
      (and the Window resource)
    
    - to avoid deadlock, we have to ensure the locked code segment will not spawn
      off to another thread, or a thread holds the lock, spawns of an action requiring the lock. .. sure
    
    - other read-only methods (flags, ..) shall at least utilize a safe local copy of a volatile field
      if further use to produce the result is necessary.
    
    - flags like sendReshape require to be volatile to guarantee it's being processed
    
    Patch impacts: AWT/SWT GLCanvas, GLAutoDrawableBase [and it's specializations]
    and hopefully closes any loopholes of missing a cache hit, etc.
    
    If you review this and find optimizations, i.e. removing a lock due to semantics etc,
    don't hold back and discuss it, please.
Comment 7 ac 2012-07-24 23:38:45 CEST
Appears to work very nicely. The crash is now gone. Thanks!
Comment 8 ac 2012-07-27 19:58:28 CEST
I found an issue that seems to be triggered by the changes of this commit: the application's framerate won't go over 60fps, even it trying explicitly from the code to make it run faster...
Comment 9 Sven Gothel 2012-08-02 09:21:07 CEST
(In reply to comment #8)
> I found an issue that seems to be triggered by the changes of this commit: the
> application's framerate won't go over 60fps, even it trying explicitly from the
> code to make it run faster...

Can you elaborate on this ?
  - platform
  - test case 
  - .. ?