Bug 628

Summary: NewtCanvasSWT Hangs Sporadically During Window Resizing
Product: [JogAmp] Newt Reporter: Rob Hatcherson <rob.hatcherson>
Component: x11Assignee: Sven Gothel <sgothel>
Status: RESOLVED FIXED    
Severity: major    
Priority: ---    
Version: 1   
Hardware: pc_x86_32   
OS: linux   
Type: --- SCM Refs:
jogl a349db5086a7be7dd80fc2ad29a8a4b55f343e01 jogl f24844c5e6c57a43df79224f2d3a89e9720726f7 jogl b6fa407d4bf19ef9fe387454b5eeca68853532b9 jogl 8cf694c1424277e6358039a964ecd75c54cf9af9 jogl 17dd761d7c2b224f0505a399bf4ecb18634e9250
Workaround: ---
Attachments: test.sh output plus standalone SWT app that demos the issue

Description Rob Hatcherson 2012-10-11 19:42:49 CEST
Created attachment 375 [details]
test.sh output plus standalone SWT app that demos the issue

Fedora 12 (2.6.32.26-175.fc12.i686.PAE)
Intel/32
JRE/JDK 7u7
GeForce 9800 GT + NVIDIA driver 304.37
JOGAMP/JOGL custom build from 10/09/2012 repos (RC11 in progress - is there a better/preferred way to report this?)

test.sh output is included in the attached zip file.

The version of SWT in use is the one that came with the Eclipse 4 "Juno" installation.  The relevant jar is named: org.eclipse.swt.gtk.linux.x86_3.100.1.v4234e.jar
Not sure how this maps to the standalone SWT versions (e.g. 3.8, 4.2, etc).


There is a trail of prior discussion leading up to this bug report at:
http://forum.jogamp.org/Reshape-Problem-Using-NewtCanvasSWT-tp4026385.html


This problem was originally observed in an Eclipse 4 RCP app.  If a NewtCanvasSWT and associated GLWindow are constructed and placed as the view associated with a "part" (essentially a RCP UI tab) everything will come up and appear to work.  Initially we had some event delivery issues as discussed in the forum topic noted above, but those were fixed quickly by Sven.

However, one issue remains: if the corner of the window is grabbed with the mouse and moved around rapidly such that many reshapes of the NewtCanvasSWT are stimulated, eventually the UI will freeze up and ~5 seconds later a locking exception will be thrown.  An example backtrace is included in the forum discussion above and repeated here for convenience:

Exception in thread "main-SWTDisplay-.x11_:0.0-1-EDT-1" java.lang.RuntimeException: Waited 5000ms for: <14e5b0e, 13c0223>[count 1, qsz 0, owner <main>] - <main-SWTDisplay-.x11_:0.0-1-EDT-1>
        at jogamp.common.util.locks.RecursiveLockImpl01Unfairish.lock(RecursiveLockImpl01Unfairish.java:197)
        at com.jogamp.newt.opengl.GLWindow.display(GLWindow.java:539)
        at jogamp.opengl.GLAutoDrawableBase.defaultWindowResizedOp(GLAutoDrawableBase.java:128)
        at com.jogamp.newt.opengl.GLWindow.access$100(GLWindow.java:94)
        at com.jogamp.newt.opengl.GLWindow$1.windowResized(GLWindow.java:112)
        at jogamp.newt.WindowImpl.consumeWindowEvent(WindowImpl.java:2344)
        at jogamp.newt.WindowImpl.sendWindowEvent(WindowImpl.java:2287)
        at jogamp.newt.WindowImpl.sizeChanged(WindowImpl.java:2431)
        at jogamp.newt.driver.x11.DisplayDriver.DispatchMessages0(Native Method)
        at jogamp.newt.driver.x11.DisplayDriver.dispatchMessagesNative(DisplayDriver.java:106)
        at jogamp.newt.DisplayImpl.dispatchMessages(DisplayImpl.java:442)
        at com.jogamp.newt.swt.SWTEDTUtil$1.run(SWTEDTUtil.java:57)
        at com.jogamp.newt.swt.SWTEDTUtil$NewtEventDispatchThread.run(SWTEDTUtil.java:239) 


This problem also happens in a standalone SWT application, an example of which is in the attached zip file.  I didn't include anything to build or run this app since your jar locations will be different.

The problem is harder to duplicate in the standalone SWT app than in the Eclipse RCP app.  When using the mouse to drag the corner of an Eclipse RCP window the lock-up can be made to happen typically within 5->10 seconds.  In the standalone SWT app it sometimes takes a minute or so of dragging before the lockup occurs.

To avoid wrist and finger fatigue the app has a "Start" button that will do the resizes from code.

I was not able to come up with a way to make this problem happen in a deterministic way that could be included in a regular test case.  The presentation is typical of concurrency issues (i.e. races, etc) where sometimes it fails quickly, and sometimes not.  It is the worst kind of software issue since you can never be quite sure when you have it fixed unless the cause is clear.
Comment 1 Rob Hatcherson 2012-10-11 20:04:14 CEST
On second thought, should this be filed as a NEWT bug (?).
Comment 2 Sven Gothel 2012-10-31 17:14:30 CET
(In reply to comment #1)
> On second thought, should this be filed as a NEWT bug (?).

Yes, done :)

Any progress ?

I had not time yet to test this issue, maybe next weeks.
Comment 3 Rob Hatcherson 2012-10-31 17:44:03 CET
This one I have not looked at.  It has the feel of something that you could fix a lot quicker than you could field the pile of questions I'd probably have about your exclusive access policies as I tried to figure out what's going on.

Regardless, if time permits at some point soon then I'll take a closer look.  If I don't get anywhere then we're no worse off.
Comment 4 Sven Gothel 2012-10-31 20:13:30 CET
(In reply to comment #3)
> This one I have not looked at.  It has the feel of something that you could fix
> a lot quicker than you could field the pile of questions I'd probably have
> about your exclusive access policies as I tried to figure out what's going on.

Whats 'exclusive access policies' ?

> 
> Regardless, if time permits at some point soon then I'll take a closer look. 
> If I don't get anywhere then we're no worse off.

It's ok .. maybe next week I can at least validate and try to reproduce the issue.
If it is reproducible, I agree .. I may be able to handle it 'quicker'.

So .. let's say till next week on this issue.
Comment 5 Rob Hatcherson 2012-10-31 20:36:42 CET
> Whats 'exclusive access policies' ?

As in an understanding about what resources need to be protected with locks and why.  "Critical sections" might have been a better term to use.

Judging by the backtrace there is some of that going on here.  It is often a recipe for disaster when a less-familiar-with-the-project-yet-generally-not-too-afraid-of-changing-stuff individual (such as myself in this case :-) ) comes along and starts swinging a bat around in code whose motivations are not well understood.

As mentioned in one of my previous blurbs, on my machine at least the test app was able to reproduce the issue reliably, but not deterministically.  So... hopefully if you are patient with the test app it eventually will reward you with a lockup.

Not sure why it is so much quicker to lock up when it's SWT running in a RCP app.  In that case I can pretty much get the lockup within 5 seconds.  The standalone SWT test app on the other hand may require up to a minute.  But... the SWT test app is so much simpler than the RCP test app it seemed like that's what I should provide.  Plus I needed to be sure RCP itself wasn't causing the problem, though clearly its presence has an effect.
Comment 6 Sven Gothel 2012-11-22 04:30:10 CET
reproduced, currently analyzing deadlock which is due to running a 
mutable NEWT Window operation not on it's EDT.
Comment 8 Sven Gothel 2012-11-27 01:58:01 CET
Following JOGL commits complements the SWT/AWT deadlock fix
  b6fa407d4bf19ef9fe387454b5eeca68853532b9
  8cf694c1424277e6358039a964ecd75c54cf9af9
  17dd761d7c2b224f0505a399bf4ecb18634e9250

Commit 8cf694c1424277e6358039a964ecd75c54cf9af9
also allows usage of SWT 4.3.0
Comment 9 Rob Hatcherson 2012-11-29 19:55:34 CET
I've been unable to reproduce the deadlock after the fix, so that part is good.  However, assuming all my builds are good, the problem discussed at the forum link below appears to have come back:

http://forum.jogamp.org/Reshape-Problem-Using-NewtCanvasSWT-td4026385.html


Not exactly sure of the right way to report this with git, but the last commit I see in my project from Sven's master using "git log" says:

commit f25b5c973150252af5c5fbf4ca87b03e2e9aee32
Author: Sven Gothel <sgothel@jausoft.com>
Date:   Tue Nov 27 19:04:20 2012 +0100
Comment 10 Sven Gothel 2012-12-02 04:59:33 CET
(In reply to comment #9)
> I've been unable to reproduce the deadlock after the fix, so that part is good.
>  However, assuming all my builds are good, the problem discussed at the forum
> link below appears to have come back:
> 
> http://forum.jogamp.org/Reshape-Problem-Using-NewtCanvasSWT-td4026385.html
> 
> 
> Not exactly sure of the right way to report this with git, but the last commit
> I see in my project from Sven's master using "git log" says:
> 
> commit f25b5c973150252af5c5fbf4ca87b03e2e9aee32
> Author: Sven Gothel <sgothel@jausoft.com>
> Date:   Tue Nov 27 19:04:20 2012 +0100

Well, I don't know - just passed all SWT unit tests incl.: 
  <http://jogamp.org/git/?p=jogl.git;a=blob;f=src/test/com/jogamp/opengl/test/junit/jogl/swt/TestNewtCanvasSWTBug628ResizeDeadlock.java;h=a0874e609dfd28da3c220b65948a8cd6943f076d;hb=HEAD>

If you still have troubles, pls try to adapt this unit test to reproduce the freeze.
Comment 11 Rob Hatcherson 2012-12-02 05:08:22 CET
Hi Sven,

Just to be sure I was clear... I was *not* having problems with the freeze anymore; that all worked great after your fix for this bug.

However, an earlier problem where expose, resize, etc events would not get delivered seems to have reappeared.  You had fixed this problem a while back, so it looks like a regression (?).  The forum post I mentioned discussed this issue.  Can't remember if we filed bug on it.

Are you saying that you aren't seeing the event delivery issue in your latest build, and/or you have a test case for it that suggests the problem is not there in a build from your current master?
Comment 12 Sven Gothel 2012-12-04 07:16:46 CET
(In reply to comment #11)
> Hi Sven,
> 
> Just to be sure I was clear... I was *not* having problems with the freeze
> anymore; that all worked great after your fix for this bug.
Ah .. ok, thx a lot - I was confused w/ the many SWT issues :)

> 
> However, an earlier problem where expose, resize, etc events would not get
> delivered seems to have reappeared.  You had fixed this problem a while back,
> so it looks like a regression (?).  The forum post I mentioned discussed this
> issue.  Can't remember if we filed bug on it.

No bug report, but I fixed the 'sendReshape'  in SWT GLCanvas updateSizeCheck().

> 
> Are you saying that you aren't seeing the event delivery issue in your latest
> build, and/or you have a test case for it that suggests the problem is not
> there in a build from your current master?

I did overhaul our SWT GLCanvas, see:
  <http://jogamp.org/git/?p=jogl.git;&a=commit&h=7cb6cf2a9708d3f4e06f2215eb0d06b00fa6cd15>

tested manual here on X11 and Windows .. jenkins busy now, let's see.
Comment 13 Rob Hatcherson 2012-12-06 19:20:13 CET
I just now (12/6/2012, noon local time) sync'd up with your master and tried this again, and at the moment everything appears to be work correctly.  No deadlock, and reshapes are happening when they should.  This is most excellent.

FWIW the last commit I see from you in my clone is 7a6f6b7a5b028e918a843de9fe16c38da75edba9.