Bug 616

Summary: Remove XInitThreads() (and hence XLockDisplay/XUnlockDisplay) - Make X11 Display usage lock free.
Product: [JogAmp] Jogl Reporter: Sven Gothel <sgothel>
Component: x11Assignee: Sven Gothel <sgothel>
Status: RESOLVED FIXED    
Severity: enhancement CC: alexander.lex, andres.colubri, bforrester, dp, gouessej, wwalker3, xerxes
Priority: P1    
Version: 2   
Hardware: All   
OS: all   
Type: --- SCM Refs:
jogl fbe331f013608eb31ff0d8675f4e4c9881c9c48b jogl 21b85e647f1e661c8e5e49caa91c564a3d041df2 jogl f7ca4df7b4e6b2acbdb941f67f6d7058db134d91
Workaround: ---
Bug Depends on: 623    
Bug Blocks: 603    

Description Sven Gothel 2012-09-18 06:51:35 CEST
Remove XInitThreads() (and hence XLockDisplay/XUnlockDisplay) - Make X11 Display usage lock free.

It is known that calling  XInitThreads() may cause confusion w/ other toolkits
like GDK/GTK and if running within other [Java] IDEs where it is unclear
what the X11 locking policy is - or whether [X11] code did already run.
In case of the latter, a call to XInitThreads() is a violation of the spec.

Basics:

X11 is network based, using fast local transfers or other slow media.
A X11 display connection (DPY) communicates w/ a device (X server)
which may be local (fast) or remote (slow).
One site can have multiple connection names, e.g. X server etc.

There is a DPY limit, 255 connections are allowed _to_ one X11 server (?)

1 DPY gives you one communication queue to the server, 
each is intrinsically thread safe if used by one thread at a time.

If sharing a DPY one would either need to use 
  [1] XLockDisplay/XUnlockDisplay, which requires XInitThreads to work
  [2] Application level locking

Depending on the impl. [1] is quite fast today (XCB backend).

A bad example of [2] is AWT, i.e. using one global lock for the whole 
toolkit (on X11). In general it would be a very bad idea to use 
one DPY for the whole 'location'. Stuttering and other blocking 
artifacts would be experienced .. as you are used w/ the 
one-for-all EDT architecture as experienced w/ AWT, Cocoa, SWT, ..

Since the goal already achieved is to use one 'display' w/ multiple
rendering instances w/o blocking, we can either use:
  [a] one DPY for each thread and use case
  [b] X11 locking

The master branch uses a mix of [a] and [b] already,
where a shift towards [a] happened a while ago in the offscreen
surface and drawable implementation.

Now lets have a view back in history ..

- In the beginning .. JOGL1, followed more the EDT approach
  utilizing the ThreadUtil. Of course, at this time AWT and Swing
  were the only drawable surfaces available.
  You could disable the EDT funneling via properties, they still exist.
  However, since AWT is not implemented to be thread safe,
  but single threaded - your mileage may vary (YMMV).

- On AWT we lock surface and the so called JAWT global lock.
  Hence rendering in multiple AWT surfaces (GLCanvas) 
  from multiple threads will serve you a stuttering experience.
  You will also notice a lag of input, since the global lock
  synchronizes both, the input event queue and the surfaces - all resources.  

- We created NEWT and hence AWT constraints were no more required to follow.
  We were asked which threading model NEWT follows.
  The simple answer was none at all, i.e. it should be thread safe.
  In the end - due to conveniences, the NEWT EDT semantic was born. 
  NEWT's EDT intend was to only dispatch events from the native queue.
  Nevertheless we were required to also use it to handle 
  the native windowing lifecycle actions like create / destroy windows.

  In the end .. as it is of today, you can use a NEWT surface
  to render into from any thread, while locking it's surface and DPY.
  Since we used [b], a shared DPY w/ XInitThreads, we are thread-safe
  and resource efficient. Still .. having multiple NEWT windows 
  operating on one shared DPY may reduce some responsiveness.

  This is the balance to discuss here.

The current DPY situation regarding locking, open and close ..

- NVidia plays nice w/ open and closing of display connections

- AMD has an open/close order bug, i.e. DPY must be closed in the order 
  they were opened, hence X11Util funnels all DPY open and close ops. 
  If using AMD, we never close the DPY, but reuse them (not too bad).
  At JVM shutdown we close them in the proper order,
  which avoids leaks and a SEGV at JVM shutdown .. you may 
  have been bothered with it.

- XCB backend introduced another hack, it's locking bug forced us 
  to have 1 DPY per EDT and one per 'renderer'.
  Read: Both make up one NEWT X11 Display, so a NEWT X11 Display 
  opens _two_ DPY.


++++

The new branch x11_no_xinitthreads, test it!

- One good unit test here demonstrating the difference of 
  the master branch [b] and the new x11_no_xinitthreads branch [a]
  is TestInitConcurrentNEWT, which opens multiple NEWT windows
  each in it's own thread and renders them.
  With [b] you will notice that it takes a little time
  until all windows become established, where with [a]
  realization of all windows happens almost at the same time
  even though it takes longer until they all show up at once.
  The total time delta is almost zero, a few milliseconds
  pro [a]. However, resource costs for [a] explodes a little bit.
     20 NEWT Windows w/ DPY: [b] == 2 DPYs, [a] == 40 DPYs

  Of course, this is not the regular use case ..

As mentioned above, it is to discuss which solution is preferable.

PRO x11_no_xinitthreads branch:

  - more clean and less complex implementation

  - easier to maintain

  - neither requiring not bothering w/ XInitThreads, 
    nor XLockDisplay/XUnlockDisplay

  - no questionable locking parts in implementation

  - may solve issues folks had w/ IDEs like Eclipse ?
    E.g. Wade and others reported different behavior 
    when calling JOGL's initialization at a different time.

  - multiple NEWT windows running [each in their thread]
    on one display run more in a separate fashion.
    No interference nor blocking from other same 'display' NEWT windows    

PRO master branch (old style):

  - it's tested and used today

  - uses less DPY resources

  - much harder to hit the DPY limit of 255 connections to one server


The educated gut feeling will favor 'x11_no_xinitthreads'.

There has been quite some change in the NEWT X11 Display domain:
  NEWT: Changed Lifecycle of Display/Screen 
    <http://jogamp.org/git/?p=jogl.git&a=search&h=HEAD&st=commit&s=Changed+Lifecycle+of+Display%2FScreen>

also searching for 'X11 Display' reveals above mentioned change sets:
    <http://jogamp.org/git/?p=jogl.git&a=search&h=HEAD&st=commit&s=X11+Display>

My conclusion so far - if you could test the new branch and:
  - it works well
  - even fixes maybe a bug (Eclipse, IDEs, .. Wade ?)
  - the DPY limit doesn't bother you
  - no other concerns ..

then we could move over and clean up accordingly.

Sorry for my very long writeup, but since the matter is quite complex
and important to our X11 port I believe being more verbose and detailed
is hurting less as if being too brief.
Comment 1 Sven Gothel 2012-09-18 06:59:08 CEST
Added people to CC list who could be helping making a decision ..
Comment 2 Sven Gothel 2012-09-18 07:00:36 CEST
See branch x11_no_xinitthreads.
Comment 3 Sven Gothel 2012-09-18 08:16:45 CEST
I have tested the new branch on OpenIndiana x86_64 NV
and  Linux x86_64 (NV, AMD, Intel) with our elaborate unit tests.

So far .. no downsides. Ofc some NEWT Display lifecycle tests fail.
No deadlock or whatsoever.

The total test time is similar, a few milliseconds faster than the master branch,
but this delta could be within the measurement inaccuracy.
Comment 4 Julien Gouesse 2012-09-18 10:28:19 CEST
Hi

(In reply to comment #0)
> Sorry for my very long writeup, but since the matter is quite complex
> and important to our X11 port I believe being more verbose and detailed
> is hurting less as if being too brief.
No, it's fine for me, thank you very much.

(In reply to comment #3)
> So far .. no downsides. Ofc some NEWT Display lifecycle tests fail.
> No deadlock or whatsoever.
> 
Which one?

Can you explain a bit why the new branch uses more display resources? Is it because you don't reuse some handles?

I can give it a try but I have no heavy test case, a first person shooter using a single NEWT window will probably not exhibit problematic behaviours (but I can use some other Ardor3D testcases).

If we really know what we are doing, we should do this change in JOGL 2.0, not in another major version. If it is too risky, we should postpone it to the next minor release.
Comment 5 Sven Gothel 2012-09-18 11:49:24 CEST
(In reply to comment #4)
> Hi
> 
> (In reply to comment #3)
> > So far .. no downsides. Ofc some NEWT Display lifecycle tests fail.
> > No deadlock or whatsoever.
> > 
> Which one?
com.jogamp.opengl.test.junit.newt.TestDisplayLifecycle02NEWT
com.jogamp.opengl.test.junit.newt.TestWindows01NEWT

Both check whether the NEWT Display is being reused .. etc. 
So  breakage here is expected.

> 
> Can you explain a bit why the new branch uses more display resources? Is it
> because you don't reuse some handles?

B/c [1] we don't reuse the NEWT Display itself, hence each NEWT window will 
at least have one more DPY for the surface - we may can reuse the EDT and it's DPY,
if we like that idea. The latter would reduce the costs of DPY by 50%
and reproduce current behavior of having one EDT w/ it's DPY for one 
physical connection.

> 
> I can give it a try but I have no heavy test case, a first person shooter using
> a single NEWT window will probably not exhibit problematic behaviours (but I
> can use some other Ardor3D testcases).
That would be something, yes. Maybe some troubles disappear ?

> 
> If we really know what we are doing, we should do this change in JOGL 2.0, not
> in another major version. If it is too risky, we should postpone it to the next
> minor release.
Agreed. That is why I am so verbose here .. and trying to get make a risk assessment.
Highly likely this approach will remove currently existing bugs appearing within 'environments'.
Sure this lock free approach will cost a few DPYs .. but as you mentions, this is not 
your everyday scenario (w/ 100 windows open -> 101 or 200 DPY .. etc).
Comment 6 Julien Gouesse 2012-09-18 12:30:27 CEST
(In reply to comment #5)
> That would be something, yes. Maybe some troubles disappear ?
> 
Is there an aggregated build of this branch anywhere? I would like to test all canvases of Ardor3D with this branch, some test cases with multiple canvases (especially the one with 6 canvases). I can work a bit on that during this night or tomorrow evening.
Comment 7 Wade Walker 2012-09-20 02:24:52 CEST
I think it will be great if we can get rid of XInitThreads(). That call makes it very difficult to use JOGL safely inside an Eclipse RCP app, since Eclipse itself is drawing stuff (or at least making XWindows calls) before any JOGL code can be invoked. In most app frameworks, it's probably very difficult to insure that JOGL's static initializer is called before any part of the framework does anything with XWindows.

Also, as others have pointed out in the forum, there's no documentation saying that it's safe to call XInitThreads() more than once, so it's probably best not to call it at all.

I will test the new branch soon, I just have to travel for work next week so I can't test it right away.
Comment 8 Julien Gouesse 2012-09-20 10:19:42 CEST
(In reply to comment #7)
> 
> I will test the new branch soon, I just have to travel for work next week so I
> can't test it right away.

Please let me know if you plan to test it under Cent OS Linux too.
Comment 9 Brandon Forrester 2012-09-20 14:35:30 CEST
I'm happy to report that we tested it on our code base and it seems to fix the threading errors we were receiving. Should we expect to see this branch merged into rc11 ??
Comment 10 Brandon Forrester 2012-09-20 14:37:06 CEST
(In reply to comment #9)
> I'm happy to report that we tested it on our code base and it seems to fix the
> threading errors we were receiving. Should we expect to see this branch merged
> into rc11 ??

Tested under Ubuntu 10.x and 12.x 64 bit..
Comment 11 Julien Gouesse 2012-09-20 14:47:35 CEST
(In reply to comment #9)
> Should we expect to see this branch merged
> into rc11 ??

We have to test it a bit more and Sven will have the last word.
Comment 12 Sven Gothel 2012-09-20 14:57:08 CEST
I agree w/ all of you .. i.e. a bit more testing and favor the merge
due to simplified X11 locking, read: X11 lock-free.
Plus, it removes the XInitThreads() issue completly .. great.

Yes, it will be in RC11 .. more testing will be done while merging it, I will keep another branch alive
with xinitthreads .. so we can compare, just in case.

Thank you for testing!

~Sven
Comment 13 Sven Gothel 2012-09-27 17:40:46 CEST
Commit fbe331f013608eb31ff0d8675f4e4c9881c9c48b
  <http://jogamp.org/git/?p=jogl.git;a=commit;h=fbe331f013608eb31ff0d8675f4e4c9881c9c48b>:

+++

Fix Bug 616: X11: Remove XInitThreads() dependency while cleaning up device locking, resulting in a native-lock-free impl.

The X11 implementation details of NativeWindow and NEWT used the X11 implicit locking facility 
XLockDisplay/XUnlockDisplay, enabled via XInitThreads().
The latter useage is complicated within an unsure environment where the initialization point of JOGL
is unknown, but XInitThreads() requires to be called once and before any other X11 calls.

The solution is simple and thorough, replace native X11 locking w/ 'application level' locking.
Following this pattern actually cleans up a pretty messy part of X11 NativeWindow and NEWT,
since the generalization of platform independent locking simplifies code.
Simply using our RecursiveLock also speeds up locking, since it doesn't require JNI calls down to X11 anymore.
It allows us to get rid of X11ToolkitLock and X11JAWTToolkitLock.

Using the RecursiveLock also allows us to remove the shortcut of explicitly createing 
a NullToolkitLocked device for 'private' display connections. 
All devices use proper locking as claimed in their toolkit util 'requiresToolkitLock()' in X11Util, OSXUtil, ..

Further more a bug has been fixed of X11ErrorHandler usage, i.e. we need to keep our handler alive at all times
due to async X11 messaging behavior. This allows to remove the redundant code in X11/NEWT.

The AbstractGraphicsDevice lifecycle has been fixed as well, i.e. called when closing NEWT's Display 
for all driver implementations.

On the NEWT side the Display's AbstractGraphicsDevice semantics has been clarified, 
i.e. it's usage for EDT and lifecycle operations.
Hence the X11 Display 2nd device for rendering operations has been moved to X11 Window
where it belongs - and the X11 Display's default device used for EDT/lifecycle-ops as it should be.
This allows running X11/NEWT properly with the default usage, where the Display instance 
and hence the EDT thread is shared with many Screen and Window.
Rendering using NEWT Window is decoupled from it's shared Display lock
via it's own native X11 display.

Lock free AbstractGraphicsDevice impl. (Windows, OSX, ..) don't require any attention in this regard
since they use NullToolkitLock.

Tests:
======
This implementation has been tested manually with Mesa3d (soft, Intel), ATI and Nvidia 
on X11, Windows and OSX w/o any regressions found in any unit test.

Issues on ATI:
==============
Only on ATI w/o a composite renderer the unit tests expose a driver or WM bug where XCB 
claims a lack of locking. Setting env. var 'LIBXCB_ALLOW_SLOPPY_LOCK=true' is one workaround
if users refuse to enable compositing. We may investigate this issue in more detail later on.
Comment 14 Sven Gothel 2012-09-29 13:08:20 CEST
Reopened due to Bug 623 [dependency].
Comment 16 Sven Gothel 2012-09-30 20:58:13 CEST
Using workaround of Bug 623 remedy ..
Comment 17 Sven Gothel 2012-10-02 07:39:54 CEST
Added fix 21b85e647f1e661c8e5e49caa91c564a3d041df2

    Fix commit fbe331f013608eb31ff0d8675f4e4c9881c9c48b (Bug 616 - Remove XInitThreads())
    
    X11Util/Native: Fix X11Util_initialize0() arguments were wrong and code still invoked XInitThreads() .. woops;
                    Added missing included "jogamp_nativewindow_x11_X11Util.h" incl. it's generation via javah,
                    which was the culprit of not detecting it at compile time.
                    This is a fix for commit fbe331f013608eb31ff0d8675f4e4c9881c9c48b
Comment 18 Sven Gothel 2012-10-02 17:57:43 CEST
Fixed regression of getDefaultToolkitLock() usage, commit f7ca4df7b4e6b2acbdb941f67f6d7058db134d91