Bug 1183

Summary: NullPointerException at jogamp.newt.driver.x11.ScreenDriver.collectNativeMonitorModesAndDevicesImpl(ScreenDriver.java:124) upon startup
Product: [JogAmp] Newt Reporter: Gottfried Haider <gottfried.haider>
Component: x11Assignee: Sven Gothel <sgothel>
Status: RESOLVED FIXED    
Severity: normal CC: gottfried.haider, xerxes, xerxes
Priority: ---    
Version: 2.3.2   
Hardware: embedded_arm   
OS: all   
Type: --- SCM Refs:
jogl 4ff4f735c421ef343a8c447fa699c01444bd5e9b jogl 217d8b78a3d70d9be59d4537c7565118dfe1e277 jogl a213c39fa9d741d519df56bc4d4abb86113985f4
Workaround: ---
Attachments: TestInitScreen.java
commit-15a7e36-cleanup-javadoc-signature-comments.patch

Description Gottfried Haider 2015-07-27 14:28:25 CEST
Using Eric Anholt's vc4 mesa driver on the Raspberry Pi with jogl 2.3.1 and Xerxes' fixes on top, I am often receiving the following NullPointerException upon startup of a Processing sketch that used the 3D renderer:

java.lang.NullPointerException, at jogamp.newt.driver.x11.ScreenDriver.collectNativeMonitorModesAndDevicesImpl(ScreenDriver.java:124), 
at jogamp.newt.ScreenImpl.collectNativeMonitorModes(ScreenImpl.java:630), at jogamp.newt.ScreenImpl.initMonitorState(ScreenImpl.java:566), 
at jogamp.newt.ScreenImpl.createNative(ScreenImpl.java:196), 
at jogamp.newt.ScreenImpl.addReference(ScreenImpl.java:235), 
at processing.opengl.PSurfaceJOGL.initScreen(PSurfaceJOGL.java:126), 
at processing.opengl.PSurfaceJOGL.initFrame(PSurfaceJOGL.java:105), 
at processing.core.PApplet.initSurface(PApplet.java:10243), 
at processing.core.PApplet.runSketch(PApplet.java:10129) 
at processing.core.PApplet.main(PApplet.java:9898)

Any hint as to how to debug this further would be greatly appreciated.

Thanks
gohai
Comment 1 Xerxes Rånby 2015-07-28 00:02:03 CEST
I recall i have seen this exception and remember it to be caused because the static initialization of
GraphicsConfigurationFactory.initSingleton();
has not yet been run.

In most JOGL applications
GraphicsConfigurationFactory.initSingleton();
is triggered during the NativeWindowFactory static block
before you create a NewtWindow.

Processing 3 codebase initializes the NEWT Screen early before reaching the line when it create the Newt Window.
https://github.com/processing/processing/blob/master/core/src/processing/opengl/PSurfaceJOGL.java#L100-L123

I will try reproduce the bug and check if i can remedy it by placing a
GraphicsConfigurationFactory.initSingleton();
inside the X11 ScreenDriver static block since the ScreenDriver depend on that GraphicsConfigurationFactory has been initialized.
Comment 2 Xerxes Rånby 2015-07-28 00:38:58 CEST
If you are running on a raspberry pi system with both the /opt/vc driver and the /usr/lib/... mesa3d driver present then
GraphicsConfigurationFactory.initSingleton()
will skip to register X11 because it think it is on a bcm.vc.iv system.

http://jogamp.org/git/?p=jogl.git;a=blob;f=src/nativewindow/classes/com/jogamp/nativewindow/GraphicsConfigurationFactory.java;hb=HEAD#l121

A workaround to allow your program to create both X11 and bcm.vc.iv newt window at the same time is to run the following code early in your program.

GraphicsConfigurationFactory.initSingleton()
try {
    ReflectionUtil.callStaticMethod("jogamp.nativewindow.x11.X11GraphicsConfigurationFactory",
                                    "registerFactory", null, null, GraphicsConfigurationFactory.class.getClassLoader());
} catch (final Exception e) {
    throw new RuntimeException(e);
}
Comment 3 Xerxes Rånby 2015-07-28 02:38:30 CEST
I can reproduce this bug using the
20150726-1747-vc4-image.zip system image from
http://sukzessiv.net/~gohai/vc4-buildbot/build/
when running the YellowTail processing 3 demo
with the X11 mesa3d driver installed and
without the /opt/vc driver.

It do not trigger all the time. If I re-run the demo then it launches fine.

The thesis in comment #2 that both the /opt/vc and /usr/lib mesa3d has to be installed at the same time to trigger this bug is not true.

We need to ensure the static initialization order as suggested in comment #1 and as discussed in jogamp irc:
http://jogamp.org/log/irc/jogamp_20150727135426.html#l74
Comment 4 Gottfried Haider 2015-07-28 12:52:50 CEST
This seems to happens less frequently for me on http://sukzessiv.net/~gohai/vc4-buildbot/build/20150727-1947-vc4-image.zip, which has a recent snapshot of JOGL, but it still happens.

The Processing demo that made it trigger two out of three tries for me with this build was: Demos/Tests/SpecsTest.

Thanks for looking into this!
Comment 5 Sven Gothel 2015-07-28 14:17:01 CEST
"Xerxes' fixes on top", referring to Xerxes WIP regarding Bug 1178.
Comment 6 Gottfried Haider 2015-07-29 10:56:57 CEST
Sorry for not being specific enough.. I was referring to the JavaEmitter's JVMUtil_NewDirectByteBufferCopy one. Also seeing this with autobuild from July 25.
Comment 7 Sven Gothel 2015-08-04 15:30:24 CEST
This crash is confined to the Raspi ARM?
This crash is confined to Eric's 'alpha' OpenGL driver?
Comment 8 Sven Gothel 2015-08-04 15:34:05 CEST
(In reply to comment #7)
> This crash is confined to the Raspi ARM?
> This crash is confined to Eric's 'alpha' OpenGL driver?

If I digg through these issues, I need to know a reliable recipe 
for reproduction ofc ..

It would be nice, if this can be reproduced w/ our unit tests
or a new unit test! Same for Bug 1185.
Comment 9 Xerxes Rånby 2015-08-05 10:57:04 CEST
Created attachment 715 [details]
TestInitScreen.java

The attached test-program TestInitScreen.java can reproduce this bug in about  one of three runs using a
RaspberryPi 1 b+ (the "old" one using an armv6 cpu)

Steps to reproduce:
Boot the pi to X11, I was using Eric Anholts experimental X11 glamor driver when I managed to reproduce this.

javac -cp jogamp-fat.jar:. TestInitScreen.java
java -cp jogamp-fat.jar:. TestInitScreen

Output:
pi@raspberrypi ~/2.3.2 $ java -cp jogamp-fat.jar:. TestInitScreen 
Exception in thread "main" java.lang.NullPointerException
	at jogamp.newt.driver.x11.ScreenDriver.collectNativeMonitorModesAndDevicesImpl(ScreenDriver.java:124)
	at jogamp.newt.ScreenImpl.collectNativeMonitorModes(ScreenImpl.java:630)
	at jogamp.newt.ScreenImpl.initMonitorState(ScreenImpl.java:566)
	at jogamp.newt.ScreenImpl.createNative(ScreenImpl.java:196)
	at jogamp.newt.ScreenImpl.addReference(ScreenImpl.java:235)
	at TestInitScreen.testInitScreenBug1183(TestInitScreen.java:42)
	at TestInitScreen.main(TestInitScreen.java:46)
X11Util.Display: Shutdown (JVM shutdown: true, open (no close attempt): 1/1, reusable (open, marked uncloseable): 0, pending (open in creation order): 1)
X11Util: Open X11 Display Connections: 1
X11Util: Open[0]: NamedX11Display[:0.0, 0xb61d20, refCount 1, unCloseable false]
Comment 10 Xerxes Rånby 2015-08-05 11:24:02 CEST
(In reply to comment #7)
> This crash is confined to the Raspi ARM?
> This crash is confined to Eric's 'alpha' OpenGL driver?

It can be reproduced with both the "regular" framebuffer X11 driver found on the default Raspbian system
and
by using Eric Anholt's glamour X11 driver.


It is *very* easy to reproduce this bug using the "regular" X11 framebuffer driver + mesa3d software rasterizer using a Raspbian system if you disable the bcm.vc.iv driver by moving the
/opt/vc to /opt/vc.off

and then run the test program:

javac -cp jogamp-fat.jar:. TestInitScreen.java
java -cp jogamp-fat.jar:. TestInitScreen
Comment 11 Xerxes Rånby 2015-08-05 12:35:09 CEST
(In reply to comment #10)
> (In reply to comment #7)
> > This crash is confined to the Raspi ARM?
> > This crash is confined to Eric's 'alpha' OpenGL driver?
> 
> It can be reproduced with both the "regular" framebuffer X11 driver found on
> the default Raspbian system
> and
> by using Eric Anholt's glamour X11 driver.
> 
> 
> It is *very* easy to reproduce this bug using the "regular" X11 framebuffer
> driver + mesa3d software rasterizer using a Raspbian system if you disable
> the bcm.vc.iv driver by moving the
> /opt/vc to /opt/vc.off
> 
> and then run the test program:
> 
> javac -cp jogamp-fat.jar:. TestInitScreen.java
> java -cp jogamp-fat.jar:. TestInitScreen

Its easy to reproduce this on a raspberry pi 2 (quad core armv7) as well as using the raspberry pi 1 (singlecore armv6) when using the X11 framebuffer driver + mesa3d software rasterizer.
Its near 100% reproduction rate when you move the opt/vc to /opt/vc.off
Comment 12 Sven Gothel 2015-08-05 13:46:03 CEST
jogl 4ff4f735c421ef343a8c447fa699c01444bd5e9b
    Bug 1183: Handle NULL return values for native RandR13 calls, 
    which suppose to return 'int[]'
    
    Certain native RandR13 functions shall return 'int[]',
    but will bail out early in case of an error or lack of resources.
    
    The latter is true for getMonitorDeviceIds(..)
    where 'XRRScreenResources->ncrtc <= 0', causing return value NULL.
    
    This patch handles the NULL return values,
    however, the root cause of having 'XRRScreenResources->ncrtc <= 0'
    _occasionally_ lies within the Raspi X11 driver I am afraid?!
Comment 13 Sven Gothel 2015-08-05 13:50:24 CEST
(In reply to comment #12)
> jogl 4ff4f735c421ef343a8c447fa699c01444bd5e9b
>     Bug 1183: Handle NULL return values for native RandR13 calls, 
>     which suppose to return 'int[]'
>     
>     Certain native RandR13 functions shall return 'int[]',
>     but will bail out early in case of an error or lack of resources.
>     
>     The latter is true for getMonitorDeviceIds(..)
>     where 'XRRScreenResources->ncrtc <= 0', causing return value NULL.
>     
>     This patch handles the NULL return values,
>     however, the root cause of having 'XRRScreenResources->ncrtc <= 0'
>     _occasionally_ lies within the Raspi X11 driver I am afraid?!

It should fall back to a fake entry using the virtual size within NEWT.

If you can confirm this .. please set this bug to validated, thx!
Comment 14 Xerxes Rånby 2015-08-05 15:00:44 CEST
I can confirm that I no longer can see the exception when
jogl 4ff4f735c421ef343a8c447fa699c01444bd5e9b
is applied.
Comment 15 Xerxes Rånby 2015-08-05 23:45:26 CEST
(In reply to comment #12)
> jogl 4ff4f735c421ef343a8c447fa699c01444bd5e9b
>     Bug 1183: Handle NULL return values for native RandR13 calls, 
>     which suppose to return 'int[]'
>     
>     Certain native RandR13 functions shall return 'int[]',
>     but will bail out early in case of an error or lack of resources.
>     
>     The latter is true for getMonitorDeviceIds(..)
>     where 'XRRScreenResources->ncrtc <= 0', causing return value NULL.
>     
>     This patch handles the NULL return values,
>     however, the root cause of having 'XRRScreenResources->ncrtc <= 0'
>     _occasionally_ lies within the Raspi X11 driver I am afraid?!

I found this upstream xorg bug, that includes a possible workaround!
It is a possibility that this bug only trigger when using RandR in combination with an xorg driver that are not yet fully randr1.2 compatible.

https://bugs.freedesktop.org/show_bug.cgi?id=20270 - 
XRRGetScreenResourcesCurrent (or XRRGetScreenResources) reports empty data until XRRGetScreenSizeRange is called on non randr1.2 drivers
Comment 16 Xerxes Rånby 2015-08-05 23:59:34 CEST
Created attachment 716 [details]
commit-15a7e36-cleanup-javadoc-signature-comments.patch

Spotted some javadoc and comment typos while reading RandR code related to this bug.
Feel free to merge using the patch attached.

Bug 1183: Cleanup x11/RandR javadoc and native/X11RandR13 signature comments.
Comment 17 Gottfried Haider 2015-08-06 17:02:20 CEST
Thanks Sven & Xerxes!

I am still a bit confused why this only hits occasionally and not for every invocation.. might be totally unrelated, but Processing just merged a fix related to screen device parsing (https://github.com/gohai/processing/commit/844501686ce509c21b9856d1accae7f41e729060) - testing in a bit with the JOGL fix and this on top.
Comment 18 Gottfried Haider 2015-08-09 15:39:29 CEST
I was testing with the new Mesa driver today, and instead of the NullPointerException I am now seeing the error below. Is this indeed the same root cause - outside of JOGL's codebase - not manifesting itself like this? Thanks!

FATAL ERROR in native method: Nativewindow X11 IOError: Display 0x3dc5ee0 (:0): Resource temporarily unavailable
Nativewindow X11 IOError: Display 0x3dc5ee0 (:0): Resource temporarily unavailable
Nativewindow X11 IOError: Display 0x3dc5ee0 (:0): Resource temporarily unavailable
	at jogamp.newt.driver.x11.WindowDriver.CreateWindow0(Native Method)
	at jogamp.newt.driver.x11.WindowDriver.CreateWindow(WindowDriver.java:445)
	at jogamp.newt.driver.x11.WindowDriver.createNativeImpl(WindowDriver.java:130)
	at jogamp.newt.WindowImpl.createNative(WindowImpl.java:460)
	at jogamp.newt.WindowImpl.setVisibleActionImpl(WindowImpl.java:975)
	at jogamp.newt.WindowImpl$VisibleAction.run(WindowImpl.java:1026)
	at com.jogamp.common.util.RunnableTask.run(RunnableTask.java:150)
	- locked <0x65a60628> (a java.lang.Object)
	at jogamp.newt.DefaultEDTUtil$NEDT.run(DefaultEDTUtil.java:372)
Comment 19 Xerxes Rånby 2015-08-09 18:21:53 CEST
(In reply to comment #18)
> I was testing with the new Mesa driver today, and instead of the
> NullPointerException I am now seeing the error below. Is this indeed the
> same root cause - outside of JOGL's codebase - not manifesting itself like
> this? Thanks!
> 
> FATAL ERROR in native method: Nativewindow X11 IOError: Display 0x3dc5ee0
> (:0): Resource temporarily unavailable
> Nativewindow X11 IOError: Display 0x3dc5ee0 (:0): Resource temporarily
> unavailable
> 	at jogamp.newt.driver.x11.WindowDriver.CreateWindow0(Native Method)
...
> 	at jogamp.newt.DefaultEDTUtil$NEDT.run(DefaultEDTUtil.java:372)

This is a different issue
i have filed a new bug 1187
https://jogamp.org/bugzilla/show_bug.cgi?id=1187
to track it.
Comment 20 Xerxes Rånby 2015-08-10 17:11:11 CEST
jogl 217d8b78a3d70d9be59d4537c7565118dfe1e277
    Bug 1183: Cleanup x11/RandR javadoc and native/X11RandR13 signature comments
Comment 21 Xerxes Rånby 2015-08-16 02:31:59 CEST
(In reply to comment #15)
> (In reply to comment #12)
> > jogl 4ff4f735c421ef343a8c447fa699c01444bd5e9b
> >     Bug 1183: Handle NULL return values for native RandR13 calls, 
> >     which suppose to return 'int[]'
> >     
> >     Certain native RandR13 functions shall return 'int[]',
> >     but will bail out early in case of an error or lack of resources.
> >     
> >     The latter is true for getMonitorDeviceIds(..)
> >     where 'XRRScreenResources->ncrtc <= 0', causing return value NULL.
> >     
> >     This patch handles the NULL return values,
> >     however, the root cause of having 'XRRScreenResources->ncrtc <= 0'
> >     _occasionally_ lies within the Raspi X11 driver I am afraid?!
> 
> I found this upstream xorg bug, that includes a possible workaround!
> It is a possibility that this bug only trigger when using RandR in
> combination with an xorg driver that are not yet fully randr1.2 compatible.
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=20270 - 
> XRRGetScreenResourcesCurrent (or XRRGetScreenResources) reports empty data
> until XRRGetScreenSizeRange is called on non randr1.2 drivers

I have seen this bug trigger on a x86_64 system as well using the jogl 2.3.1
an a laptop using a intel i916 + X.Org X Server 1.15.1

thus the root cause of having 'XRRScreenResources->ncrtc <= 0'
_occasionally_ lies generally within all xorg drivers

this bug has made bugreport rounds in all linux distributions yet no-one have fixed xorg instead they added workarounds to all applications using xrandr such as gtk, gdm...

example
Frederic Crozat proposed one workaround fix for this bug to gtk
https://bugzilla.gnome.org/show_bug.cgi?id=572387#c2
https://bugzilla.gnome.org/attachment.cgi?id=129071&action=diff

the reason the xrandr tool always work seems to be that it always uses XRRGetScreenSizeRange before calling XRRGetScreenResourcesCurrent or XRRGetScreenResources

we want to add this workaround.
by having XRRScreenResources->ncrtc allways containing valid data would make jogl better.
Comment 22 Xerxes Rånby 2015-08-16 04:47:04 CEST
(In reply to comment #21)

Branch ready to merge: https://github.com/xranby/jogl/tree/Bug1183
https://github.com/xranby/jogl/commit/a213c39fa9d741d519df56bc4d4abb86113985f4

    Bug 1183
    XRRGetScreenResourcesCurrent (or XRRGetScreenResources)
    _occasionally_ reports empty data
    unless XRRGetScreenSizeRange has been called once.
Comment 23 Sven Gothel 2015-08-18 03:44:57 CEST
commit a213c39fa9d741d519df56bc4d4abb86113985f4

XRRGetScreenResourcesCurrent (or XRRGetScreenResources)
_occasionally_ reports empty data
unless XRRGetScreenSizeRange has been called once.
Hence call XRRGetScreenSizeRange first!