Bug 1167

Summary: Heavy performance issue
Product: [JogAmp] Jogl Reporter: Giuseppe Barbieri <elect86>
Component: coreAssignee: Sven Gothel <sgothel>
Status: RESOLVED WORKSFORME    
Severity: minor CC: gouessej, sgothel
Priority: P5    
Version: 2.4.0   
Hardware: pc_x86_64   
OS: all   
Type: DEFECT SCM Refs:
Workaround: ---
Attachments: test.log
Program without NV resident buffers
Program with NV resident buffers

Description Giuseppe Barbieri 2015-06-29 16:54:36 CEST
I ported a simple example from the nvidia gl samples, where they show how to use bindless vbo/uniform/textures.

I am experiencing some heavy performance problems, my code runs @175fps vs 450fps of C code.

I put some timers to properly detect how much time my display() takes and it looks fine, problem is somewhere else..

You can clone from here

https://github.com/elect86/NvGlSamples
Comment 2 Julien Gouesse 2015-07-09 11:51:03 CEST
It's not JOGL related, there is nothing showing that this performance problem comes from JOGL itself and you confirmed it:
http://jogamp.org/log/irc/jogamp_20150709050624.html#l70

The problem might come from NvInputTransformer or another class in your own code. Rather use a profiler. I'm going to close this bug report.

Reopen it if and only if a profiler shows an excessive GPU or CPU time consumption in JOGL.
Comment 3 Julien Gouesse 2015-07-13 16:48:27 CEST
Giuseppe, where can we find these Nvidia GL samples? The problem is that your code involves lots of classes, it's difficult to know what it implies.

It seems to be a problem in JOGL when the buffers are resident on the GPU:
http://forum.jogamp.org/Bindless-vertex-array-tp4034343p4034862.html
Comment 4 Giuseppe Barbieri 2015-07-16 21:24:53 CEST
Created attachment 706 [details]
test.log
Comment 5 Giuseppe Barbieri 2015-07-16 21:32:26 CEST
Commenting everything in the display. Comment also the part where you create the resident vbo and ibo buffers in the Mesh.update.

Run it, performances looks fine, I hits 5k fps 

Now comment out the resident vbo/ibo, you will see how performances will suffer from..
Comment 6 Sven Gothel 2015-07-28 14:20:37 CEST
Please provide a unit test for JOGL for this case,
so we can validate this.
Best case: One unit test run w/o the performance loss,
and one proving the performance loss due to some GL functionality
you mentioned.
Thank you.
Comment 7 Giuseppe Barbieri 2015-07-28 14:37:47 CEST
Created attachment 712 [details]
Program without NV resident buffers
Comment 8 Giuseppe Barbieri 2015-07-28 14:38:51 CEST
Created attachment 713 [details]
Program with NV resident buffers
Comment 9 Giuseppe Barbieri 2015-07-28 14:43:11 CEST
I commented most of the stuff, right now there is almost nothing in the display() except color/depth clearing, transformations and shader binding/unbinding, no draw calls at all.

What it matters now is only the declaration of resident NV buffers in the Mesh.update() method.

Program "without" has them commented, so no resident NV buffer will be created. In this way I hit more than 5k fps.

If I comment them out, program "with", I hit 600 fps, about 10 times less.. just having them declared, nothing else.
Comment 10 Sven Gothel 2015-07-28 17:16:29 CEST
Please provide 'a unit test for JOGL' 
as requested and described in comment 6!

This includes either a git email patch or git pull request,
so I can nicely pull and merge it.

Thank you!
Comment 11 Sven Gothel 2015-07-28 17:18:29 CEST
(In reply to comment #10)
> Please provide 'a unit test for JOGL' 
> as requested and described in comment 6!
> 
> This includes either a git email patch or git pull request,
> so I can nicely pull and merge it.
> 
> Thank you!

ZIP files w/ effort for me to reshape them manually
to suit our JOGL unit tests is _not_ acceptable here,
especially since you start to become a JogAmp developer 
who contributes w/ editing wiki, fixing bugs .. etc.
Comment 12 Sven Gothel 2015-07-28 17:19:28 CEST
The content of attachment 712 [details] has been deleted by
    Sven Gothel <sgothel@jausoft.com>
who provided the following reason:

Provide unit test via git email-patch or git pull request.

The token used to delete this attachment was generated at 2015-07-28 17:19:02 CEST.
Comment 13 Sven Gothel 2015-07-28 17:19:39 CEST
The content of attachment 713 [details] has been deleted by
    Sven Gothel <sgothel@jausoft.com>
who provided the following reason:

Provide unit test via git email-patch or git pull request.

The token used to delete this attachment was generated at 2015-07-28 17:19:36 CEST.
Comment 14 Sven Gothel 2015-08-19 13:13:44 CEST
Please reopen this bug when a unit test as requested is provided
and the issue is valid and exists.
Comment 15 Giuseppe Barbieri 2016-01-19 14:13:14 CET
Hi Sven,

I created a minimal test case scenario to reproduce the bug

You can find it here, https://github.com/elect86/joglBug/blob/master/src/bug1167/Bug1167.java

If you comment out the two **INTERESTING** section, fps will drop from 9k to 400 fps (gtx 770).


Anyway I am afraid this test case won't be accepted, will it?

You said to provide it via git email-patch or git pull request. I don't know what you mean by "git email-patch". Regarding the latter instead, you mean directly on jogl github?
Comment 16 Giuseppe Barbieri 2016-01-21 09:42:06 CET
Probabily I found the issue, I'll be right back!

SO EXCITED!!!
Comment 17 Julien Gouesse 2016-01-21 09:53:07 CET
(In reply to Giuseppe Barbieri from comment #16)
> Probabily I found the issue, I'll be right back!
> 
> SO EXCITED!!!

Please can you be more accurate about your findings? What is the culprit?
Comment 18 Giuseppe Barbieri 2016-01-21 10:01:52 CET
Briefly, if you iterate thousand of times this code

                gl4.glCreateBuffers(1, vertexBuffer, 0);

                // Stick the data for the vertices and indices in their respective buffers
                ByteBuffer verticesBuffer = GLBuffers.newDirectByteBuffer(512);
                gl4.glNamedBufferData(vertexBuffer[0],
                        verticesBuffer.capacity(), verticesBuffer.rewind(), GL_STATIC_DRAW);

        gl4.glBindBuffer(GL_ARRAY_BUFFER, vertexBuffer[0]);
        gl4.glGetBufferParameterui64vNV(GL_ARRAY_BUFFER, GL_BUFFER_GPU_ADDRESS_NV, vertexBufferGPUPtr, 0);
        gl4.glGetBufferParameteriv(GL_ARRAY_BUFFER, GL_BUFFER_SIZE, vertexBufferSize, 0);
        gl4.glMakeBufferResidentNV(GL_ARRAY_BUFFER, GL_READ_ONLY);
        gl4.glBindBuffer(GL_ARRAY_BUFFER, 0);

using always the same single array buffer vertexBuffer[0], then you get the slow down. Otherwise if you expand it and properly index it, it is gone.

Anyway, this does *not* reproduce also on C and Lwjgl, but this is something I'll investigate later.

Now I need to re-enable and re-validate all the other parts of the sample and be sure everything works
Comment 19 Giuseppe Barbieri 2016-01-22 17:16:51 CET
Relative good news and bad news here..


If I generate vertex and index buffers one by one (inside the two for loops), I get max ~ 600 fps. 
I can slightly improve it, if I generate them outside, I get max ~ 770 fps.

Code:
https://github.com/elect86/joglBug/blob/master/src/bug1167/jogl.java


But we are still faaar away from other platforms.

I built exactly the same program, same init(), same calls and same render() with just a clearBufferiv also on lwjgl and C. The former runs at 14k+ fps, the latter insanely crazy at 20k+.

There must be something wrong. I struggled on this bug for more than half an year thinking it was some bug inside the new extenstions.. but it's not.

What do you tell me, guys? :(
Comment 20 Sven Gothel 2018-01-15 07:34:45 CET
Glanced over the code, native-c code n/a though,
the only semantic difference seems to be that your JOGL code
uses GL context switches.
Therefor the NV GL driver penalties the switch, known by this vendor.

Our 'setExclusiveContextThread(..)' could remedy the situation.

If there is anybody willing to test this solution,
pls report - and if not working re-open this issue.