<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://jogamp.org/bugzilla/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.2"
          urlbase="https://jogamp.org/bugzilla/"
          
          maintainer="sgothel@jausoft.com"
>

    <bug>
          <bug_id>1167</bug_id>
          
          <creation_ts>2015-06-29 16:54:36 +0200</creation_ts>
          <short_desc>Heavy performance issue</short_desc>
          <delta_ts>2019-03-27 04:16:23 +0100</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>3</classification_id>
          <classification>JogAmp</classification>
          <product>Jogl</product>
          <component>core</component>
          <version>2.4.0</version>
          <rep_platform>pc_x86_64</rep_platform>
          <op_sys>all</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WORKSFORME</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P5</priority>
          <bug_severity>minor</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Giuseppe Barbieri">elect86</reporter>
          <assigned_to name="Sven Gothel">sgothel</assigned_to>
          <cc>gouessej</cc>
    
    <cc>sgothel</cc>
          
          <cf_type>DEFECT</cf_type>
          <cf_scm_refs></cf_scm_refs>
          <cf_workaround>---</cf_workaround>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>4740</commentid>
    <comment_count>0</comment_count>
    <who name="Giuseppe Barbieri">elect86</who>
    <bug_when>2015-06-29 16:54:36 +0200</bug_when>
    <thetext>I ported a simple example from the nvidia gl samples, where they show how to use bindless vbo/uniform/textures.

I am experiencing some heavy performance problems, my code runs @175fps vs 450fps of C code.

I put some timers to properly detect how much time my display() takes and it looks fine, problem is somewhere else..

You can clone from here

https://github.com/elect86/NvGlSamples</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4741</commentid>
    <comment_count>1</comment_count>
    <who name="Giuseppe Barbieri">elect86</who>
    <bug_when>2015-06-29 16:55:49 +0200</bug_when>
    <thetext>http://forum.jogamp.org/Bindless-vertex-array-tp4034343p4034456.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4770</commentid>
    <comment_count>2</comment_count>
    <who name="Julien Gouesse">gouessej</who>
    <bug_when>2015-07-09 11:51:03 +0200</bug_when>
    <thetext>It&apos;s not JOGL related, there is nothing showing that this performance problem comes from JOGL itself and you confirmed it:
http://jogamp.org/log/irc/jogamp_20150709050624.html#l70

The problem might come from NvInputTransformer or another class in your own code. Rather use a profiler. I&apos;m going to close this bug report.

Reopen it if and only if a profiler shows an excessive GPU or CPU time consumption in JOGL.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4777</commentid>
    <comment_count>3</comment_count>
    <who name="Julien Gouesse">gouessej</who>
    <bug_when>2015-07-13 16:48:27 +0200</bug_when>
    <thetext>Giuseppe, where can we find these Nvidia GL samples? The problem is that your code involves lots of classes, it&apos;s difficult to know what it implies.

It seems to be a problem in JOGL when the buffers are resident on the GPU:
http://forum.jogamp.org/Bindless-vertex-array-tp4034343p4034862.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4781</commentid>
    <comment_count>4</comment_count>
      <attachid>706</attachid>
    <who name="Giuseppe Barbieri">elect86</who>
    <bug_when>2015-07-16 21:24:53 +0200</bug_when>
    <thetext>Created attachment 706
test.log</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4782</commentid>
    <comment_count>5</comment_count>
    <who name="Giuseppe Barbieri">elect86</who>
    <bug_when>2015-07-16 21:32:26 +0200</bug_when>
    <thetext>Commenting everything in the display. Comment also the part where you create the resident vbo and ibo buffers in the Mesh.update.

Run it, performances looks fine, I hits 5k fps 

Now comment out the resident vbo/ibo, you will see how performances will suffer from..</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4837</commentid>
    <comment_count>6</comment_count>
    <who name="Sven Gothel">sgothel</who>
    <bug_when>2015-07-28 14:20:37 +0200</bug_when>
    <thetext>Please provide a unit test for JOGL for this case,
so we can validate this.
Best case: One unit test run w/o the performance loss,
and one proving the performance loss due to some GL functionality
you mentioned.
Thank you.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4839</commentid>
    <comment_count>7</comment_count>
      <attachid>712</attachid>
    <who name="Giuseppe Barbieri">elect86</who>
    <bug_when>2015-07-28 14:37:47 +0200</bug_when>
    <thetext>Created attachment 712
Program without NV resident buffers</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4840</commentid>
    <comment_count>8</comment_count>
      <attachid>713</attachid>
    <who name="Giuseppe Barbieri">elect86</who>
    <bug_when>2015-07-28 14:38:51 +0200</bug_when>
    <thetext>Created attachment 713
Program with NV resident buffers</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4842</commentid>
    <comment_count>9</comment_count>
    <who name="Giuseppe Barbieri">elect86</who>
    <bug_when>2015-07-28 14:43:11 +0200</bug_when>
    <thetext>I commented most of the stuff, right now there is almost nothing in the display() except color/depth clearing, transformations and shader binding/unbinding, no draw calls at all.

What it matters now is only the declaration of resident NV buffers in the Mesh.update() method.

Program &quot;without&quot; has them commented, so no resident NV buffer will be created. In this way I hit more than 5k fps.

If I comment them out, program &quot;with&quot;, I hit 600 fps, about 10 times less.. just having them declared, nothing else.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4845</commentid>
    <comment_count>10</comment_count>
    <who name="Sven Gothel">sgothel</who>
    <bug_when>2015-07-28 17:16:29 +0200</bug_when>
    <thetext>Please provide &apos;a unit test for JOGL&apos; 
as requested and described in comment 6!

This includes either a git email patch or git pull request,
so I can nicely pull and merge it.

Thank you!</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4846</commentid>
    <comment_count>11</comment_count>
    <who name="Sven Gothel">sgothel</who>
    <bug_when>2015-07-28 17:18:29 +0200</bug_when>
    <thetext>(In reply to comment #10)
&gt; Please provide &apos;a unit test for JOGL&apos; 
&gt; as requested and described in comment 6!
&gt; 
&gt; This includes either a git email patch or git pull request,
&gt; so I can nicely pull and merge it.
&gt; 
&gt; Thank you!

ZIP files w/ effort for me to reshape them manually
to suit our JOGL unit tests is _not_ acceptable here,
especially since you start to become a JogAmp developer 
who contributes w/ editing wiki, fixing bugs .. etc.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4847</commentid>
    <comment_count>12</comment_count>
    <who name="Sven Gothel">sgothel</who>
    <bug_when>2015-07-28 17:19:28 +0200</bug_when>
    <thetext>The content of attachment 712 has been deleted by
    Sven Gothel &lt;sgothel@jausoft.com&gt;
who provided the following reason:

Provide unit test via git email-patch or git pull request.

The token used to delete this attachment was generated at 2015-07-28 17:19:02 CEST.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4848</commentid>
    <comment_count>13</comment_count>
    <who name="Sven Gothel">sgothel</who>
    <bug_when>2015-07-28 17:19:39 +0200</bug_when>
    <thetext>The content of attachment 713 has been deleted by
    Sven Gothel &lt;sgothel@jausoft.com&gt;
who provided the following reason:

Provide unit test via git email-patch or git pull request.

The token used to delete this attachment was generated at 2015-07-28 17:19:36 CEST.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>4971</commentid>
    <comment_count>14</comment_count>
    <who name="Sven Gothel">sgothel</who>
    <bug_when>2015-08-19 13:13:44 +0200</bug_when>
    <thetext>Please reopen this bug when a unit test as requested is provided
and the issue is valid and exists.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5515</commentid>
    <comment_count>15</comment_count>
    <who name="Giuseppe Barbieri">elect86</who>
    <bug_when>2016-01-19 14:13:14 +0100</bug_when>
    <thetext>Hi Sven,

I created a minimal test case scenario to reproduce the bug

You can find it here, https://github.com/elect86/joglBug/blob/master/src/bug1167/Bug1167.java

If you comment out the two **INTERESTING** section, fps will drop from 9k to 400 fps (gtx 770).


Anyway I am afraid this test case won&apos;t be accepted, will it?

You said to provide it via git email-patch or git pull request. I don&apos;t know what you mean by &quot;git email-patch&quot;. Regarding the latter instead, you mean directly on jogl github?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5517</commentid>
    <comment_count>16</comment_count>
    <who name="Giuseppe Barbieri">elect86</who>
    <bug_when>2016-01-21 09:42:06 +0100</bug_when>
    <thetext>Probabily I found the issue, I&apos;ll be right back!

SO EXCITED!!!</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5518</commentid>
    <comment_count>17</comment_count>
    <who name="Julien Gouesse">gouessej</who>
    <bug_when>2016-01-21 09:53:07 +0100</bug_when>
    <thetext>(In reply to Giuseppe Barbieri from comment #16)
&gt; Probabily I found the issue, I&apos;ll be right back!
&gt; 
&gt; SO EXCITED!!!

Please can you be more accurate about your findings? What is the culprit?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5519</commentid>
    <comment_count>18</comment_count>
    <who name="Giuseppe Barbieri">elect86</who>
    <bug_when>2016-01-21 10:01:52 +0100</bug_when>
    <thetext>Briefly, if you iterate thousand of times this code

                gl4.glCreateBuffers(1, vertexBuffer, 0);

                // Stick the data for the vertices and indices in their respective buffers
                ByteBuffer verticesBuffer = GLBuffers.newDirectByteBuffer(512);
                gl4.glNamedBufferData(vertexBuffer[0],
                        verticesBuffer.capacity(), verticesBuffer.rewind(), GL_STATIC_DRAW);

        gl4.glBindBuffer(GL_ARRAY_BUFFER, vertexBuffer[0]);
        gl4.glGetBufferParameterui64vNV(GL_ARRAY_BUFFER, GL_BUFFER_GPU_ADDRESS_NV, vertexBufferGPUPtr, 0);
        gl4.glGetBufferParameteriv(GL_ARRAY_BUFFER, GL_BUFFER_SIZE, vertexBufferSize, 0);
        gl4.glMakeBufferResidentNV(GL_ARRAY_BUFFER, GL_READ_ONLY);
        gl4.glBindBuffer(GL_ARRAY_BUFFER, 0);

using always the same single array buffer vertexBuffer[0], then you get the slow down. Otherwise if you expand it and properly index it, it is gone.

Anyway, this does *not* reproduce also on C and Lwjgl, but this is something I&apos;ll investigate later.

Now I need to re-enable and re-validate all the other parts of the sample and be sure everything works</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5520</commentid>
    <comment_count>19</comment_count>
    <who name="Giuseppe Barbieri">elect86</who>
    <bug_when>2016-01-22 17:16:51 +0100</bug_when>
    <thetext>Relative good news and bad news here..


If I generate vertex and index buffers one by one (inside the two for loops), I get max ~ 600 fps. 
I can slightly improve it, if I generate them outside, I get max ~ 770 fps.

Code:
https://github.com/elect86/joglBug/blob/master/src/bug1167/jogl.java


But we are still faaar away from other platforms.

I built exactly the same program, same init(), same calls and same render() with just a clearBufferiv also on lwjgl and C. The former runs at 14k+ fps, the latter insanely crazy at 20k+.

There must be something wrong. I struggled on this bug for more than half an year thinking it was some bug inside the new extenstions.. but it&apos;s not.

What do you tell me, guys? :(</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>6065</commentid>
    <comment_count>20</comment_count>
    <who name="Sven Gothel">sgothel</who>
    <bug_when>2018-01-15 07:34:45 +0100</bug_when>
    <thetext>Glanced over the code, native-c code n/a though,
the only semantic difference seems to be that your JOGL code
uses GL context switches.
Therefor the NV GL driver penalties the switch, known by this vendor.

Our &apos;setExclusiveContextThread(..)&apos; could remedy the situation.

If there is anybody willing to test this solution,
pls report - and if not working re-open this issue.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>706</attachid>
            <date>2015-07-16 21:24:53 +0200</date>
            <delta_ts>2015-07-16 21:24:53 +0200</delta_ts>
            <desc>test.log</desc>
            <filename>test.log</filename>
            <type>text/x-log</type>
            <size>292</size>
            <attacher name="Giuseppe Barbieri">elect86</attacher>
            
              <data encoding="base64">L3Vzci9iaW4vamF2YQpqYXZhIHZlcnNpb24gIjEuOC4wXzQ1IgpKYXZhKFRNKSBTRSBSdW50aW1l
IEVudmlyb25tZW50IChidWlsZCAxLjguMF80NS1iMTQpCkphdmEgSG90U3BvdChUTSkgNjQtQml0
IFNlcnZlciBWTSAoYnVpbGQgMjUuNDUtYjAyLCBtaXhlZCBtb2RlKQpMSUJYQ0JfQUxMT1dfU0xP
UFBZX0xPQ0s6CkxJQkdMX0RSSVZFUlNfUEFUSDoKTElCR0xfREVCVUc6CmphdmEKRXJyb3I6IENv
dWxkIG5vdCBmaW5kIG9yIGxvYWQgbWFpbiBjbGFzcyBjb20uam9nYW1wLm5ld3Qub3BlbmdsLkdM
V2luZG93Cg==
</data>

          </attachment>
          <attachment
              isobsolete="1"
              ispatch="0"
              isprivate="0"
          >
            <attachid>712</attachid>
            <date>2015-07-28 14:37:47 +0200</date>
            <delta_ts>2015-07-28 14:37:47 +0200</delta_ts>
            <desc>Program without NV resident buffers</desc>
            <filename>without.zip</filename>
            <type>text/plain</type>
            <size>0</size>
            <attacher name="Giuseppe Barbieri">elect86</attacher>
            
              <data encoding="base64"></data>

          </attachment>
          <attachment
              isobsolete="1"
              ispatch="0"
              isprivate="0"
          >
            <attachid>713</attachid>
            <date>2015-07-28 14:38:51 +0200</date>
            <delta_ts>2015-07-28 14:38:51 +0200</delta_ts>
            <desc>Program with NV resident buffers</desc>
            <filename>with.zip</filename>
            <type>text/plain</type>
            <size>0</size>
            <attacher name="Giuseppe Barbieri">elect86</attacher>
            
              <data encoding="base64"></data>

          </attachment>
      

    </bug>

</bugzilla>