#jogamp @ irc.freenode.net - 20150217 05:05:32 (UTC)


20150217 05:05:32 -jogamp- Previous @ http://jogamp.org/log/irc/jogamp_20150216050532.html
20150217 05:05:32 -jogamp- This channel is logged @ http://jogamp.org/log/irc/jogamp_20150217050532.html
20150217 07:23:53 * monsieur_max (~maxime@anon) has joined #jogamp
20150217 07:36:45 * monsieur_max (~maxime@anon) Quit (Read error: Connection reset by peer)
20150217 08:19:09 * monsieur_max (~maxime@anon) has joined #jogamp
20150217 08:34:07 * eclesia (~husky@anon) has joined #jogamp
20150217 08:34:12 <eclesia> good morning
20150217 09:49:04 * doev (~doev@anon) has joined #jogamp
20150217 16:35:37 * doev (~doev@anon) Quit (Ping timeout: 256 seconds)
20150217 17:04:07 * eclesia (~husky@anon) has left #jogamp
20150217 18:05:36 * monsieur_max (~maxime@anon) Quit (Quit: Leaving.)
20150217 18:39:48 * monsieur_max (~maxime@anon) has joined #jogamp
20150217 19:02:41 * zzuegg (~zzuegg___@anon) Quit (Ping timeout: 244 seconds)
20150217 19:03:21 * zzuegg (~zzuegg___@anon) has joined #jogamp
20150217 21:28:52 * phao (~phao@anon) has joined #jogamp
20150217 21:28:54 <phao> Hi...
20150217 21:29:14 <phao> When writing java... what would someone do to write code that takes advantage of CPU caches?
20150217 21:30:25 <sgothel> quite the same as w/ c++, i.e. small'ish, inline ([static] final), local final references for often used vars ..
20150217 21:37:54 <phao> I just got some funny numbers...
20150217 21:37:57 <phao> check this out...
20150217 21:38:17 <phao> http://pastie.org/private/c7kiuwuafhchn29mpoew
20150217 21:38:24 <phao> Idk how meaningful that is...
20150217 21:38:30 <phao> (but wait a moment, there is more to paste)
20150217 21:38:45 <phao> that was the java code
20150217 21:38:48 <phao> this is the c++ code => http://pastie.org/private/uejm8wkvkwcwbtkhkia3iw
20150217 21:39:15 <phao> This is the output of the java program => http://pastie.org/private/b6evoa6ioj7nztzh3vsw
20150217 21:39:46 <phao> For the C++ program http://pastie.org/private/5uc8axnpvkpvg3j9nianxa
20150217 21:40:16 <phao> Which is kind of funny, right? Or isn't it?
20150217 21:40:41 <phao> There are more runs of the java program to let the jit-ish stuff kick in... for C++ I don't think I need that.
20150217 21:41:05 <phao> But anyway... the java program is about 3-4 times faster by the last runs.
20150217 21:41:48 <phao> Idk how to make an array-of-structs'ish version in java... my AoS in C++ has about the same performance characteristics
20150217 21:43:02 <phao> Any clues on why the java version is faster?
20150217 21:43:55 <phao> Idk if it should be or not... I can only guess why (my guess would be on the grounds of *maybe* some conservative aliasing considerations by GCC/g++)?
20150217 21:45:12 <sgothel> lets say a hotspot [J]VM w/ runtime optimization does exactly that .. optimizes code at runtime (the hotspots)
20150217 21:45:22 <phao> =)
20150217 21:45:25 <phao> Ok...
20150217 21:45:32 <sgothel> hence virtual function pointer are directly replaced .. etc
20150217 21:45:39 <sgothel> inlines .. etc
20150217 21:46:08 <sgothel> be aware of Object overhead .. hence I use simple plain float[] within our math utils
20150217 21:46:40 <sgothel> lots of 'flying objects', i.e. temp. create/destroy, could congest the GC
20150217 21:47:56 <sgothel> on the other hand, all 'local vars' w/o allocation (read: new Object(..)) .. shall be within the most local inner block allowing for better optimization .. and final, if possible
20150217 21:48:23 <sgothel> local vars == automatic vars -> stack
20150217 21:49:41 <phao> Right...
20150217 21:49:56 <phao> But the answer turned out to be that g++ wasn't generating simd instructions
20150217 21:50:15 <phao> clang does, and the C++ code with clang runs about 2x faster than the java code
20150217 21:50:26 <phao> secs => 0.0147673
20150217 21:51:21 <sgothel> hehe .. sure
20150217 21:51:36 <phao> hehehe
20150217 21:51:42 <sgothel> i.e. llvm offers more machine magic
20150217 21:51:47 <sgothel> so use OpenCL :)
20150217 21:51:51 <phao> maybe GCC was being conservative about aliasing stuff
20150217 21:52:20 <phao> afaik aliasing issues can prevent optimizations related to using simd instructions... idk though =(
20150217 21:52:41 <phao> sgothel, Btw... in your computer graphics programming
20150217 21:52:46 <sgothel> gcc has 'runtime optimization' sort of .. statically .. AFAIK
20150217 21:52:51 <phao> do you do that struct-of-arrays like that?
20150217 21:53:07 <sgothel> some options .. w/ profiling .. hmm
20150217 21:53:17 <sgothel> struct-of-arrays like what ? sorry
20150217 21:53:41 <phao> Yeah... like that Vector3fArray class.
20150217 21:53:49 <phao> http://pastie.org/private/c7kiuwuafhchn29mpoew
20150217 21:54:29 <phao> instead of having an array of vector3f'ish types (e.g. class Vector3f { float x, y, z; }, ... Vector3f[] vecs = new Vector3f[size];)
20150217 21:54:32 <sgothel> didn't look sorry, well - just simple float[] for now, and semantic de- encoding in a 'manager class'
20150217 21:55:01 <phao> you'd have 3 array of floats, one for the xs, another for the ys, another for the zs.
20150217 21:55:02 <sgothel> we have a Vec class for graph .. but only for the math heavy calculations ..
20150217 21:55:21 <sgothel> interleave 'em .. etc
20150217 21:56:39 <sgothel> problem w/ high-level CPU Object -> GPU float stream sync .. if it can be avoided .. avoid it, sure.
20150217 21:56:57 <phao> hmm...
20150217 21:57:03 <phao> I wish I understood more about performance.
20150217 21:57:05 <sgothel> always specific to problem .. ofc .. no generic answer is possible
20150217 21:57:12 <phao> Right...
20150217 21:57:22 <phao> performance solutions are very context/problem specific.
20150217 21:57:52 <sgothel> i.e. you have a 10k float VBO .. compiled from 10k/3 Vec3 objects .. that would be heavy duty on the CPU :) throw-away objects .. copy copy copy
20150217 22:02:29 <phao> sgothel, the whole array-of-structs and structs-of-array is impressive, actually
20150217 22:02:33 <phao> for example...
20150217 22:02:48 <phao> http://pastie.org/private/pkzvvihgbpt7dc0xtbxlzw
20150217 22:02:55 <phao> for example, here is a version of the c++ code
20150217 22:03:03 <phao> but using arrays of structures.
20150217 22:03:19 <phao> the previous one, http://pastie.org/private/uejm8wkvkwcwbtkhkia3iw , uses a struct with arrays
20150217 22:03:28 <phao> clang can make the struct-of-arrays 5x faster
20150217 22:03:33 <phao> than the array-of-structs one.
20150217 22:03:41 <phao> Afaik it's because of the use of simd instructions.
20150217 22:03:53 <phao> secs => 0.0505954
20150217 22:03:53 <phao> (for array of structs)
20150217 22:04:01 <phao> secs => 0.0167742 (for structs of arrays)
20150217 22:04:17 <phao> not really 5x faster, however hehehe
20150217 22:04:42 <phao> about 3x
20150217 22:05:19 <phao> it varies though... it got to 4.4x faster
20150217 22:05:35 <phao> there are some articles published by intel people recommending using structs of arrays in performance intensive code.
20150217 22:05:46 <phao> I mean, the struct of arrays java version beats the array of structs C++ version
20150217 22:05:55 <phao> both compiled/optimized by clang/llvm and gcc/g++
20150217 22:06:30 <sgothel> nice nice, make a blog post w/ your findings maybe ?
20150217 22:06:51 <phao> hehehehe, I'm just mumbling in here. I don't understand much about this
20150217 22:07:10 <phao> I guess I'm taking lots of your time. Sorry.
20150217 22:25:25 * monsieur_max (~maxime@anon) Quit (Quit: Leaving.)
20150217 22:27:57 <phao> btw sgothel
20150217 22:28:19 <phao> do you know of some open source java code that you know was written to take advantages of cpu caches?
20150217 22:29:21 <sgothel> nope, sorry
20150217 22:29:49 <sgothel> just memory utils, i.e. cache things in RAM .. etc
20150217 22:30:26 <phao> Ok...
20150217 22:30:53 <sgothel> you could use GlueGen for struct mapping .. streaming
20150217 22:31:05 <phao> Ok.
20150217 22:31:10 <sgothel> data .. java<->c .. and do your simd processing there .. or use opencl .. etc etc :)
20150217 22:31:22 <phao> Ok.
20150217 22:31:31 <phao> I'll check it out.
20150217 22:32:27 <phao> Doesn't it bother you that java doesn't really let you control memory "kind of well"? Even being GC'ed, the Go language is also GC'ed, but it lets you have arrays of contiguous record types in memory...
20150217 22:32:51 <sgothel> oh .. you can do it w/ JVM as well
20150217 22:32:56 <phao> Can you?
20150217 22:33:07 <phao> Any documents I can read on that?
20150217 22:33:08 <sgothel> use NIO .. be wise and conservative w/ allocation etc
20150217 22:33:31 <phao> Ahhhh Like using the ByteBuffer class?
20150217 22:33:41 <phao> And laying out floats, doubles, longs, ints, ... on top of a byte array?
20150217 22:33:44 <sgothel> or use simple arrays .. w/o a guarantee ofc, however JVM should do it 'wise'
20150217 22:33:49 <sgothel> yup
20150217 22:34:00 <phao> Ok... I'll check that out. Thanks.
20150217 22:34:06 <sgothel> i.e. GlueGen has a file mmap for it .. etc
20150217 22:34:14 <phao> I see.
20150217 22:34:33 <phao> Is that the 'technique' behind that struct mapping you told me GlueGen does?
20150217 22:35:02 <sgothel> generating java accessors for native structs .. yes
20150217 23:35:00 * phao (~phao@anon) has left #jogamp
20150218 01:16:18 <sgothel> if anybody gets the 'glNext' spec .. please post or PM .. thx :)
20150218 05:05:32 -jogamp- Continue @ http://jogamp.org/log/irc/jogamp_20150218050532.html