The local work size should be the minimum of that allowed by the hardware in general, and what is possible for the specific kernel being run. Currently it's set to the hardware max, which fails on CPU OpenCL devices on Mac. To fix, replace this: int groupSize = queue2.getDevice().getMaxWorkItemSizes()[0]; with this: int maxWorkItemSize = queue2.getDevice().getMaxWorkItemSizes()[0]; int kernelWorkGroupSize = (int)vectorAddKernel2.getWorkGroupSize( queue2.getDevice() ); int groupSize = Math.min( maxWorkItemSize, kernelWorkGroupSize ); I'll do some more testing and submit a patch for this.
Fixes are available now in https://github.com/WadeWalker/jocl/compare/fix_jocl_bugs_959_960_963_964. I still need to test them on Windows and Linux though, so not ready to merge yet.