959 – CLCommandQueueTest.concurrencyTest() doesn't set the local work size correctly for put1DRangeKernel()

Bug 959 - CLCommandQueueTest.concurrencyTest() doesn't set the local work size correctly for put1DRangeKernel()

Summary: CLCommandQueueTest.concurrencyTest() doesn't set the local work size correctl...

Status:	RESOLVED FIXED

Alias:	None

Product:	Jocl
Classification:	JogAmp
Component:	opencl (show other bugs)
Version:	1
Hardware:	All all

Importance:	--- normal
Assignee:	Wade Walker

URL:

Depends on:
Blocks:

Reported:	2014-02-05 20:46 CET by Wade Walker
Modified:	2014-02-12 03:02 CET (History)
CC List:	1 user (show)

See Also:
Type:	---
SCM Refs:	0874fa955c0401dba9f54816a9654bb4380abed8
Workaround:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Wade Walker 2014-02-05 20:46:32 CET

The local work size should be the minimum of that allowed by the hardware in general, and what is possible for the specific kernel being run. Currently it's set to the hardware max, which fails on CPU OpenCL devices on Mac.

To fix, replace this:

int groupSize = queue2.getDevice().getMaxWorkItemSizes()[0];

with this:

int maxWorkItemSize = queue2.getDevice().getMaxWorkItemSizes()[0];
int kernelWorkGroupSize = (int)vectorAddKernel2.getWorkGroupSize( queue2.getDevice() );
int groupSize = Math.min( maxWorkItemSize, kernelWorkGroupSize );

I'll do some more testing and submit a patch for this.

Comment 1 Wade Walker 2014-02-08 21:10:50 CET

Fixes are available now in https://github.com/WadeWalker/jocl/compare/fix_jocl_bugs_959_960_963_964. I still need to test them on Windows and Linux though, so not ready to merge yet.