askill
opencl

openclSafety 95Repository

OpenCL SDK (Khronos Group) for cross-platform GPU/CPU parallel computing in C and C++. Use when writing OpenCL kernels, managing devices/contexts/queues, allocating and transferring buffers or images, building and executing programs, or using the C++ wrapper (opencl.hpp / cl::CommandQueue, cl::Buffer, cl::KernelFunctor). Covers OpenCL C API, C++ bindings, and SDK utility libraries (OpenCLUtils, OpenCLSDK).

6 stars
1.2k downloads
Updated 2/22/2026

Package Files

Loading files...
SKILL.md

OpenCL SDK

Version: v2025.07.23 (Khronos Group OpenCL-SDK) Language: C (OpenCL 1.0–3.0) / C++ (opencl.hpp wrapper) License: Apache-2.0 Repo: https://github.com/KhronosGroup/OpenCL-SDK

Overview

OpenCL (Open Computing Language) is a framework for parallel programming across heterogeneous platforms — GPUs, CPUs, FPGAs, and DSPs — from a single API. The SDK bundles:

  • OpenCL-Headers — C headers (<CL/cl.h>, <CL/cl_ext.h>)
  • OpenCL-CLHPP — C++ wrapper (<CL/opencl.hpp>)
  • OpenCL-ICD-Loader — runtime dispatch to installed platform drivers
  • OpenCLUtils / OpenCLSDK — utility libraries (<CL/Utils/>, <CL/SDK/>)

Quick Start (C)

Kernel file saxpy.cl:

__kernel void saxpy(float a, __global float *x, __global float *y) {
    int i = get_global_id(0);
    y[i] = fma(a, x[i], y[i]);
}

Host:

#include <CL/cl.h>
cl_platform_id plat; cl_device_id dev;
clGetPlatformIDs(1, &plat, NULL);
clGetDeviceIDs(plat, CL_DEVICE_TYPE_DEFAULT, 1, &dev, NULL);
cl_context ctx = clCreateContext(NULL, 1, &dev, NULL, NULL, &err);
cl_command_queue q = clCreateCommandQueueWithProperties(ctx, dev, NULL, &err);
// ... load source, clCreateProgramWithSource, clBuildProgram,
//     clCreateKernel, clSetKernelArg, clEnqueueNDRangeKernel,
//     clEnqueueReadBuffer, clReleaseXxx ...

Quick Start (C++)

#define CL_HPP_ENABLE_EXCEPTIONS
#define CL_HPP_TARGET_OPENCL_VERSION 200
#include <CL/opencl.hpp>

cl::Context ctx{CL_DEVICE_TYPE_DEFAULT};
cl::Device  dev = ctx.getInfo<CL_CONTEXT_DEVICES>()[0];
cl::CommandQueue queue{ctx, dev};
cl::Program prog{ctx, source_string};
prog.build(dev);
auto saxpy = cl::KernelFunctor<cl_float, cl::Buffer, cl::Buffer>(prog, "saxpy");
saxpy(cl::EnqueueArgs{queue, cl::NDRange{N}}, a, buf_x, buf_y);

Core Concepts

  • Work-item — one parallel execution unit; maps to one GPU thread
  • Work-group — block of work-items sharing local memory and barriers
  • NDRange — N-dimensional index space (up to 3D); defines total parallelism
  • Context — owns devices, memory objects, programs, and queues
  • Command Queue — ordered or OOO stream of commands to one device
  • Memory object — buffer (linear) or image (typed, sampled); device-side
  • Kernel — a __kernel function compiled from OpenCL C source or SPIR-V
  • Event — synchronization token returned by every enqueue command
  • Address spaces__global (buffers), __local (shared), __constant (read-only), __private (per-item)

API Reference

DomainReference FileKey Functions / Types
Platform & Devicereferences/api-platform-device.mdclGetPlatformIDs, clGetDeviceIDs, clGetDeviceInfo, cl_util_get_device
Context & Queuereferences/api-context-queue.mdclCreateContext, clCreateCommandQueueWithProperties, clFlush, clFinish
Memory Objectsreferences/api-memory.mdclCreateBuffer, clCreateImage, clEnqueueRead/WriteBuffer, clEnqueueMapBuffer, SVM
Programs & Kernelsreferences/api-program-kernel.mdclCreateProgramWithSource, clBuildProgram, clCreateKernel, clSetKernelArg
Execution & Eventsreferences/api-execution.mdclEnqueueNDRangeKernel, clWaitForEvents, clSetEventCallback, profiling
C++ Wrapperreferences/api-cpp-wrapper.mdcl::Context, cl::Buffer, cl::KernelFunctor, cl::EnqueueArgs, exceptions
Workflowsreferences/workflows.mdQuick-start, vector add, image blur, async events, binary caching, error handling

Common Workflows

See references/workflows.md for complete, runnable examples:

  • Vector add (C) — minimal host+kernel from scratch
  • SAXPY (C++)KernelFunctor pattern with RAII
  • Device enumeration — iterate all platforms and devices
  • Image blur — 2D image creation, read_imageui / write_imageui
  • Async events — non-blocking enqueue chains
  • Binary caching — save/restore compiled programs
  • Error handling — C goto pattern vs. C++ exceptions

SDK Utility Libraries

Include <CL/Utils/Utils.h> (C) or <CL/Utils/Utils.hpp> (C++) and link OpenCLUtils / OpenCLUtilsCpp.

HeaderAPI
<CL/Utils/Context.h>cl_util_get_device, cl_util_get_context, cl_util_print_device_info
<CL/Utils/File.h>cl_util_read_text_file, cl_util_read_exe_relative_text_file, cl_util_write_binaries
<CL/Utils/Error.h>OCLERROR_RET, OCLERROR_PAR, MEM_CHECK macros, cl_util_print_error
<CL/Utils/Event.h>cl_util_get_event_duration
<CL/Utils/Device.hpp>cl::util::supports_extension, cl::util::supports_feature

SDK Library (samples only, not installed): <CL/SDK/CLI.h>, <CL/SDK/Random.h>, <CL/SDK/Image.h>.

Key Considerations

Release everything: Every clCreate* call must be paired with the corresponding clRelease*. Leak buffers or kernels and you exhaust device memory silently.

Blocking vs. non-blocking transfers: clEnqueueReadBuffer(..., CL_TRUE, ...) blocks the CPU. Use CL_FALSE + events for overlap. Always clFlush before blocking on an event from another thread.

Local work-group size: Must evenly divide global work size in each dimension. Query CL_KERNEL_WORK_GROUP_SIZE for the max; CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE for optimal alignment. Passing NULL lets the runtime choose (portable, not always optimal).

Build log on failure: clBuildProgram returns CL_BUILD_PROGRAM_FAILURE — always query CL_PROGRAM_BUILD_LOG to get the compiler error message. The SDK's cl_util_build_program does this automatically.

Image format validation: Not all cl_image_format combinations are supported on every device. Call clGetSupportedImageFormats before creating images.

Event callbacks must not block: Callbacks registered via clSetEventCallback are invoked from a runtime thread. Never call clFinish or clWaitForEvents inside a callback.

C++ exceptions: Enable with #define CL_HPP_ENABLE_EXCEPTIONS before including <CL/opencl.hpp>. Without it, check cl_int error parameters manually.

OpenCL version targeting: Set CL_HPP_TARGET_OPENCL_VERSION (e.g., 300, 200, 120) to control which API surface is available in the C++ wrapper. OpenCL 1.x deprecated clCreateCommandQueue; use clCreateCommandQueueWithProperties for 2.0+.

SVM requires OpenCL 2.0+: Shared Virtual Memory (clSVMAlloc) requires device support for CL_DEVICE_SVM_CAPABILITIES. Check before use.

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

88/100Analyzed 2/24/2026

High-quality technical reference for OpenCL SDK covering C and C++ APIs. Well-structured with quick starts, core concepts, API tables, workflows, and practical considerations. Score reflects comprehensive reference content but moderate actionability for complete beginners. Tags slightly misaligned with content (ci-cd/github for OpenCL reference).

95
92
90
85
75

Metadata

Licenseunknown
Version-
Updated2/22/2026
Publisherdatathings

Tags

apici-cdgithub