Skip to content

[DO NOT REVIEW] cuda.core: add Array, TextureObject, SurfaceObject, MipmappedArray (#467)#2095

Draft
rparolin wants to merge 11 commits into
NVIDIA:mainfrom
rparolin:feature/cuda-core-texture-surface-467
Draft

[DO NOT REVIEW] cuda.core: add Array, TextureObject, SurfaceObject, MipmappedArray (#467)#2095
rparolin wants to merge 11 commits into
NVIDIA:mainfrom
rparolin:feature/cuda-core-texture-surface-467

Conversation

@rparolin
Copy link
Copy Markdown
Collaborator

Summary

Implements #467 (TextureObject and SurfaceObject) plus the supporting stack on top of cuda.core:

  • Array + ArrayFormat — opaque, hardware-laid-out GPU allocations for texture/surface backing.
  • MipmappedArray — wraps CUmipmappedArray with non-owning Array level views (parent kept alive via _parent_ref).
  • TextureObject + TextureDescriptor — bindless texture handle with the full sampling state surface (filter mode, read mode, address modes, border color, mipmap clamps, anisotropy, sRGB, seamless cubemap).
  • SurfaceObject — bindless surface handle for kernel-side typed load/store. Requires Array(surface_load_store=True).
  • ResourceDescriptor — three factories (from_array, from_mipmapped_array, from_linear, from_pitch2d) covering all texture-eligible variants of CUDA_RESOURCE_DESC.

What's included

  • Full public API exported from cuda.core.
  • Documentation under docs/source/api.rst (Textures and surfaces section).
  • End-to-end example at examples/texture_sample.py — allocates a 2D Array, plants a known pattern, builds a TextureObject with LINEAR/CLAMP filtering, samples at texel-center and half-integer coordinates from a kernel, verifies sampling math (POINT-exact + bilinear blend).
  • 62 unit tests covering happy paths, validation (negative paths for every raise site), boundary cases, and lifetime / keepalive chains.
  • Shared _get_current_context_ptr / _get_current_device_id helpers in cuda_utils (9+ duplicate sites in the rest of cuda.core can adopt these in a follow-up).

Test plan

  • pixi run test-core — full suite passes (3326 passed, 199 skipped, 2 xfailed).
  • pixi run docs-core — docs build clean (1 pre-existing warning unrelated to this PR).
  • python examples/texture_sample.py — runs end-to-end, verifies expected sampling output.
  • CI green on NVIDIA runners.
  • Reviewer spot-checks the lifetime model on MipmappedArray.get_level (non-owning Array + _parent_ref strong ref to parent).

Not in this PR

  • Layered / cubemap / sparse Array variants (documented as NotImplementedError-deferred).
  • cuTexObject / cuSurfObject introspection via cuTexObjectGetResourceDesc etc. (round-tripping the descriptors).
  • Consolidating the duplicate _get_current_* helpers in _tensor_map.pyx, _graph_node.pyx, etc. to the new cuda_utils versions.

Filed as draft pending owner review.

🤖 Generated with Claude Code

rparolin and others added 10 commits May 13, 2026 13:51
Introduce a Pythonic wrapper around CUarray as a prerequisite for
TextureObject / SurfaceObject support. This initial slice covers plain
1D/2D/3D allocations via cuArrayCreate / cuArray3DCreate, with an opt-in
surface_load_store flag for binding as a SurfaceObject. Layered, cubemap,
sparse, and texture-gather variants are intentionally deferred.

_from_handle is provided for graphics-interop borrowing and queries shape,
format, and channel count from the driver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full-array async copies between an Array and either a Buffer or any
buffer-protocol host object (numpy, bytes, bytearray, array.array).
Implemented as a single cuMemcpy3DAsync path so 1D/2D/3D arrays share
one code path.

Also exposes a size_bytes property used to size matching host or device
buffers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…IA#467)

Wraps cuTexObjectCreate with a Pythonic descriptor pair:

- ResourceDescriptor.from_array(array) is the only resource kind supported
  in this initial slice; from_linear and from_pitch2d will follow once
  Buffer carries format/channel metadata.
- TextureDescriptor mirrors CUDA_TEXTURE_DESC: per-axis AddressMode,
  FilterMode, ReadMode, normalized coords, sRGB, border color, mipmap
  params, anisotropy.
- TextureObject holds a strong ref to the ResourceDescriptor (and
  transitively the backing Array) for the lifetime of the handle to
  prevent dangling-pointer kernel launches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the second half of NVIDIA#467 alongside the existing TextureObject:

- SurfaceObject wraps cuSurfObjectCreate / cuSurfObjectDestroy. Unlike a
  texture it has no sampling state (no filter mode, no addressing, no
  normalization); kernels read and write through it with integer pixel
  coordinates.
- Track CUDA_ARRAY3D_SURFACE_LDST on Array as a new surface_load_store
  property, populated in both Array.from_descriptor and
  Array._from_handle. SurfaceObject.from_array validates this upfront
  rather than letting the driver surface CUDA_ERROR_INVALID_VALUE late.
- Add a convenience SurfaceObject.from_array shortcut next to
  from_descriptor so the common case skips building a ResourceDescriptor
  by hand.

Covered by tests/test_texture_surface.py (14 tests: array shape/format/
flag plumbing, texture + surface creation, surface_load_store validation,
unsupported-resource-kind guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Widens the texture-resource surface to cover the two Buffer-backed
variants from CUDA_RESOURCE_DESC:

- ResourceDescriptor.from_linear(buffer, *, format, num_channels,
  size_bytes=None) wraps a Buffer as a typed 1D fetch. Defaults
  size_bytes to buffer.size; validates against it.
- ResourceDescriptor.from_pitch2d(buffer, *, format, num_channels,
  width, height, pitch_bytes) wraps a Buffer as a row-pitched 2D
  image. Validates pitch_bytes >= width * element_size and
  pitch_bytes * height <= buffer.size; the driver enforces its own
  CU_DEVICE_ATTRIBUTE_TEXTURE_PITCH_ALIGNMENT on top.
- TextureObject.from_descriptor handles the three resType branches
  (ARRAY, LINEAR, PITCH2D); SurfaceObject continues to require an
  array-backed resource.
- ResourceDescriptor gains format/num_channels read-only properties
  (None for array-backed) and a kind-aware __repr__.

Tests: 9 new (linear/pitch2D creation, validation paths, surface
rejection of non-array resources) on top of the existing 14. Full
test-core suite green (3287 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire the newly public Array, ArrayFormat, TextureObject, SurfaceObject,
ResourceDescriptor, TextureDescriptor, AddressMode, FilterMode, and
ReadMode symbols into the cuda.core Sphinx reference under a new
"Textures and surfaces" section in api.rst. No source docstring changes;
documentation is rendered via the existing autosummary templates and the
enum_documenter extension.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…VIDIA#467)

Introduces a MipmappedArray cdef class wrapping CUmipmappedArray with the
same lifetime model as Array (close/__dealloc__/context-manager). Levels
are obtained via get_level(L), which returns a non-owning Array that
holds a strong ref back to the parent MipmappedArray via a new
Array._parent_ref slot, ensuring level views cannot outlive the
underlying storage. Surfaces continue to require a single-Array backing;
the existing kind != "array" check in SurfaceObject.from_descriptor
naturally rejects mipmapped resources (covered by a new test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end example that builds a 2D Array with a known pattern, binds it as
a bindless TextureObject with LINEAR/CLAMP/non-normalized sampling, and
launches a kernel that samples both texel-center and half-integer
coordinates. Verifies POINT-exact returns at texel centers and analytical
bilinear blends at half-pixel positions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Safety and correctness:
- Validate buffer sizes against array extent in Array.copy_from/copy_to;
  undersized host or device Buffer inputs were previously silent stomps
  via cuMemcpy3DAsync. Both branches now raise ValueError before issuing
  the copy.
- Zero the underlying handle BEFORE calling cuXxxDestroy in close() for
  Array, MipmappedArray, TextureObject, SurfaceObject. Prevents a
  double-destroy via __dealloc__ if the driver call raises.
- ResourceDescriptor.from_linear: require size_bytes >= element_size and
  size_bytes % element_size == 0; previously accepted zero and arbitrary
  non-multiples.
- Reject bool in num_channels across Array, MipmappedArray, and the two
  Buffer-backed ResourceDescriptor factories (True was silently treated
  as 1 channel).

API polish:
- Rename TextureObject.from_descriptor params resource_desc/texture_desc
  to resource/texture_descriptor so they match the .resource and
  .texture_descriptor properties; same rename in SurfaceObject. Both
  factories are now keyword-only, consistent with Array.from_descriptor
  and MipmappedArray.from_descriptor.
- Add four ResourceDescriptor properties (size_bytes, width, height,
  pitch_bytes) so values shown in __repr__ are reachable programmatically.
- Add MipmappedArray to docs/source/api.rst (was exported but unlinked).
- Align error message style across new files: type(x).__name__ instead of
  type(x); include got <type> in three previously-bare TypeErrors in
  TextureObject.from_descriptor.

Refactor:
- Extract _get_current_context_ptr and _get_current_device_id to
  cuda_utils.{pxd,pyx} and share across all four new files (was
  duplicated four times). Generic error message keeps the helper
  reusable for the 9+ remaining duplicate sites in cuda.core.
- Hoist the buffer-protocol path in _fill_linear_endpoint into a new
  _fill_host_endpoint helper. Original function becomes a thin
  Buffer-vs-host router.
- Type Array._format and MipmappedArray._format as cydriver.CUarray_format
  instead of int (was a comment-typed int; now C-level type-checked).
- Drop unused `field` import from _texture.pyx.

Tests (+28, total 62 in this file):
- Undersized host/device buffer rejection in Array.copy_from/copy_to.
- ResourceDescriptor.from_linear rejects size_bytes=0 and non-multiples.
- _normalize_address_modes unit tests now make explicit assertions
  instead of only smoke-testing TextureObject creation.
- Negative-path coverage for Array.from_descriptor (bad format, non-
  iterable shape, zero dim), MipmappedArray.from_descriptor, all
  TextureObject.from_descriptor validation branches (filter_mode,
  read_mode, mipmap_filter_mode, max_anisotropy, border_color length),
  address-mode normalization (scalar non-AddressMode, empty/4-entry
  tuples, mixed-type entries), ResourceDescriptor.from_pitch2d, and
  copy_from/copy_to non-Stream rejection.
- TextureObject and SurfaceObject keepalive lifetime tests verifying
  the _source_ref chain holds after gc.collect() (mirrors the existing
  MipmappedArray level keepalive test).
- copy_from must not mutate the source buffer (round-trip test now
  also asserts list(src) is unchanged).

Example:
- texture_sample.py uses `with` blocks for Array and TextureObject so
  the user-facing demo shows the idiomatic context-manager pattern
  rather than manual try/finally.

Full cuda_core suite: 3326 passed, 199 skipped, 2 xfailed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rparolin rparolin added this to the cuda.core v1.1.0 milestone May 15, 2026
@rparolin rparolin added feature New feature or request cuda.core Everything related to the cuda.core module labels May 15, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 15, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Copy Markdown
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a design review as both @mdboom and @Andy-Jost suggested in the past meetings. At least, the API surface should be sketched in the issue before a PR is fired up. Having major feature PRs vibed without design discussion makes it very easy to merge regrettable changes.

Array is in particular a terrible name.

@rparolin rparolin changed the title cuda.core: add Array, TextureObject, SurfaceObject, MipmappedArray (#467) [DO NOT REVIEW] cuda.core: add Array, TextureObject, SurfaceObject, MipmappedArray (#467) May 15, 2026
These graphical examples demonstrate the new Array, TextureObject,
SurfaceObject, MipmappedArray, and ResourceDescriptor APIs in increasing
order of complexity. All use the existing GraphicsResource + GL PBO
pattern for display (matching gl_interop_plasma.py); CI is gated on
has_display so headless runners skip them.

Minimum-API examples:
- gl_interop_image_show.py    Hello-world for the stack: 64x64 Array,
                              TextureObject, key F toggles POINT/LINEAR.
                              Read this file first.
- gl_interop_texture_filter.py POINT vs LINEAR side-by-side on one Array
                              with two TextureObjects; mouse pan/zoom,
                              key M cycles AddressMode.

Simulation examples (Array + SurfaceObject + TextureObject ping-pong):
- gl_interop_reaction_diffusion.py Gray-Scott with FLOAT32 x 2 channels;
                              LINEAR + WRAP for toroidal diffusion.
- gl_interop_lenia.py         Continuous-state CA with bell-curve
                              convolution; FLOAT32 x 1 channel.
- gl_interop_fire.py          Canonical Doom fire (37-color indexed
                              palette, UINT8 intensity 0..36, gather
                              equivalent of the original scatter
                              algorithm); exercises ArrayFormat.UINT8.
- gl_interop_ocean.py         Animated Gerstner-wave ocean with normal
                              mapping via finite-difference texture
                              reads and Phong + Fresnel shading.

Visualization examples:
- gl_interop_mandelbrot.py    Real-time deep-zoom using a 1D Array as
                              a color LUT (TextureObject for palette
                              lookup, not simulation).
- gl_interop_mipmap_lod.py    Procedural mipmap pyramid built with a
                              SurfaceObject per level; trilinear
                              sampling via tex2DLod and TextureDescriptor
                              mipmap fields.
- gl_interop_sdf_volume.py    3D ray-marched gyroid via a 128^3 Array,
                              surf3Dwrite for bake, tex3D for trilinear
                              SDF sampling. Only example exercising the
                              3D side of the API.

Every public symbol added in this PR is exercised by at least one of
these examples.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants