diff --git a/README.md b/README.md index 6da895bbb9b..0a986bc10b0 100644 --- a/README.md +++ b/README.md @@ -5,8 +5,8 @@ CUDA Python is the home for accessing NVIDIA’s CUDA platform from Python. It c * [cuda.core](https://nvidia.github.io/cuda-python/cuda-core/latest): Pythonic access to CUDA Runtime and other core functionality * [cuda.bindings](https://nvidia.github.io/cuda-python/cuda-bindings/latest): Low-level Python bindings to CUDA C APIs * [cuda.pathfinder](https://nvidia.github.io/cuda-python/cuda-pathfinder/latest): Utilities for locating CUDA components installed in the user's Python environment -* [cuda.coop](https://nvidia.github.io/cccl/python/coop): A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels -* [cuda.compute](https://nvidia.github.io/cccl/python/compute): A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc. that are callable on the *host* +* [cuda.coop](https://nvidia.github.io/cccl/unstable/python/coop.html): A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels +* [cuda.compute](https://nvidia.github.io/cccl/unstable/python/compute/index.html): A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc. that are callable on the *host* * [numba.cuda](https://nvidia.github.io/numba-cuda/): A Python DSL that exposes CUDA **SIMT** programming model and compiles a restricted subset of Python code into CUDA kernels and device functions * [cuda.tile](https://docs.nvidia.com/cuda/cutile-python/): A new Python DSL that exposes CUDA **Tile** programming model and allows users to write NumPy-like code in CUDA kernels * [nvmath-python](https://docs.nvidia.com/cuda/nvmath-python/latest): Pythonic access to NVIDIA CPU & GPU Math Libraries, with [*host*](https://docs.nvidia.com/cuda/nvmath-python/latest/overview.html#host-apis), [*device*](https://docs.nvidia.com/cuda/nvmath-python/latest/overview.html#device-apis), and [*distributed*](https://docs.nvidia.com/cuda/nvmath-python/latest/distributed-apis/index.html) APIs. It also provides low-level Python bindings to host C APIs ([nvmath.bindings](https://docs.nvidia.com/cuda/nvmath-python/latest/bindings/index.html)). @@ -44,4 +44,6 @@ The list of available interfaces is: * NVRTC * nvJitLink * NVVM +* nvFatbin * cuFile +* NVML diff --git a/cuda_core/cuda/core/experimental/__init__.pxd b/cuda_core/cuda/core/experimental/__init__.pxd deleted file mode 100644 index d8b3a2dc32c..00000000000 --- a/cuda_core/cuda/core/experimental/__init__.pxd +++ /dev/null @@ -1,3 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# -# SPDX-License-Identifier: Apache-2.0 diff --git a/cuda_core/cuda/core/experimental/__init__.py b/cuda_core/cuda/core/experimental/__init__.py deleted file mode 100644 index 08c3e33ce18..00000000000 --- a/cuda_core/cuda/core/experimental/__init__.py +++ /dev/null @@ -1,69 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# -# SPDX-License-Identifier: Apache-2.0 - -""" -Backward compatibility stubs for cuda.core.experimental namespace. - -This module provides forwarding stubs that import from the new cuda.core.* -locations and emit deprecation warnings. Users should migrate to importing -directly from cuda.core instead of cuda.core.experimental. - -The experimental namespace will be removed in v1.0.0. - -""" - - -def _warn_deprecated(): - """Emit a deprecation warning for using the experimental namespace. - - Note: This warning is only when the experimental module is first imported. - Subsequent accesses to attributes (like utils, Device, etc.) do not trigger - additional warnings since they are already set in the module namespace. - """ - import warnings - - warnings.warn( - "The cuda.core.experimental namespace is deprecated. " - "Please import directly from cuda.core instead. " - "For example, use 'from cuda.core import Device' instead of " - "'from cuda.core.experimental import Device'. " - "The experimental namespace will be removed in v1.0.0.", - DeprecationWarning, - stacklevel=3, - ) - - -# Import from new locations and re-export -_warn_deprecated() - - -from cuda.core import system, utils - -# Make utils accessible as a submodule for backward compatibility -__import__("sys").modules[__spec__.name + ".utils"] = utils - - -from cuda.core._device import Device -from cuda.core._event import Event, EventOptions -from cuda.core._launch_config import LaunchConfig -from cuda.core._launcher import launch -from cuda.core._layout import _StridedLayout -from cuda.core._linker import Linker, LinkerOptions -from cuda.core._memory import ( - Buffer, - DeviceMemoryResource, - DeviceMemoryResourceOptions, - GraphMemoryResource, - LegacyPinnedMemoryResource, - ManagedMemoryResource, - ManagedMemoryResourceOptions, - MemoryResource, - PinnedMemoryResource, - PinnedMemoryResourceOptions, - VirtualMemoryResource, - VirtualMemoryResourceOptions, -) -from cuda.core._module import Kernel, ObjectCode -from cuda.core._program import Program, ProgramOptions -from cuda.core._stream import Stream, StreamOptions diff --git a/cuda_core/cuda/core/utils/_program_cache/_keys.py b/cuda_core/cuda/core/utils/_program_cache/_keys.py index dda07039e32..fbb5ef3f890 100644 --- a/cuda_core/cuda/core/utils/_program_cache/_keys.py +++ b/cuda_core/cuda/core/utils/_program_cache/_keys.py @@ -670,7 +670,7 @@ def make_program_cache_key( Returns ------- bytes - A 32-byte blake2b digest suitable for use as a cache key. + An opaque bytes digest suitable for use as a cache key. Raises ------ diff --git a/cuda_core/docs/nv-versions.json b/cuda_core/docs/nv-versions.json index d55ec26f53f..0d0aa6276d9 100644 --- a/cuda_core/docs/nv-versions.json +++ b/cuda_core/docs/nv-versions.json @@ -3,6 +3,10 @@ "version": "latest", "url": "https://nvidia.github.io/cuda-python/cuda-core/latest/" }, + { + "version": "1.0.0", + "url": "https://nvidia.github.io/cuda-python/cuda-core/1.0.0/" + }, { "version": "0.7.0", "url": "https://nvidia.github.io/cuda-python/cuda-core/0.7.0/" diff --git a/cuda_core/docs/source/api.rst b/cuda_core/docs/source/api.rst index 6c0019279cf..0a88a5bd4b6 100644 --- a/cuda_core/docs/source/api.rst +++ b/cuda_core/docs/source/api.rst @@ -6,11 +6,10 @@ ``cuda.core`` API Reference =========================== -This is the main API reference for ``cuda.core``. The package has not yet -reached version 1.0.0, and APIs may change between minor versions, possibly -without deprecation warnings. Once version 1.0.0 is released, APIs will -be considered stable and will follow semantic versioning with appropriate -deprecation periods for breaking changes. +This is the main API reference for ``cuda.core``. As of version 1.0.0, all +APIs are considered stable and follow `Semantic Versioning `_ +with appropriate deprecation periods for breaking changes. See the +:doc:`support policy ` for details. Devices and execution @@ -261,46 +260,6 @@ execution. checkpoint.Process -CUDA system information and NVIDIA Management Library (NVML) ------------------------------------------------------------- - -.. note:: - ``cuda.core.system`` support requires ``cuda_bindings`` 12.9.6 or later, or 13.2.0 or later. - -Basic functions -``````````````` - -.. autosummary:: - :toctree: generated/ - - system.get_user_mode_driver_version - system.get_kernel_mode_driver_version - system.get_driver_branch - system.get_num_devices - system.get_nvml_version - system.get_process_name - system.get_topology_common_ancestor - system.get_p2p_status - -Events -`````` - -.. autosummary:: - :toctree: generated/ - - system.register_events - -Types -````` - -.. autosummary:: - :toctree: generated/ - - :template: autosummary/cyclass.rst - - system.Device - system.NvlinkInfo - Utility functions ----------------- diff --git a/cuda_core/docs/source/api_nvml.rst b/cuda_core/docs/source/api_nvml.rst new file mode 100644 index 00000000000..078f8ac6d67 --- /dev/null +++ b/cuda_core/docs/source/api_nvml.rst @@ -0,0 +1,44 @@ +.. SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +.. SPDX-License-Identifier: Apache-2.0 + +.. module:: cuda.core.system + +CUDA system information and NVIDIA Management Library (NVML) +============================================================ + +.. note:: + ``cuda.core.system`` support requires ``cuda_bindings`` 12.9.6 or later, or 13.2.0 or later. + +Basic functions +--------------- + +.. autosummary:: + :toctree: generated/ + + get_user_mode_driver_version + get_kernel_mode_driver_version + get_driver_branch + get_num_devices + get_nvml_version + get_process_name + get_topology_common_ancestor + get_p2p_status + +Events +------ + +.. autosummary:: + :toctree: generated/ + + register_events + +Types +----- + +.. autosummary:: + :toctree: generated/ + + :template: autosummary/cyclass.rst + + Device + NvlinkInfo diff --git a/cuda_core/docs/source/index.rst b/cuda_core/docs/source/index.rst index 3bf962d7251..9a266e20949 100644 --- a/cuda_core/docs/source/index.rst +++ b/cuda_core/docs/source/index.rst @@ -15,12 +15,14 @@ Welcome to the documentation for ``cuda.core``. install interoperability api + api_nvml environment_variables contribute .. toctree:: :maxdepth: 1 + support conduct license diff --git a/cuda_core/docs/source/install.rst b/cuda_core/docs/source/install.rst index 90e2a1b5b17..33a46a8c84e 100644 --- a/cuda_core/docs/source/install.rst +++ b/cuda_core/docs/source/install.rst @@ -32,7 +32,7 @@ dependencies are as follows: Free-threading Build Support ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -As of cuda-core 0.4.0, **experimental** packages for the `free-threaded interpreter`_ are shipped. +Starting ``cuda-core`` 0.4.0, **experimental** packages for the `free-threaded interpreter`_ are shipped. 1. Support for these builds is best effort, due to heavy use of `built-in modules that are known to be thread-unsafe`_, such as ``ctypes``. diff --git a/cuda_core/docs/source/release/1.0.0-notes.rst b/cuda_core/docs/source/release/1.0.0-notes.rst index 1a9a67c8614..714dc48ff62 100644 --- a/cuda_core/docs/source/release/1.0.0-notes.rst +++ b/cuda_core/docs/source/release/1.0.0-notes.rst @@ -10,21 +10,96 @@ Highlights ---------- -- TBD +- First stable release of ``cuda.core``! As of version 1.0.0, all + APIs are considered stable and follow Semantic Versioning (SemVer) + with appropriate deprecation periods for breaking changes. See the + :doc:`support policy ` for details. +- Added green context support (CUDA 12.4+). New types :class:`Context`, + :class:`ContextOptions`, :class:`SMResource`, :class:`SMResourceOptions`, + :class:`WorkqueueResource`, and :class:`WorkqueueResourceOptions` enable GPU + SM and workqueue resource partitioning. Create green contexts via + :meth:`Device.create_context`, then use :meth:`Context.create_stream` and + :attr:`Context.resources` to work within the partitioned resources. + (`#1976 `__) +- Added the :mod:`cuda.core.checkpoint` module for CUDA process checkpointing, + including string process state queries, lock/checkpoint/restore/unlock + operations, and GPU UUID remapping support for restore. + (`#1343 `__) New features ------------ -- Added the :mod:`cuda.core.checkpoint` module for CUDA process checkpointing, - including string process state queries, lock/checkpoint/restore/unlock - operations, and GPU UUID remapping support for restore. - (`#1343 `__) +- :meth:`Program.compile` now accepts an optional ``cache=`` keyword argument + for avoiding recompilation of identical source + options + target. Two + concrete implementations of the :class:`~utils.ProgramCacheResource` ABC are + provided: :class:`~utils.InMemoryProgramCache` (thread-safe, single-process + LRU) and :class:`~utils.FileStreamProgramCache` (disk-backed, cross-process + safe, LRU-evicting). A standalone :func:`~utils.make_program_cache_key` + function is exposed for callers who need to incorporate additional content + (e.g. headers or PCH files) into the cache key. + (`#1912 `__) +- Changes to the :mod:`cuda.core.system` module for NVIDIA Management Library (NVML) + access: + + - :attr:`system.Device.mig` for querying and setting MIG mode, enumerating + MIG device instances, and navigating parent/child relationships. + (`#1916 `__) + - :attr:`system.Device.compute_running_processes` for querying running compute + processes on a device, returning :class:`~system.ProcessInfo` objects with + PID, GPU memory usage, and MIG instance IDs. + (`#1917 `__) + - :meth:`system.Device.get_nvlink` for querying NVLink version and state per + link, and :attr:`system.Device.utilization` returning current GPU and memory + utilization rates. + (`#1918 `__) + +- Enums are now available in places where a small number of string values are + accepted or returned. You may continue to use the string values, or use + enumerations for better linting and type-checking. + (`#2016 `__) + The new enums are: + + - :class:`cuda.core.typing.CompilerBackendType` + - :class:`cuda.core.typing.GraphConditionalType` + - :class:`cuda.core.typing.GraphMemoryType` + - :class:`cuda.core.typing.ManagedMemoryLocationType` + - :class:`cuda.core.typing.ObjectCodeFormatType` + - :class:`cuda.core.typing.PCHStatusType` + - :class:`cuda.core.typing.SourceCodeType` + - :class:`cuda.core.typing.VirtualMemoryAccessType` + - :class:`cuda.core.typing.VirtualMemoryAllocationType` + - :class:`cuda.core.typing.VirtualMemoryGranularityType` + - :class:`cuda.core.typing.VirtualMemoryHandleType` + - :class:`cuda.core.typing.VirtualMemoryLocationType` Breaking changes ---------------- +- :class:`~utils.StridedMemoryView` now provides a fast path for ``torch.Tensor`` + objects via PyTorch's AOT Inductor (AOTI) stable C ABI. When a ``torch.Tensor`` + is passed to any ``from_*`` classmethod (``from_dlpack``, + ``from_cuda_array_interface``, ``from_array_interface``, or + ``from_any_interface``), tensor metadata is read directly from the underlying + C struct, bypassing the DLPack and CUDA Array Interface protocol overhead. + This yields ~7–20x faster ``StridedMemoryView`` construction for PyTorch + tensors (depending on whether stream ordering is required). Proper CUDA stream + ordering is established between PyTorch's current stream and the consumer + stream, matching the DLPack synchronization contract. + Requires PyTorch >= 2.3. + + This is a *behavioral* breaking change: because the AOTI tensor bridge reads + raw metadata without re-enacting PyTorch's export guardrails, tensors that + PyTorch would reject at the DLPack boundary (notably ``requires_grad``, + conjugated, non-strided/sparse, and wrong-current-device CUDA tensors) are + now accepted. This is intentional — ``StridedMemoryView`` is designed for + low-level interop where those checks are not needed. + (`#749 `__) +- Removed the deprecated ``cuda.core.experimental`` namespace. All public APIs + have been available under ``cuda.core`` since v0.5.0. Code that imports from + ``cuda.core.experimental`` must be updated to import from ``cuda.core`` + instead. - Graph types are no longer re-exported from the top-level ``cuda.core`` namespace; they must be imported from :mod:`cuda.core.graph`. The affected symbols are :class:`~graph.Graph`, :class:`~graph.GraphBuilder`, @@ -32,8 +107,7 @@ Breaking changes :class:`~graph.GraphDebugPrintOptions`, and :class:`~graph.GraphDefinition`. Update ``from cuda.core import GraphBuilder`` to ``from cuda.core.graph import GraphBuilder`` (and similarly for the other - symbols). The same symbols are also no longer forwarded through the - deprecated ``cuda.core.experimental`` namespace. + symbols). - Removed the ``GraphAllocOptions`` dataclass and the ``AllocNode.options`` property. Its fields are now keyword-only parameters on :meth:`graph.GraphDefinition.allocate` and @@ -171,8 +245,69 @@ Breaking changes - :obj:`cuda.core.typing.DevicePointerT` -> :obj:`cuda.core.typing.DevicePointerType` - :obj:`cuda.core.typing.IsStreamT` -> :obj:`cuda.core.typing.IsStreamType` -- :func:`args_viewable_as_strided_memory` and :class:`StridedMemoryView` are now - longer at the top-level in :mod:`cuda.core`. They are available publicly from the +- Renamed and converted multiple :class:`~system.Device` properties and methods + for naming consistency + (`#1946 `__): + + On :class:`~system.Device`: + + - ``is_c2c_mode_enabled`` -> ``is_c2c_enabled`` + - ``persistence_mode_enabled`` -> ``is_persistence_mode_enabled`` + - ``clock(clock_type)`` -> ``get_clock(clock_type)`` + - ``get_auto_boosted_clocks_enabled()`` -> ``is_auto_boosted_clocks_enabled`` + (method -> property) + - ``get_current_clock_event_reasons()`` -> ``current_clock_event_reasons`` + (method -> property) + - ``get_supported_clock_event_reasons()`` -> ``supported_clock_event_reasons`` + (method -> property) + - ``display_mode`` -> ``is_display_connected`` + - ``display_active`` -> ``is_display_active`` + - ``fan(fan=0)`` -> ``get_fan(fan=0)`` + - ``get_supported_pstates()`` -> ``supported_pstates`` + (method -> property) + + On ``PciInfo``: + + - ``get_max_pcie_link_generation()`` -> ``link_generation`` (method -> property) + - ``get_gpu_max_pcie_link_generation()`` -> ``max_link_generation`` + (method -> property) + - ``get_max_pcie_link_width()`` -> ``max_link_width`` (method -> property) + - ``get_current_pcie_link_generation()`` -> ``current_link_generation`` + (method -> property) + - ``get_current_pcie_link_width()`` -> ``current_link_width`` + (method -> property) + - ``get_pcie_throughput(counter)`` -> ``get_throughput(counter)`` + - ``get_pcie_replay_counter()`` -> ``replay_counter`` (method -> property) + + On ``Temperature``: + + - ``sensor(sensor=...)`` -> ``get_sensor(sensor=...)`` + - ``threshold(threshold_type)`` -> ``get_threshold(threshold_type)`` + - ``thermal_settings(sensor_index)`` -> ``get_thermal_settings(sensor_index)`` + + On ``FanInfo``: + + - ``set_default_fan_speed()`` -> ``set_default_speed()`` + +- Re-wrapped NVML enums as human-readable ``StrEnum`` subclasses instead of raw + integer re-exports from ``cuda.bindings.nvml``. These are available in + ``cuda.core.system.typing``. + (`#2014 `__) +- Removed 18 helper/data-container classes from ``cuda.core.system.__all__``: + ``BAR1MemoryInfo``, ``ClockInfo``, ``ClockOffsets``, ``CoolerInfo``, + ``DeviceAttributes``, ``DeviceEvents``, ``EventData``, ``FanInfo``, + ``FieldValue``, ``FieldValues``, ``GpuDynamicPstatesInfo``, + ``GpuDynamicPstatesUtilization``, ``InforomInfo``, ``PciInfo``, + ``RepairStatus``, ``Temperature``, ``ThermalSensor``, ``ThermalSettings``. + These classes are still returned by :class:`~system.Device` properties and + methods but should not be directly instantiated by users. + (`#1942 `__) +- :attr:`system.Device.uuid` now returns the full NVML UUID with prefix + (e.g. ``GPU-...``). Use :attr:`system.Device.uuid_without_prefix` for + the previous behavior. + (`#1916 `__) +- :func:`args_viewable_as_strided_memory` and :class:`StridedMemoryView` were accidentally + exposed at the top-level in :mod:`cuda.core`. They are available publicly from the :mod:`cuda.core.utils` module. (`#2028 `__) @@ -182,36 +317,43 @@ Breaking changes NVML) and :func:`system.get_kernel_mode_driver_version` (requires NVML). Each returns a ``tuple[int, ...]``. + Fixes and enhancements ----------------------- -- :class:`~utils.StridedMemoryView` now provides a fast path for ``torch.Tensor`` - objects via PyTorch's AOT Inductor (AOTI) stable C ABI. When a ``torch.Tensor`` - is passed to any ``from_*`` classmethod (``from_dlpack``, - ``from_cuda_array_interface``, ``from_array_interface``, or - ``from_any_interface``), tensor metadata is read directly from the underlying - C struct, bypassing the DLPack and CUDA Array Interface protocol overhead. - This yields ~7-20x faster ``StridedMemoryView`` construction for PyTorch - tensors (depending on whether stream ordering is required). Proper CUDA stream ordering is established between PyTorch's current - stream and the consumer stream, matching the DLPack synchronization contract. - Requires PyTorch >= 2.3. - (`#749 `__) - -- Enums are not available in places where a small number of string values are - accepted or returned. You may continue to use the string values, or use - enumerations for better linting and type-checking. - (`#2016 `__) - The new enums are: - - - :class:`cuda.core.typing.CompilerBackendType` - - :class:`cuda.core.typing.GraphConditionalType` - - :class:`cuda.core.typing.GraphMemoryType` - - :class:`cuda.core.typing.ManagedMemoryLocationType` - - :class:`cuda.core.typing.ObjectCodeFormatType` - - :class:`cuda.core.typing.PCHStatusType` - - :class:`cuda.core.typing.SourceCodeType` - - :class:`cuda.core.typing.VirtualMemoryAccessType` - - :class:`cuda.core.typing.VirtualMemoryAllocationType` - - :class:`cuda.core.typing.VirtualMemoryGranularityType` - - :class:`cuda.core.typing.VirtualMemoryHandleType` - - :class:`cuda.core.typing.VirtualMemoryLocationType` +- Fixed :attr:`Buffer.is_managed` returning ``False`` for pool-allocated managed + memory (:class:`ManagedMemoryResource`), which caused DLPack interop to + misclassify managed buffers as ``kDLCUDAHost``. The fix queries both the + driver pointer attribute and the memory resource. + (`#1924 `__) +- :attr:`system.Device.arch` now returns ``UNKNOWN`` instead of raising + ``ValueError`` when NVML reports an architecture not yet in the enum. + (`#1937 `__) +- :meth:`system.Device.get_field_values` and + :meth:`system.Device.clear_field_values` with an empty list no longer raise + ``InvalidArgumentError``. + (`#1982 `__) +- :class:`Linker` error and info log retrieval now properly checks return codes + from nvJitLink, raising exceptions on failure instead of silently ignoring + errors. + (`#1993 `__) +- Fixed a potential crash when NVML event set creation failed on Windows, due to + ``__dealloc__`` freeing an uninitialized handle. + (`#1992 `__) +- CUDA Runtime error messages are now more reliable, especially on Windows + where the runtime DLL name table could disagree with the installed bindings. + (`#2003 `__) +- Graph kernel nodes now prevent Python kernel-argument objects from being + garbage-collected before the graph executes. Previously, objects passed as + kernel arguments (e.g. a :class:`Buffer`) could be freed if the only Python + reference was through the launch call, causing the graph to operate on stale + device pointers. + (`#2041 `__) +- Fixed a potential crash in ``DeviceEvents.__dealloc__`` when ``__init__`` + raised before the NVML event set was created, due to freeing an uninitialized + handle. + (`#2047 `__) +- Linux release wheels are now stripped of debug symbols, significantly reducing + package size. Debug builds are now supported via + ``--config-settings=debug=true``. + (`#1890 `__) diff --git a/cuda_core/docs/source/support.rst b/cuda_core/docs/source/support.rst new file mode 100644 index 00000000000..3a6548ce204 --- /dev/null +++ b/cuda_core/docs/source/support.rst @@ -0,0 +1,87 @@ +.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +.. SPDX-License-Identifier: Apache-2.0 + +.. _cuda-core-support: + +``cuda.core`` Support Policy +============================ + +Versioning Scheme +----------------- + +``cuda.core`` follows `Semantic Versioning (SemVer) `_ with the version +format ``major.minor.patch``: + +- **Major**: Bumped when a new CUDA major release is out and support for the oldest CUDA major + version is dropped. Breaking API changes only happen at major-version boundaries. +- **Minor**: Bumped when new, backward-compatible features are added, or when a new Python feature + release is out and the oldest supported Python version reaches EOL. +- **Patch**: Bumped for bug fixes and backward-compatible maintenance updates. + +Unlike ``cuda.bindings``, the ``cuda.core`` version is *not* aligned with the CUDA Toolkit version. +Consult the table below or the :doc:`release notes ` to determine which CUDA versions are +supported by a given ``cuda.core`` release. + +Project Lifecycle & Release Cadence +*********************************** + +- ``cuda.core`` follows its own release cadence, independent of CUDA Toolkit releases, as long as + SemVer guarantees are maintained. + + - We currently aim for bimonthly releases, though this is subject to change. + +- Major version releases are aligned to CUDA major version releases. +- New features may be delivered in minor releases at any time — not gated by the CUDA Toolkit + release schedule. +- Patch releases can be made on an as-needed basis, subject to urgency and the team's bandwidth. +- We currently do not plan to maintain multiple releases, nor have any backport policy for new features or bug fixes. +- Deprecation notices will be issued at least for one (1) minor release, before the actual removal + happens. + +CUDA Version Support +-------------------- + +``cuda.core`` is actively maintained to support the two (2) most recent CUDA major versions. For +example, ``cuda.core`` 1.x supports CUDA 12 and 13. + +In particular, what this entails is that all CUDA minor versions within the two major releases +(12.x, 13.x) are supported by the same ``cuda-core`` package. + +When a new CUDA major version is released and support for the oldest major version is dropped, +``cuda.core`` will release a new major version (e.g., 1.x → 2.0.0). + +.. list-table:: CUDA Version Support Matrix + :header-rows: 1 + + * - ``cuda.core`` version + - Supported CUDA versions + * - 1.x + - 12, 13 + +As with any CUDA library, certain features may impose additional requirements on the minimum +``cuda-bindings``, CUDA library, or CUDA driver versions. Refer to the individual module +documentation for details. + +Python Version Support +---------------------- + +``cuda.core`` supports all Python versions following the `CPython EOL schedule +`_. As of writing, Python 3.10 – 3.14 are supported. + +When a new Python feature version is released and the oldest supported version reaches EOL, +``cuda.core`` will bump its minor version accordingly. + +Free-threading Build Support +---------------------------- + +Starting ``cuda-core`` 0.4.0, packages for the `free-threaded interpreter +`_ are shipped to PyPI and conda-forge. +This support is currently *experimental*. + +For now, you are responsible for making sure that calls into the underlying CUDA libraries +are thread-safe. This is subject to change. + +---- + +The NVIDIA CUDA Python team reserves the right to amend the above support policy. Any major changes, +however, will be announced to users in advance. diff --git a/cuda_core/tests/test_experimental_backward_compat.py b/cuda_core/tests/test_experimental_backward_compat.py deleted file mode 100644 index 98af4a9557a..00000000000 --- a/cuda_core/tests/test_experimental_backward_compat.py +++ /dev/null @@ -1,124 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# -# SPDX-License-Identifier: Apache-2.0 - -""" -Tests for backward compatibility of cuda.core.experimental namespace. - -These tests verify that the experimental namespace forwarding stubs work -correctly and emit appropriate deprecation warnings. - -Note: This test function is assumed to be the only function importing -cuda.core.experimental in the test suite to avoid race conditions when -tests run in parallel. -""" - -import sys - -import pytest - - -def test_experimental_backward_compatibility(): - """Test backward compatibility of cuda.core.experimental namespace. - - This single test function combines all experimental namespace tests to - avoid race conditions when tests run in parallel. All tests that need to - verify deprecation warnings or module state should be in this function. - """ - # Defensive: ensure module is not cached (handles case where it might - # already be imported by other tests or conftest) - if "cuda.core.experimental" in sys.modules: - del sys.modules["cuda.core.experimental"] - - # Test 1: Main module import - should emit deprecation warning - with pytest.deprecated_call(): - import cuda.core.experimental - - # Test that symbols are accessible - assert hasattr(cuda.core.experimental, "Device") - assert hasattr(cuda.core.experimental, "Stream") - assert hasattr(cuda.core.experimental, "Buffer") - assert hasattr(cuda.core.experimental, "system") - - # Test 2: Direct imports - should emit deprecation warning - # Clear cached module again to ensure warning is emitted - del sys.modules["cuda.core.experimental"] - - with pytest.deprecated_call(): - from cuda.core.experimental import ( - Buffer, - Device, - Stream, - ) - - # Verify objects are usable - assert Device is not None - assert Stream is not None - assert Buffer is not None - - # Test 3: Symbols are the same objects as core - import cuda.core - - # Compare classes/types - assert cuda.core.experimental.Device is cuda.core.Device - assert cuda.core.experimental.Stream is cuda.core.Stream - assert cuda.core.experimental.Buffer is cuda.core.Buffer - assert cuda.core.experimental.MemoryResource is cuda.core.MemoryResource - assert cuda.core.experimental.Program is cuda.core.Program - assert cuda.core.experimental.Kernel is cuda.core.Kernel - assert cuda.core.experimental.ObjectCode is cuda.core.ObjectCode - assert cuda.core.experimental.Event is cuda.core.Event - assert cuda.core.experimental.Linker is cuda.core.Linker - - # Compare singletons - assert cuda.core.experimental.system is cuda.core.system - - # Test 4: Utils module works - # Note: The deprecation warning is only emitted once at import time when - # cuda.core.experimental is first imported. Accessing utils or importing - # from utils does not trigger additional warnings since utils is already - # set as an attribute in the module namespace. - assert hasattr(cuda.core.experimental, "utils") - assert cuda.core.experimental.utils is not None - - # Should have expected utilities (no warning on import from utils submodule) - from cuda.core.experimental.utils import StridedMemoryView, args_viewable_as_strided_memory - - assert StridedMemoryView is not None - assert args_viewable_as_strided_memory is not None - - # Test 5: Options classes are accessible - assert hasattr(cuda.core.experimental, "EventOptions") - assert hasattr(cuda.core.experimental, "StreamOptions") - assert hasattr(cuda.core.experimental, "LaunchConfig") - assert hasattr(cuda.core.experimental, "ProgramOptions") - assert hasattr(cuda.core.experimental, "LinkerOptions") - assert hasattr(cuda.core.experimental, "DeviceMemoryResourceOptions") - assert hasattr(cuda.core.experimental, "VirtualMemoryResourceOptions") - - # Verify they're the same objects - assert cuda.core.experimental.EventOptions is cuda.core.EventOptions - assert cuda.core.experimental.StreamOptions is cuda.core.StreamOptions - assert cuda.core.experimental.LaunchConfig is cuda.core.LaunchConfig - - # Test 6: Memory-related classes are accessible - assert hasattr(cuda.core.experimental, "MemoryResource") - assert hasattr(cuda.core.experimental, "DeviceMemoryResource") - assert hasattr(cuda.core.experimental, "LegacyPinnedMemoryResource") - assert hasattr(cuda.core.experimental, "VirtualMemoryResource") - assert hasattr(cuda.core.experimental, "GraphMemoryResource") - - # Verify they're the same objects - assert cuda.core.experimental.MemoryResource is cuda.core.MemoryResource - assert cuda.core.experimental.DeviceMemoryResource is cuda.core.DeviceMemoryResource - - # Test 7: Objects can be instantiated through experimental namespace - # (No deprecation warning expected since module is already imported) - device = cuda.core.experimental.Device() - - assert device is not None - - # Verify it's the same type - from cuda.core import Device as CoreDevice - - assert isinstance(device, CoreDevice) diff --git a/cuda_python/DESCRIPTION.rst b/cuda_python/DESCRIPTION.rst index 6120a568023..90bf5c127a4 100644 --- a/cuda_python/DESCRIPTION.rst +++ b/cuda_python/DESCRIPTION.rst @@ -10,8 +10,8 @@ CUDA Python is the home for accessing NVIDIA's CUDA platform from Python. It con * `cuda.core `_: Pythonic access to CUDA Runtime and other core functionality * `cuda.bindings `_: Low-level Python bindings to CUDA C APIs * `cuda.pathfinder `_: Utilities for locating CUDA components installed in the user's Python environment -* `cuda.coop `_: A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels -* `cuda.compute `_: A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like ``sort``, ``scan``, ``reduce``, ``transform``, etc. that are callable on the *host* +* `cuda.coop `_: A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels +* `cuda.compute `_: A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like ``sort``, ``scan``, ``reduce``, ``transform``, etc. that are callable on the *host* * `numba.cuda `_: A Python DSL that exposes CUDA **SIMT** programming model and compiles a restricted subset of Python code into CUDA kernels and device functions * `cuda.tile `_: A new Python DSL that exposes CUDA **Tile** programming model and allows users to write NumPy-like code in CUDA kernels * `nvmath-python `_: Pythonic access to NVIDIA CPU & GPU Math Libraries, with `host `_, `device `_, and `distributed `_ APIs. It also provides low-level Python bindings to host C APIs (`nvmath.bindings `_). @@ -52,4 +52,6 @@ The list of available interfaces is: * NVRTC * nvJitLink * NVVM +* nvFatbin * cuFile +* NVML diff --git a/cuda_python/docs/source/index.rst b/cuda_python/docs/source/index.rst index 7aad94ef9c4..458a7a03229 100644 --- a/cuda_python/docs/source/index.rst +++ b/cuda_python/docs/source/index.rst @@ -20,8 +20,8 @@ multiple components: - `CUPTI Python`_: Python APIs for creation of profiling tools that target CUDA Python applications via the CUDA Profiling Tools Interface (CUPTI) - `Accelerated Computing Hub`_: Open-source learning materials related to GPU computing. You will find user guides, tutorials, and other works freely available for all learners interested in GPU computing. -.. _cuda.coop: https://nvidia.github.io/cccl/python/coop -.. _cuda.compute: https://nvidia.github.io/cccl/python/compute +.. _cuda.coop: https://nvidia.github.io/cccl/unstable/python/coop.html +.. _cuda.compute: https://nvidia.github.io/cccl/unstable/python/compute/index.html .. _numba.cuda: https://nvidia.github.io/numba-cuda/ .. _cuda.tile: https://docs.nvidia.com/cuda/cutile-python/ .. _nvmath-python: https://docs.nvidia.com/cuda/nvmath-python/latest @@ -50,8 +50,8 @@ be available, please refer to the `cuda.bindings`_ documentation for installatio cuda.core cuda.bindings cuda.pathfinder - cuda.coop - cuda.compute + cuda.coop + cuda.compute numba.cuda cuda.tile nvmath-python