About 117 results
Open links in new tab
  1. GitHub - ROCm/tensorcast

    A "cast" is the conversion of a tensor from one datatype to another. A conversion can include compressed tensors that pack values and scaling information (actual cast) or regular torch tensors …

  2. GitHub - ROCm/TransformerEngine

    Currently, we have integrated Triton kernels for cast_transpose and cast_transpose_bgrad, which are commonly used in fp8 training, and also rmsnorm kernels. This feature is still experimental as it …

  3. [BF16] GPU Implementation · Issue #3519 · ROCm/AMDMIGraphX

    Oct 9, 2024 · Idea: Cast FP32/FP16 to BF16. Casting will be different based on type: FP32 to BF16: truncate last 16 bits from mantissa, exponent stays the same FP16 to BF16: more involved process - …

  4. gfx1151 encounters invalid data cast · Issue #1136 · ROCm/TheRock

    Jul 28, 2025 · When I use comfyui and play with fox moving video using wan2.1, a invalid value cast appears in VAE process. Checkpoint files will always be loaded safely. Total VRAM 98304 MB, total …

  5. hipMallocPitch requires cast to void** · Issue #2477 · ROCm/hip - GitHub

    Feb 8, 2022 · Hello, this is a minor issue, but it is quite inconvenient -- one needs to cast the first parameter of hipMallocPitch to void** in HIP. In CUDA (cudaMallocPitch), the cast is not needed.

  6. [FP8] add autocast_fp8 pass for the Python API #2447 - GitHub

    Nov 16, 2023 · umangyadav mentioned this on Dec 1, 2023 [FP8] Update ONNX operator parsing for CastLike/Cast operation to account for Saturate #2497

  7. Undefined behaviour in f8_utils (reinterpret_cast) #1439 - GitHub

    Aug 5, 2024 · In the file f8_utils.hpp I found several occurences of reinterpret casts such as: * (reinterpret_cast<const Y*> (&retval)) From my (shallow) understanding of the code, this is undefined …

  8. [Fp16] MIOpen integration or layout transpose issue with FP16 ... - GitHub

    Nov 1, 2023 · From the frameworks perspective, it would be best if the cast kernel clamped into finite range for forward but not backward convolutions. If that is too hard, then it should not clamp at all.

  9. [gfx1030] [ROCM 5.2.3] [rocm-arch]MIOpen (HIP): Error [Do ... - GitHub

    Sep 17, 2022 · > `(HIP): Error [Do] 'amd_comgr_do_action(kind, handle, in.GetHandle(), out.GetHandle())' AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC: ERROR (1) …

  10. Support the Castlike ONNX Operator #2151 - GitHub

    Sep 5, 2023 · We read every piece of feedback, and take your input very seriously ...