rocm-kernel-examples Custom HIP/ROCm kernel examples — matrix ops, attention, flash-attention optimized for AMD CDNA architecture