requires cuda, hip-runtime-amd, nccl, rccl