HGMMA_WAIT instruction stalls at issue when inuse_tensor is set, which is done by the previous HGMMA insn. Currently inuse_tensor is never set back to zero.