This scheduling logic tries to evenly distribute warps across *all* cores, instead of trying to fill up the first cores as much as possible. This scheme is necessary for the intra-cluster cores which are assumed to have equal workloads distributed.
This scheduling logic tries to evenly distribute warps across *all* cores, instead of trying to fill up the first cores as much as possible. This scheme is necessary for the intra-cluster cores which are assumed to have equal workloads distributed.