> Modern GPUs will go: huh, it sure would be cool if we just shifted the threads about to produce two non divergent warps, and bam divergence solved at the hardware level
Could you kindly share a source for this? Shader Execution Reordering (SER) is available for Ray tracing, but it is not a general-purpose feature that can be used in generic compute shaders.
> Divergent threads can have a better throughput than you'd expect on a modern GPU, as they get more capable at handling this. Divergence isn't bad, its just something you have to manage - and hardware architectures are rapidly improving here
I would strongly advise against this. GPUs are highly efficient when neighboring threads within a warp access neighboring data and follow largely the same code path. Even across warps, data locality is highly desirable.