:/ That's not what async is meant to do. There are computations which need/HAVE to be made serially, this is where async will make no difference whatsoever (and where you are correct in the nvidia is extremely efficient there, no argument). Then there are computations which can be completed asynchronously, meaning that it doesn't depend on the computation ahead of it. If, like Maxwell, you do not posses the ability to async compute then all those computations which could have been completed will be done serially. No matter how perfectly you programmed the efficiency the async capable chip will always outperform, I mean just look at programs that efficiently take advantage of hyper threading (aka async). As developers become more proficient at applying async there will be larger and larger disparities between chips of differing async capability.