Like haste said, asynchronous compute depends on currently unutilized resources to work effectively. Since Nvidia GPUs tend to be more efficient and have higher utilization, it goes to show that they would benefit less from asynchronous compute. You even see this with AMD GPUs. Navi does not benefit from asynchronous compute nearly as much as Vega, because Navi has higher efficiency/utilization. The fact that Turing GPUs suffer a small performance hit though tells me that Nvidia needs to tune the drivers a bit more so it can make better decisions based on available resources whether to attempt to execute the code in parallel or serial mode.