It is strange i have everything in msi-x except usb. And still i have 350k interrupts on core0, while on other 7 cores only 500-7000. I tried set interrupt affinity in regedit, but it doesn't work. Even support from latencymon said, these settings can be overriden on driver/hardware level. This doesn't make sense, i would expect interrupts will be spread on all cores evenly, i don't have even that old pc i7 3700 and b75m chipset, which isn't best but still. I am getting new pc, i will test it out that too. But it is strange. It is not like it is 2005 anymore... Would expect for this to work properly...
Spreading interrupts across cores in MSI mode is active only in server editions and mostly for network controllers. And be aware that interrupt is handled in two stages - first Interrupt Service Routine is invoked as response to interrupt from device, and then ISR should schedule the main part of interrupt handling by scheduling the Deferred Procedure Call. Both stages can be executed at different cores. PS Also note that only when device and its driver utilizes multiple MSIs they can be spread across cores. But usually in client editions of Windows we do not see devices with multiple MSIs. To check whether device uses multiple MSIs go to device manager, switch the view to "Resources by type", expand IRQ root node - if you see multiple nodes for the same device then this device uses multiple MSIs. PPS I forgot to print a question - why do you think that having 350k interrupts on Core 0 is the problem?
There is a utility by MS "interrupt_affinity_policy_tool.msi" that supposedly lets you move interrupts to different cores that will spread out the workload you see in latencyMon. I never messed with it and don't know if all HW is supported.
one key reason for this is that core 0 is the core least likely to ever be in a sleep state, so interrupts with a realtime or high priority will fire on this core to have the least transitional delay possible.
It is obviously problem, since all interrupts are handled on core0, that means interrupts are handled slower. I have good dpc latency, but still i could use improvement. I read on multicore system, system should spread interrupts to multiple cores for better performance, however one core per driver. As it could cause cache miss and long reading times and degrade performance. But there is no reason it shouldn't utilize multicore for handling interrupts... I said i have everything is msi/msi-x mode, so drivers should use multicore. However they don't !!! Biggest dpc latency source is gpu for example. On nvidia forums, nvidia dev said: nvidia specifies msi-x support and number of channels in its drivers coding. And settings it put in registry are only for older devices, that couldn't handle more channels. And said setting in registry shouldn't matter, however, if i set gpu to msi mode in registry i have less input lag, but still all interrupts are handled on core0. And yet it shows for some reason only msi mode supported in msi util... But i have by default gpu in line-based interrupt mode and if latencymon show correct results, my gpu drivers doesn't utilize multicore for interrupt handling, that's terrible. So that means nvidia is either lying, or latencymon shows incorrect results, i have checked all cores in setting of latencymon. "astyanax" No cpu is all time at 100% load... "edkiefer" Yeah and that interrupt affinity tool, or setting in manually under device id, affinity policy: "DevicePolicy" 3 e.g. It doesn't go anything, latencymon support said, drivers and hardware can override this setting. Already tried didn't help.. EDIT: wait a minute, after setting gpu manually to msi mode, because in msiutil under msi supported is only msi mode - strange. And using that utility i selected core 1 and now same amout of dpc which are generated by gpu was on core1, previously there was nothing, so it works now - cool !
??? Even on expensive mobos like asus rog, people have dpc latency issues... And i even read that in article, that interrupts should be spread on all cores for better performance and it is logical. So it is your who doesn't comprehend importance of this. I spred my interrupts on multiple cores and have now much lower input lag
But DPCs do not have to be processed on the Core 0. They can be scheduled to any cores. Have you checked the spread of DPCs across cores? PS Also you can use this utility https://forums.guru3d.com/threads/windows-power-plan-settings-explorer-utility.416058/ to set the Interrupt Steering Mode PS I guess "Target Load" setting sets the treshold of core load to be selected as a target for interrupt. But either "higher than" or "lower than" is unclear. Although "less than" seems more logical to me.
Also you misunderstand multi-core and MSI mode. Here is the usage https://docs.microsoft.com/en-us/wi.../network/introduction-to-receive-side-scaling When device uses multiple MSIs it can schedule each MSI into separate core. When all devices use one MSI (one MSI per device) you just can isolate them from each other assigning (affinity) to different cores. PS And overall MSI mode just improves first stage of interrupt handling - ISR. I mean legacy mode interrupt can be assigned to different cores too.
Do you even read my posts ??? I was saying that in first post, that everything is on core 0 and i had checked all cores in latencymon's setting. Only after i manually switched gpu to msi mode, interrupt affinity started working. And it is strange, because nvidia dev, said msi is supported and specified in nvidia's drivers programming so it should work by default. But it seems to not work: i get lower latency when i switch gpu to msi, but if it already worked in msi, latency shouldn't change. Also why msi util says under msi supported that my gpu support only msi and not msi-x, i have nvidia 780 and pci-e gen 3, checked with gpu-z it is working.
Be aware that any installation of NV drivers will revert GPU to legacy mode. I have read your post but I just do not see the difference between MSI mode and legacy mode in terms of CPU affinity. They should be the same. If you sure MSI mode changes things then obviously there are factors I am not aware of. As for "MSI" vs "MSI-X" - they are the same mode and differ only in some numbers, so do not bother with that. I saw "MSI-X" support only on NMVe, RAID, USB3.1 controllers.
Because i said i have 350k interrupts on core0 and rest is unused, but whatever. That is changes to legacy after driver install is strange, usually after other driver installation it changes from legacy to msi, like for example with nic. In that link about RSS what you send me, there was that msi and msi-x has some benefits and is better than single line interrupts. Also it can support up to 2048 channels for messages. I still don't understand how it is possible, if i switch gpu to msi, that i get a lot of less input lag, while nvidia dev said, setting from registry shouldn't matter and nvidia specifies this in its drivers programming. There shouldn't be any difference, if it was true, i should yet test it with latencymon with and without msi. EDIT: So i did 2 tests, one with msi on and one with off, i had open exact same things, still loads will be never same, but it should show big differences. And i played replay from the game, so gpu load should be same. https://imgur.com/a/sAvQUDS That one with with on had significantly lower execution times, one thing: when msi is on, it doesn't show isr, so i don't know if that counts in total execution time, but dpc was also lower with msi on.
Latencymon is not 100% safe proof/evidence. It can have bugs. I say this because no matter - MSI or Line mode for interrupt - device still produces interrupts (according to workload) and each interrupt consists of one ISR and one DPC. MSI mode just changes the way CPU receives interrupt from device - directly from PCI-E bus (avoiding legacy dedicated interrupt controller). After CPU receives interrupt it invokes ISR provided by driver, and ISR schedules DPC. ISR part is more efficient in MSI mode but still it is invoked in all modes. And DPC part is exactly the same in all modes. These are the facts stated in the Microsoft books. If you want to get more accurate picture/trace use this method https://forums.guru3d.com/threads/simple-way-to-trace-dpcs-and-isrs.423884/ While I was on the way from work another thought came: when you statically assign interrupts of GPU to another core you get improvement because that core is less loaded/occupied by work. But if you will play game which will load all cores for big amount of % you will lose that improvement. Just a note.
Even if that is so, i can feel difference with msi on and off, i am very sensitive to input lag and i tested myself with app: which split screen on 2 halfs - one with lag and one not and i can tell down to 6ms difference. Ye i know adk, but i never used that, also performance monitor can measure dpc latency sort of. I could do that test later with adk. I don't know if you would lose that improvement, why tho ? If your cpu was maxed on all cores, interrupts should have priority and even if all cores were maxed, interrupts would be processed in parallel, so it should be still faster than on one core. Also interrupts should pause whatever work is cpu doing and focus on interrupts, so it doesn't make much sense to me. Even i imagine having constant 100% load on all cores could degrade interrupt handling performance, because of sheduling is not perfect. I still play some games, which don't max cpu at all, so it helps. Still this one is baffling to me, It is the strangest thing - nvidia not supporting msi mode by default, also other people saying putting that manually to msi mode helped them. Because it reduces input lag a lot - by putting it to msi mode, if it worked in msi by default, there shouldn't be any difference whatsoever !!! Nvidia driver still has relatively high execution time up to 700us. Also msi util says: gpu supports only msi and not msi-x mode and under limit is nothing and under max limit 1, don't know if these values are optimal. And btw even in intel document is msi-x are better and have lower latency and i agree.
That`s a secret for me and all NV videocard users - why NV defaults drivers to legacy mode. And of course MSI mode is more efficient comparing to legacy one - no argument here. Some people here stated that they solved problem like stuttering or audio popping by switching to MSI mode. Now to the limit value(s). The limit value starts to play only when device uses more then one MSI - that looks like a multiple IRQs (negative) for one device in device manager ("Resources by type" view) - so that you can limit the number of MSIs for such device. When you see only one IRQ for the device it does not use multiple MSIs - hence the limit value has no effect (there is nothing to limit). Update: Found a screenshot of MSI utility with devices with multiple MSIs - see those multiple entries for SATA and Solarstorm (that`s a NIC) controllers. PS About interrupt affinity Spoiler Interrupt Affinity On systems that both support ACPI and contain an APIC, Windows enables driver developers and administrators to somewhat control the processor affinity (selecting the processor or group of processors that receives the interrupt) and affinity policy (selecting how processors will be chosen and which processors in a group will be chosen). Furthermore, it enables a primitive mechanism of interrupt prioritization based on IRQL selection. Affinity policy is defined according to Table 3-1, and it’s configurable through a registry value called InterruptPolicyValue in the Interrupt Management\Affinity Policy key under the device’s instance key in the registry. Because of this, it does not require any code to configure - an administrator can add this value to a given driver’s key to influence its behavior. Microsoft provides such a tool, called the Interrupt Affinity policy Tool.
That's strange since i thought whole point of message signaled interrupts is use of parallel interrupts. Pci-e supports up to 2048 channels aka msis. I am not hardware expert and these things are hard to find, if it is even possible to find some information without being expert in hardware, or nvidia dev. Because if i don't find some article, or explanation, i won't know how many msis is ideal... That gpu would use only single is weird. Why user has to solve this, one would expect devs would correct things like this. Anyways i am happy now, since affinity is working and msi helps with input lag substantially. Some things like usb not working is msi are frustrating, because you can't do anything, it is up to driver vendor fix this... Luckily i will be getting new pc so... Anandtech makes even dpc latency tests for mobos, even they are relative, it tells you if it isn't complete crap, some expensive asus rog mobos even had overly high dpc latency. It is kinda big deal, i can really discern difference...
First point of MSI mode is to improve ISR stage of interrupt handling - https://www.intel.co.za/content/dam...hite-papers/msg-signaled-interrupts-paper.pdf Second point of MSI is to use parallel interrupts. But that is most relevant in heavy server loads. For example WEB server, or SQL server, with tens of CPUs, and you improve network latency with multiple MSIs for NIC, each assigned to separate CPU. Or file storage server with RAID controller. In home or office usage benefits of multiple MSIs will be less obvious, and the hardware with multiple MSIs support is not cheap. It even can be that Microsoft and OEMs strip low level hardware of multiple MSIs feature to not let users to use it in servers. PS Only USB v2 doesn't work in MSI mode. USB of newer version does.
When you set/use Disable Idle in Processor idle disable in Processor Power Management then it shouldn't matter, correct?