Log in or Sign up

Microsoft Phases Out 32-Bit Support for Windows 10

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, May 14, 2020.

Page 2 of 3

Astyanax Ancient Guru

Messages:

17,044

Likes Received:

7,380

GPU:

GTX 1080ti

Richard Nutman said: ↑

They need to port every MS windows app to 64bit as well. Every 32bit process is slowing down people's systems.
Click to expand...

Thats not how any of this works.

infact on 64bit processors using 64bit's can imply REDUCED performance as the datasize can exceed cache.

Astyanax, May 14, 2020

#21

carnivore likes this.
wavetrex Ancient Guru

Messages:

2,465

Likes Received:

2,579

GPU:

ROG RTX 6090 Ultra

Astyanax said: ↑

Thats not how any of this works.

infact on 64bit processors using 64bit's can imply REDUCED performance as the datasize can exceed cache.
Click to expand...

That's not how it works either.

32-bit memory management means copying chunks of data left and right all the time, and every 32-bit program runs in a thin-layer virtualized environment. Virtualization costs cycles, much more than running native 64-bit code (for which memory management can also be simpler, as any quantity of memory can be contiguous, and represent pointers to actual memory, instead of having to use TLBs (translation buffers), which consume energy and compute power.

I NO circumstance does a 64-bit program run slower than a 32-bit one on a 64-bit CPU and OS !

wavetrex, May 14, 2020

#22

Richard Nutman likes this.
Astyanax Ancient Guru

Messages:

17,044

Likes Received:

7,380

GPU:

GTX 1080ti

wavetrex said: ↑

That's not how it works either.
Click to expand...

Yeah, actually, it does.
Especially when handwriting SSE and using packed data sets.

wavetrex said: ↑

I NO circumstance does a 64-bit program run slower than a 32-bit one on a 64-bit CPU and OS !
Click to expand...

And you'd be wrong.
Under a number of conditions, the same code with the same optimizations can perform worse in a 64bit binary vs 32bit if inline expansion results in data exceeding cache (cache misses)

This becomes less of an issue as the instruction sets and cpu's offering them get more advanced but it can still demonstrate performance differences even on a 9900k.

Many developers are using x64 binaries to ignore the need to optimize memory usage and manage it appropriately, there are games out there now using 10GB of memory that based on what they are actually loading could be done in 1/3 that (stares blankly at AOEE)

Last edited: May 14, 2020

Astyanax, May 14, 2020

#23

carnivore likes this.
Richard Nutman Master Guru

Messages:

268

Likes Received:

121

GPU:

Sapphire 7800XT

Astyanax said: ↑

Thats not how any of this works.

infact on 64bit processors using 64bit's can imply REDUCED performance as the datasize can exceed cache.
Click to expand...

Actually it is. 32bit compiled code cannot make use of all the hardware resources in the chip, so it's like running a crippled cpu whenever the OS schedules a 32bit app to execute.
This means the quantum timeslice it gets is not returned as quickly as it could be, thus making other applications wait longer.

The main reasons for this are;
1. In 32bit mode you cannot use the extra 8 registers that were added to x86_64. This means loops with many variables are constantly juggling them to the stack. This results in more instructions, slower code, larger executable and more memory accesses.
2. In 32bit mode you only get access to half the SSE/AVX registers. Only 8 instead of 16. Again, this results in more memory accesses as values are juggled.
3. Function calls in 64bit mode pass more variables in registers so they can be significantly quicker, and result in less push/pop instructions around the call.

It's not just about managing memory, with a 32bit memory space it's quite easy to fragment the virtual memory space such that you cannot make any more allocations, even if you have free physical memory.
With 64bit virtual memory spaces this is effectively eliminated.

Also there's no reason the size of your data has to increase, unless you're storing lots of pointers, but there are workarounds for that.
integer data, floating point data doesn't change size simply because you're in 64bit mode. The exception is in Linux a "long" is 64bits whereas it's still 32bits in Windows.

Writing 32bit code for x86_64 chips is like having a V8 and only running on 4 cylinders. You're throwing performance away for no reason.

Last edited: May 14, 2020

Richard Nutman, May 14, 2020

#24
Astyanax Ancient Guru

Messages:

17,044

Likes Received:

7,380

GPU:

GTX 1080ti

Richard Nutman said: ↑

Actually it is. 32bit compiled code cannot make use of all the hardware resources in the chip, so it's like running a crippled cpu whenever the OS schedules a 32bit app to execute.
Click to expand...

Highly optimised 32bit code do not need to use those extra resources that only serve to slow the processor down and heat it up more when abused.

Richard Nutman said: ↑

It's not just about managing memory, with a 32bit memory space it's quite easy to fragment the virtual memory space such that you cannot make any more allocations, even if you have free physical memory.
Click to expand...

Refer to : mismanagement.

You're not offering valid reasons to use one over the other.

Astyanax, May 14, 2020

#25
Richard Nutman Master Guru

Messages:

268

Likes Received:

121

GPU:

Sapphire 7800XT

Using more registers does not slow the cpu down or create more heat. It allows the compiler to produce more efficient and simpler/faster code.

I've just given you 3 reasons why 64bit code is superior. There are several more.

Richard Nutman, May 14, 2020

#26
Alessio1989 Ancient Guru

Messages:

2,959

Likes Received:

1,246

GPU:

.

Having more registers doesn't automatically mean better performance on the same peace of code. For that kinda code register renaming is a better solution. WOW64 subsystem overhead is typically meaningless, upcasting a couple of pointers for some content switches is not an issue at all. Don't forget also that more registers also mean more complex pipeline while less registers also mean less bits for source and destination register in the binary for the same assembly, which means shorter binary, which mean shorter source area in an executable which means less cache miss probabilities. If you do not need at all to take advantage of x64 instruction set, 32-bit pointers as Astyanax pointed means also shorter data records, which means better usage of cache.

Last edited: May 14, 2020

Alessio1989, May 14, 2020

#27
Richard Nutman Master Guru

Messages:

268

Likes Received:

121

GPU:

Sapphire 7800XT

Alessio1989 said: ↑

Having more registers doesn't automatically mean better performance on the same peace of code. For that kinda code register renaming is a better solution. WOW64 subsystem overhead is typically meaningless, upcasting a couple of pointers for some content switches is not an issue at all. Don't forget also that more registers also mean more complex pipeline while less registers also mean less bits for source and destination register in the binary for the same assembly, which means shorter binary, which mean shorter source area in an executable which means less cache miss probabilities. If you do not need at all to take advantage of x64 instruction set, 32-bit pointers as Astyanax pointed means also shorter data records, which means better usage of cache.
Click to expand...

You're right it doesn't automatically mean better performance. But complex loops with lots of variables invariably will.
You don't get access to register renaming, that's something the CPU does internally.
It's the programming interface that is restricted to 8 registers in 32bit mode. Nothing you can do about this.

The complexity of the pipeline doesn't change either. It's the same hardware running the code, you're just not using some parts of it.
Not using some registers doesn't change the size of the executable.

Here is a link with more detail;
https://www.viva64.com/en/a/0030/

Richard Nutman, May 14, 2020

#28
Alessio1989 Ancient Guru

Messages:

2,959

Likes Received:

1,246

GPU:

.

Richard Nutman said: ↑

You're right it doesn't automatically mean better performance. But complex loops with lots of variables invariably will.
Click to expand...

Complex loops are evil. Having more register will not change the complexity.

Richard Nutman said: ↑

You don't get access to register renaming, that's something the CPU does internally.
It's the programming interface that is restricted to 8 registers in 32bit mode. Nothing you can do about this.
Click to expand...

Of course. sir-

Richard Nutman said: ↑

The complexity of the pipeline doesn't change either. It's the same hardware running the code, you're just not using some parts of it.
Click to expand...

Of course if we only talks about x86. But generally having more registers doesn't mean better architecture or performance.

Richard Nutman said: ↑

Not using some registers doesn't change the size of the executable.
Click to expand...

Using shorter pointer will, as will the binary from the assembly.

Richard Nutman said: ↑

Here is a link with more detail;
https://www.viva64.com/en/a/0030/
Click to expand...

Good points but nothing new for me.

All this will not change the fact that pretending Microsoft to rewrite every 32-bit executable into 64-bit (whatever they want to mean with that) is just pointless. What modern software does Microsoft not provide on 64-bit? I am not aware.. 32-bit bits in Windows are meant for compatibility and they are meant to stay.

Alessio1989, May 15, 2020

#29
alanm Ancient Guru

Messages:

12,277

Likes Received:

4,484

GPU:

RTX 4080

Surprised not sooner. Even Linux distros have dropped 32-bit support already.

alanm, May 15, 2020

#30

anticupidon likes this.
Richard Nutman Master Guru

Messages:

268

Likes Received:

121

GPU:

Sapphire 7800XT

Alessio1989 said: ↑

Complex loops are evil. Having more register will not change the complexity.
Of course. sir-

Of course if we only talks about x86. But generally having more registers doesn't mean better architecture or performance.
Using shorter pointer will, as will the binary from the assembly.

Good points but nothing new for me.

All this will not change the fact that pretending Microsoft to rewrite every 32-bit executable into 64-bit (whatever they want to mean with that) is just pointless. What modern software does Microsoft not provide on 64-bit? I am not aware.. 32-bit bits in Windows are meant for compatibility and they are meant to stay.
Click to expand...

"Complex loops are evil. Having more register will not change the complexity."

The logic is the same, but the code that implements it can become simpler.
If you have 12 variables and only 8 registers, you have to switch variables out to the stack. This isn't done automatically. The compiler has to generate code to do this.
If you have 16 registers you can hold all the variables in the cpu at once. The result is smaller more efficient code.

"Good points but nothing new for me"

But it clearly explains why 64bit code is faster. Are you saying it's wrong?

You don't need to rewrite 32bit applications to be 64bit, you just recompile them in 64bit mode. The compiler will generate more efficient code.

"What modern software does Microsoft not provide on 64-bit? I am not aware"

Pretty much all their development tools and compilers for one thing. The Visual C++ compiler is a 32bit executable. It is not unheard of that it runs out of memory with large source files.

Last edited: May 15, 2020

Richard Nutman, May 15, 2020

#31
Yxskaft Maha Guru

Messages:

1,495

Likes Received:

124

GPU:

GTX Titan Sli

alanm said: ↑

Surprised not sooner. Even Linux distros have dropped 32-bit support already.
Click to expand...

Yeah, I agree. One could argue Microsoft should have longer security support since 32-bit users will be stuck with that build, but the average user really shouldn't be affected by this. 32-bit only hardware is just so old at this point.
I was surprised already five years ago that Windows 10 has so low requirements. I've never seen any tests for it, but on paper it should work on 2003 hardware.

Yxskaft, May 15, 2020

#32
Richard Nutman Master Guru

Messages:

268

Likes Received:

121

GPU:

Sapphire 7800XT

Here is another example;

https://godbolt.org/z/2psR3Y

On the left is some sample matrix multiply code.
The middle pane is compiled using gcc to 64bit code.
The right hand pane is compiled using gcc to 32bit code.

The 64bit code is 134 lines of instructions.
The 32bit code is 168 lines of instructions. You can see a lot more push/pop instructions.

The main loop at L24 has 3 more instructions in the 32bit code.

Richard Nutman, May 15, 2020

#33

yasamoka likes this.
mbk1969 Ancient Guru

Messages:

15,646

Likes Received:

13,648

GPU:

GF RTX 4070

Alessio1989 said: ↑

Using shorter pointer will, as will the binary from the assembly.
Click to expand...

Are you sure? Pointers are located mostly in stack memory and in heap memory - they are allocated dynamically, while the code is loaded from file image. And statically allocated data usually is not that big (in good programs). So having longer pointers increases stack and heap usage but not the executed code itself.

mbk1969, May 15, 2020

#34

Richard Nutman likes this.
Alessio1989 Ancient Guru

Messages:

2,959

Likes Received:

1,246

GPU:

.

mbk1969 said: ↑

Are you sure? Pointers are located mostly in stack memory and in heap memory - they are allocated dynamically, while the code is loaded from file image.
Click to expand...

You are confusion about pointer variables and memory allocation area. A tiny stack is still better than a bigger stack if the content is the same, less cache misses. The only exception I could think about is packed data structures where you sacrifice the data alignment, and you need to carefully benchmark the trade-off. But x86 doesn't have issues with 32-bit data alignment, on the other hand SIMD abuse or misuse could result in a lot of wast of space due data alignment requirements.

mbk1969 said: ↑

And statically allocated data usually is not that big (in good programs). So having longer pointers increases stack and heap usage but not the executed code itself.
Click to expand...

Having 64-bit pointers means having QWORDs for storing them, means more memory, doesn't matter where they are allocated. The text area is also smaller and so is the binary.

Richard Nutman said: ↑

"Complex loops are evil. Having more register will not change the complexity."

The logic is the same, but the code that implements it can become simpler.
If you have 12 variables and only 8 registers, you have to switch variables out to the stack. This isn't done automatically. The compiler has to generate code to do this.
If you have 16 registers you can hold all the variables in the cpu at once. The result is smaller more efficient code.

"Good points but nothing new for me"

But it clearly explains why 64bit code is faster. Are you saying it's wrong?

You don't need to rewrite 32bit applications to be 64bit, you just recompile them in 64bit mode. The compiler will generate more efficient code.

"What modern software does Microsoft not provide on 64-bit? I am not aware"

Pretty much all their development tools and compilers for one thing. The Visual C++ compiler is a 32bit executable. It is not unheard of that it runs out of memory with large source files.
Click to expand...

CL is already 64-bit, as are the linker and the debugger and quite all developer tools of the Windows SDK. What Microsoft still not ported to 64-bit is the IDE of Visual Studio (but not the tools, they are already 64-bit). Yes, that would be really appreciated, but we were talking about Windows, they do not need to port nothing to 64-bit, everything is already 64-bit except some executable for legacy technologies which are defunct and never had a 64-bit version (e.g.: very old version of DirectX runtimes).

Richard Nutman said: ↑

Here is another example;

https://godbolt.org/z/2psR3Y

On the left is some sample matrix multiply code.
The middle pane is compiled using gcc to 64bit code.
The right hand pane is compiled using gcc to 32bit code.

The 64bit code is 134 lines of instructions.
The 32bit code is 168 lines of instructions. You can see a lot more push/pop instructions.

The main loop at L24 has 3 more instructions in the 32bit code.
Click to expand...

Good point sir, but you know that sample is meaningless in real life code. There will be more critical code parts in real life executables (where the mov instructions will result in more overhead than a couple of push-pop from/to stack and L2 cache) and if you really need to work with matrix you would not use such naive code.

But please, all you just remember I never said 64-bit compiled code is generally slower, I simply state it could be slower. Microsoft is retiring the 32-bit only version of Windows for consumer and this is done for good.

Last edited: May 15, 2020

Alessio1989, May 15, 2020

#35
Richard Nutman Master Guru

Messages:

268

Likes Received:

121

GPU:

Sapphire 7800XT

Alessio1989 said: ↑

You are confusion about pointer variables and memory allocation area. A tiny stack is still better than a bigger stack if the content is the same, less cache misses. The only exception I could think about is packed data structures where you sacrifice the data alignment, and you need to carefully benchmark the trade-off. But x86 doesn't have issues with 32-bit data alignment, on the other hand SIMD abuse or misuse could result in a lot of wast of space due data alignment requirements.
Click to expand...

The alignment requirements for SIMD are the same in 32bit and 64bit. In fact since 64bit guarantees SSE2 which has more unaligned load functions, you get more flexibility with alignment with 64bit code.

Alessio1989 said: ↑

Having 64-bit pointers means having QWORDs for storing them, means more memory, doesn't matter where they are allocated. The text area is also smaller and so is the binary.
Click to expand...

It's only more memory if you're storing loads of them, and the link I gave shows workarounds. You can just use 32bit indexing instead of storing pointers if it results in massive increase in memory. Most of the time pointers reside in registers or local variables and then their size is irrelevant.

Alessio1989 said: ↑

CL is already 64-bit, as are the linker and the debugger and quite all developer tools of the Windows SDK. What Microsoft still not ported to 64-bit is the IDE of Visual Studio (but not the tools, they are already 64-bit). Yes, that would be really appreciated, but we were talking about Windows, they do not need to port nothing to 64-bit, everything is already 64-bit except some executable for legacy technologies which are defunct and never had a 64-bit version (e.g.: very old version of DirectX runtimes).
Click to expand...

Incorrect. Open task manager, any process with a (32) after it is 32bit code.
See here;
https://ibb.co/XknfHwX
This is one example anyway, there are loads more applications still running in 32bit mode.

Alessio1989 said: ↑

Good point sir, but you know that sample is meaningless in real life code. There will be more critical code parts in real life executables (where the mov instructions will result in more overhead than a couple of push-pop from/to stack and L2 cache)
Click to expand...

It's an extremely simple example that shows function calls are way more efficient on x64. As code complexity increases, the 64bit app with more registers will handle that complexity more efficiently.
Mov instructions are extremely cheap, they don't even take time to execute if it's register to register. Besides which there wouldn't be more move instructions anyway.

Richard Nutman, May 15, 2020

#36
Alessio1989 Ancient Guru

Messages:

2,959

Likes Received:

1,246

GPU:

.

Your IDE is using the 32-bit compiler (inside Hostx86 folder I guess) for 64-bit targeting. For using 64-bit compiler you need to launch it from Hostx64 and then your target platform folder. https://docs.microsoft.com/en-us/cp...-cpp-toolset-on-the-command-line?view=vs-2019
Yes that sucks. Visual Studio completely ported to 64-bit would be really nice. At least the debugger is an 64-bit out-of-process program now: https://devblogs.microsoft.com/cppblog/out-of-process-debugger-for-c-in-visual-studio-2019/

Btw mov/pop/push should run identical iif the data are already in register or in stack. mov from/to main memory is the critical part.

And yes, SSE2/../4.2 alignment requirements are the same for 32 and 64-bit x86 code, but most (all?) x86 compilers will try to use and generate SIMD code, especially with /O2 when targeting 64-bit while that's not true when targeting 32-bit. Abusing SIMD may results in slower code.

But again the 64-bit compiled code is generally faster (especially due better calling conventions and modern compilers), but not always. And again the WOW64 performance coast is meaningless on modern hardware.

Last edited: May 15, 2020

Alessio1989, May 15, 2020

#37
mbk1969 Ancient Guru

Messages:

15,646

Likes Received:

13,648

GPU:

GF RTX 4070

Alessio1989 said: ↑

You are confusion about pointer variables and memory allocation area.
Having 64-bit pointers means having QWORDs for storing them, means more memory, doesn't matter where they are allocated.
Click to expand...

Variables (stack or heap) are not allocated in the body of executable. Only static variables are. Hence your statement "Using shorter pointer will, as will the binary from the assembly" is wrong.

Alessio1989 said: ↑

The text area is also smaller and so is the binary
Click to expand...

Which text area? You mean constant strings stored statically? Why are they smaller if they are stored in the same encoding?

mbk1969, May 15, 2020

#38
Richard Nutman Master Guru

Messages:

268

Likes Received:

121

GPU:

Sapphire 7800XT

Alessio1989 said: ↑

Your IDE is using the 32-bit compiler (inside Hostx86 folder I guess) for 64-bit targeting. For using 64-bit compiler you need to launch it from Hostx64 and then your target platform folder. https://docs.microsoft.com/en-us/cp...-cpp-toolset-on-the-command-line?view=vs-2019

Btw mov/pop/push should run identical iif the data are already in register or in stack. mov from/to main memory is the critical part.
Click to expand...

No I'm targetting 64bit builds. The compiler is 32bit but can output code for 32 or 64bit targets.
It's good that they have a 64bit compiler, but that's not what is triggered by VS IDE it seems.

No, push and pop have higher latency and less throughput than MOV's
https://www.agner.org/optimize/instruction_tables.pdf

The point being if the compiler picks the right register for parameters to functions it is not replacing a push and pop with a mov, it doesn't need to do the mov at all!

Richard Nutman, May 15, 2020

#39
mbk1969 Ancient Guru

Messages:

15,646

Likes Received:

13,648

GPU:

GF RTX 4070

Also note that using .Net Framework the same .Net binary executable can be executed both on 32- and 64-bit environment.

mbk1969, May 15, 2020

#40

(You must log in or sign up to reply here.)

Page 2 of 3

Share This Page