It looks like this is it ...: In the past few weeks, we've seen various fishy rumors on the product specifications of first discrete GPU using the upcoming 28nm Kepler architecture the GK104. While we have known parts of the specifications, such as no hot clocks, the doubling of Streaming Multiprocessor (SM) node from 48 to 96 CUDA cores (i.e. Stream Processors), 256-bit memory controller, the real specifications are (finally) here... even though, our information differes minimally from information originally posted on 3DCenter.org. NVIDIA Kepler GK104 Architectural overview: at first look, very similar to GF110, but then you take a deeper look: 1536 Stream Processors instead of 512! First and foremost, in NVIDIA's internal nomenclature, this part should be named GeForce GTX 660 (the company is debating GeForce GTX 660, 670 or 680 - and the final verdict will 99% be GTX 680). This is a 349-399 dollar part which in conventional way would replace the 300-dollar "GeForce GTX 560 Ti 2GB", but will offer higher performance than GTX 580. Significantly higher… and more importantly, not just beating the $449 Radeon HD 7950 3GB, but also endangering the $549 Radeon HD 7970. Yeah, it is that fast. Why? Because we're talking about 1536 CUDA cores divided in four Graphics Processing Clusters (GPC), all of which contain four Streaming Multiprocessors (SM). Given that there are 96 Stream Processors (or CUDA cores, NVIDIA seems they cannot make up their minds how to call them), we can see that for instance, the entry-level Kepler has a single SM unit with 96 CUDA cores/Stream Processors. Can you say… a mobile GPU part that allegedly taped out ages ago… and just by some accident, ended in a Samsung notebook? Only time will tell for those. The base combinations for NVIDIA future GPUs now are 96 (1SM), 384 (1GPC), 768 (2GPC), 1536 (4GPC), 2304 CUDA cores/Stream Processors (6GPC). Given that we our sources are telling us the big monolithic die comes with 2304 SP, the question is what can be done with the memory controller. The logic dictates Kepler can come with the following memory controller configuration: 64-bit, 128-bit, 192-bit, 256-bit, 320-bit and 512-bit: to us, it is most logical that we see 64-bit low-end, 128-bit mainstream, 256-bit high-end and either 384-bit / 512-bit on the high-end compute side - and GeForce GTX 690, but this time as a single monolithic die, instead of typical mix'n'match of two high-end GPUs. Continuing with the GK104 GPU, the chip has the same amount of fixed-function logic as competing Tahiti XT - 32 ROPs (Raster OPeration Units) and 128 TMUs (Texture Memory Units). As you can see in our architectural mockup, the decision to go with 256-bit memory controller results in 2GB GDDR5 and this is the only part where NVIDIA really loses to AMD: both 7950 and 7970 come with 3GB GDDR5 memory. True, the difference in planned price is estimated at $100 less for NVIDIA boards ($349-399 versus $449/7950 and $549/7970), which should mitigate the paper advantage of the HD 7900 Series. How high can it go? Just like GF110, the GK104 comes in two different versions: the GeForce board will run double-precision at one sixth rate - while Quadro and Tesla will run at typical half-rate. Just like AMD Southern Islands, we were told by one source that there is an architectural possibility of full rate DP (instruction, cache sizes) - but we do not believe in fairy tales. The GPU clock is estimated at 950MHz, but our sources are telling us that there are different clocks running in Lab: 772MHz for clock-per-clock versus GTX 580, 925MHz for clock-per-clock versus Tahiti XT, while the clock range for the shipping parts is between 950 and 1000MHz. We were told that NVIDIA did not laugh too much at Verdetrol performance enhancing pills and that the company is trying to tweak the BIOS (more importantly, thermal envelope) in order to get the parts running at 1GHz. If NVIDIA fails, the partners are certain to offer a 1GHz board (just like in case of Tahiti XT and 3rd party vendors). The memory is set at 1.25 GHz in Quad-Data Rate (QDR, i.e. 5GHz "effective"). This 25% boost over GF100/GF110 is something that thrilled NVIDIA engineers, since this is the first time their memory controllers were able to reach AMD with stable default clock frequency. Remember, unlike GDDR3 memory, GDDR5 is "activelly driven" and memory controller does much more than it used to. Given that AMD is actually the company that creates the memory standard, AMD's GPU engineers actually have a good advantage in terms of just how high can they clock the GDDR5 memory. This clock results in 160GB/s video memory bandwidth, a drop from GTX 580 (192.4GB/s), but a big boost over GTX 560 Ti and its 128.27GB/s (excluding the OEM versions), and just a bit higher from GTX 560 Ti OEM (GF110 die), GTX 560 Ti 448 Cores LE and GTX 570, all having the same GDDR5 memory clock and bandwidth of 152GB/s. All of this results with 2.9 to 3.05 TFLOPS single-precision, i.e. 486-500 GFLOPS double-precision. Quadro and potential Tesla versions of this board will feature unlocked double-precision, meaning identically clocked board would have around the same amount of DP-GFLOPS as GTX 580 had single-precision… an impressive boost indeed. In any case higher than what Fermi-based Quadros and Teslas were able to achieve. You won't need to wait for too long, as NVIDIA is already starting pre-sale activities, and getting ready to counter AMD and their momentum with the Radeon 7700 (Cape Verde, February 15), 7800 (Pitcairn, March 6) and 7900 Series (released). http://www.brightsideofnews.com/new...k1042c-geforce-gtx-670680-specs-leak-out.aspx
Nope more like a recap, mostly from that chiphell pic., OBR, S|A and a little FUD on top lol 670/680gtx both made from GK104 and both with 256bit bus? That's not a high end gpu. And GK104 has max 2300cores? and not GK100?/110 like according to that fake chiphell slide? But if its true, and it actually needs 1500 cores to barely beat 512core Fermi(hot clocks or not), is very weak.
ROFL what made u think GK104 is high end part compared to GF110: 3x shaders running on 2/3 speed I am not sure if you can call that weak What's weak is bandwidth, but that only points to some crazy optimizations, or to purpose crippling
This is different than in Fermi. So these 1536 would stand as 768 CUDA cores in Fermi, which is double the amount of the GF114. TMU and ROP are identical to Tahiti XP.
We dont know yet how the cuda cores in Kepler will work ... 40 to 28nm, dont allow to push 4x more CUDA cores in it ( GK100 or 110 )... (not the same cuda cores anyway ) the GF100 had 512SP ( GF110 unlocked ) in a 520+mm2 die size... ( 28nm allow something like 30-40% save on die space vs 40nm, as hinted by AMD) The cuda cores there should be a lot smaller for be able to pack so much in a die size who will stay around 550mm2 for the GK110 (or GK100). And if they are smaller, they are or "less efficient" or dont have the same function as before. I dont know how you arrive to 768cuda cores, cause the Hotclock drops, you divide it by 2 ? Or cause you think they dont anymore include FP+Int in 1SP? Anyway, If thoses spec are true, this will be really interessant to see what change have made Nvidia to thoses SP.
Try 43% smaller die space, i.e 57% of original area - as hinted by AMD (still less then perfect 28nm shrink, which would give you 50% less die space) For a straight Fermi die shrink on 340mm2 chip, that would give you 580 old Cuda Cores. That is already 13% higher then GTX580. 340mm2 is HUGE! It should allow high performing chip even with old Fermi arch.
Can't mods just make a NVIDIA RUMOURS thread in Nvidia section and move anything like this there? Kinda getting annoyed at these now. If there really WAS any news or hints of anything, Hilbert would be ALL OVER IT.
How about you stay on topic. These rumour threads generally descend into the down sticky stuff. Let's stick to the article discussion.