Madshrimps Forum Madness - Intel Nehalem, Bloomfield has 8MB of cache

Madshrimps Forum Madness (https://www.madshrimps.be/vbulletin/)

- WebNews (https://www.madshrimps.be/vbulletin/f22/)

- - Intel Nehalem, Bloomfield has 8MB of cache (https://www.madshrimps.be/vbulletin/f22/intel-nehalem-bloomfield-has-8mb-cache-38816/)

Intel Nehalem, Bloomfield has 8MB of cache

To all of our surprise the future Nehalem processors with four cores and eight threads will have 8MB of cache memory. Yorkfield has 12MB, or should we say two times 6MB as this is still dual chip stitched together chip. Each core in Yorkfield has 3MB cache and it looks that Nehalem will have 8MB.

http://www.fudzilla.com/index.php?op...11&Ite mid=35

Sounds about right. Shouldn't be a need for a huge cache with an IMC, but looks like they kept some of the cache for HT use. I'm very curious how big the die is compared to a 45nm Yorkfield... :)

so exactly how will an integrated memory controller help speed.... i imagine it will be easier for the cpu to communicate with the ram.... but how much faster will this make it... i know amd has been doing it but don't know that much about it. if you guys could explain or point me in the right direction of an explanation it would be greatly appreciated. thanks.

reduce latency, faster data exchange = speedier CPU

The entire point of thise insanely large 12mb L2 caches, even 16MB caches is to keep the CPU fed with data to crunch.

If you look at AMD K7 processor verses AMD K8 processor benchmarks you will see the improvement integrating the memory controller will bring... it is the biggest reason for why AMD's K8 easily defeated the Pentium.

For example... Imagine you and two other people in a building... You need to complete a project but lack the information. To get the info you must ask the 2nd person to go talk to the 3rd person to get your info and bring it back to you. If you keep havingto ask questions or you find you don't have all the info you can see how slow this would make completing your project...

The middle guy has to act as the go between for you to get the information you need... Until now the middle guy has always been the chipset, the CPU could never directly talk to or update the memory.

I don't suspect so see performance boost like with K7->K8, with C2D reducing memory latency's at FSB400 doesn't help that much.

It was my understanding that the FSB has an inherent latency penalty already, and that aggressive memory prefetchers only hid the latency issue. ;) So increasing the FSB would not compensate any for the inherent latency penalty. FSB is just a general bus where everything in the system including other processors use it together, they all share the same FSB. Just like with PCI bus, all the cards used the same PCI bus and would hinder each other's performance.

With QPI not only will there be a dedicated point-to-point interconnect between the CPU and chipset, there will bea dedicated connection directly to the main memory. No waiting on other cores and other processors to be done with the FSB anymore. Since Nehalem will also be a native quad design there won't be any added penalty for having cores mesh coherency traffic over the FSB either.

Intel chips still scale amazingly worse compared to AMD Opterons in dual/quad socket servers, even if they still overall perform better against AMD chips... this could only be from the FSB/lack of IMC. ;)

C2D is far more data hungry than K8 ever could hope to be. Nehalem will be even more so, Intel widened the instruction pipe even further to accomodate 4 cores executing 8 threads... I suspect this may contribute to the gains seen

Quote:

Originally Posted by Anandtech

Nehalem allows for 33% more micro-ops in flight compared to Penryn (128 micro-ops vs. 96 in Penryn), this increase was achieved by simply increasing the size of the re-order window and other such buffers throughout the pipeline.

With more micro-ops in flight, Nehalem can extract greater instruction level parallelism (ILP) as well as support an increase in micro-ops thanks to each core now handling micro-ops from two threads at once.

For server/workstation, the bandwidth of the CSI bus is just huge, combined with the on-die memory controller Nehalem will certainly offer a nice gain compared to C2D. On the tech side of things, Nehalem is spectacular, but not for home users... probable not enough data to feed.

We haven't run out of data yet... :) Even single-threaded applications are going to see a performance improvement.

Thought this was interesting: http://blogs.zdnet.com/Ou/?p=1025 Especially for Shanghai's results.

To bad they forgot to mention clock rates :D

Still, at this moment dual core is still favorable over Quads, unless software dramatically changes over the following half year I don't think many home users will need it, maybe in 2 years or so... Theoretically, single core apps may be faster, but will you notice the difference compared with the E8x00 which is doing all ready very good? If so, where?