I came into contact with MS, owner of the famous 'lostcircuits' website which deals with technology much more in-depth than we do. Although the following remarks are posted under my nickname here on this forum, all the credit for these insights go to 'MS'!
Link to discussion:
http://www.lostcircuits.com/forum/vi...php?f=3&t=2609 I will try to simplify the concepts for those who need more explanation; drawing some graphs as we speak.
--------------------------------------------------------------------
Quote:
Excerpt from forum post: "This (133x4=666MHz) is the memory running at highest possible frequency, at a stock BLCK frequency. But, that's not the most interesting part about this screenshot. Please have a look at the frequency of the uncore: 1995MHz. On a triple channel i7 platform, the minimum frequency of the uncore would've been 2660MHz, as in 2x the memory frequency (1330MHz DDR); on this dual channel platform, we see that the minimum uncore frequency is set at 1,5x the memory frequency. Why? I don't know. At least, not yet ... |
That one is easy- at least with a few assumptions. In triple channel mode, the uncore clock has to be twice that of the memory because in a triple channel configuration, the combined bit bus width meets the interface between the L3/uncore to the CPU that we assume to be at least 96 bit wide (1/2 of the 192 bit wide memory data path). In other words, every triple-channel memory transaction has to be split into two transactions between the uncore and the core, or else the L3 if it is used for prefetch. Only caveat is: we don't know the width of the uncore-core interface on the Core i7 (central queue) and I have not been able to find any conclusive data on this feature. However, if the uncore (or non-core in Intel's parlance) is interconnected with the core through the assumed 96 bit interface, then a 128 bit memory transaction will take at least 1.33 cycles to transfer. If you make the uncore clock 1.33 x of the memory then you use the entire possible bandwidth of all registers that are involved but you will end up with alignment issues in that the first transaction ends 1/3 into the register, the second one will have to start overlapping at 1/3 and end at 2/3 and so on, making things a bit complicated. Easier is to throw away a bit of frequency and use a 1.5 x clock where you have always full transactions with a boundary at 50% of the register width. You throw away a few bits but management is much easier that way.
Off all the different features on the architecture, that particular interface is one of the less likely items to be changed, whereas the memory controllers are just modular blocks that can be thrown in or deleted at lib.
Quote:
After exchanging ideas with several more knowledgable people, it seems to me that the limitation of a minimum uncore frequency being twice the memory frequency is more of marketing purposes than really a technical limitation. To explain this, let's have a look at the AMD platform first, because it's also a dual channel DDR3 platform. AMD representatives acknowledge the fact that to really put the memory frequency to good use, you need a NB frequency of at least 3x (DDR -> 1,5x) the memory frequency. So, with an IMC frequency of 2GHz, you would only need a memory frequency of 667MHz (DDR3-1333). Anything higher would still scale but less and less intensively, at which you can ask yourself the question if you want to spend the extra money on high-rated memory kits. This 'theory' has been developped, tested and confirmed by Tony of OCZ, click for more information.
|
Different architectures and interconnect width will result in different requirements for the interaction of the different components. The AMD interconnect, for all I know, is totally different from the Intel architecture, there you have independent memory controllers, whereas, in Intel's approach all three controllers are always doing the same thing (even if the physical addresses may possibly vary. BTW, we found the same thing with respect to memory frequency vs. NB frequency and resulting performance scaling, 2.4 GHz is the minimum to get DDR3 1600 really going.
Quote:
For Intel-based platforms, the story isn't that much different: the memory controller is now integrated in the processor and the clock frequency of it can form a bottleneck with high-frequency memory. The novelty about the i7 was the newly introduced third memory channel, which should increase the memory bandwidth significantly. Many tests, however, confirm that the extra channel doesn't have that much of an effect in most benchmarks, let alone in daily computing activities. And that is a big problem when trying to sell the product: who wants to pay more for something that doesn't work in the first place? The technique is quite simple: make it look like it works. And that's where the limitation of "uncore >= 2 x memory" kicks in: with an even lower uncore frequency, the added memory channel would have had even less effect than it has now. Less than almost insignificant, that's bad PR. Technically, it seems possible for the uncore to run at a lower ratio than 2:1, but weirdly enough none of the motherboard manufacturers seem to have added this option to their bios, although it would help people reaching 2000CL7 on air cooling since the memory overclock is very often limited by the uncore frequency."[/i]
(~ http://www.madshrimps.be/vbulletin/f...5-gd80-65278/) |
You are mixing two different things here. One is the synthetic throughput that really depends on the uncore running 2 x the frequency of the memory, the other one being the fact that finally the actual core is starting to saturate with the actual amount of data that is incoming and that the processing units cannot digest the data as fast as they are delivered. That is one main difference to the older Intel architectures including the Core2 where memory bottlenecks were the biggest problem (even though it wasn't really the memory but the AGTL bus and the fact that every request had to be snooped on a bi-directional bus, leading to some 50-70 % of possible memory utilization only.
Quote:
The question is in fact quite simple: am I being too critical or thinking too much in lines of conspiracy theories to believe that manipulating the Uncore frequency is just a marketing tool rather than battling with technical limitations? The LGA1156 platform shows me that it's perfectly possible to have an Uncore multiplier running lower than 2x the memory frequency and I'm quite reluctant to believe it's because of the missing third memory channel.
|
Actually, if you do the math, then that's what it comes down to. Intel has vast experience with this type of data buffering from the days of the AGP bus (internally or externally), I have done enough reverse engineering on this feature (which was originally developed by HP IIRC). A 2:1 ratio is the easiest because you don't have to do any split-transactions, instead, you transfer the entire width of the register every time. With a 1.5 x ratio it is still easy because you split the register down the middle, so you don't run into alignment issues that you would have with a 1.33 x multiplier where you would be stuck with 3 segments and have to track where the boundary ends up after each transaction and subsequent re-fill. This is actually the tidbit that makes me believe that the interconnect is less than 128 bit wide, otherwise the dual channel memory could be transferred in a single transaction but again, most of what I wrote is based on that one assumption of a limited uncore-core bus-width.
Quote:
Also, it seems that the i5 7xx series only have memory ratios upto 2:10 (or 5x), whereas the i7 8xx series have ratios upto 2x12 (6x) ... to feed the 8 threads which are present on the 8xx, but not on the 7xx? In any case: for more multipliers, you need to pay more. Coincidence or marketing strategy?
|
Those ratios will change with the actual release of the parts into the market.
Quote:
In any case, if it's indeed just a marketing tool, and there's no technical limitation regarding Uncore/memory frequency, why has no motherboard manufacturer been trying to figure how to 'crack' the limitation? I mean: most of the performance enthousiasts are ignorant when it comes to finding the right balance between frequency and timings; 90% just applies the "more equals better"-rule and buys $350 2000CL7 memory kits only to find out their CPU isn't capable of running 4GHz uncore on air cooling. Having a motherboard that allows users to downclock the uncore would be a smart move marketing-wise.
|
Because if they did, the processor would run into a buffer overflow. Imagine you have one register of 1/2 the bus width and then you force it to cycle at less than 2 x the speed. That is you have one big mug of coffee that you are trying to drink but the mug is too heavy so you have a small cup as intermediate carrier. The big mug has twice the volume of the small cup. How many times do you have to empty the small cup if the big cup fills up once every minute?
I'll check on the interconnect width again, I might be wrong but I think I remember having this discussion with some Intel folks.