Massman | 13th November 2009 20:34 | Okay. A few minutes after I've sent the email, I'm pretty sure I've figured out what happened with Gulftown. Well, not certain, but at least I have a concept that works in theory. Quote:
Originally Posted by Massman the interconnect is increased to 144-bit, 48 bit from each channel per uncore clock. But that would mean that the gulftown uncore-dram clock ratio could be dropped to even lower than 1,5x namely 1,33x. Haven't seen a screen of that so far ... and it wouldn't make any sense. | This is an issue we have seen come up with the Lynnfield as well, where the 128-bit memory bus width could be transfered to the CPU by having an uncore clock frequency of 1,33x the memory frequency as "96 x 1,33 = 128". The problem is described underneath: Quote:
Originally Posted by Massman
(Post 242353)
For dual channel configurations, which are 128 bit wide, it's a more complicated problem since there's no easy fix as with triple channel (just x2). Basicly, in a perfect world, increasing the uncore frequency by 1,33 would do the trick as instead of 96 bit per clock cycle you would then be able to address 96 x 1,33 = 128 bit in one clock cycle, which is the full dual channel bandwidth. The problem, however, is that this would make the register management quite difficult as you can see on the graph underneath:
Basicly, the uncore register would have to be aligned at the 1/3 and 2/3 mark. Or, put differently, the system has to make note where the first register output ends (1/3 of the second 96 bit series) and the second output ends (2/3 of the third 96 bit series). It's not technically impossible, but far from an elegant (= efficient) solution. Much easier is to increase the frequency by 1,5: the only alignment is the one at 1/2, which is just splitting up into two pieces. | The same story can be applied to the Gulftown, assuming that the uncore-cpu interconnect width has indeed been increased to 144-bit. In an attempt to make the memory bandwidth transfer as efficient as possible, we could state the the 192-bit memory data can be transfered as 144+48 bit, or, 3x 192-bit can be transfered as [(64+64+16)+(48+64+32)+(32+64+48)+(16+64+64)] which equals 4 blocks of 144-bit. The problem is allignement ... as you can see: the blocks have to be split up in a lot of pieces, which means set marks at all those cut-offs. Again, much easier would be to sacrifice a bit of uncore to make things less complicated. So, instead of a 4:3 uncore-memory ratio, we go for a 3:2 uncore-memory ratio.
I will make a similar sketch as I did for the Lynnfield. To be honest, this is really good! It fits the whole concept. It's perfect! |