IntroductionIt has been a while since my latest article for which I blame examinations at school and of course the Gigabyte Open Overclocking Competition, which eventually led me to the great city of Taipei. However, since I have a few weeks of vacation now, I'm able to present you some of the findings of the Madshrimps team regarding the Core i7 platform.
Maybe you remember the X58 motherboard round-up which was published already four months ago, close to the release of the Core i7 platform. Having 7+ X58 motherboards at home (I was sent a couple more after the round-up) gave me, and my collegues, the opportunity to digg deeper into the different aspects of overclocking an i7 processor. In fact, we focussed on three different aspects:
BCLK frequency
Memory frequency
The combination of BCLK and memory frequency
As for this article, we'll be spending time on one setting that seems to give us the best solution when running into either stability or performance problems when overclocking the memory. It seems to be one of the key elements to stabilize the memory when running at high frequencies.
Back-to-Back Cas Delay

This one setting I'm talking about is a memory timing called "Back-to-Back Cas Delay" (referred to as B2B in the article). To explain you why this particular timing is so interesting, I need to tell you the story of how we figured out all this, since it's vital to the understanding of it's impact on stability. First of all, this timing seems to be available on Asus motherboards only. Why? We don't know, but on the Asus motherboards it defaults at "0" when set to auto in the bios. But, before we digg in deeper: the background story.
To start with, I have to explain a concept named "low-clock challenges", which is mainly used within the enthousiast community to work out a software/hardware configuration as efficient as possible. Basicly, you limit the maximum frequency of the cpu and then try to get a score as good as possible using a certain benchmark, in this case 32M. This allows the overclockers to compare their tweaking skills and, if necessary, figure out hardware-related performance problems.
Now, when tweaked properly, a 4GHz Core i7 combined with 1GHz CL7 (2000CL7) memory will give you a time around 8 minutes 50 seconds (in the 32M benchmark). However, certain people seem to be able to run at least 10 seconds faster, which could not be explained by software tweaks solely. To keep the story short: it occured to me that almost all fast configurations were Asus-equipped.
On the Madshrimps forums, I wrote quite a lenghty post regarding my performance issues in the 32M low-clock challenge. At initial stages, I believed the hardware prefetcher options in the bios were causing this since the Asus motherboards were the only motherboards that had them available in the bios. However, after testing both the Rampage II Gene and Foxconn Bloodrage (new bios has prefetcher option) I know that those two prefetchers are not the cause. I presume the best way to read that post is just for understanding that there's a significant difference in clock-per-clock performance between certain motherboards.
(Link: 32M and i7, a motherboard's choice?)
The second part of the story is actually located in another thread, but since the information is spread over numerous forums, I'll give you the details in short. Basicly, Jody (3oh6, hardwarecanucks.com) and I were testing the Rampage II Gene at the same time and both our motherboards were incapable of running an Elpida-based memory kit over 930MHz, no matter what voltage or cas latency. After a few hours (okay, 10+) I found a reasonably stable configuration when playing with the Back-to-back Cas Delay timming. There are a couple of weird issues with this timing, though.
1) The instability caused by this timing doesn't scale as you'd expect. It's equally unstable set at "0" as it is set at "10" whereas you'd expect it to be more unstable at a lower value.
2) The instability itself is not typical for memory instability: instead of getting a BSOD or a reboot, the system just locks up.
3) The instability seems to be very particular to certain benchmarks. I am perfectly capable of running Superpi 2M as much as I want and even copying large files doesn't cause the system to lock up, but after 2 seconds of Superpi 4M the system hangs.
4) In dual channel configuration, there's absolutely no issue whatsoever with this timing: this value set at "0" is perfectly stable
5) The issue cannot be solved by increasing the voltage or loosening the timings. The setting "10" is as unstable at 1.8v Vdimm and 1.6v Vqpi/dram as it is at 1.65/1.50v
A fellow overclocker has tested the influence of this timing on the performance in 32M and changing this value from 0 to 12 has a negative effect on the performance as follows: at loop 4, the benchmark is already 1 second slower. Taking into account that the benchmark has 24 loops in total, that would mean that changing this timing will cause you to lose about 6 seconds, maybe 7. And, coincidence or not, this is almost exactly what I'm losing in comparison to the best scores. This "finding" also occurs on other motherboards; initially I was using the DFI Lanparty DK X58-T3eH6 for testing purposes and again I had times around 8 minutes 50s. But, this board gives me perfect 200/2000 stability out of the box ... so I assume the back-to-back cas delay timing has been set to 12 by default on this motherboard to maximize stability.
In short: the Back-to-Back Cas Delay timing has a significant effect on the stability of you memory overclock, but seems to have an effect on the performance as well. On the next page, we tried to find out how much effect the timing has on the performance.
Q: "On page 1 it sounds like no matter what setting you use, its not stable."
A: No, what I say on page 1 is that one of the weird characteristics of the issue is that the instability doesn't scale. So, it's not because you'd raise the timing by one that you'll get a more stable system by definition. I tried from 0 upto 10 and every single setting was equally unstable: 2M no problem, 4M crash after 1 or 2 loops. The non-scaling characteristic is also shown at the point where it gets stable: at 11 I couldn't do 4M, at 12 I could do anything.
Q: "So you mean, at some point losening up b2b doesnt improve stability, that memory speed is simply unstable and changing b2b doesnt change that."
A: That's something I forgot to mention in the article, well, at least forgot to mention explicitly. It's indeed true that the instability is only under load ... and apparently not under all load. As the article states: 2M was completely stable, copying files also ... 4M crashed after 2 loops or so.
But, changing the B2B value definitly helps to get more stability, no doubt about that. The problem is that it doesn't scale like you'd expect. For instance, tRas you can increase by one and make it more stable. With B2B, that's not the case: it's either fully stable or half 'n' half.
Eg:
0 - unstable
4 - unstable
8 - unstable
9 - unstable
10 - stable
11 - stable
So, 0 would be equally unstable as 9.
Q: "I don't understand the Performance scaling graph on page two"
A: I have had a colleague ask me about the graph as well, haha. Basicly, I calculated the effect of changing each variable in the different test. The longer the bar, the more effect a certain variable has in that particular test. It's a different representation of the 5 other performance charts. I thought it would be more clear, but apparently the opposite is true. So, for instance, in the Everest-Copy benchmark, the B2B timing has the most effect.
Q: "I find the title very misleading, i like the article but its not at all what i expected to find with that title"
A: Sorry about that