It appears you have not yet registered with our community. To register please click here...

 
Go Back [M] > Hardware Madness > Hardware Overclocking and Case Modding
32M and i7, a motherboard's choice? 32M and i7, a motherboard's choice?
FAQ Members List Calendar Search Today's Posts Mark Forums Read


32M and i7, a motherboard's choice?
Closed Thread
 
Thread Tools
Old 7th May 2009, 20:55   #1
[M] Reviewer
 
Join Date: Nov 2004
Location: Waregem
Posts: 6,466
Massman Freshly Registered
Default 32M and i7, a motherboard's choice?

1. Background story

Last two weeks I've been spending more time than I actually have available on 32M, basicly to get a well-tweaked run to add to the list in the low-clock challenge thread over at the OCX benchmark team subforum: OCX SuperPi low-clock challenge...

In spite of the effort of Sergio and Sam to help me out on the tweaking part of the 32M run, it seems that I just can't hit a decent score. I know, I'd be an idiot if I'd say "I can't do it, so no one can", but ... the difference is quite big. I know, I'm not the best tweaker around, but a difference of 10+ seconds is A LOT, especially since I'm absolutely doing every tweak I'm supposed to do.

2. Prelude to Theory

There are a few possible explanations for this issue. First of all, it's just possible that I'm over-tweaking my OS and making it just way slower. It's possible that I don't do the Copy-Waza correct or that my rams are incorrectly tweaked. However, this issue seems not to be related to either the OS (tried multiple different installations, os versions and languages), the method of tweaking (got some excellent tips from Sergio and Sam) or this particular motherboard/memory configuration (tried three different motherboards). So, with the risk of making a complete idiot out of myself, I'd say the problem is located somewhere else.

Looking at the results in the low-clock challenge, it seems that there's one type of motherboards hitting the top spots: Asus. Either the P6T-WS, the P6T6-WS or R2E ... there are only a few people not on an Asus board. And thát is weird.

Now, from my own testing, I consider an 8 minutes 50 seconds relatively good and 8 minutes 47 seconds fully tweaked. As you can see, there's still a gap of ~10 seconds to be explained, something in my opinion cannot be done by a "it's crappy copy-waza" reason. I'm at the moment in the unknown, so the theory I'm going to put forward is definitly subject to discussion. Please, if I'm wrong, tell me the hard way and if you've got something better, please elaborate.

3. Core of Theory

On the Asus motherboards, there are two cpu tweaks that can be enabled or disabled: 'Hardware Prefetcher' and 'Adjacent Cache Line Prefetch'. As tested by Linky today, these two have a fairly large effect on the speed by which 32M is calculated at a given frequency.

Without: 9m55.953
With HP: 9m33.75
With ACLP: 9m51.719
With both: 9m28.435

A pretty interesting difference, I'd say. Now, knowing that:

1) These settings hardly make a difference in 3D benchmarks
2) Superpi 32M is more than ever a cache-related benchmark in which memory tweaking became less and less important once the memory frequency goes over 1GHz

I'd say these two settings changes things on the level of managing the cache of your cpu. In other words: these are two cache-improving features and are the reason why Asus motherboards are so much faster.

This theory actually fits most of the results from the low-clock challenge. Most, but not all ... and as you all know, it's those results that don't fit a theory that are most interesting. Instead of throwing away the theory, let's try to adjust the theory so the results fit .

4. Problematic scores: Part One
- Sam & kiwi - P6Tdlx,3x1Gb@1000MHz@7-6-6 - 8:49.031
- Onepagebook - P6Tdlx,3x2Gb@1000MHz@8-8-7 - 8:49.360
- DeDaL - P6Tdlx,3x1Gb@1000MHz@7-7-7 - 8:53.500

Three of the bottom results of the low-clock challenge. In itself, nothing strange: there are always lasts. However, these three results (and more actually, couldn't list them all) are done by excellent tweakers, but still are in the range of what I'm hitting and, more importantly, are lightyears away from the 8min40's we see at the top of the ranking. So, either these people lost their skills (it's Asus, so should be fast) or there's something else going on.

Looking at the screenshots of these three results, there's one thing that should catch your attention: all bios versions used are from before 2009. Going over the different bios releases at the Asus support site, it seems that all these biosses have something in common: no support for the D0 revision. And this actually makes a lot of sense: after all, in January '09 Intel spread information of the new D0 revision to the motherboard manufacturers.

5. Problematic scores: Part Two
- OC_windforce - DFI UT,3x2Gb@1000MHz@7-7-6 - 8:41.344
- clon22 - MSI Eclipse,3x2Gb@1110MHz@8-9-8 - 8:43.172

These two scores don't fit the theory, because they are in fact a lot faster than I have been able to run: 8 and 5 seconds to be precise. Now, the following explanations are even less solid than the one I used in part one of the problematic scores, hence why I chose to place them in a second category. For the first result, I'd say the D0 revision already has the cache improvements on board and thus the extra boost you'd get from enabling the two prefetchers is non-existant. To come up with this explanation, I remembered what Kevin told me when I blocked his 3DMark05 result on Hwbot: "the D0 revision has instruction set improvements, so the result is better".

For the second result, I'd say the 400MHz extra on the uncore and 100MHz extra on the memory could explain the 4 extra seconds I lose.

5. Conclusion: Formulating the Theory

So, to sum up:

1) It seems that only Asus motherboards are capable of extremely efficient 32M runs
2) Enabling the two prefetcher options seems to be beneficial for Superpi, but hardly for 3D
3) These prefetchers boosts are not seen on bioses released without D0 support, but are both noticed on configurations with C0/C1 and D0 processors
4) Motherboards with bioses that don't have these prefetcher options are slow in combination with an C0/C1, but fast with a D0.

Conclusion: the two prefetcher options give users an advantage of 5~10 seconds over the competition; they contain processor cache-management optimizations that come with the D0 processor and seem to affect the C0/C1 processors only, because these optimizations are already in de D0 revision.

(cross-posting for reference only)
__________________
Massman is offline  
Old 10th May 2009, 01:37   #2
[M] Reviewer
 
leeghoofd's Avatar
 
Join Date: Dec 2008
Posts: 3,209
leeghoofd Fully Registeredleeghoofd Fully Registeredleeghoofd Fully Registeredleeghoofd Fully Registeredleeghoofd Fully Registeredleeghoofd Fully Registeredleeghoofd Fully Registeredleeghoofd Fully Registered
Default

Where is this gain ? in the first itenerations ? I have had issues to get the same time with your RE II My older Gigabut (UD5) mobo kicks the RE II's *** in the first iteneration with about 3-5 secs...
leeghoofd is offline  
Old 10th May 2009, 11:12   #3
[M] Reviewer
 
Join Date: Nov 2004
Location: Waregem
Posts: 6,466
Massman Freshly Registered
Default

Overall: it's clock-per-clock faster (1st iteneration doesn't matter, start comparing from 2nd loop)
__________________
Massman is offline  
Old 11th May 2009, 10:44   #4
[M] Reviewer
 
thorgal's Avatar
 
Join Date: Sep 2005
Posts: 1,887
thorgal Freshly Registered
Default

Very interesting and informative at the same time.

I suggest you post this over at i4-memory as well, or I'll do it if you want to, maybe George could have a look at this.
__________________



thorgal is offline  
Old 11th May 2009, 10:48   #5
[M] Reviewer
 
Join Date: Nov 2004
Location: Waregem
Posts: 6,466
Massman Freshly Registered
Default

Maybe George should have a look at the Madshrimps forums
__________________
Massman is offline  
Old 11th May 2009, 10:56   #6
[M] Reviewer
 
thorgal's Avatar
 
Join Date: Sep 2005
Posts: 1,887
thorgal Freshly Registered
Default

Quote:
Originally Posted by Massman View Post
Maybe George should have a look at the Madshrimps forums
Hehe, we can only ask
__________________



thorgal is offline  
Old 15th May 2009, 10:26   #7
[M] Reviewer
 
Join Date: Nov 2004
Location: Waregem
Posts: 6,466
Massman Freshly Registered
Default

I'm currently looking into an explanation using the back-to-back cas delay timing as cause of the performance slow-down. More will follow.
__________________
Massman is offline  
Closed Thread


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


All times are GMT +1. The time now is 08:40.


Powered by vBulletin® - Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO