Our first hands-on experience with NVIDIA´s Fermi comes from Sparkle GTX 470, a high end video card which we compare a whole bunch of previous and current generation competitors. Is the nextgen from NVIDIA worth the money? Read on to find out.
After a very long delay, Nvidia have finally released their 40nm high performance GPUs, with the intention to take the crown from AMD. The GPU has “Fermi” code name, fully supports DirectX11 and has an incredible number of3200 million transistors.
Most people expected the flagship, GTX 480, to be launched with 512 Shader Processors, but instead it has 480. The 512 Shader variant went to Quadro video cards, which cost an arm and a leg.
The GF100 chip core layout is divided into 4 GPCs (Graphic Processing Clusters); each GPC has four SMs (Streaming Multiprocessors) each which incorporate 32 Shader Processors (or CUDA cores); this totalizes 512 Shader Processors, like the Quadro cards have.
The GTX 470, with its 64 cores missing, has also the ROP and L2 cache and memory controllers cut.
With this modular design, it is easier to create in the future GPUs with less and less Shader Processors and cover all mainstream and entry level markets.
In Polymorph Engines that we can see in the diagram we can see the five stages:
These stages process the information from each Streaming Multiprocessor they are associated with. The data will then get to the Raster Engine which has 3-pipeline stages, that pass data from current to the next.
In conclusion, we can find a total of 16 Polymorph Engines and 4 Raster Engines, one for each GPC.
The latest GPUs from Nvidia fully support DirectX11 instructions which include GPGPU (DirectCompute 11), tessellation and improved multi-threading; they also come with Shader Model 5, better shadows and HDR texture compression.
Tesselation, as described on the Unigine website, is a “scalable technology aimed for automatic subdivision of polygons into smaller and finer pieces, so that developers can gain a more detailed look of their games almost free of charge in terms of performance. Thanks to this procedure, the elaboration of the rendered image finally approaches the boundary of veridical visual perception: the virtual reality is vivified at your fingertips delivering engaging gaming experience.”
Here is a modeled house inside the Unigine Heaven benchmark, without and with the tessellation feature enabled:
The multi-threaded rendering is similar to the techniques applied for the current CPUs. If a shader or an instruction has to be queued up, the process creates a delay. The current GPUs can now process data completely threaded, which bring a better overall performance.
The DirectCompute feature allows access to the GPU for stream computing; it shares a range of computational interfaces with its competitors: OpenCL and CUDA.
3D Vision Surround and Nvidia Surround are the responses to ATI’s Eyefinity. Nvidia Surround allows the use of 3 separate monitors to be used in 3D applications, with an SLI setup. This feature is backwards compatible with GT200 and GF100 series and allows a maximum resolution of 2560x1600 per monitor. The 3D Vision Surround offers stereoscopic viewing when using 3 monitors at once. To use this feature we also need three 120Hz displays, same model and same make to ensure uniformity; the system must be very powerful, because it needs to process six high-res 1920x1080 images, two images for each 120Hz monitor.
About Nvidia PhysX
Delivering physics in games is no easy task. It's an extremely compute-intensive environment based on a unique set of physics algorithms that require tremendous amounts of simultaneous mathematical and logical calculations. This is where NVIDIA® PhysX™ Technology and GeForce® processors come in. NVIDIA PhysX is a powerful physics engine which enables real-time physics in leading edge PC and console games. PhysX software is widely adopted by over 150 games, is used by more than 10,000 registered users and is supported on Sony Playstation 3, Microsoft Xbox 360, Nintendo Wii and PC. In addition, PhysX is designed specifically for hardware acceleration by powerful processors with hundreds of cores. Combined with the tremendous parallel processing capability of the GPU, PhysX will provide an exponential increase in physics processing power and will take gaming to a new level delivering rich, immersive physical gaming environments with features such as:
* Explosions that cause dust and collateral debris * Characters with complex, jointed geometries for more life-like motion and interaction * Spectacular new weapons with incredible effects * Cloth that drapes and tears naturally * Dense smoke & fog that billow around objects in motion
The only way to get real physics with the scale, sophistication, fidelity and level of interactivity that dramatically alters your entertainment experience will be with one of the millions of NVIDIA PhysX-ready GeForce processors.
Here are some PhysX demos from Youtube :
Fluid demo :
The Great Kulu :
Deformable objects :
NVIDIA® CUDA™ is a general purpose parallel computing architecture that leverages the parallel compute engine in NVIDIA graphics processing units (GPUs) to solve many complex computational problems in a fraction of the time required on a CPU. It includes the CUDA Instruction Set Architecture (ISA) and the parallel compute engine in the GPU. To program to the CUDATM architecture, developers can, today, use C, one of the most widely used high-level programming languages, which can then be run at great performance on a CUDATM enabled processor. Other languages will be supported in the future, including FORTRAN and C++.
With over 100 million CUDA-enabled GPUs sold to date, thousands of software developers are already using the free CUDA software development tools to solve problems in a variety of professional and home applications – from video and audio processing and physics simulations, to oil and gas exploration, product design, medical imaging, and scientific research.
Technology features :
* Standard C language for parallel application development on the GPU * Standard numerical libraries for FFT (Fast Fourier Transform) and BLAS (Basic Linear Algebra Subroutines) * Dedicated CUDA driver for computing with fast data transfer path between GPU and CPU * CUDA driver interoperates with OpenGL and DirectX graphics drivers * Support for Linux 32/64-bit and Windows XP 32/64-bit operating systems