Thursday, 10 July 2014

QUADRO FOR DESIGN AND MANUFACTURING

The NVIDIA Quadro K2000 graphics board offers the perfect blend of performance and the latest productivity enhancing technical innovations at a very reasonable cost for a wide range of leading professional applications. It features outstanding NVIDIA Kepler architecture performance and quad display support to enhance professional productivity and creativity. 2GB of GDDR5 GPU memory, 384 SMX CUDA parallel processing cores, the ability to drive up to four displays simultaneously, and full Shader Model 5 compatibility in a single slot form factor which requires no auxiliary power to deliver full performance.
Designed and built specifically for professional workstations, NVIDIA Quadro GPUs power more than 150 professional applications across a broad range of industries including manufacturing, media and entertainment, sciences, and energy. Professionals trust them to realize their most ambitions visions – whether it's product design, visualization and simulation, or spectacular visual storytelling – and get results to market faster, more profitably, and with superior visual quality. 
  • neweggULTRA-QUIET DESIGNSilent cooling design enables acoustics lower than 28db for an ultra-quiet desktop environment.
  • neweggKEPLER GPU ARCHITECTURENVIDIA's Kepler GPU architecture has been designed from the ground up not just for maximum performance in the latest DirectX 11 games, but optimal performance per watt. The new SMX streaming multiprocessor is twice as efficient as the prior generation and the new geometry engine draws triangles twice as fast. The result is world class performance and the highest image quality in an elegant and power efficient graphics card.
  • neweggNVIDIA CUDA TECHNOLOGYParallel-computing architecture that tightly integrates advanced visualization and compute features to significantly accelerate professional workflows. Quadro solutions leverage general-purpose GPU computing using standard programming languages like C/C++ and Fortran, and emerging APIs such as OpenCL and Direct Compute. This broad adoption of CUDA accelerates techniques like ray tracing, video and image processing, and computation fluid dynamics.
  • neweggTWO NEW ANTI-ALIASING MODES: FXAA AND TXAAAnti-aliasing smoothes out jagged edges but can be demanding on framerates. FXAA is a new antialiasing technology that produces beautiful smooth lines with minimal performance impact. And with Kepler based GPUs, you'll be able to enable FXAA in hundreds of game titles through the NVIDIA Control Panel. The second mode, TXAA, is an in-game option that combines MSAA, temporal filtering, and post processing for even higher visual fidelity.
neweggNVIDIA 3D VISION AND 3D VISION PROAdvanced active shutter glasses deliver crystal-clear stereoscopic 3D visualization for the most immersive experience. Infrared (3D Vision) or RF (3D Vision Pro) technology enable a range of immersive environments from your desktop workstation to collaborative work spaces. 3D Vision and 3D Vision Pro are sold separately.

QUADRO K2000 QUICK SPECS
CUDA Parallel-Processing Cores384
Frame Buffer Memory2 GB GDDR5
Max Power Consumption51 W
Graphics BusPCI Express 2.0 x16
Display ConnectorsDVI-I (1), DP 1.2 (2)
Form Factor4.376" H x 7.97 L Single Slot




With the Quadro K5000 this is done natively using the 2 x DisplayPorts and 2 x DVI ports. The Quadro K2000 and K4000 support three displays natively (2 x DP + 1 x DVI), but can bump this up to four by using the multi-streaming feature of DisplayPort 1.2, where displays are daisy chained. The Quadro K600 is fixed at two displays (1 x DVI and 1 x DisplayPort).


Performance


SPECviewperf 11.0

In our tests of professional graphics cards, we first run SPECViewperf, which has become an industry standard for benchmarking graphics workstations. SPECViewperf sends predefined sequences of OpenGL instructions to the graphics card driver, describing rotations of complex models typical of various professional applications. By simulating such rather primitive operations, this benchmark reports the geometrical performance of a graphics card which is determined by hardware features as well as driver optimizations.
The scripts integrated into SPECviewperf version 11 emulate the user’s work in wireframe mode in the following professional applications (the names of corresponding tests are given in brackets): LightWave (lightwave-01), CATIA (catia-03), EnSight (ensight-04), Maya (maya-03), Pro/ENGINEER (proe-05), SolidWorks (sw-03), Siemens Teamcenter Visualization Mockup (tcvis-02) and Siemens NX (snx-01).
For all its downsides, SPECviewperf 11.0 can show us the effect of full-screen antialiasing on performance. In the diagrams below you can see how the SPECviewperf scores change when we enable various FSAA modes.


The professional cards from AMD and Nvidia differ in their behavior with full-screen antialiasing. Nvidia's cards slow down as the level of FSAA increases, which is just what we might expect. The AMD FirePro series, however, often deliver the same performance irrespective of FSAA levels. Moreover, they do not support FSAA levels higher than x16. All of this makes us suspect the FirePro W series driver to share a large amount of code with the gaming Catalyst driver. The FirePro series behave like gaming rather than professional cards in SPECviewperf.

Power Consumption

In this section we want to show you how much power is needed by the complete systems (without the monitor) equipped with the tested graphics cards. We measure the power consumption by means of the Corsair AX1200i power supply. The result is the sum total of the consumption of each system component. The PSU’s efficiency doesn’t affect it.
There are two test modes: idle and high load (FurMark 1.9.2 running in Burn mode in a 1280x720 window). We use FurMark for the high-load mode since, like most professional applications, it is based on OpenGL and is a heavy load indeed.
Comparing professional cards from the same price categories, the Quadro series looks far more economical. The Quadro K5000 configuration needs almost 50 watts less than its FirePro W8000 counterpart. The Quadro K4000 consumes 34 watts less than the FirePro W7000. Nvidia puts an emphasis on this fact, counting the energy efficiency of its Quadro series among its advantages. Well, it might be expected as the GPUs of the Quadro cards do not have maximum specs and the cards themselves don’t have two additional power connectors.
The FirePro series, on its part, is close to gaming cards in its design. That’s why their heat dissipation and power consumption are comparable to those of the Radeon HD series, which means that AMD-based workstations are going to be noisier and hotter than their Nvidia-based counterparts.



Benchmark results


3ds Max 2013
For our 3ds Max benchmarks we have three high-poly scenes all running in the Nitrous viewport mode with realistic shading enabled. Our SL500 scene has reflections enabled and multiple light sources. The SDF-1 Scene also has reflections enabled, and a little under 2.8 million polygons. And lastly, the steampunk tank benchmark draws over 1,050 objects totalling 6.9 million polys.


131107_GPU_3dsMax_new



Back in 2011, 3ds Max greatly favoured Nvidia hardware over AMD GPUs. This time, as a whole, the AMD 
cards perform as well as, or slightly better than, their Nvidia competitors. Both manufacturers’ current-generation cards also outperform their previous-generation counterparts by significant margins.
Maya 2013
All of the Maya benchmark scenes are run in Viewport 2.0 shading mode with ambient occlusion and shadows enabled. The landing pad scene has relatively low polygon counts, but is shader-intensive; while the troll statue and GTR scene are polygon-intensive.
131004_GPU_MAYA
As in the 2011 tests, the AMD cards take an overall lead here, albeit by a smaller margins, with the current-generation FirePros outperforming the current-generation Quadros in two of the three benchmarks.
Softimage 2013
The Softimage benchmarks consist of a scene pushing 3.6 million polygons with textures enabled, and a scene that is less polygon- and texture-intensive, but running in High Quality display mode with reflections, ambient occlusion and shadows enabled.
131004_GPU_Soft
Like 3ds Max and Maya, Softimage also favours the AMD hardware: by a small margin with the motorcycle scene, and a much larger one in the high-poly scene.
Modo 701
For Modo, we are benchmarking viewport performance with a mid-density car model, and a higher-polygon battleship model. Both models are rendered with the Advanced OpenGL mode.
131004_GPU_Modo
Here, the banner changes hands, with the Nvidia cards taking a commanding lead over the AMD hardware. However, on every card, frame rates are more than usable in both tests. Although, given the differences in display modes, it’s hard to make accurate comparisons, it’s noteworthy that Modo’s frame rates are significantly higher than those of any other 3D application on test, regardless of whether you are using AMD or Nvidia hardware.
LightWave 11.5
Our LightWave benchmark is a moderately complex test model running in the Textured Shaded Solid display.
131004_GPU_Lightwave
Like Modo, LightWave runs significantly faster on the Nvidia hardware. However, the only AMD card on which frame rate becomes a practical issue is the previous-generation FirePro V5900.
Cinema 4D R14
Our Cinema 4D benchmarks consist of two scenes, one static and one animated, run in Gouraud Shading mode.
131004_GPU_Cinema4D
The AMD cards take a lead in the car scene, but with the animated character, there is no clear winner or loser, with the AMD and Nvidia cards trading blows.
Blender 2.68
Our next display performance benchmark uses the open-source 3D animation package Blender 2.68. Both 3D scenes were tested using the Material viewport shading mode.
131004_GPU_Blender
Here, the Nvidia cards clearly come out on top. On the aircraft scene, the frame rates of the AMD cards are almost unworkable. If you are a Blender user, Nvidia hardware is definitely the way to go.
Mudbox 2013
For our Mudbox test, we have a model subdivided to 12.7 million polygons running in Mudbox’s standard viewport. We are testing both display performance and performance while sculpting.
131004_GPU_Mudbox
As with the other Autodesk software (3ds Max, Maya and Softimage), the AMD cards take the lead in terms of viewport performance, albeit by a narrow margin. With sculpting performance, the pattern is reversed. With such high polygon counts, the amount of on-board RAM is also significant: cards with more memory perform noticeably better than those with less.
Mari 2.0
Our Mari benchmark consists of a Fokker DR1 triplane model. The geometry comes in at just over 1.25 million polygons, with large 16,384 x 16,384-pixel painted textures. Again, we measured both display performance and performance while painting.
131004_GPU_Mari
Mari has only just started supporting AMD hardware. Although for painting performance, the AMD and Nvidia cards trade blows, when it comes to display performance, the Nvidia cards clearly come out on top. If Mari forms a significant part of your workflow, Nvidia is still the best way to go.
UDK 10897
Our last two benchmarks are performed with real-time tools. The first is statistically the most popular development tool for real-time 3D and game development: the Unreal Development Kit, or UDK. Our test scene uses the DirectX 9 and 11 renderers.
131004_GPU_UDK
There is no clear winner here. The AMD hardware takes a slight lead with the DX11 renderer, while performance on the DX9 renderer is fairly similar between AMD and Nvidia cards.
CryEngine 3
Our second real-time benchmark is performed with another popular games content-creation tool: CryEngine 3. Like UDK, the CryEngine editor uses DirectX 11 as its primary display API.
131004_GPU_Crytek
Like UDK, CryEngine performs similarly on AMD and Nvidia hardware, with the AMD cards perhaps slightly ahead.
V-Ray 2.2
The first of our GPU computing benchmarks uses V-Ray RT, V-Ray’s GPU-accelerated interactive preview renderer. It supports OpenCL, so both Nvidia and AMD cards were benchmarked using this API. The first benchmark consists of a 7.6 million poly model with three lights and image-based lighting; the second, a 2.1 million poly scene with three lights. Both were rendered at 1,920 x 1,200 resolution.
131107_GPU_VrayRT_new
With the 7.6 million polygon troll statue scene, running in OpenCL, the Nvidia hardware takes the lead over the AMD cards by a decent margin. The light cycles scene shows a similar pattern until you get to any of the cards with 2GB of graphics memory or less, all of which perform almost identically – and significantly worse than the other cards. It seems likely that the scene needs more than 2GB of RAM, meaning that below this threshold, the render is dumped to the system’s CPUs.
One thing to note when using V-Ray RT and OpenCL is that the first render will be the slowest. Unlike CUDA, which is precompiled code, OpenCL applications must be compiled the first time you run them on a particular piece of hardware. This can be a lengthy process, taking between one and five hours on the Z820 for each new card. Once compiled, the cached data is saved to your hard drive, so all subsequent renders will be much faster.
iray 2.1
Next, we have two scenes rendered with iray: the first consisting of just under 4.5 million polygons, and lit using a sunlight system and image-based lighting; the other with just under 1.6 million polygons, and lit using one light and image-based lighting. Both were rendered at 1,920 x 1,200 resolution.
Like V-Ray RT, iray can run on either CPUs or GPUs, but unlike V-Ray RT, is only accelerated on the GPU via CUDA. As a result, we could only run the iray benchmarks on the Nvidia GPUs.
131004_GPU_iRay
With Ferrari scene, the previous-generation Quadro 6000 takes first place. The remaining cards perform in price order, with the current-generation cards slightly outperforming their equivalents from the previous generation. The results are similar for the star destroyer scene, only here the Quadro K4000 outperforms the more expensive previous-generation Quadro 5000.
Octane Render 1.20
Our Octane Render test scene consists of just under 700,000 polygons, lit with a sunlight system, and rendered at 1,000 x 600 resolution. Like iray, Octane is a CUDA-only application and therefore works only with Nvidia GPUs. Developer Otoy says that an OpenCL implementation is in the works, but that OpenCL is “currently not as mature as CUDA”.
131004_GPU_OctaneRender
The results of the Octane Render benchmark are similar to those of the iray Ferrari benchmark.
Blender 2.68
The Cycles renderer integrated into Blender supports both CPU and GPU. Although previous releases offered some support for OpenCL, it was experimental, and users had limited success in using AMD hardware. Since then, OpenCL development has been put on hold, with the Blender wiki citing difficulties in compiling the entire rendering kernel. As a result, we’ve only benchmarked the software using CUDA, and therefore only on the Nvidia cards. The test model consists of just over 170,000 polygons, and is rendered at 1,920 x 1,200 resolution.
131004_GPU_BlenderCycles
Again, the pattern of results is similar to those of the iray Ferrari benchmark and the Octane Render benchmark. Overall, CUDA performance is pretty consistent across different render engines, and across different 3D scenes.
LuxMark 2.0
LuxMark is an OpenCL benchmarking tool based on the open-source renderer LuxRender. There are three levels of complexity to choose from, but for this review, I selected the highest-detail interior scene.
131004_GPU_LuxMark
Here, the AMD hardware comes out on top by quite some margin. All of the AMD cards outperform all of the Nvidia cards, with the exception of the Quadro 6000, which falls in between the FirePro W7000 and W5000.
Cinebench 11.5
As anyone who has read my previous reviews here on CG Channel will know, I am not a big fan of synthetic benchmarks, as they offer no real insight as to how a particular hardware set-up will perform in production. This is not the fault of the engineers who write them: it’s just that there are too many variables to account for.
Having said that, I have had several requests for Cinebench, and 3DMark, so I have included them both here. In the case of Cinebench, we are using only the OpenGL test.
131004_GPU_Cinebench
In general, the AMD cards outperform the Nvidia cards here. However, the performance of the Nvidia cards is unusual, with the K4000 achieving a higher score than the K5000, and the previous-generation Quadro 5000 a higher score than either.
3DMark 11 Basic Edition
Our second synthetic benchmark, 3DMark 11, is geared towards gaming performance. However, it is very GPU-intensive, so it is a reasonable measure of overall 3D performance.
131004_GPU_3D_Mark
Like Cinebench, the AMD hardware takes most of the higher scores here. It’s interesting to note that even the mid-range AMD cards beat the mighty Nvidia Quadro 6000 – again, suggesting that synthetic benchmarks are skewed in ways that do not reflect real production conditions.

SLI – are two cards better than one?

Most people know that using Nvidia’s SLI technology (or AMD’s equivalent, CrossFire) to run two or more graphics cards side by side can increase 3D performance in games. I am often asked if it has the same kind of benefits for DCC applications. Until now my answer has always been, “I’m not sure: I’ve never benchmarked it.” But this time around, I happened to have a pair of Quadro 5000s on-hand to do just that. I compared the performance of a single Quadro 5000 to two cards running under SLI on a range of the previous benchmarks.
131004_GPU_Quadro5000_SLI
While running a second card under SLI does increase viewport performance, the increase in frame rate is small. Speaking personally, I don’t feel it justifies the extra cost or the extra power consumption. However, as you can see from the V-Ray RT, iray, Blender Cycles and LuxMark tests, adding an extra card makes a considerable difference to GPU compute performance.
My conclusion? If you just want better viewport frame rates, you’d be better off buying a more powerful single card. But if you want to do a lot of GPU-accelerated rendering, adding a second card will give you a good-to-significant performance boost.

The verdict

As you might expect, both AMD’s and Nvidia’s current-generation cards outperform their previous-generation equivalents. The only set of tests in which a previous-generation card came out on top were those using CUDA-based renderers, in which the Quadro K5000 was outperformed by the (considerably more expensive) previous-generation Quadro 6000.
Beyond that, there isn’t a clear winner here. The AMD FirePros came out on top more often than Nvidia’s Quadros in viewport display performance, and they are typically less expensive. However, with GPU computing, the Quadro cards take the lead, and while OpenCL is now supported more widely than it was two years ago, there are still a number of applications whose GPU-accelerated functionality requires CUDA – and therefore Nvidia hardware.
As a result, the professional card that works best for you will depend on several factors, including which software packages you use regularly, whether you are looking for viewport or GPU compute performance, and your power draw and price restrictions. However, here are a few of my personal recommendations.
If viewport display performance is your main focus, and you are using any of the Autodesk applications or Cinema 4D, the AMD cards, especially the FirePro W7000 and W8000, are your best bet. Of the two, the W8000 offers better performance, although the much cheaper W7000 almost matches it in several benchmark tests. But if Modo, LightWave, Blender or Mari are significant parts of your production pipeline, an Nvidia card would be the better choice. As with the FirePro W8000 and W7000, the performance difference between the Quadro K5000 and K4000 is slight-to-moderate, but the price difference is greater still, making the K4000 the ideal choice for the Quadro shopper on more of a budget.
If GPU-accelerated rendering is your primary concern, things change a bit. Here, Nvidia’s mighty Quadro 6000 still reigns supreme, albeit at a seriously hefty price point. After the Quadro 6000, I would personally recommend the Quadro K5000. Its 4GB of RAM will cope with most moderately complex scenes, and it offers the best performance in both the OpenCL and CUDA tests behind the 6000. Plus, an Nvidia card gives you the option of those renderers that only support CUDA.

No comments:

Post a Comment