Nvidia’s next-gen GPUs set for launch

Mar 25, 2010 — by LinuxDevices Staff — from the LinuxDevices Archive — 2 views

Nvidia will launch new GeForce GPU (graphics processing units) tomorrow using its next-generation “Fermi” architecture, The Wall Street Journal reports. It's said the products will include 512 cores and three billion transistors, doubling the amounts in the company's previous chips.

Jerry Dicolo's story for The Wall Street Journal at one point refers to a single product, claiming "the latest GPU is Nvidia's first product using its Fermi design architecture. Elsewhere, however, it refers to "new chips, to be part of Nvidia's GeForce line of GPUs."

In fact, as we review later in this story, Nvidia announced two Fermi-based parallel processors for the high-performance computing market, the Tesla C2070 and C2050. We presume that these specialized products, which do not yet appear to be shipping, flew below the Journal's radar.

Tomorrow's announcement most likely involves the formal debut of the long-delayed GF100, a 40nm implementation of the Fermi architecture. According to various reports, the DirectX 11-capable GF100 will be built into two new GPUs, the GeForce GTX 470 and GTX 480. It's said OEMs such as Asus, MSI, and Zotac will ship graphics cards based on these GPUs next month.

ATI released DirectX 11-compatible GPUs in its Radeon product line as long ago as September, garnering a technological advantage. However, manufacturing delays forced both ATI and Nvidia to keep selling last-generation products, giving Nvidia something of a reprieve, according to Dicolo. In the fourth quarter of 2009, Nvidia increased its position in the desktop GPU market, raising its share to 64.8 percent from 62.1 percent a year ago, Jon Peddie Research is quoted as saying.

Background on Fermi

Fermi, announced last October at Nvidia's inaugural GPU Technology Conference in San Jose, amounts to a third generation of products embodying the company's "GPU computing" model. The first generation was the G80 unified graphics/computing architecture, introduced in November 2006 and later embodied in the GeForce 8800, Quadro FX 6500, and Tesla C870 GPU products.

The G80 was the first GPU to replace separate vertex and pixel pipelines with a single unified processor, the first to utilize a scalar thread processor, and the first to support C, according to the company.

The second generation was the GT200, introduced in the GeForce GTX 280, Quadro FX 5800, and Tesla T10 GPUs. GT200 increased the number of streaming processor cores — subsequently referred to as "Cuda" cores — from 128 to 240. It also added "hardware memory access coalescing," improving memory access efficiency, along with double-precision floating point support, Nvidia says.

A comparison of Nvidia GPU generations
Source: Nvidia

Fermi, implemented in a GPU containing more than three billion transistors, more than doubles the number of Cuda cores, organizing them into 16 SMs (streaming multiprocessors) with 32 cores apiece. Sporting up to 6GB of GDDR5 RAM, Fermi is the first product of its type to support ECC (error correcting code), the company says.

In October, Nvidia cited the following additional features for Fermi:

512 Cuda cores feature the new IEEE 754-2008 floating-point standard, surpassing even the most advanced CPUs
8x the peak double-precision arithmetic performance of the GT200
Nvidia Parallel DataCache, a cache hierarchy that speeds up algorithms such as physics solvers, raytracing, and sparse matrix multiplication where data addresses are not known beforehand
Nvidia GigaThread engine, supporting concurrent kernel execution, where different kernels of the same application context can execute on the GPU at the same time (eg, PhysX fluid and rigid body solvers)

According to Nvidia, Fermi-based products are so powerful that they can now be termed CGPUs (computational graphics processing units), and are suitable for high-performance computing (HPC) applications such as linear algebra, numerical simulation, and quantum chemistry.

At Nvidia's GPU Technology Conference, Oak Ridge National Laboratory (ORNL) announced plans to build a new supercomputer that will employ the Fermi architecture, and also announced it will be creating the Hybrid Multicore Consortium, whose goals "are to work with the developers of major scientific codes to prepare those applications to run on the next generation of supercomputers built using GPUs."

According to Nvidia, Fermi is the first product of its type that supports C++, complementing existing support for C, Fortran, Java, Python, OpenCL and DirectCompute. Fermi also supports Nexus (below), touted as "the world's first fully integrated heterogeneous computing application development environment within Microsoft Visual Studio."

Nvidia's Nexus
(Click to enlarge)
Source: Nvidia

Nvidia's C2070 and C2050

Nvidia's C2070 and C2050 are PCI Express x16 cards touted as "transforming a [Windows or Linux] workstation to perform like a small cluster," with up to 640 Gigaflops of performance. They employ the Fermi GPUs to run C, C++, OpenCL, DirectCompute, or Fortran while a workstation's CPU performs other tasks, according to the company.

Nvidia's Tesla C2070/C2050
(Click to enlarge)

The C2070 and C2050 are PCI Express Gen2 cards that occupy two slots in a workstation, and include either 3GB or 6GB of onboard GDDR5 memory, respectively. Nvidia claims the cards' onboard GPUs offer performance that's equivalent to the latest quad-core CPUs, but with 1/20th the power consumption and 1/10th the cost.

According to Nvidia, the C2070 and C2050 offer from 520 to 640 Gigaflops of double precision performance, allowing applications such as ray tracing, 3D cloud computing, video encoding, database search, data analytics, computer-aided engineering, and virus scanning to be performed "dramatically faster." Four of the boards may be placed into a 1U enclosure that quadruples performance for data center deployments, the company adds.

Nvidia says the boards support the next-generation IEEE 754-2008 double-precision floating point standard. Providing ECC (error correction code) memory for their DRAM, shared memory, L1/L2 caches, and shared memory, they support PCI Express 2.0, for "fast and high-bandwidth communication between CPU and GPU," the company adds.

Specifications provided for the Tesla C2070 and C2050 by Nvidia include:

Memory — 3GB DDR5 on C2050; 6GB DDR5 on C2070
Double-precision floating point performance (peak) — 520 GFlops to 630 GFlops
Form factor — Dual-slot PCI Express x16; 9.75 x 4.376 inches
Power consumption — 190 Watts (typical)
Software development tools — Cuda C/C++/Fortran, OpenCL, DirectCompute Toolkits
Supported operating systems — Linux, Windows XP, Windows Vista

Availability

According to Nvidia, the Tesla C2070 and C2050 will be available during the second quarter, retailing for approximately $4,000 and $2,500, respectively. More information on the boards may be found on the company's website, here.

More information on Nexus may be found on the Nvidia website, here. Meanwhile, overall background on Fermi, including a downloadable white paper, may be found here.

Jerry Dicolo's The Wall Street Journal article on tomorrow's product launches from Nvidia may be found here.

This article was originally published on LinuxDevices.com and has been donated to the open source community by QuinStreet Inc. Please visit LinuxToday.com for up-to-date news and articles about Linux and open source.

Comments are closed.

Pages

Archives

Categories

Nvidia’s next-gen GPUs set for launch

Related Posts: