Friday, November 18, 2011

product brief for the NVIDIA® Tesla™

The NVIDIA Tesla™ C2050 and C2070 Computing Processors fuel
the transition to parallel computing and bring the performance of
a small cluster to the desktop.



Based on the next-generation CUDA™ architecture codenamed “Fermi”, the 20-series family of Tesla GPUs support many “must have” features for technical and enterprise computing including C++ support, ECC memory for uncompromised accuracy and scalability, and a 7X increase in double precision performance compared Tesla 10-series GPUs. The Tesla C2050 and C2070 GPUs are designed to redefine high performance computing and make supercomputing available to everyone.
Compared to the latest quad-core CPUs, Tesla C2050 and C2070 Computing Processors deliver equivalent supercomputing performance at 1/10th the cost and 1/20th the power consumption.




GPUs powered by the Fermi -generation of the CUDA architecture
Delivers cluster performance at 1/10th the cost and 1/20th the power of
CPU-only systems based on the latest quad core CPUs.
448 CUDA Cores
Delivers up to 515 Gigaflops of double-precision peak performance in each GPU,
Enabling a single workstation to deliver a Teraflop or more of performance.
Single precision peak performance is over a Teraflop per GPU.
ECC Memory
Meets a critical requirement for computing accuracy and reliability for
workstations. Offers protection of data in memory to enhance data integrity
and reliability for applications. Register files, L1/L2 caches, shared memory,
and DRAM all are ECC protected.
Desktop Cluster Performance
Solves large-scale problems faster than a small server cluster on a single
workstation with multiple GPUs.
Up to 6GB of GDDR5 memory per GPU
Maximizes performance and reduces data transfers by keeping larger data sets in
local memory that is attached directly to the GPU.
NVIDIA Paralel Data Cache ™
Accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix
multiplication where data addresses are not known beforehand. This includes
a configurable L1 cache per Streaming Multiprocessor block and a unified L2
cache for all of the processor cores.
NVIDIA Giga Thread™ Engine
Maximizes the throughput by faster context switching that is 10X faster than
previous architecture, concurrent kernel execution, and improved thread block
scheduling.
Asynchronous Transfer
Turbocharges system performance by transferring data over the PCIe bus while
the computing cores are crunching other data. Even applications with heavy
data-transfer requirements, such as seismic processing, can maximize the
computing efficiency by transferring data to local memory before it is needed.
CUDA programing environment
with broad suport of programing
languages and APIs
Choose C, C++, OpenCL, DirectCompute, or Fortran to express application
parallelism and take advantage of the “Fermi” GPU’s innovative architecture.
NVIDIA Parallel Nsight tool is available for Microsoft Visual Studio developers.
High Sped, PCIe Gen 2.0 Data
Transfer
Maximizes bandwidth between the host system and the Tesla processors.
Enables Tesla systems to work with virtually any PCIe-compliant host system
with an open PCIe x16 slot.