What is a TFLOP (Teraflops)

From gaming and AI to scientific research, GPUs handle calculations much faster than general-purpose CPUs. But how do you compare different GPUs and quantify just how powerful one is?

This is where the TFLOPS metric comes in. TFLOPS, which stands for “trillion floating point operations per second”, gives us a standardized way to measure a GPU’s theoretical peak performance.

In this post, we’ll explain TFLOPS and cover everything you need to know about this key specification. You’ll learn what TFLOPS really measures, how it’s calculated, and how it can be used to compare different GPUs.

Contents show

What Does TFLOPS Measure?

First, let’s break down the meaning of TFLOPS.

FLOPS stands for “floating point operations per second”. These refer to calculations done on decimal point numbers, which are ubiquitous in computer math.

Some common FLOPS include:

Adding or subtracting two decimal numbers
Multiplying two decimal numbers
Dividing one decimal number by another

More complex operations like trigonometric functions or square roots require multiple FLOPS to compute.

TFLOPS simply means a trillion of these floating point operations calculated every second. It’s represented by either TFLOP/s or TFLOPS.

So in summary, TFLOPS measures the theoretical maximum number of decimal point calculations a GPU can handle per second, determined by its underlying hardware.

Factors That Determine TFLOPS

There are three key factors that shape a GPU’s theoretical TFLOPS capability:

Stream Processors

GPUs contain hundreds or even thousands of cores called stream processors or streaming multiprocessors. Each one can perform FLOPS in parallel. More stream processors means more operations handled simultaneously, leading to higher potential TFLOPS. High-end GPUs today contain upwards of 5000 stream processors while mainstream gaming GPUs have around 1500-3000.

Clock Speed

This is the speed at which a GPU’s stream processors operate, measured in MHz or GHz. Higher clock speeds mean the processors can cycle through more operations per second, again increasing potential TFLOPS. Modern GPU clock speeds range from 1 GHz to 1.7 GHz+.

Precision

FLOPS calculations can be done at different levels of precision. FP32 uses 32-bit precision while FP64 uses 64-bit, which is slower but offers greater numeric precision. Gaming and graphics workflows emphasize 32-bit so gaming GPUs are optimized for FP32 over FP64. Scientific computing may use more 64-bit, however.

Calculating TFLOPS

To calculate a GPU’s theoretical maximum TFLOPS, the formula is:

TFLOPS = (Stream Processors x Clock Speed x 2) / 1,000,000,000

The key components are:

Stream Processors: More processors means more parallel FLOPS
Clock Speed: Faster cycling through operations per second
x2: Accounts for FMA operations (Fused Multiply-Add) which can process 2 FLOPS per cycle
1,000,000,000: Converts from FLOPS to TFLOPS

Let’s use an RTX 3080 as an example. It has 8704 stream processors and a boost clock speed of 1.71 GHz. Plugging this into the formula gives us:

TFLOPS = (8704 x 1.71 x 2) / 1,000,000,000 = 29.8 TFLOPS

So theoretically, the RTX 3080 can hit 29.8 TFLOPS!

Real-World TFLOPS vs. Theoretical Maximum TFLOPS

The TFLOPS formula gives the theoretical ceiling for a GPU, but real-world workloads often achieve lower TFLOPS for a few reasons:

Code optimization: No software uses the hardware perfectly efficiently. There is always some overhead.
Precision: Real projects use a mix of FP32 and FP64, with FP64 being slower.
Data bottlenecks: Transferring data to and from the GPU limits real throughput.
Thermals: GPUs may throttle down from max clock speeds to stay within power and heat limits.

That said, the theoretical TFLOPS value serves as a good upper bound and basis for comparison between GPUs. It’s rare that a workload will somehow greatly exceed a GPU’s rated TFLOPS.

TFLOPS Comparison Between GPUs Over Time

To demonstrate the rapid evolution of GPU performance, here’s a table comparing TFLOPS across Nvidia’s last four generations of gaming GPUs:

GPU Generation	Flagship GPU	Theoretical TFLOPS
Pascal	GTX 1080 Ti	11.3
Turing	RTX 2080 Ti	13.4
Ampere	RTX 3090	35.6
Ada Lovelace	RTX 4090	76.0

In just six years, peak TFLOPS has increased over 6x from 11 TFLOPS to 76 TFLOPS! Moore’s law is still holding strong.

We can also look at benchmarks of real games and programs to compare practical TFLOPS between two GPUs:

GPU	3DMark Time Spy (TFLOPS)	Fortnite (TFLOPS)
RTX 3060 Ti	17.2	12.1
RX 6700 XT	18.1	13.0

Here the RX 6700 XT delivers moderately higher TFLOPS, indicating stronger general performance. Benchmarks like these validate the usefulness of the TFLOPS metric.

How Many Teraflops is the RTX 4090?

Nvidia’s newest flagship graphics card, the GeForce RTX 4090, achieves a tremendous peak theoretical performance of 83 teraflops. For comparison, this makes the RTX 4090 over two times faster than the previous top-end consumer GPU, the RTX 3090, which had 36 teraflops.

With its astronomical 83 TFLOPS of power, the RTX 4090 takes consumer GPUs to new heights. 83 teraflops enable the RTX 4090 to handle extremely demanding workloads like 8K ultra-high-resolution gaming, complex 3D rendering and video editing, scientific simulations, and advanced artificial intelligence research models. In fact, its raw floating-point operations per second even rival some of the world’s fastest supercomputers used for research.

To reach such high teraflop ratings in a consumer-level graphics card, Nvidia engineered the RTX 4090 using brand new technologies. This includes Rapid Core Scaling, which allows efficient scaling of the GPU’s 16,384 CUDA cores.

The RTX 4090 also utilizes Taiwan Semiconductor Manufacturing Company’s (TSMC) cutting-edge 4nm semiconductor manufacturing process to pack in more transistors and cores.

Despite having over two times the teraflop rating of the previous generation, the RTX 4090 still maintains a reasonable 250W board power rating. Nvidia has pushed the boundaries with power efficiency to reach 83 TFLOPS within a standard triple-slot graphics card form factor.

Overall, the GeForce RTX 4090’s remarkable 83 teraflop theoretical rating demonstrates how modern GPUs continue rapidly advancing in performance each generation. In just two years, the peak teraflops have more than doubled from 36 TFLOPS to 83 TFLOPS, enabling new possibilities in high-end gaming, content creation, and computing.

Conclusion

While not a perfect measure, TFLOPS provides a standardized way to quantify and compare the immense computing power of modern GPUs. It reduces the performance of a complex processor down to a single number.

Just remember that real-world TFLOPS depends on workload, precision, and optimizations. But armed with the basics of what TFLOPS measures and how it’s calculated, you can better understand GPU benchmarks and specifications to make informed purchasing decisions!

FAQs on Teraflop

What is a TFLOP in GPU?

A TFLOP or teraflop is a measure of a GPU’s performance and refers to its capability of carrying out one trillion floating point operations per second. It indicates the theoretical maximum computational speed of a GPU.

How much is 10 teraflops?

10 teraflops is equal to 10 trillion floating point operations per second. It is a very high level of computing power mainly found in supercomputers and high-end servers and workstations. Consumer graphics cards in 2022 normally range from 10 to 40 teraflops.

What do teraflops do?

Teraflops quantify the speed at which a computer processor can calculate floating point operations. More teraflops means a GPU can process computer graphics, simulations, AI applications and other complex math-heavy code much faster. Higher teraflop GPUs enable real-time photorealistic graphics and visuals.

What GPU has 12 teraflops?

Some examples of GPUs with around 12 teraflops of power include the Nvidia RTX 2080 Super, AMD Radeon RX 6700 XT, and Xbox Series X GPU. 12 teraflops allow these GPUs to handle demanding games at 4K resolution while also providing performance for content creation applications.

What is a TFLOP (Teraflops)

What Does TFLOPS Measure?