GPU Boost 2.0

With the NVIDIA GeForce GTX 680 graphics card we get an important new feature: GPU Boost. And the new NVIDIA GeForce GTX Titan goes one step further by expanding this feature to GPU Boost 2.0. The first version of GPU Boost 1.0 was focused on the maximum power consumption achieved in the most demanding modern games. In this case, the GPU temperature did not play a special role, unless it came close to the critical threshold. The maximum clock frequency was determined based on the relative voltage. The drawback was quite obvious: GPU Boost 1.0 could not prevent situations where, even at non-critical voltage, the temperature increased excessively.

NVIDIA GeForce GTX Titan - GPU-Boost 2.0

The GeForce GTX Titan already evaluates two parameters: voltage and temperature. That is, the relative voltage (Vref) is determined based on these two parameters. Of course, there will still be a reliance on individual GPUs, as there is variability in chip manufacturing, so each graphics card will be different from every other. But NVIDIA points out that technically the addition of temperature allowed for an average of 3-7 percent higher Boost overclocking. GPU Boost 2.0 technology could theoretically be transferred to older video cards, but this is unlikely to happen.

NVIDIA GeForce GTX Titan - GPU-Boost 2.0

Let's take a closer look at GPU Boost 2.0. Utilities like EVGA Precision Tool or MSI Afterburner already support GPU Boost 2.0. We used EVGA Precision Tool version 4.0.

NVIDIA GeForce GTX Titan - GPU-Boost 2.0

GPU Boost 2.0 takes temperature into account, and at low temperatures the technology can increase performance more significantly. The target temperature (Ttarget) is set to 80 °C by default.

NVIDIA GeForce GTX Titan - GPU-Boost 2.0

GPU Boost 2.0 technology contains all the functions familiar to us from the first generation of technology, but at the same time it additionally makes it possible to set a higher voltage, and therefore higher clock speeds. For overclockers, it is possible to change the settings. You can enable GPU Overvoltage, but be aware of the potential reduction in graphics card life.

NVIDIA GeForce GTX Titan - GPU-Boost 2.0

Overclockers can raise Vref and Vmax (OverVoltaging). This was what many users wanted on the GK104, but NVIDIA did not trust either users or manufacturers with such an opportunity. And the EVGA GTX 680 Classified video card we tested (test and review) is just an excellent example. This video card had a special EVGA Evbot module that provided users with control over voltages. But NVIDIA urgently demanded that EVGA remove additional equipment from its video cards. With GPU Boost 2.0 and OverVoltaging, NVIDIA itself has taken a step in this direction. So, video card manufacturers can release several GeForce GTX Titan models, for example, standard versions and factory overclocked versions. Enabling OverVoltaging is done via a VBIOS switch (that is, explicitly for the user so that he is aware of the possible consequences).

Review of the NVIDIA GeForce GTX 780 video card | GeForce Experience and ShadowPlay

GeForce Experience

As PC enthusiasts, we appreciate the combination of different settings that affect the performance and quality of games. The easiest way is to spend a lot of money on a new video card and set all graphics settings to maximum. But when some parameter turns out to be too heavy for the card and it has to be reduced or disabled, you are left with an unpleasant feeling and the realization that the game could work much better.

However, setting the optimal settings is not so easy. Some settings produce better visual effects than others, but the impact on performance can vary greatly. GeForce Experience is NVIDIA's attempt to make choosing gaming settings easier by comparing your CPU, GPU and resolution to a database of configurations. The second part of the utility helps determine whether driver updates are needed.

It is likely that enthusiasts will continue to choose settings on their own and will perceive the additional program negatively. However, most gamers who want to install the game and immediately begin gameplay without checking drivers and going through various settings will certainly be glad of this opportunity. Either way, NVIDIA's GeForce Experience helps people get the most out of their gaming experience and is therefore a useful utility for PC gaming.

GeForce Experience identified all nine games installed on our test system. Naturally, they did not keep the default settings, since we applied certain settings in the interests of testing. But it's still interesting how GeForce Experience would change the options we selected.

For Tomb Raider, GeForce Experience wanted to disable TressFX technology, even though NVIDIA GeForce GTX 780 with the function enabled, it showed an average of 40 frames per second. For some reason the program was unable to determine the configuration Far Cry 3, although the settings she suggested turned out to be quite high. For unknown reasons, the utility wanted to disable FXAA for Skyrim.

It's nice to get a set of screenshots for each game describing the impact of a certain setting on image quality. Of the nine examples we looked at, GeForce Experience came close to the optimal settings, in our opinion. However, the utility is also biased, favoring NVIDIA-specific features like PhysX (which the program pushed to a high level in Borderlands 2) and discouraging the inclusion of AMD features (including TressFX in Tomb Raider). Disabling FXAA in Skyrim makes no sense at all, since the game averages 100 FPS. It's possible that enthusiasts will want to install GeForce Experience once the NVIDIA Shield system starts shipping, as the Game Streaming feature appears to be coming through the NVIDIA app.

ShadowPlay: Always-on DVR for gaming

WoW fans often record their raids, but this requires a fairly powerful system, Fraps and a lot of disk space.

NVIDIA recently announced a new feature called ShadowPlay that can make the recording process much easier.

When activated, ShadowPlay uses the Kepler GPU's built-in fixed NVEnc decoder, which automatically records the last 20 minutes of gameplay. Or you can manually start and stop ShadowPlay. Thus, the technology replaces software solutions like Fraps, which place a higher load on the central processor.

For reference: NVEnc only works with H.264 encoding at resolutions up to 4096x4096 pixels. ShadowPlay isn't yet available on the market, but NVIDIA says it will be able to record 1080p video at up to 30 FPS when it launches this summer. We'd like to see higher resolution, as it was previously stated that the encoder could potentially support it in hardware.

Review of the NVIDIA GeForce GTX 780 video card | GPU Boost 2.0 and possible overclocking problems

GPU Boost 2.0

In review GeForce GTX Titan We weren't able to conduct comprehensive testing of the second generation NVIDIA GPU Boost technology, but now we've got it NVIDIA GeForce GTX 780. Here is a short description of this technology:

GPU Boost is an NVIDIA mechanism that changes the performance of video cards depending on the type of task being processed. As you probably know, games have different GPU resource requirements. Historically, the frequency has to be adjusted to accommodate the worst-case scenario. But when processing “light” tasks, the GPU was wasted. GPU Boost monitors various parameters and increases or decreases frequencies depending on the needs of the application and the current situation.

The first implementation of GPU Boost worked under a certain power threshold (170 W in the case of GeForce GTX 680). However, the company's engineers have found that they can safely exceed this level if the GPU temperature is low enough. In this way, performance can be optimized even further.

In practice, GPU Boost 2.0 differs only in that NVIDIA now accelerates the frequency based not on the maximum power consumption indicator, but on a certain temperature, which is 80 degrees Celsius. This means that higher frequencies and voltages will now be used, up to heating the chip up to 80 degrees. Don't forget that the temperature mainly depends on the fan profile and settings: the higher the fan speed, the lower the temperature and, therefore, the higher the GPU Boost value (and the noise level, unfortunately, too). The technology still evaluates the situation once every 100 ms, so NVIDIA still has some work to do in future versions.

Temperature-sensitive settings make testing even more difficult than the first version of GPU Boost. Anything that increases or decreases the temperature of the GK110 changes the frequency of the chip. Therefore, it is quite difficult to achieve consistent results between runs. In laboratory conditions, one can only hope for a stable ambient temperature.

In addition to the above, it is worth noting that you can increase the temperature limit. For example, if you want NVIDIA GeForce GTX 780 lowered the frequency and voltage to 85 or 90 degrees Celsius, this can be configured in the parameters.

Want the GK110 to be as far away from your chosen temperature limit as possible? Fan curve NVIDIA GeForce GTX 780 Fully adjustable, allowing you to customize the duty cycle according to temperature values.

Possible overclocking problems

During our acquaintance with GeForce GTX Titan company representatives showed us an internal utility that can read the status of various sensors: this way it simplifies the process of diagnosing non-standard card behavior. If the GK110's temperature rises too high during overclocking, even when throttling, this information will be recorded in the log.

Now the company implements this function through the Precision X application, which launches a warning “reasons” algorithm if, during overclocking, actions occur that interfere with its effective continuation. This is a great feature because you no longer have to guess about possible bottlenecks. There is also an OV max limit indicator that will let you know if you have reached the absolute peak of GPU voltage. In this case, there is a risk of burning the card. You can consider this as a suggestion to lower your overclocking settings.

Review of the NVIDIA GeForce GTX 780 video card | Test stand and benchmarks


Test bench configuration
CPU Intel Core i7-3770K (Ivy Bridge) 3.5 GHz @ 4.0 GHz (40 * 100 MHz), LGA 1155, 8 MB shared L3 cache, Hyper-Threading on, Power-savings on
Motherboard Gigabyte Z77X-UD5H (LGA 1155), Z77 Express chipset, BIOS F15q
RAM G.Skill 16 GB (4 x 4 GB) DDR3-1600, F3-12800CL9Q2-32GBZL @ 9-9-9-24 at 1.5 V
Storage device Crucial m4 SSD 256 GB SATA 6 Gb/s
Video cards Nvidia GeForce GTX 780 3 GB

AMD Radeon HD 7990 6 GB

AMD Radeon HD 7970 GHz Edition 3 GB

Nvidia GeForce GTX 580 1.5 GB

Nvidia GeForce GTX 680 2 GB

Nvidia GeForce GTX Titan 6 GB

Nvidia GeForce GTX 690 4 GB

power unit Cooler Master UCP-1000W
System software and drivers
OS Windows 8 Professional 64-bit
DirectX DirectX 11
Graph. drivers AMD Catalyst 13.5 (Beta 2)
Nvidia GeForce Release 320.00
Nvidia GeForce Release 320.18 (for GeForce GTX 780)

Getting the correct frame rate value

Observant readers will notice that the figures on subsequent pages are more modest than in the review AMD Radeon HD 7990, and there is a reason for this. Previously, we presented synthetic and real frame rates and then showed the time variations between frames along with dropped and short frames. The fact is that this method does not reflect the real sensations of the video card, and it would be unfair for us to judge AMD based on synthetic indicators of the time delay between frames.

That's why, along with frame rate fluctuations, we now provide more practical metrics for dynamic frame rates. The results are not so high, but at the same time they are very eloquent in games where AMD has difficulties.

Tests and settings
Battlefield 3 Graphics quality - Ultra, v-sync off, 2560x1440, DirectX 11, Going Hunting, 90 seconds, FCAT
Far Cry 3 Graphics quality - Ultra, DirectX 11, v-sync off, 2560x1440, run your own route, 50 seconds, FCAT
Borderlands 2 Graphics quality - highest, PhysX low, 16x anisotropic filtering, 2560x1440, run your own route, FCAT
Hitman: Absolution Graphics quality - Ultra, MSAA off, 2560x1440, built-in benchmark, FCAT
The Elder Scrolls V: Skyrim Graphics quality - Ultra, FXAA Enabled, 2560x1440, run your own route, 25 seconds, FCAT
3DMark Fire Strike Benchmark
BioShock Infinite Graphics quality - Ultra, DirectX 11, diffuse depth of field, 2560x1440, built-in benchmark, FCAT
Crysis 3 Graphics quality - very high, MSAA: Low (2x), high resolution textures, 2560x1440, run along your own route, 60 seconds, FCAT
Tomb Raider Graphics Quality - Ultimate, FXAA On, 16x Anisotropic Filtering, TressFX Hair, 2560x1440, Custom Run, 45 Seconds, FCAT
LuxMark 2.0 64-bit Binary, Version 2.0, Sala Scene
SiSoftware Sandra 2013 Professional Sandra Tech Support (Engineer) 2013.SP1, Cryptography, Financial Analysis Performance


CONTENT
ParameterMeaning
Chip code nameGP104
Production technology16 nm FinFET
Number of transistors7.2 billion
Core area314 mm²
Architecture
DirectX hardware support
Memory bus
1607 (1733) MHz
Computing blocks20 streaming multiprocessors, including 2560 scalar ALUs for floating point calculations within the framework of the IEEE 754-2008 standard;
Texturing blocks160 texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Monitor support
GeForce GTX 1080 reference graphics card specifications
ParameterMeaning
Core frequency1607 (1733) MHz
2560
Number of texture blocks160
Number of blending blocks64
Effective memory frequency10000 (4×2500) MHz
Memory typeGDDR5X
Memory bus256-bit
Memory8 GB
320 GB/s
about 9 teraflops
103 gigapixels/s
257 gigatexels/s
TirePCI Express 3.0
Connectors
Energy consumptionup to 180 W
Additional foodOne 8-pin connector
2
Recommended price$599-699 (USA), RUB 54,990 (Russia)

The new model of the GeForce GTX 1080 video card has received a name that is logical for the first solution of the new GeForce series - it differs from its direct predecessor only in the changed generation number. The new product not only replaces the top solutions in the company’s current line, but also for some time became the flagship of the new series, until Titan X was released with a GPU of even greater power. Below it in the hierarchy is also the already announced GeForce GTX 1070 model, based on a stripped-down version of the GP104 chip, which we will consider below.

The recommended prices for the new Nvidia graphics card are $599 and $699 for the regular versions and the special Founders Edition (see below), respectively, and this is a pretty good deal considering that the GTX 1080 is ahead of not only the GTX 980 Ti, but also the Titan X. Today, the new product is the best performance solution on the single-chip video card market without any questions, and at the same time it costs less than the most productive video cards of the previous generation. So far, the GeForce GTX 1080 essentially has no competitor from AMD, so Nvidia was able to set a price that suits them.

The video card in question is based on the GP104 chip, which has a 256-bit memory bus, but the new type of GDDR5X memory runs at a very high effective frequency of 10 GHz, which gives a high peak bandwidth of 320 GB / s - which is almost on par with the GTX 980 Ti with 384 -bit bus. The volume of memory installed on a video card with such a bus could be equal to 4 or 8 GB, but installing a smaller volume for such a powerful solution in modern conditions would be stupid, so the GTX 1080 quite logically received 8 GB of memory, and this volume is enough to run any 3D games. applications with any quality settings for several years to come.

The GeForce GTX 1080 PCB is, for obvious reasons, quite different from the company's previous PCBs. The typical power consumption for the new product is 180 W - this is slightly higher than the GTX 980, but noticeably lower than the less productive Titan X and GTX 980 Ti. The reference board has the usual set of connectors for connecting image output devices: one Dual-Link DVI, one HDMI and three DisplayPort.

Founders Edition reference design

Even when the GeForce GTX 1080 was announced in early May, a special edition of the video card called Founders Edition was announced, which had a higher price compared to regular video cards from the company’s partners. Essentially, this edition is a reference design of the card and cooling system, and it is produced by Nvidia itself. You can have different attitudes towards such video card options, but the reference design developed by the company’s engineers and manufactured using high-quality components has its fans.

But whether they will pay several thousand rubles more for a video card from Nvidia itself is a question that can only be answered by practice. In any case, at first it will be the reference video cards from Nvidia that will go on sale at an increased price, and there is not much to choose from - this happens with every announcement, but the reference GeForce GTX 1080 is different in that it is planned to be sold in this form throughout its entire lifespan life, until the release of the next generation of solutions.

Nvidia believes that this publication has its merits even over the best works of its partners. For example, the dual-slot design of the cooler makes it easy to build on the basis of this powerful video card both gaming PCs of a relatively small form factor and multi-chip video systems (even despite the three- and four-chip mode of operation that is not recommended by the company). The GeForce GTX 1080 Founders Edition has some advantages in the form of an efficient cooler using a vapor chamber and a fan that pushes heated air out of the case - this is Nvidia's first such solution, consuming less than 250 W of power.

Compared to the company's previous reference product designs, the power circuit has been upgraded from four-phase to five-phase. Nvidia also talks about improved components on which the new product is based; electrical noise has also been reduced, allowing for improved voltage stability and overclocking potential. As a result of all the improvements, the energy efficiency of the reference board has increased by 6% compared to the GeForce GTX 980.

And in order to differ from the “regular” GeForce GTX 1080 models in appearance, an unusual “chopped” case design was developed for the Founders Edition. Which, however, probably also led to a more complicated shape of the evaporation chamber and radiator (see photo), which may have served as one of the reasons for paying an extra $100 for such a special edition. Let us repeat that at the beginning of sales, buyers will not have much choice, but in the future they will be able to choose either a solution with their own design from one of the company’s partners, or one made by Nvidia itself.

New generation of Pascal graphics architecture

The GeForce GTX 1080 video card was the company's first solution based on the GP104 chip, which belongs to the new generation of Nvidia's Pascal graphics architecture. Although the new architecture is based on solutions developed at Maxwell, it also has important functional differences, which we will write about later. The main change from a global point of view was the new technological process by which the new graphics processor was made.

The use of the 16 nm FinFET process in the production of GP104 graphics processors at the factories of the Taiwanese company TSMC made it possible to significantly increase the complexity of the chip while maintaining a relatively low area and cost. Compare the number of transistors and the area of ​​the GP104 and GM204 chips - they are similar in area (the crystal of the new product is even slightly smaller physically), but the Pascal architecture chip has a noticeably larger number of transistors, and, accordingly, execution units, including those that provide new functionality.

From an architectural point of view, the first gaming Pascal is very similar to similar solutions of the Maxwell architecture, although there are some differences. Like Maxwell, Pascal processors will have different configurations of Graphics Processing Cluster (GPC), Streaming Multiprocessor (SM) and memory controllers. The SM multiprocessor is a highly parallel multiprocessor that schedules and runs warps (groups of 32 instruction threads) on CUDA cores and other execution units in the multiprocessor. You can find detailed information about the design of all these blocks in our reviews of previous Nvidia solutions.

Each of the SM multiprocessors is paired with a PolyMorph Engine, which handles texture sampling, tessellation, transformation, vertex attribute setting, and perspective correction. Unlike previous solutions from the company, the PolyMorph Engine in the GP104 chip also contains a new multi-projection unit, Simultaneous Multi-Projection, which we will talk about below. The combination of an SM multiprocessor with one Polymorph Engine is traditionally called TPC - Texture Processor Cluster for Nvidia.

In total, the GP104 chip in the GeForce GTX 1080 contains four GPC clusters and 20 SM multiprocessors, as well as eight memory controllers combined with 64 ROP units. Each GPC cluster has a dedicated rasterization engine and includes five SM multiprocessors. Each multiprocessor, in turn, consists of 128 CUDA cores, a 256 KB register file, 96 KB shared memory, 48 KB L1 cache and eight TMU texture units. That is, in total the GP104 contains 2560 CUDA cores and 160 TMU units.

Also, the graphics processor on which the GeForce GTX 1080 is based contains eight 32-bit (as opposed to the 64-bit previously used) memory controllers, which gives us a final 256-bit memory bus. Each memory controller has eight ROP blocks and 256 KB of L2 cache. That is, in total the GP104 chip contains 64 ROP blocks and 2048 KB of second-level cache.

Thanks to architectural optimizations and a new process technology, the first gaming Pascal became the most energy-efficient GPU of all time. Moreover, there is a contribution to this both from one of the most advanced 16 nm FinFET technological processes, and from the architecture optimizations carried out in Pascal, in comparison with Maxwell. Nvidia was able to increase the clock frequency even more than they expected when switching to a new process technology. The GP104 operates at a higher frequency than a hypothetical GM204 produced using the 16nm process would operate. To do this, Nvidia engineers had to carefully check and optimize all the bottlenecks of previous solutions that prevented acceleration above a certain threshold. As a result, the new GeForce GTX 1080 model operates at more than 40% increased frequency compared to the GeForce GTX 980. But this is not all the changes associated with the GPU frequency.

GPU Boost 3.0 technology

As we well know from previous Nvidia video cards, in their graphics processors they use GPU Boost hardware technology, designed to increase the operating clock speed of the GPU in modes when it has not yet reached the limits of power consumption and heat dissipation. Over the years, this algorithm has undergone many changes, and the Pascal architecture video chip already uses the third generation of this technology - GPU Boost 3.0, the main innovation of which is a finer setting of turbo frequencies, depending on voltage.

If you remember the principle of operation of previous versions of the technology, then the difference between the base frequency (the guaranteed minimum frequency value below which the GPU does not fall, at least in games) and the turbo frequency was fixed. That is, the turbo frequency was always a certain number of megahertz higher than the base one. In GPU Boost 3.0, it became possible to set turbo frequency offsets for each voltage separately. The easiest way to understand this is by illustration:

On the left is the second version of GPU Boost, on the right is the third, which appeared in Pascal. The fixed difference between the base and turbo frequencies did not allow the full capabilities of the GPU to be revealed; in some cases, GPUs of previous generations could work faster at the set voltage, but a fixed excess of the turbo frequency did not allow this to be done. In GPU Boost 3.0 this feature has appeared, and the turbo frequency can be set for each of the individual voltage values, completely squeezing all the juice out of the GPU.

Handy utilities are required to control overclocking and set the turbo frequency curve. Nvidia itself does not do this, but helps its partners create similar utilities to make overclocking easier (within reasonable limits, of course). For example, the new functionality of GPU Boost 3.0 has already been revealed in EVGA Precision XOC, which includes a dedicated overclock scanner that automatically finds and sets the non-linear difference between base frequency and turbo frequency for different voltages by running a built-in performance and stability test. As a result, the user gets a turbo frequency curve that perfectly matches the capabilities of a particular chip. Which, moreover, can be modified in any way manually.

As you can see in the screenshot of the utility, in addition to information about the GPU and system, there are also settings for overclocking: Power Target (defines the typical power consumption during overclocking, as a percentage of the standard), GPU Temp Target (maximum allowable core temperature), GPU Clock Offset (exceeding the base frequency for all voltage values), Memory Offset (exceeding the video memory frequency above the default value), Overvoltage (additional ability to increase the voltage).

The Precision XOC utility includes three overclocking modes: Basic, Linear and Manual. In the main mode, you can set a single value for the excess frequency (fixed turbo frequency) above the base one, as was the case for previous GPUs. Linear mode allows you to set a linear frequency change from the minimum to maximum voltage values ​​for the GPU. Well, in manual mode you can set unique GPU frequency values ​​for each voltage point on the graph.

The utility also includes a special scanner for automatic overclocking. You can either set your own frequency levels or let Precision XOC scan the GPU at all voltages and find the most stable frequencies for each point on the voltage and frequency curve completely automatically. During the scanning process, Precision XOC gradually increases the GPU frequency and checks its operation for stability or artifacts, building an ideal frequency and voltage curve that will be unique for each specific chip.

This scanner can be customized to your own requirements by setting the time period for testing each voltage value, the minimum and maximum frequency being tested, and its step. It is clear that to achieve stable results it would be better to set a small step and a decent testing duration. During the testing process, unstable operation of the video driver and system may be observed, but if the scanner does not freeze, it will restore operation and continue to find optimal frequencies.

New GDDR5X video memory type and improved compression

So, the power of the GPU has increased significantly, but the memory bus remains only 256-bit - will memory bandwidth limit overall performance and what can be done about it? Looks like it's promising HBM memory the second generation is still too expensive to produce, so we had to look for other options. Ever since the introduction of GDDR5 memory in 2009, Nvidia engineers have been exploring the possibilities of using new types of memory. As a result, developments have led to the introduction of a new memory standard, GDDR5X - the most complex and advanced standard to date, providing a transfer speed of 10 Gbps.

Nvidia gives an interesting example of how fast this is. Only 100 picoseconds pass between transmitted bits - during this time a beam of light will travel a distance of only one inch (about 2.5 cm). And when using GDDR5X memory, the data transmission and reception circuits must select the value of the transmitted bit in less than half this time, before the next one is sent - this is just so you understand what modern technology has come to.

To achieve such speed, it was necessary to develop a new architecture for the data input/output system, which required several years of joint development with memory chip manufacturers. In addition to the increased data transfer speed, energy efficiency has also increased - GDDR5X memory chips use a lower voltage of 1.35 V and are manufactured using new technologies, which gives the same energy consumption at a 43% higher frequency.

The company's engineers had to rework the data lines between the GPU core and memory chips, paying more attention to preventing signal loss and degradation along the entire path from memory to the GPU and back. Thus, the illustration above shows the captured signal in the form of a large symmetrical “eye”, which indicates good optimization of the entire circuit and the relative ease of capturing data from the signal. Moreover, the changes described above led not only to the possibility of using GDDR5X at 10 GHz, but should also help to obtain high memory bandwidth on future products using more conventional GDDR5 memory.

Well, we got more than 40% increase in bandwidth from using the new memory. But isn't this enough? To further increase the efficiency of memory bandwidth, Nvidia continued to improve the advanced data compression introduced in previous architectures. The memory subsystem in the GeForce GTX 1080 uses improved and several new lossless data compression techniques designed to reduce bandwidth requirements - this is the fourth generation of on-chip compression.

In-memory data compression algorithms bring several positive aspects. Compression reduces the amount of data written to memory, the same applies to data sent from video memory to the second level cache, which improves the efficiency of using the L2 cache, since a compressed tile (a block of several framebuffer pixels) is smaller than an uncompressed one. It also reduces the amount of data sent between different points, such as the TMU and framebuffer.

The data compression pipeline in the GPU uses several algorithms, which are determined depending on the “compressibility” of the data - the best available algorithm is selected for them. One of the most important is the delta color compression algorithm. This compression technique encodes data as the difference between successive values ​​instead of the data itself. The GPU calculates the difference in color values ​​between pixels in a block (tile) and stores the block as an average color for the entire block plus data on the difference in values ​​​​for each pixel. For graphic data, this method is usually well suited, since the color within small tiles for all pixels often does not differ too much.

The GP104 graphics processor in the GeForce GTX 1080 supports more compression algorithms compared to previous Maxwell architecture chips. Thus, the 2:1 compression algorithm has become more efficient, and in addition to it, two new algorithms have appeared: a 4:1 compression mode, suitable for cases when the difference in the color value of the block pixels is very small, and an 8:1 mode, combining a constant algorithm 4:1 compression of 2x2 pixel blocks with 2x delta compression between blocks. When compression is completely impossible, it is not used.

However, in reality the latter happens very rarely. This can be seen in the examples of screenshots from the Project CARS game that Nvidia provided to illustrate the increased compression ratio in Pascal. In the illustrations, those frame buffer tiles that were able to be compressed by the GPU are painted in purple, while those that cannot be compressed without loss remain with the original color (top - Maxwell, bottom - Pascal).

As you can see, the new compression algorithms in GP104 really work much better than in Maxwell. While the older architecture was also able to compress most of the tiles in the scene, large amounts of grass and trees around the edges, as well as vehicle parts, are not subject to the legacy compression algorithms. But when we put new techniques into work in Pascal, very few areas of the image remained uncompressed - the improved efficiency is obvious.

As a result of improvements in data compression, the GeForce GTX 1080 is able to significantly reduce the amount of data sent per frame. In terms of numbers, improved compression saves an additional 20% of effective memory bandwidth. In addition to the more than 40% increase in memory bandwidth of the GeForce GTX 1080 relative to the GTX 980 due to the use of GDDR5X memory, all together this gives about a 70% increase in effective bandwidth compared to the previous generation model.

Support for asynchronous computing Async Compute

Most modern games use complex calculations in addition to graphics. For example, calculations when calculating the behavior of physical bodies can be carried out not before or after graphical calculations, but simultaneously with them, since they are not related to each other and do not depend on each other within one frame. Another example is post-processing of already rendered frames and processing of audio data, which can also be performed in parallel with rendering.

Another prominent example of the use of functionality is the technique of asynchronous time warp (Asynchronous Time Warp), used in virtual reality systems in order to change the output frame in accordance with the movement of the player's head right before its output, interrupting the rendering of the next one. Such asynchronous loading of GPU power makes it possible to increase the efficiency of using its execution units.

Such workloads create two new scenarios for using GPUs. The first of these involves overlapping loads, since many types of tasks do not fully utilize the capabilities of the GPUs, and some resources are idle. In such cases, you can simply run two different tasks on the same GPU, separating its execution units for more efficient use - for example, PhysX effects running in conjunction with 3D frame rendering.

To improve this scenario, the Pascal architecture introduced dynamic load balancing. In the previous Maxwell architecture, overlapping workloads were implemented by statically distributing GPU resources between graphics and compute. This approach is effective provided that the balance between the two workloads approximately corresponds to the division of resources and the tasks are completed in the same amount of time. If non-graphical calculations take longer than graphic ones, and both are waiting for the overall work to complete, then part of the GPU will be idle for the remaining time, which will cause a decrease in overall performance and reduce all benefits to nothing. Hardware dynamic load balancing allows you to use freed GPU resources as soon as they become available - we will provide an illustration for understanding.

There are also tasks that are critical to execution time, and this is the second scenario of asynchronous computing. For example, the asynchronous time distortion algorithm in VR must complete before scan out or the frame will be discarded. In this case, the GPU must support very fast task interruption and switching to another in order to remove a less critical task from execution on the GPU, freeing up its resources for critical tasks - this is called preemption.

A single render command from a game engine can contain hundreds of draw calls, each draw call in turn containing hundreds of triangles to process, each containing hundreds of pixels that need to be calculated and drawn. The traditional GPU approach only interrupts tasks at a high level, and the graphics pipeline is forced to wait for all that work to complete before switching tasks, resulting in very high latencies.

To correct this, the Pascal architecture introduced for the first time the ability to interrupt a task at the pixel level - Pixel Level Preemption. Pascal GPU execution units can continuously monitor the progress of rendering tasks, and when an interrupt is requested, they can stop execution, preserving the context for further completion, quickly switching to another task.

Thread-level interruption and switching for compute operations works similarly to pixel-level interruption for graphics computing. Compute workloads consist of multiple grids, each containing multiple threads. When an interrupt request is received, threads running on the multiprocessor terminate execution. Other blocks save their own state to continue from the same point in the future, and the GPU switches to another task. The entire task switching process takes less than 100 microseconds after the running threads exit.

For gaming workloads, the combination of pixel-level interrupts for graphics workloads and thread-level interruptions for compute workloads gives Pascal GPUs the ability to quickly switch between tasks with minimal downtime. And for computing tasks on CUDA, interruption with minimal granularity is also possible - at the instruction level. In this mode, all threads stop execution at once, immediately switching to another task. This approach requires storing more information about the state of all registers of each thread, but in some non-graphical computing cases it is quite justified.

The use of fast interruption and task switching in graphics and compute workloads was added to the Pascal architecture so that graphics and non-graphics tasks could be interrupted at the level of individual instructions, rather than entire threads, as was the case with Maxwell and Kepler. These technologies can improve the asynchronous execution of various GPU workloads and improve responsiveness when running multiple tasks simultaneously. At the Nvidia event, they showed a demonstration of asynchronous computing using the example of calculating physical effects. If without asynchronous computing the performance was at the level of 77-79 FPS, then with the inclusion of these features the frame rate increased to 93-94 FPS.

We have already given an example of one of the possibilities of using this functionality in games in the form of asynchronous time distortion in VR. The illustration shows the operation of this technology with a traditional interruption (preemption) and with a fast one. In the first case, they try to perform the process of asynchronous time distortion as late as possible, but before the start of updating the image on the display. But the work of the algorithm must be sent to the GPU for execution a few milliseconds earlier, since without a fast interrupt there is no way to accurately execute the work at the right moment, and the GPU is idle for some time.

In the case of pixel- and thread-precise interruption (shown on the right), this capability allows greater precision in determining when the interrupt occurs, and asynchronous time distortion can be started much later with the confidence of completing the job before the display starts updating. And the GPU, which is idle for some time in the first case, can be loaded with some additional graphics work.

Simultaneous Multi-Projection technology

The new GP104 GPU now supports new Simultaneous Multi-Projection (SMP) technology, allowing the GPU to render data on modern display systems more efficiently. SMP allows the video chip to simultaneously output data in several projections, which required introducing a new hardware block in the GPU as part of the PolyMorph engine at the end of the geometry pipeline before the rasterization unit. This block is responsible for working with multiple projections for a single geometry stream.

The multi-projection engine processes geometric data simultaneously for 16 pre-configured projections combining the projection point (camera), these projections can be independently rotated or tilted. Since each geometry primitive can appear in multiple views simultaneously, the SMP engine provides this functionality by allowing the application to instruct the GPU to replicate the geometry up to 32 times (16 views at two projection centers) without additional processing.

The entire processing process is hardware accelerated, and since multiprojection works after the geometry engine, it does not need to repeat all the geometry processing stages several times. The resource savings are important when rendering speed is limited by geometry processing performance, such as tessellation, where the same geometric work is performed multiple times for each projection. Accordingly, in the peak case, multiprojection can reduce the need for geometry processing by up to 32 times.

But why is all this needed? There are some good examples where multi-projection technology can be useful. For example, a multi-monitor system of three displays installed at an angle to each other quite close to the user (surround configuration). In a typical situation, the scene is rendered in one projection, which leads to geometric distortions and incorrect geometry rendering. The correct way is to have three different projections for each of the monitors, according to the angle at which they are positioned.

Using a video card on a chip with Pascal architecture, this can be done in one geometry pass, specifying three different projections, each for its own monitor. And the user will thus be able to change the angle at which the monitors are located to each other not only physically, but also virtually - rotating the projections for the side monitors to get the correct perspective in the 3D scene with a noticeably wider viewing angle (FOV). However, there is a limitation here - for such support, the application must be able to render the scene with a wide FOV and use special SMP API calls to set it. That is, you can’t do this in every game; you need special support.

Either way, the days of a single projection onto a single flat-panel monitor are gone, with many multi-monitor configurations and curved displays now available that can also use this technology. Not to mention virtual reality systems that use special lenses between the screens and the user's eyes, which require new techniques for projecting a 3D image into a 2D picture. Many of these technologies and techniques are still in the early stages of development, the main thing being that older GPUs cannot effectively use more than one plane view. They require several rendering passes, repeated processing of the same geometry, etc.

Maxwell architecture chips had limited support for Multi-Resolution to help improve efficiency, but Pascal's SMP can do much more. Maxwell could rotate the projection 90 degrees for cube mapping or different projection resolutions, but this was only useful in limited applications such as VXGI.

Other possibilities for using SMP include multi-resolution rendering and single-pass stereo rendering. For example, Multi-Res Shading can be used in games to optimize performance. When applied, a higher resolution is used in the center of the frame, and at the periphery it is reduced to obtain a higher rendering speed.

Single-pass stereo rendering is used in VR, already added to the VRWorks package, and uses multi-projection capabilities to reduce the amount of geometric work required in VR rendering. When this feature is used, the GeForce GTX 1080 GPU processes the scene geometry only once, generating two projections for each eye at once, which halves the geometric load on the GPU, and also reduces losses from driver and OS operation.

An even more advanced method for increasing the efficiency of VR rendering is Lens Matched Shading, which uses multiple projections to simulate the geometric distortions required in VR rendering. This method uses multi-projection to render a 3D scene onto a surface that approximates the lens-corrected rendering for VR headset rendering, avoiding drawing a lot of extra pixels on the periphery that will be discarded. The easiest way to understand the essence of the method is from the illustration - four slightly expanded projections are used in front of each eye (on Pascal you can use 16 projections for each eye - for a more accurate imitation of a curved lens) instead of one:

This approach can significantly save on performance. Thus, a typical Oculus Rift image for each eye is 1.1 megapixels. But due to the difference in projections, a 2.1 megapixel source image is used to render it - 86% larger than necessary! The use of multi-projection, implemented in the Pascal architecture, allows you to reduce the resolution of the rendered image to 1.4 megapixels, achieving one and a half times savings in pixel processing speed, and also saves memory bandwidth.

And along with a twofold saving in geometry processing speed due to single-pass stereo rendering, the GeForce GTX 1080 graphics card is capable of providing a significant increase in VR rendering performance, which is very demanding both in terms of geometry processing speed, and even more so in pixel processing.

Improvements in video output and processing units

In addition to the performance and new functionality associated with 3D rendering, it is necessary to maintain a good level of image output capabilities, as well as video decoding and encoding. And the first Pascal architecture GPU did not disappoint - it supports all modern standards in this sense, including hardware decoding of the HEVC format, necessary for watching 4K videos on a PC. Also, future owners of GeForce GTX 1080 video cards will soon be able to enjoy playing 4K video streaming from Netflix and other providers on their systems.

In terms of display output, the GeForce GTX 1080 has support for HDMI 2.0b with HDCP 2.2, as well as DisplayPort. So far the DP 1.2 version has been certified, but the GPU is ready for certification for the newer versions of the standard: DP 1.3 Ready and DP 1.4 Ready. The latter allows 4K displays to be output at a 120Hz refresh rate, and 5K and 8K displays to run at 60Hz using a pair of DisplayPort 1.3 cables. If for the GTX 980 the maximum supported resolution was 5120x3200 at 60 Hz, then for the new GTX 1080 model it increased to 7680x4320 at the same 60 Hz. The reference GeForce GTX 1080 has three DisplayPort outputs, one HDMI 2.0b and one digital Dual-Link DVI.

The new Nvidia video card model also received an improved video data decoding and encoding unit. Thus, the GP104 chip meets the high standards of PlayReady 3.0 (SL3000) for video streaming playback, allowing you to be sure that the playback of high-quality content from reputable providers such as Netflix will be as high quality and energy efficient as possible. Details about support for various video formats during encoding and decoding are given in the table; the new product clearly differs from previous solutions for the better:

But an even more interesting new feature is support for so-called High Dynamic Range (HDR) displays, which are about to become widespread on the market. TVs are already selling in 2016 (and four million HDR TVs are expected to be sold in just one year), and monitors next year. HDR is the biggest breakthrough in display technology in years, the format provides twice the color tones (75% of the visible spectrum, as opposed to 33% for RGB), brighter displays (1000 nits) with greater contrast (10,000:1) and rich colors.

The emergence of the ability to reproduce content with a greater difference in brightness and richer and more saturated colors will bring the image on the screen closer to reality, blacks will become deeper, and bright light will be blinding, as in the real world. Accordingly, users will see more detail in the bright and dark areas of images compared to standard monitors and TVs.

To support HDR displays, the GeForce GTX 1080 has everything you need - the ability to output 12-bit color, support for the BT.2020 and SMPTE 2084 standards, as well as output in accordance with the HDMI 2.0b 10/12-bit standard for HDR in 4K- resolution, which was also the case with Maxwell. In addition to this, Pascal now supports decoding the HEVC format in 4K resolution at 60 Hz and 10- or 12-bit color, which is used for HDR video, as well as encoding the same format with the same parameters, but only in 10-bit for HDR video recording or streaming. The new product is also ready to standardize DisplayPort 1.4 for transmitting HDR data via this connector.

By the way, HDR video encoding may be needed in the future in order to transfer such data from a home PC to a SHIELD game console that can play 10-bit HEVC. That is, the user will be able to broadcast the game from a PC in HDR format. Wait, where can I get games with such support? Nvidia continually works with game developers to implement this support, providing them with everything they need (driver support, code examples, etc.) to correctly render HDR images compatible with existing displays.

At the time of release of the video card, GeForce GTX 1080, such games as Obduction, The Witness, Lawbreakers, Rise of the Tomb Raider, Paragon, The Talos Principle and Shadow Warrior 2 support HDR output. But in the near future this list is expected to be replenished .

Changes to SLI multi-chip rendering

There have also been some changes related to the proprietary SLI multi-chip rendering technology, although no one expected this. SLI is used by PC gaming enthusiasts to push performance either to extreme levels by pairing powerful single-chip video cards in tandem, or to achieve very high frame rates by limiting themselves to a couple of mid-range solutions that are sometimes cheaper than one top-end ( The decision is controversial, but they do it that way). With 4K monitors, players have almost no other options other than installing a couple of video cards, since even top models often cannot provide comfortable gaming at maximum settings in such conditions.

One of the important components of Nvidia SLI are bridges that connect video cards into a common video subsystem and serve to organize a digital channel for data transfer between them. GeForce video cards traditionally featured dual SLI connectors, which served to connect two or four video cards in 3-Way and 4-Way SLI configurations. Each of the video cards had to connect to each one, since all the GPUs sent the frames they rendered to the main GPU, which is why two interfaces were needed on each of the cards.

Beginning with the GeForce GTX 1080, all Nvidia graphics cards based on the Pascal architecture link two SLI interfaces together to improve inter-GPU transfer performance, and this new dual-link SLI mode improves performance and visual experience on very high-resolution displays. or multi-monitor systems.

This mode also required new bridges, called SLI HB. They combine a pair of GeForce GTX 1080 video cards over two SLI channels at once, although the new video cards are also compatible with older bridges. For resolutions of 1920×1080 and 2560×1440 pixels at a refresh rate of 60 Hz, you can use standard bridges, but in more demanding modes (4K, 5K and multi-monitor systems), only new bridges will provide the best results in terms of frame smoothness, although the old ones will work, but somewhat worse.

Also, when using SLI HB bridges, the GeForce GTX 1080 data transfer interface operates at 650 MHz, compared to 400 MHz for conventional SLI bridges on older GPUs. Moreover, for some of the rigid old bridges, a higher data transmission frequency is also available with Pascal architecture video chips. With an increase in the data transfer rate between GPUs via a double SLI interface with an increased operating frequency, smoother frame output on the screen is ensured compared to previous solutions:

It should also be noted that support for multi-chip rendering in DirectX 12 is somewhat different from what was usual before. In the latest version of the graphics API, Microsoft has made many changes related to the operation of such video systems. For software developers, DX12 offers two options for using multiple GPUs: Multi Display Adapter (MDA) and Linked Display Adapter (LDA) modes.

Moreover, the LDA mode has two forms: Implicit LDA (which Nvidia uses for SLI) and Explicit LDA (when the game developer takes on the task of managing multi-chip rendering. The MDA and Explicit LDA modes were introduced into DirectX 12 in order to give game developers have more freedom and opportunities when using multi-chip video systems.The difference between the modes is clearly visible in the following table:

In LDA mode, the memory of each GPU can be linked to the memory of another and displayed as a large total volume, of course, with all the performance limitations when data is taken from “foreign” memory. In MDA mode, each GPU's memory operates separately, and different GPUs cannot directly access data from another GPU's memory. LDA mode is designed for multi-chip systems of similar performance, while MDA mode has fewer restrictions and can work together between discrete and integrated GPUs or discrete solutions with chips from different manufacturers. But this mode also requires more thought and work from developers when programming to work together so that the GPUs can communicate with each other.

By default, an SLI system based on GeForce GTX 1080 boards supports only two GPUs, and three- and four-chip configurations are not officially recommended for use, since in modern games it is becoming increasingly difficult to provide performance gains from adding a third and fourth GPU. For example, many games rely on the capabilities of the system’s central processor when operating multi-chip video systems; new games also increasingly use temporal techniques that use data from previous frames, in which the effective operation of several GPUs at once is simply impossible.

However, operation of systems in other (non-SLI) multi-chip systems remains possible, such as MDA or LDA Explicit modes in DirectX 12 or a dual-chip SLI system with a dedicated third GPU for PhysX physical effects. What about the records in benchmarks? Is Nvidia really abandoning them completely? No, of course, but since such systems are in demand in the world by almost a few users, for such ultra-enthusiasts they came up with a special Enthusiast Key, which can be downloaded on the Nvidia website and unlock this feature. To do this, you must first obtain a unique GPU identifier by running a special application, then request the Enthusiast Key on the website and, after downloading it, install the key into the system, thereby unlocking 3-Way and 4-Way SLI configurations.

Fast Sync technology

Some changes have occurred in synchronization technologies when displaying information. Looking ahead, nothing new has appeared in G-Sync, nor is Adaptive Sync technology supported. But Nvidia decided to improve the smoothness of the output and synchronization for games that show very high performance when the frame rate is noticeably higher than the refresh rate of the monitor. This is especially important for games that require minimal latency and fast response and which host multiplayer battles and competitions.

Fast Sync is a new alternative to vertical sync that does not have visual artifacts such as image tearing and is not tied to a fixed refresh rate, which increases latency. What is the problem with Vsync in games like Counter-Strike: Global Offensive? This game runs at several hundred frames per second on powerful modern GPUs, and the player has the choice of whether to enable V-sync or not.

In multiplayer games, users most often strive for minimal latency and disable VSync, resulting in clearly visible tearing in the image, which is extremely unpleasant even at high frame rates. If you enable vertical synchronization, the player will experience a significant increase in delays between his actions and the image on the screen when the graphics pipeline slows down to the monitor's refresh rate.

This is how a traditional conveyor works. But Nvidia decided to separate the process of rendering and displaying images on the screen using Fast Sync technology. This allows the part of the GPU that is rendering frames to continue to work as efficiently as possible at full speed, storing those frames in a special temporary buffer, the Last Rendered Buffer.

This method allows you to change the way you display the screen and take the best of VSync On and VSync Off modes, achieving low latency but no image artifacts. With Fast Sync there is no frame flow control, the game engine runs in synchronization disabled mode and is not told to wait to render the next one, so the latencies are almost as low as with VSync Off mode. But since Fast Sync independently selects a buffer for output to the screen and displays the entire frame, there are no picture breaks.

Fast Sync uses three different buffers, the first two of which work similarly to double buffering in a classic pipeline. The primary buffer (Front Buffer - FB) is the buffer from which information is displayed on the display, a fully rendered frame. The secondary buffer (Back Buffer - BB) is a buffer that receives information during rendering.

When using vertical synchronization at high frame rates, the game waits until the refresh interval is reached to swap the primary buffer with the secondary buffer to display the entire frame on the screen. This slows down the process, and adding additional buffers like traditional triple buffering will only add to the delay.

With Fast Sync, a third buffer is added, the Last Rendered Buffer (LRB), which is used to store all the frames that have just been rendered in the secondary buffer. The name of the buffer speaks for itself; it contains a copy of the last fully rendered frame. And when the time comes to update the primary buffer, this LRB buffer is copied to the primary as a whole, and not in parts, as from the secondary when vertical synchronization is disabled. Since copying information from buffers is ineffective, they are simply swapped (or renamed, as it will be more convenient to understand), and the new logic for swapping buffers, which appeared in GP104, manages this process.

In practice, enabling the new synchronization method Fast Sync still provides a slightly higher delay compared to disabling vertical synchronization altogether - on average 8 ms more, but it displays frames on the monitor in their entirety, without unpleasant artifacts on the screen that tear up the image. The new method can be enabled from the graphics settings of the Nvidia control panel in the Vsync control section. However, the default value remains application control, and there is simply no need to enable Fast Sync in all 3D applications; it is better to choose this method specifically for high FPS games.

Virtual reality technologies Nvidia VRWorks

We've touched on the hot topic of virtual reality more than once in the article, but we've mostly talked about increasing frame rates and ensuring low latency, which are very important for VR. All this is very important and progress is indeed being made, but so far VR games do not look nearly as impressive as the best of the “regular” modern 3D games. This happens not only because leading game developers are not yet particularly involved in VR applications, but also because VR is more demanding on frame rates, which prevents the use of many of the usual techniques in such games due to their high demands.

In order to reduce the difference in quality between VR games and regular ones, Nvidia decided to release a whole package of relevant VRWorks technologies, which included a large number of APIs, libraries, engines and technologies that can significantly improve both the quality and performance of VR games. applications. How does this relate to the announcement of the first gaming solution based on Pascal? It’s very simple - some technologies have been introduced into it to help increase productivity and improve quality, and we have already written about them.

And although the matter concerns not only graphics, first we’ll talk a little about it. The set of VRWorks Graphics technologies includes previously mentioned technologies, such as Lens Matched Shading, which uses the multi-projection feature that appeared in the GeForce GTX 1080. The new product allows you to get a performance increase of 1.5-2 times compared to solutions that do not have such support. We also mentioned other technologies, such as MultiRes Shading, designed for rendering with different resolutions in the center of the frame and at its periphery.

But much more unexpected was the announcement of VRWorks Audio technology, designed for high-quality processing of audio data in 3D scenes, which is especially important in virtual reality systems. In conventional engines, the positioning of sound sources in a virtual environment is calculated quite correctly; if the enemy shoots from the right, then the sound is louder from that side of the audio system, and such a calculation is not too demanding on computing power.

But in reality, sounds go not only to the player, but in all directions and are reflected from various materials, similar to how light rays are reflected. And in reality, we hear these reflections, although not as clearly as direct sound waves. These indirect reflections of sound are usually simulated by special reverb effects, but this is a very primitive approach to the task.

VRWorks Audio uses sound wave rendering similar to ray tracing in rendering, where the path of light rays is traced to multiple reflections from objects in a virtual scene. VRWorks Audio also simulates the propagation of sound waves in the environment by tracking direct and reflected waves depending on their angle of incidence and the properties of reflective materials. In its work, VRWorks Audio uses the high-performance Nvidia OptiX engine, known for graphics tasks, designed for ray tracing. OptiX can be used for a variety of tasks, such as calculating indirect lighting and preparing light maps, and now for sound wave tracing in VRWorks Audio.

Nvidia has built precise sound wave calculations into its VR Funhouse demo, which uses several thousand beams and calculates up to 12 reflections from objects. And in order to understand the advantages of the technology using a clear example, we invite you to watch a video about the operation of the technology in Russian:

It is important that Nvidia’s approach differs from traditional sound engines, including hardware accelerated using a special block in the GPU method from its main competitor. All these methods provide only precise positioning of sound sources, but do not calculate the reflection of sound waves from objects in a 3D scene, although they can simulate this using the reverberation effect. Still, using ray tracing technology can be much more realistic, since only this approach will provide an accurate simulation of various sounds, taking into account the size, shape and materials of objects in the scene. It’s difficult to say whether such calculation accuracy is required for a typical player, but one thing is certain: in VR it can add to users that very realism that is still lacking in regular games.

Well, all we have left to talk about is VR SLI technology, which works in both OpenGL and DirectX. Its principle is extremely simple: a dual-processor video system in a VR application will work in such a way that each eye is allocated a separate GPU, in contrast to AFR rendering, which is common for SLI configurations. This significantly improves overall performance, which is so important for virtual reality systems. Theoretically, more GPUs can be used, but their number must be even.

This approach was required because AFR is not well suited for VR, since with its help the first GPU will draw an even frame for both eyes, and the second - an odd one, which does not reduce latency, which is critical for virtual reality systems. Although the frame rate will be quite high. So with VR SLI, the work on each frame is divided into two GPUs - one works on part of the frame for the left eye, the second - for the right, and then these halves of the frame are combined into a whole.

This division of work between a pair of GPUs results in nearly 2x performance gains, allowing for higher frame rates and lower latency than single-GPU systems. However, using VR SLI requires special support from the application to use this scaling method. But VR SLI technology is already built into such VR demo applications as Valve's The Lab and ILMxLAB's Trials on Tatooine, and this is just the beginning - Nvidia promises other applications will soon appear, as well as implementation of the technology in game engines Unreal Engine 4, Unity and MaxPlay.

Ansel gaming screenshot platform

One of the most interesting announcements related to software was the release of technology for capturing high-quality screenshots in gaming applications, named after one famous photographer - Ansel. Games have long become not just games, but also a place for the use of playful hands for various creative individuals. Some people change scripts for games, some release high-quality sets of textures for games, and some take beautiful screenshots.

Nvidia decided to help the latter by introducing a new platform for creating (and creating, because this is not such a simple process) high-quality images from games. They believe Ansel can help create a new kind of contemporary art. After all, there are already quite a lot of artists who spend most of their lives on a PC, creating beautiful screenshots from games, and they still did not have a convenient tool for this.

Ansel allows you to not only capture an image in a game, but change it as the creator needs. Using this technology, you can move the camera around the scene, rotate and tilt it in any direction in order to obtain the desired composition of the frame. For example, in games like first-person shooters, you can only move the player, you can’t really change anything else, so all the screenshots turn out to be quite monotonous. With a free camera in Ansel, you can go far beyond the limits of the game camera, choosing the angle that is needed for a successful picture, or even capture a full 360-degree stereo image from the desired point, and in high resolution for later viewing in a VR helmet.

Ansel works quite simply - using a special library from Nvidia, this platform is implemented into the game code. To do this, its developer only needs to add a small piece of code to his project to allow the Nvidia video driver to intercept buffer and shader data. There is very little work involved; implementing Ansel into the game requires less than one day to implement. Thus, enabling this feature in The Witness took about 40 lines of code, and in The Witcher 3 it took about 150 lines of code.

Ansel will come with an open source SDK. The main thing is that the user receives a standard set of settings with it, allowing him to change the position and angle of the camera, add effects, etc. The Ansel platform works like this: it pauses the game, turns on the free camera and allows you to change the frame to the desired view, recording the result in the form of a regular screenshot, a 360-degree image, a stereo pair, or simply a huge resolution panorama.

The only caveat is that not all games will support all the features of the Ansel game screenshot platform. Some game developers, for one reason or another, do not want to enable a completely free camera in their games - for example, because of the possibility of cheaters using this functionality. Or they want to limit the change in viewing angle for the same reason - so that no one gets an unfair advantage. Well, or so that users don’t see the poor sprites in the background. All these are completely normal desires of game creators.

One of the most interesting features of Ansel is the creation of screenshots of simply enormous resolution. It doesn’t matter that the game supports resolutions up to 4K, for example, and the user’s monitor is Full HD. Using the screenshot platform, you can capture a much higher quality image, which is limited rather by the capacity and performance of the drive. The platform easily captures screenshots with a resolution of up to 4.5 gigapixels, stitching them together from 3600 pieces!

It is clear that in such pictures you can see all the details, right down to the text on newspapers lying in the distance, if such a level of detail is in principle provided for in the game - Ansel can also control the level of detail, setting the maximum level to get the best picture quality. But you can also enable supersampling. All this allows you to create images from games that you can safely print on large banners and be confident about their quality.

Interestingly, a special hardware-accelerated code based on CUDA is used to stitch large images. After all, no video card can render a multi-gigapixel image as a whole, but it can do it in pieces, which simply need to be combined later, taking into account possible differences in lighting, color, etc.

After stitching such panoramas, special post-processing is used for the entire frame, also accelerated on the GPU. And to capture images with increased dynamic range, you can use a special image format - EXR, an open standard from Industrial Light and Magic, the color values ​​​​of which are recorded in 16-bit floating point format (FP16) in each channel.

This format allows you to change the brightness and dynamic range of the image by post-processing, bringing it to the desired level for each specific display, in the same way as is done with RAW formats from cameras. And for the subsequent use of post-processing filters in image processing programs, this format is very useful, since it contains much more data than conventional image formats.

But the Ansel platform itself contains a lot of post-processing filters, which is especially important because it has access not only to the final image, but also to all the buffers used by the game when rendering, which can be used for very interesting effects, like depth of field. Ansel has a special post-processing API for this, and any of the effects can be included in a game that supports this platform.

Ansel post filters include the following filters: color curves, color space, transformation, desaturation, brightness/contrast, film grain, bloom, lens flare, anamorphic glare, distortion, heathaze, fisheye, color aberration, tone mapping, lens dirt, lightshafts , vignette, gamma correction, convolution, sharpening, edge detection, blur, sepia, denoise, FXAA and others.

As for the appearance of Ansel support in games, you will have to wait a bit until developers implement and test it. But Nvidia promises the imminent appearance of such support in such famous games as The Division, The Witness, Lawbreakers, The Witcher 3, Paragon, Fortnite, Obduction, No Man's Sky, Unreal Tournament and others.

The new 16 nm FinFET technological process and architecture optimization allowed the GeForce GTX 1080 video card, based on the GP104 graphics processor, to achieve a high clock frequency of 1.6-1.7 GHz even in the reference form, and the new generation guarantees operation at the highest possible frequencies in games GPU Boost technology. Together with the increased number of execution units, these improvements made the new product not just the highest-performance single-chip video card of all time, but also the most energy-efficient solution on the market.

The GeForce GTX 1080 model became the first video card to carry a new type of graphics memory GDDR5X - a new generation of high-speed chips that made it possible to achieve very high data transfer rates. In the case of the GeForce GTX 1080 modification, this type of memory operates at an effective frequency of 10 GHz. Combined with improved information compression algorithms in the framebuffer, this has led to an increase in effective memory bandwidth for this graphics processor by 1.7 times compared to its direct predecessor, the GeForce GTX 980.

Nvidia wisely decided not to release a radically new architecture on a completely new technological process, so as not to encounter unnecessary problems during development and production. Instead, they seriously improved the already good and very efficient Maxwell architecture, adding some features. As a result, everything is fine with the production of new GPUs, and in the case of the GeForce GTX 1080 model, engineers have achieved a very high frequency potential - in overclocked versions from partners, GPU frequencies are expected up to 2 GHz! Such an impressive frequency became possible thanks to the perfect technical process and painstaking work of Nvidia engineers when developing the Pascal GPU.

And although Pascal became a direct successor to Maxwell, and these graphics architectures are not fundamentally very different from each other, Nvidia has introduced many changes and improvements, including the ability to display images on displays, the video encoding and decoding engine, and improved asynchronous execution of various types of calculations on the GPU, made changes to multi-chip rendering and introduced a new synchronization method, Fast Sync.

It is impossible not to highlight the multi-projection technology Simultaneous Multi-Projection, which helps improve performance in virtual reality systems, obtain more correct display of scenes on multi-monitor systems, and introduce new performance optimization techniques. But VR applications will receive the greatest increase in speed when they support multi-projection technology, which helps save GPU resources by half when processing geometric data and by one and a half times when performing pixel-by-pixel calculations.

Among the purely software changes, the platform for creating screenshots in games called Ansel stands out - it will be interesting to try it not only for those who play a lot, but also for those simply interested in high-quality 3D graphics. The new product allows you to advance the art of creating and retouching screenshots to a new level. Well, Nvidia simply continues to improve its packages for game developers, such as GameWorks and VRWorks, step by step - for example, the latter has an interesting feature for high-quality sound processing, taking into account numerous reflections of sound waves using hardware ray tracing.

In general, a real leader has entered the market in the form of the Nvidia GeForce GTX 1080 video card, having all the necessary qualities for this: high performance and broad functionality, as well as support for new features and algorithms. The first buyers of this video card will be able to appreciate many of the mentioned advantages immediately, and other possibilities of the solution will be revealed a little later, when broad software support appears. The main thing is that the GeForce GTX 1080 turned out to be very fast and efficient, and we really hope that Nvidia engineers managed to fix some of the problem areas (the same asynchronous calculations).

Graphics accelerator GeForce GTX 1070

ParameterMeaning
Chip code nameGP104
Production technology16 nm FinFET
Number of transistors7.2 billion
Core area314 mm²
ArchitectureUnified, with an array of common processors for stream processing of numerous types of data: vertices, pixels, etc.
DirectX hardware supportDirectX 12, with support for Feature Level 12_1
Memory bus256-bit: Eight independent 32-bit memory controllers supporting GDDR5 and GDDR5X memory
GPU frequency1506 (1683) MHz
Computing blocks15 active (out of 20 in the chip) streaming multiprocessors, including 1920 (out of 2560) scalar ALUs for floating point calculations within the framework of the IEEE 754-2008 standard;
Texturing blocks120 active (out of 160 on the chip) texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Raster Operation Blocks (ROPs)8 wide ROP blocks (64 pixels) with support for various anti-aliasing modes, including programmable and with FP16 or FP32 frame buffer format. The blocks consist of an array of configurable ALUs and are responsible for depth generation and comparison, multisampling and blending
Monitor supportIntegrated support for up to four monitors connected via Dual Link DVI, HDMI 2.0b and DisplayPort 1.2 (1.3/1.4 Ready) interfaces

Specifications of the reference video card GeForce GTX 1070
ParameterMeaning
Core frequency1506 (1683) MHz
Number of universal processors1920
Number of texture blocks120
Number of blending blocks64
Effective memory frequency8000 (4×2000) MHz
Memory typeGDDR5
Memory bus256-bit
Memory8 GB
Memory Bandwidth256 GB/s
Compute Performance (FP32)about 6.5 teraflops
Theoretical maximum fill rate96 gigapixels/s
Theoretical texture sampling rate181 gigatexels/s
TirePCI Express 3.0
ConnectorsOne Dual Link DVI, one HDMI and three DisplayPorts
Energy consumptionup to 150 W
Additional foodOne 8-pin connector
Number of slots occupied in the system case2
Recommended price$379-449 (USA), 34,990 (Russia)

The GeForce GTX 1070 video card also received a logical name, similar to the same solution from the previous GeForce series. It differs from its direct predecessor GeForce GTX 970 only in the changed generation number. The new product in the company's current line is one step lower than the current top-end solution GeForce GTX 1080, which has become the temporary flagship of the new series until the release of solutions on GPUs of even greater power.

Recommended prices for Nvidia's new top-end graphics card are $379 and $449 for regular Nvidia partner versions and the special Founders Edition, respectively. Compared to the top model, this is a very good price considering that the GTX 1070 is about 25% behind it at worst. And at the time of announcement and release, the GTX 1070 becomes the best performance solution in its class. Like the GeForce GTX 1080, the GTX 1070 has no direct competitors from AMD, and can only be compared with the Radeon R9 390X and Fury.

The GP104 graphics processor in the GeForce GTX 1070 modification decided to leave a full 256-bit memory bus, although they did not use the new type of GDDR5X memory, but the very fast GDDR5, which operates at a high effective frequency of 8 GHz. The amount of memory installed on a video card with such a bus can be 4 or 8 GB, and to ensure maximum performance of the new solution in high settings and rendering resolutions, the GeForce GTX 1070 video card model was also equipped with 8 GB of video memory, like its older sister. This volume is enough to run any 3D applications with maximum quality settings for several years.

Special Edition GeForce GTX 1070 Founders Edition

When the GeForce GTX 1080 was announced in early May, a special edition of the video card called Founders Edition was announced, which had a higher price compared to regular video cards from the company's partners. The same applies to the new product. In this article we will again talk about a special edition of the GeForce GTX 1070 video card called Founders Edition. As with the older model, Nvidia decided to release this version of the manufacturer's reference video card at a higher price. They argue that many gamers and enthusiasts who buy high-end, expensive graphics cards want a product with an appropriately "premium" look and feel.

Accordingly, it is for such users that the GeForce GTX 1070 Founders Edition video card will be released to the market, which is designed and made by Nvidia engineers from premium materials and components, such as the GeForce GTX 1070 Founders Edition aluminum cover, as well as a low-profile back plate covering the reverse side of the printed circuit board and quite popular among enthusiasts.

As you can see from the photographs of the board, the GeForce GTX 1070 Founders Edition inherited exactly the same industrial design inherent in the reference GeForce GTX 1080 Founders Edition. Both models use a radial fan that exhausts heated air outside, which is very useful in both small cases and multi-chip SLI configurations with limited physical space. Blowing heated air outside instead of circulating it inside the case reduces thermal stress, improves overclocking results and extends the life of system components.

Under the cover of the GeForce GTX 1070 reference cooling system is a specially shaped aluminum radiator with three built-in copper heat pipes that remove heat from the GPU itself. The heat removed by the heat pipes is then dissipated by an aluminum heatsink. Well, the low-profile metal plate on the back of the board is also designed to provide better temperature characteristics. It also has a retractable section for better air movement between multiple graphics cards in SLI configurations.

As for the board's power system, the GeForce GTX 1070 Founders Edition has a four-phase power system optimized for a stable energy supply. Nvidia claims that the use of special components in the GTX 1070 Founders Edition has improved power efficiency, stability and reliability compared to the GeForce GTX 970, providing better overclocking performance. In the company's own tests, GeForce GTX 1070 GPUs easily exceeded 1.9 GHz, which is close to the results of the older GTX 1080 model.

The Nvidia GeForce GTX 1070 graphics card will be available in retail stores starting June 10. The recommended prices for the GeForce GTX 1070 Founders Edition and partner solutions differ, and this is the most important question for this special edition. If Nvidia's partners sell their GeForce GTX 1070 video cards starting at $379 (in the US market), then the Founders Edition of Nvidia's reference design will cost $449. Are there many enthusiasts willing to overpay for, frankly speaking, the dubious advantages of the reference version? Time will tell, but we think that the reference board is more interesting as an option available for purchase at the very beginning of sales, and later the point of purchasing it (and even at a higher price!) is already reduced to zero.

It remains to add that the printed circuit board of the reference GeForce GTX 1070 is similar to that of the older video card and both of them differ from the design of the company’s previous boards. The typical power consumption for the new product is 150 W, which is almost 20% less than the value for the GTX 1080 and is close to the power consumption of the previous generation video card GeForce GTX 970. The Nvidia reference board has a familiar set of connectors for connecting image output devices: one Dual-Link DVI , one HDMI and three DisplayPort. Moreover, there is support for new versions of HDMI and DisplayPort, which we wrote about above in the review of the GTX 1080 model.

Architectural changes

The GeForce GTX 1070 video card is based on the GP104 chip, the first-born of the new generation of Nvidia's Pascal graphics architecture. This architecture is based on solutions developed at Maxwell, but it also has some functional differences, which we wrote about in detail above - in the part dedicated to the top-end GeForce GTX 1080 video card.

The main change in the new architecture was the technological process by which all new GPUs will be made. The use of the 16 nm FinFET process in the production of GP104 made it possible to significantly increase the complexity of the chip while maintaining a relatively low area and cost, and the first Pascal architecture chip has a noticeably larger number of execution units, including those providing new functionality, compared to Maxwell chips of similar positioning.

The design of the GP104 video chip is similar to similar Maxwell architecture solutions, and you can find detailed information about the design of modern GPUs in our reviews of previous Nvidia solutions. Like previous GPUs, the new architecture chips will have different configurations of Graphics Processing Cluster (GPC), Streaming Multiprocessor (SM) and memory controllers, and the GeForce GTX 1070 has already undergone some changes - part of the chip has been locked and inactive ( highlighted in grey):

Although the GP104 GPU includes four GPC clusters and 20 SM multiprocessors, in the version for the GeForce GTX 1070 it received a stripped-down modification with one GPC cluster disabled by hardware. Since each GPC cluster has a dedicated rasterization engine and includes five SM multiprocessors, and each multiprocessor consists of 128 CUDA cores and eight TMUs, this version of GP104 has 1920 CUDA cores and 120 TMUs active out of 2560 stream processors and 160 physically available texture blocks.

The GPU on which the GeForce GTX 1070 is based contains eight 32-bit memory controllers, giving a total 256-bit memory bus - exactly the same as the older GTX 1080 model. The memory subsystem has not been cut down in order to provide sufficiently high bandwidth memory with the condition of using GDDR5 memory in the GeForce GTX 1070. Each of the memory controllers is associated with eight ROP blocks and 256 KB of second-level cache, so the GP104 chip in this modification also contains 64 ROP blocks and 2048 KB of second-level cache level.

Thanks to architectural optimizations and a new process technology, the GP104 GPU is the most power-efficient GPU to date. Nvidia engineers were able to increase the clock speed more than they expected when switching to a new process technology, for which they had to work hard to carefully check and optimize all the bottlenecks of previous solutions that did not allow them to work at higher frequencies. Accordingly, the GeForce GTX 1070 also operates at a very high frequency, more than 40% higher than the reference value for the GeForce GTX 970.

Since the GeForce GTX 1070 model is, in essence, just a slightly less powerful GTX 1080 with GDDR5 memory, it supports absolutely all the technologies we described in the previous section. To learn more about the Pascal architecture, as well as the technologies it supports, such as improved video output and processing units, Async Compute support, Simultaneous Multi-Projection technology, changes to SLI multi-chip rendering, and the new Fast Sync type, it's worth checking out with a section on GTX 1080.

High-performance GDDR5 memory and its efficient use

We wrote above about changes in the memory subsystem of the GP104 graphics processor, on which the GeForce GTX 1080 and GTX 1070 are based - the memory controllers included in this GPU support both the new type of GDDR5X video memory, which is described in detail in the GTX 1080 review, and and the good old GDDR5 memory, known to us for several years now.

In order not to lose too much in memory bandwidth in the younger GTX 1070 model compared to the older GTX 1080, it left all eight 32-bit memory controllers active, giving it a full 256-bit common video memory interface. In addition, the video card was equipped with the highest speed GDDR5 memory that was available on the market - with an effective operating frequency of 8 GHz. All this provided a memory bandwidth of 256 GB/s, in contrast to 320 GB/s for the older solution - the computing capabilities were also reduced by approximately the same amount, so the balance was maintained.

Don't forget that while peak theoretical throughput is important for GPU performance, you also need to pay attention to how efficiently it's being used. During the rendering process, many different bottlenecks can limit overall performance, preventing all available bandwidth from being used. To minimize these bottlenecks, GPUs use special lossless compression to improve the efficiency of data read and write operations.

The Pascal architecture has already introduced the fourth generation of delta compression of buffer information, allowing the GPU to more efficiently use the available capabilities of the video memory bus. The memory subsystem in the GeForce GTX 1070 and GTX 1080 uses improved old and several new lossless data compression techniques designed to reduce bandwidth requirements. This reduces the amount of data written to memory, improves L2 cache efficiency, and reduces the amount of data sent between different points on the GPU, such as the TMU and framebuffer.

GPU Boost 3.0 and overclocking features

Most Nvidia partners have already announced factory overclocked solutions based on the GeForce GTX 1080 and GTX 1070. And many video card manufacturers are also creating special overclocking utilities that allow you to use the new functionality of GPU Boost 3.0 technology. One example of such utilities is EVGA Precision XOC, which includes an automatic scanner to determine the voltage-frequency curve - in this mode, for each voltage value, by running a stability test, a stable frequency is found at which the GPU provides increased performance. However, this curve can be changed manually.

We know GPU Boost technology well from previous Nvidia video cards. In their GPUs, they use this hardware feature, designed to increase the operating clock speed of the GPU in modes when it has not yet reached the limits of power consumption and heat dissipation. In Pascal GPUs, this algorithm has undergone several changes, the main of which has been a finer setting of turbo frequencies, depending on voltage.

If previously the difference between the base frequency and the turbo frequency was fixed, then in GPU Boost 3.0 it became possible to set turbo frequency offsets for each voltage separately. Now the turbo frequency can be set for each of the individual voltage values, which allows you to fully squeeze out all the overclocking capabilities from the GPU. We wrote about this feature in detail in our GeForce GTX 1080 review, and you can use EVGA Precision XOC and MSI Afterburner utilities to do this.

Since some details have changed in the overclocking methodology with the release of video cards supporting GPU Boost 3.0, Nvidia had to make additional explanations in the instructions for overclocking new products. There are different overclocking techniques with different variables that affect the final result. A particular method may be better suited for each specific system, but the basics are always roughly the same.

Many overclockers use the Unigine Heaven 4.0 benchmark to check system stability, which perfectly loads the GPU with work, has flexible settings and can be launched in windowed mode along with an overclocking and monitoring utility window nearby, like EVGA Precision or MSI Afterburner. However, such a check is enough only for initial estimates, and to firmly confirm the stability of overclocking, it needs to be checked in several gaming applications, because different games require different loads on different functional blocks of the GPU: mathematical, texture, geometric. The Heaven 4.0 benchmark is also convenient for overclocking tasks because it has a looped mode of operation, in which it is convenient to change overclocking settings and there is a benchmark for assessing the speed increase.

Nvidia recommends running Heaven 4.0 and EVGA Precision XOC together when overclocking the new GeForce GTX 1080 and GTX 1070 video cards. First, it is advisable to immediately increase the fan speed. And for serious overclocking, you can immediately set the speed value to 100%, which will make the video card very loud, but will cool the GPU and other components of the video card as much as possible, reducing the temperature to the lowest possible level, preventing throttling (a decrease in frequencies due to an increase in the GPU temperature above a certain value ).

Next, you need to set the Power Target to maximum as well. This setting will provide the maximum amount of power to the GPU, increasing the power consumption level and GPU Temp Target. For some purposes, the second value can be separated from the Power Target change, and then these settings can be adjusted individually - to achieve less heating of the video chip, for example.

The next step is to increase the value of the video chip frequency increase (GPU Clock Offset) - it means how much higher the turbo frequency will be during operation. This value increases the frequency for all voltages and results in better performance. As always, when overclocking, you need to check for stability by increasing the GPU frequency in small steps - from 10 MHz to 50 MHz per step before you notice stuttering, driver or application errors, or even visual artifacts. When this limit is reached, you should reduce the frequency value one step down and once again check the stability and performance during overclocking.

In addition to the GPU frequency, you can also increase the video memory frequency (Memory Clock Offset), which is especially important in the case of the GeForce GTX 1070, equipped with GDDR5 memory, which usually overclocks well. The process in the case of memory operating frequency exactly repeats what is done when finding a stable GPU frequency, the only difference is that the steps can be made larger - adding 50-100 MHz at once to the base frequency.

In addition to the steps described above, you can also increase the voltage limit (Overvoltage), because higher GPU frequencies are often achieved at higher voltages when unstable parts of the GPU receive additional power. True, a potential disadvantage of increasing this value is the possibility of damage to the video chip and accelerated failure, so you need to use increasing the voltage with extreme caution.

Overclocking enthusiasts use slightly different techniques, changing parameters in different orders. For example, some overclockers share experiments in finding a stable frequency of the GPU and memory so that they do not interfere with each other, and then test the combined overclocking of both the video chip and memory chips, but these are insignificant details of an individual approach.

Judging by the opinions in forums and comments to articles, some users did not like the new operating algorithm of GPU Boost 3.0, when the GPU frequency first rises very high, often above the turbo frequency, but then, under the influence of rising GPU temperatures or increased power consumption above the set limit, it can drop to significantly lower values. This is just the specifics of the updated algorithm; you need to get used to the new behavior of the dynamically changing GPU frequency, but it does not have any negative consequences.

The GeForce GTX 1070 video card became the second model after the GTX 1080 in Nvidia's new line based on Pascal family of graphics processors. The new 16 nm FinFET process technology and architecture optimizations allowed the presented video card to achieve high clock speeds, which is also helped by the new generation of GPU Boost technology. Even despite the reduced number of functional units in the form of stream processors and texture modules, their number remains sufficient for the GTX 1070 to become the most profitable and energy-efficient solution.

Installing GDDR5 memory on the younger of the pair of released Nvidia video card models on the GP104 chip, in contrast to the new GDDR5X type that distinguishes the GTX 1080, does not prevent it from achieving high performance indicators. Firstly, Nvidia decided not to cut the memory bus of the GeForce GTX 1070 model, and secondly, it installed the fastest GDDR5 memory with an effective frequency of 8 GHz, which is only slightly lower than the 10 GHz of the GDDR5X used in the older model. Taking into account the improved delta compression algorithms, the effective memory bandwidth of the GPU has become higher than that of the similar model of the previous generation GeForce GTX 970.

The GeForce GTX 1070 is good because it offers very high performance and support for new features and algorithms at a significantly lower price compared to the older model announced a little earlier. If only a few enthusiasts can afford to purchase a GTX 1080 for 55,000, then a much larger circle of potential buyers will be able to pay 35,000 for only a quarter less productive solution with exactly the same capabilities. It was the combination of a relatively low price and high performance that made the GeForce GTX 1070 perhaps the most profitable purchase at the time of its release.

Graphics accelerator GeForce GTX 1060

ParameterMeaning
Chip code nameGP106
Production technology16 nm FinFET
Number of transistors4.4 billion
Core area200 mm²
ArchitectureUnified, with an array of common processors for stream processing of numerous types of data: vertices, pixels, etc.
DirectX hardware supportDirectX 12, with support for Feature Level 12_1
Memory bus192-bit: six independent 32-bit memory controllers supporting GDDR5 memory
GPU frequency1506 (1708) MHz
Computing blocks10 streaming multiprocessors, including 1280 scalar ALUs for floating point calculations within the framework of the IEEE 754-2008 standard;
Texturing blocks80 texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Raster Operation Blocks (ROPs)6 wide ROP blocks (48 pixels) with support for various anti-aliasing modes, including programmable and with FP16 or FP32 frame buffer format. The blocks consist of an array of configurable ALUs and are responsible for depth generation and comparison, multisampling and blending
Monitor supportIntegrated support for up to four monitors connected via Dual Link DVI, HDMI 2.0b and DisplayPort 1.2 (1.3/1.4 Ready) interfaces

GeForce GTX 1060 reference graphics card specifications
ParameterMeaning
Core frequency1506 (1708) MHz
Number of universal processors1280
Number of texture blocks80
Number of blending blocks48
Effective memory frequency8000 (4×2000) MHz
Memory typeGDDR5
Memory bus192-bit
Memory6 GB
Memory Bandwidth192 GB/s
Compute Performance (FP32)about 4 teraflops
Theoretical maximum fill rate72 gigapixels/s
Theoretical texture sampling rate121 gigatexels/s
TirePCI Express 3.0
ConnectorsOne Dual Link DVI, one HDMI and three DisplayPorts
Typical Power Consumption120 W
Additional foodOne 6-pin connector
Number of slots occupied in the system case2
Recommended price$249 ($299) in the US and 18,990 in Russia

The GeForce GTX 1060 video card also received a name similar to the same solution from the previous GeForce series, differing from the name of its direct predecessor GeForce GTX 960 only in the changed first digit of the generation. The new product in the company's current line is one step lower than the previously released GeForce GTX 1070 solution, which is average in speed in the new series.

Recommended prices for Nvidia's new video card are $249 and $299 for regular versions of the company's partners and for the special Founder's Edition, respectively. Compared to the two older models, this is a very favorable price, since the new GTX 1060 model, although inferior to top-end boards, is not nearly as much cheaper than them. At the time of its announcement, the new product definitely became the best-performing solution in its class and one of the most advantageous offers in this price range.

This model of the Pascal family video card from Nvidia came out to counter the fresh solution of the competing company AMD, which a little earlier launched the Radeon RX 480 on the market. You can compare the new Nvidia product with this video card, although not entirely directly, since they are still quite noticeably different in price . GeForce GTX 1060 is more expensive ($249-299 versus $199-229), but it is also clearly faster than its competitor.

The GP106 graphics processor has a 192-bit memory bus, so the amount of memory installed on a video card with such a bus can be 3 or 6 GB. A smaller value in modern conditions is frankly not enough, and many game projects, even in Full HD resolution, will run into a lack of video memory, which will seriously affect the smoothness of rendering. To ensure maximum performance of the new solution in high settings, the GeForce GTX 1060 video card model was equipped with 6 GB of video memory, which is enough to run any 3D applications with any quality settings. Moreover, today there is simply no difference between 6 and 8 GB, and such a solution will save some money.

The typical power consumption for the new product is 120 W, which is 20% less than the value for the GTX 1070 and equal to the power consumption of the previous generation GeForce GTX 960 video card, which has much lower performance and capabilities. The reference board has the usual set of connectors for connecting image output devices: one Dual-Link DVI, one HDMI and three DisplayPort. Moreover, there is support for new versions of HDMI and DisplayPort, which we wrote about in the review of the GTX 1080 model.

The length of the GeForce GTX 1060 reference board is 9.8 inches (25 cm), and among the differences from older versions, we separately note that the GeForce GTX 1060 does not support the SLI multi-chip rendering configuration, and does not have a special connector for this. Since the board consumes less energy than older models, one 6-pin PCI-E external power connector was installed on the board for additional power.

GeForce GTX 1060 video cards have appeared on the market since the day of the announcement in the form of products from the company's partners: Asus, EVGA, Gainward, Gigabyte, Innovision 3D, MSI, Palit, Zotac. A special edition of the GeForce GTX 1060 Founder’s Edition, produced by Nvidia itself, will also be released in limited quantities, which will be sold at a price of $299 exclusively on the Nvidia website and will not be officially presented in Russia. The Founder's Edition features high-quality materials and components, including an aluminum chassis, an efficient cooling system, low-impedance power circuits and custom-designed voltage regulators.

Architectural changes

The GeForce GTX 1060 video card is based on a completely new graphics processor model GP106, which is functionally no different from the firstborn of the Pascal architecture in the form of the GP104 chip, on which the GeForce GTX 1080 and GTX 1070 models described above are based. This architecture is based on solutions developed back in Maxwell, but it also has some functional differences, which we wrote about in detail earlier.

The GP106 video chip is similar in design to the top-end Pascal chip and similar Maxwell architecture solutions, and detailed information about the design of modern GPUs can be found in our reviews of previous Nvidia solutions. Like previous GPUs, the new architecture chips have different configurations of Graphics Processing Cluster (GPC) computing clusters, Streaming Multiprocessors (SM) and memory controllers:

The GP106 graphics processor includes two GPC clusters consisting of 10 streaming multiprocessors (Streaming Multiprocessor - SM), that is, exactly half of what is available in the GP104. As in the older GPU, each multiprocessor contains 128 computational cores, 8 TMU texture units, 256 KB of register memory, 96 KB of shared memory and 48 KB of first-level cache. As a result, the GeForce GTX 1060 contains a total of 1280 processing cores and 80 texture units - half as many as the GTX 1080.

But the memory subsystem of the GeForce GTX 1060 was not halved compared to the top solution; it contains six 32-bit memory controllers, giving a final 192-bit memory bus. With an effective frequency of GDDR5 video memory for the GeForce GTX 1060 equal to 8 GHz, the bandwidth reaches 192 GB/s, which is quite good for a solution in this price segment, especially considering the high efficiency of its use in Pascal. Each memory controller has eight ROP blocks and 256 KB of L2 cache associated with it, so in total the full version of the GP106 GPU contains 48 ROP blocks and 1536 KB of L2 cache.

To reduce memory bandwidth requirements and make more efficient use of the Pascal architecture, on-chip lossless compression has been further improved, capable of compressing data in buffers for gains in efficiency and performance. In particular, new delta compression methods with a ratio of 4:1 and 8:1 were added to the chips of the new family, providing an additional 20% in bandwidth efficiency compared to previous solutions of the Maxwell family.

The base frequency of the new GPU is 1506 MHz - the frequency should not fall below this mark in principle. The typical turbo frequency (Boost Clock) is much higher and is equal to 1708 MHz - this is the average value of the real frequency at which the GeForce GTX 1060 graphics chip operates in a wide range of games and 3D applications. The actual Boost frequency depends on the game and the testing conditions.

Like the rest of the Pascal family, the GeForce GTX 1060 not only operates at a high clock speed, providing high performance, but also has a decent overclocking headroom. The first experiments indicate the possibility of achieving frequencies of about 2 GHz. It is not surprising that the company’s partners are also preparing factory overclocked versions of the GTX 1060 video card.

So, the main change in the new architecture was the 16 nm FinFET technological process, the use of which in the production of GP106 made it possible to significantly increase the complexity of the chip while maintaining a relatively low area of ​​200 mm², therefore this Pascal architecture chip has a noticeably larger number of execution units compared to a Maxwell chip of similar positioning , produced using the 28 nm process technology.

If the GM206 (GTX 960) with an area of ​​227 mm² had 3 billion transistors and 1024 ALUs, 64 TMUs, 32 ROPs and a 128-bit bus, then the new GPU contained 4.4 billion transistors, 1280 ALUs in 200 mm². 80 TMUs and 48 ROPs with a 192-bit bus. Moreover, at almost one and a half times higher frequency: 1506 (1708) versus 1126 (1178) MHz. And this is with the same power consumption of 120 W! As a result, the GP106 GPU became one of the most energy efficient GPUs, along with the GP104.

New Nvidia technologies

One of the most interesting technologies of the company, which is supported by the GeForce GTX 1060 and other solutions of the Pascal family, is the technology Nvidia Simultaneous Multi-Projection. We already wrote about this technology in our GeForce GTX 1080 review; it allows you to use several new techniques to optimize rendering. In particular, to simultaneously project a VR image for both eyes at once, greatly increasing the efficiency of using the GPU in virtual reality.

To support SMP, all Pascal family GPUs have a special engine, which is located in the PolyMorph Engine at the end of the geometry pipeline before the rasterization unit. With its help, the GPU can simultaneously project a geometric primitive onto several projections from a single point, and these projections can be stereo (i.e., up to 16 or 32 projections are supported simultaneously). This capability allows Pascal GPUs to accurately reproduce curved surfaces for VR rendering, as well as display correctly on multi-monitor systems.

It is important that Simultaneous Multi-Projection technology is already being integrated into popular game engines (Unreal Engine and Unity) and games, and to date technology support has been announced for more than 30 games in development, including such well-known projects as Unreal Tournament , Poolnation VR, Everest VR, Obduction, Adr1ft and Raw Data. Interestingly, although Unreal Tournament is not a VR game, it uses SMP to achieve higher quality images and improve performance.

Another long-awaited technology is a powerful tool for creating screenshots in games. Nvidia Ansel. This tool allows you to create unusual and very high-quality screenshots from games, with previously unavailable features, saving them in very high resolution and complementing them with various effects, and share your creations. Ansel allows you to literally construct a screenshot the way the artist wants it, allowing you to install a camera with any parameters anywhere in the scene, apply powerful post-filters to the image, or even take a 360-degree photo for viewing in a virtual reality helmet.

Nvidia has standardized on integrating the Ansel UI into games, and it's as easy as adding a few lines of code. There is no need to wait for this feature to appear in games; you can evaluate Ansel’s abilities right now in the game Mirror’s Edge: Catalyst, and a little later it will become available in Witcher 3: Wild Hunt. In addition, there are many game projects in development with Ansel support, including games such as Fortnite, Paragon and Unreal Tournament, Obduction, The Witness, Lawbreakers, Tom Clancy's The Division, No Man's Sky and others.

Also the new GeForce GTX 1060 GPU supports the toolkit Nvidia VRWorks, which helps developers create impressive virtual reality projects. This package includes many utilities and tools for developers, including VRWorks Audio, which allows you to perform very accurate calculations of sound wave reflections from objects in the scene using ray tracing on the GPU. The package also includes integration into VR and PhysX physical effects to ensure physically correct behavior of objects in the scene.

One of the most exciting VR games to benefit from VRWorks is VR Funhouse, Nvidia's own virtual reality game, which is available for free on Valve's Steam service. This game is based on the Unreal Engine 4 (Epic Games), and it runs on GeForce GTX 1080, 1070 and 1060 video cards in conjunction with HTC Vive VR headsets. Moreover, the source code of this game will be publicly available, which will allow other developers to use ready-made ideas and code in their VR attractions. Take our word for it, this is one of the most impressive demonstrations of the power of virtual reality.

Thanks also to SMP and VRWorks technologies, the use of the GeForce GTX 1060 graphics processor in VR applications provides performance quite sufficient for entry-level virtual reality, and the GPU in question meets the minimum required hardware level, including for SteamVR, becoming one of the most successful acquisitions for use in systems with official VR support.

Since the GeForce GTX 1060 model is based on the GP106 chip, which is in no way inferior in capabilities to the GP104 graphics processor, which became the basis for older modifications, it supports absolutely all the technologies we described above.

The GeForce GTX 1060 video card has become the third model in Nvidia's new line, based on Pascal family of graphics processors. The new 16 nm FinFET technological process and architecture optimizations allowed all new video cards to achieve high clock speeds and accommodate a greater number of functional units in the GPU in the form of stream processors, texture modules and others, compared to previous generation video chips. That is why the GTX 1060 model has become the most profitable and energy-efficient solution in its class and in general.

It is especially important that the GeForce GTX 1060 offers fairly high performance and support for new features and algorithms at a significantly lower price compared to older GP104 solutions. The new model's GP106 graphics chip delivers class-leading performance and power efficiency. The GeForce GTX 1060 model is specially designed and is perfect for all modern games at high and maximum graphics settings at a resolution of 1920x1080 and even with full-screen anti-aliasing enabled using various methods (FXAA, MFAA or MSAA).

And for those who want even better performance with ultra-high-resolution displays, Nvidia has the top-end GeForce GTX 1070 and GTX 1080 graphics cards, which are also very good in performance and energy efficiency. And yet, the combination of low price and sufficient performance sets the GeForce GTX 1060 apart from older solutions. Compared to the competing Radeon RX 480, Nvidia's solution is slightly faster with less complexity and GPU footprint, and has significantly better power efficiency. True, it sells a little more expensive, so each video card has its own niche.