general characteristics 64-bit processors

The advantage of 64-bit processors over their 32-bit counterparts is the expansion of the address space, an increase in the number of bits and an increase in the number of registers general purpose.

The extended 64-bit address space theoretically allows the processor to operate with 16 exabytes (2 64) of physical memory in a flat organization model. And although modern 64-bit processors in practice can provide access to only 1 terabyte (2 40) of memory, this figure still significantly exceeds the capabilities of 32-bit addressing. Increasing the amount of available memory, in turn, makes it possible to eliminate or greatly reduce the number of extremely slow operations for swapping data from disk.

Increasing the number and width of registers will allow the processor to simultaneously work with large areas of memory, work more efficiently with variables and arrays, and pass function arguments in registers instead of using the stack for this purpose.

It is worth remembering that to get a real performance increase on a 64-bit processor, it is necessary to transpose the program using a 64-bit version of the compiler, taking into account the change in the data model (new type dimensions). Running an application that is not adapted for a 64-bit platform, on the contrary, can, depending on the architecture of the processor used, lead to significant losses in performance.

The greatest increase in performance from the transition to a 64-bit platform will be received by applications that manipulate large amounts of data - these are database management systems, programs for working with digital multimedia messages, and applied scientific applications. The performance gain for software of this class can be hundreds of percent. .

A 64-bit extension of the classic 32-bit IA32 architecture was proposed in 2002 by AMD (originally called x86-64, now called AMD64) in the K8 family of processors. After some time, Intel proposed its own designation - EM64T (Extended Memory 64-bit Technology). But, regardless of the name, the essence of the new architecture is the same: the width of the main internal registers of 64-bit processors has doubled (from 32 to 64 bits), and 32-bit x86 code instructions have received 64-bit analogues. In addition, by expanding the address bus width, the amount of memory addressable by the processor has increased significantly.

Features of 64-bit MP architecture


5.1. Itanium 2 Intel

The processor was developed from scratch, and in parallel in two versions at once: by engineers from Intel and Hewlett-Packard. However, both chips were, naturally, based on the same ideas, since they were created jointly, and both were supposed to become the founders of the same family. The cementing composition was, naturally, a single ideology that replaced CISC - EPIC (Explicitly Parallel Instruction Computing), and a new architecture - IA-64, which includes a set of instructions, a description of registers, and other similar things. However, architecture is a changeable thing; just remember how CISC processors such as 8086 and i80486, both based on 80x86, differ from each other.
It's the same with Merced and McKinley, Itanium and Itanium 2 - both built on the same ideology, but in different varieties of architecture. At one time, the same story, in general, happened with the Pentium and Pentium Pro. However, both of them had common features, and these too, and EPIC is “responsible” for this. First of all, we are talking about full-scale superscalarity, that is, the ability to execute several instructions simultaneously. Why, naturally, the processor contains executive modules - for operations with integers, floating point numbers, etc.
Unlike the Pentium and its successors, which understand the code on their own, EPIC processors rely heavily on the compiler, which must itself analyze the code to find the optimal places to parallelize its execution, and provide the processor with this information. That is why it is used “explicitly” - the processor itself should not try to understand what can be executed in parallel and what cannot, etc. - the compiler will explain all this to him in advance. Plus, powerful mechanisms for predicting transitions, pre-executing pieces of code, pre-loading data, and similar things - the load on execution units should be distributed as evenly as possible.
The issue with registers has been radically resolved, the number of which has been increased several times: in Itanium, their number is 128 for general purpose (Fig. 1), 128 for storing floating point numbers, 8 transition registers, and 64 responsible for ra -bot of prediction mechanisms. Everything is obvious here - this number of registers, and even really 64-bit ones, is enough to store any required numbers for any reasonable number of executive modules. Itanium, the first member of the family, has only five such registers - two integer, two for memory operations and four for floating point operations. Physical memory is addressed by 44-bit numbers, which actually limits its volume to “only” 17.6 Terabytes; floating point units work with numbers in 82-bit representation.

Intel abandoned the idea of ​​implementing a 32-bit 80x86 core in hardware, considering it too inefficient use of the chip area. So, in order to be able to execute Itanium 80x86 code, a translation system was created that converts 80x86 code to IA-64 on the fly.
Obviously, all other things being equal, the performance of such a solution will be lower than a pure x86 operating at the same frequency. However, no one expected high-speed execution of x86 programs from Itanium - support for this architecture is more likely to be a cost of the transition period. However, the fact remains: this family is not suitable for solving 32-bit problems. However, it is unlikely that anyone will use Itanium for such purposes if they have full-fledged 64-bit software.
In addition, the Itanium itself was largely a pilot project, like the Pentium Pro, so the processor should generally be viewed more as a demonstration of the architecture's capabilities. A characteristic touch is that the chipset for Itanium, 460GX, supports only PC100 SDRAM as memory, this says something about the speed with which the processor is capable of processing data. On the other hand, however, to some extent the not very fast interface with RAM is compensated by a very large L3 cache - 2 or 4 MB, operating at the full processor frequency (733 or 800 MHz) with a bandwidth of up to 12.8 GB /With.
Another task for Itanium was to solve the issue with compilers - after all, EPIC processors, as already mentioned, are very dependent on them. Unlike compilers for 80x86 processors, which had almost no effect on their performance, compilers here are full partners of the processor - after all, they supply it with information that is extremely necessary for work, and the quality of it will depend on - determine the speed of execution of this program by the processor.
Itanium 2 is already a much more commercially interesting product. Created by Hewlett-Packard, which cut its teeth on creating 64-bit PA-RISC series processors, the chip turned out to be much more advanced. With slightly less L3 (1.5 or 3 MB) and slightly more high frequency, 900 MHz or 1 GHz, it provides one and a half to two times greater performance on the same tasks as Itanium. He is, in fact, the first representative of the IA-64 architecture.
Further parallelization is planned in the most fashionable way today: the processor will have to switch to two physical cores, which will almost double the performance at a fairly reasonable price - at least the result will be much cheaper than if the same number of executives modules, registers, etc., tried to achieve on a single chip.

5.2. Athlon 64 AMD

First of all, we note that the Athlon 64 processor is exactly the 64-bit desktop processor that AMD originally planned to release. Subsequently, in light of the release of high-speed Pentium 4 processors, the appearance of an 800-MHz bus and Hyper-Threading technology in them, AMD urgently decided to target the single-processor Opteron at the desktop market, giving it the name Athlon 64 FX. However, the Athlon 64 FX, due to its server origin, turned out to be expensive and not widely used. It is the Athlon 64 that should truly advance the AMD64 architecture for mass use.
Below is table 1 with the specifications of the 64-bit MP Athlon 64 3200+, Athlon 65 FX-51 and Athlon XP 3200+:

Table 1

* Note that the memory in the Athlon 64 and Athlon 64 FX is clocked relative to the core frequency, so the actual memory frequencies in this case are 129.4, 157.1 and 200 MHz.
In fact, Athlon 64 differs from its older brother Athlon 64 FX, in addition to the shape and size of the case, only in the memory controller. Although, at the same time, both processors are made from the same crystals. The memory controller in the Athlon 64 is single-channel, and this is both its weakness and its advantage compared to the Athlon 64 FX. The disadvantage of the single-channel memory controller in the Athlon 64 is obvious: it is lower theoretical bandwidth.
Considering that the Athlon 64 is capable of working with DDR400 memory, the maximum throughput of the memory controller built into the CPU is 3.2 GB per second. This is two times less than the same characteristic of the Athlon 64 FX. The advantage of the Athlon 64 memory controller is that, unlike the Athlon 64 FX controller, it supports conventional non-registered memory modules. Such modules are cheaper than register modules, have more aggressive timings and work faster, even with the same settings as register modules. That is, at a lower bandwidth provided by the Athlon 64 memory controller, the memory subsystem that uses it has lower latency, which we will show below.
AMD Athlon 64 appearance similar to Opteron and Athlon 64 FX.
The differences are found only in the markings and in the smaller number of legs on the reverse side, since Athlon 64 processors are installed in motherboards with Socket 754 and are not compatible with Socket 940 boards designed for CPUs of the Athlon 64 FX and Opteron families.
In addition to the features listed above, the new Athlon 64 processors have one more feature. These processors support Cool’n’Quiet technology, which actually came to them from mobile MP versions. In essence, Cool’n’Quiet is a kind of power-saving technology PowerNow!, which has long been used in mobile MP from AMD. But now this technology has finally come to the company's desktop processors. Cool’n’Quiet support is another advantage of the Athlon 64 over the Athlon 64 FX/Opteron, which do not yet have any similar technologies. AMD has long been paying close attention to reducing the heat dissipation level of its desktop processors.
It must be said that the company has long been superior to Intel in this: older models AMD processors at maximum load they generate significantly less heat than older Pentium 4 models. Also, processors use technologies that reduce heat generation even at low loads. More MP families
Athlon XP systems had the ability to go into “standby mode” (Halt/Stop Grant) when executing the HALT command, which resulted in a decrease in processor temperature when its load was below 100%. However, now AMD has gone even further. The new Athlon 64 processors feature an even more intelligent heat reduction scheme.
In addition to Halt/Stop Grant states, the Athlon 64 can reset its clock speed and supply voltage to further reduce heat dissipation. When using this technology, the MP clock frequency is controlled by the processor driver, which resets or increases it based on data about its load. Indeed, if the processor fully copes with the work assigned to it and its load is much less than 100%, then it is possible to reduce its clock frequency without harming the functioning of the system as a whole: this will not affect the operation of the system in any way. For example, when idle, working in office applications, watching videos, defragmenting disks, and similar tasks, the processor's power is not fully used. It is in such cases that the processor driver switches the Athlon 64 to a lower clock speed. When full performance is required from the processor, for example, in games, when solving computational problems, in data encoding tasks, etc., the processor frequency rises to its nominal value. This is exactly how Cool’n’Quiet technology works.
In practice it looks like this. Under normal conditions, with minimal MP load, the processor driver resets the frequency of the Athlon 64 3200+ from the standard 2 GHz to 800 MHz. The processor supply voltage is then reduced to 1.3V. As you can see, the clock frequency is reduced by reducing the processor multiplier to 4x. This, by the way, is also due to the fact that Athlon 64 3200+ processors are supplied with an unfixed multiplier. In this mode, the processor continues to operate until its load exceeds 70-80%. In particular, we were able to run disk defragmentation, playback of mp3 files (audio files) and viewing MPEG-4 (video files) at the same time, while the processor continued to operate at 800 MHz.
When the load on the Athlon 64 processor at a frequency of 800 MHz exceeds the permissible limit, the driver switches the MP to the next state, in which the frequency of the Athlon 64 3200+ is 1.8 GHz and the supply voltage is 1.4V. This is achieved again by reducing the multiplier, this time to 9x. And only if in this case the processor load again turns out to be excessively high, the driver switches the MP to normal mode: frequency 2 GHz, supply voltage – 1.5V.
Note that in modes with reduced power and frequency, the heat dissipation of the Athlon 64 3200+ processor drops sharply. For comparison, we present Table 4 with the heat dissipation of this processor in the main modes.

Thus, the use of Cool’n’Quiet technology allows you to significantly reduce the processor temperature not only during idle moments, but also during a number of tasks that do not require maximum performance from the MP. What is important is that the performance of the MP in tasks that require processor resources does not decrease at all. As a result, when using cooling systems with variable speed fans, the use of Cool’n’Quiet technology can significantly reduce the noise level.

Currently, the main share of the general-purpose microprocessor market is occupied by 32-bit and 64-bit microprocessors. This chapter describes Intel microprocessors with 32-bit architecture, which make up the IA-32 family (Intel Architecture-32). This architecture forms the basis of Intel's 64-bit x86-64 architecture and AMD's AMD-64 architecture.

2.1 Composition and functions of registers

Registers refers to high-speed memory located inside the CPU and designed for rapid storage of data and quick access to it from the internal components of the processor. For example, when optimizing program loops for speed, the variables accessed within the loop are located in processor registers rather than in memory. The collection of all such registers is sometimes called super-random access memory (SRAM).

2.1.1 Main registers

Figure 2.1 shows the structure of the main registers and their names. There are 8 general purpose registers (GPR), 6 segment registers, a flag register, an instruction pointer register, as well as system, debug and test registers.

General purpose registers (RONs). These registers are mainly used for executing arithmetic operations and data transfer. Each RON can be accessed as a 32-bit register or as a 16-bit register. Some registers can be accessed as 8-bit registers. For example, the EAX register is 32-bit, but its lower 16-bits are called the AX register. The higher 8-bits of the AX register are called AN, and the lower 8-bits are called AL. The 16-bit parts of the index and pointer registers are usually used only when writing programs for real addressing mode, i.e. for MS-DOS or its emulation in Windows.

Meaning of register name abbreviations:

The E prefix at the beginning of register names is expanded.

The postfix X at the end of register names means, in fact, the register.

AX – Accumulator (drive).

BX – Base register.

CX – Counter.

DX – Data.

AL – Accumulator low (low half of the AX register).

AH – Accumulator high (high half of register AX).

BL – Base low (low half of the BX register).

BH – Base high (high half of the BX register).

CL – Counter low (low half of the CX register).

CH – Counter high (high half of the CX register).

DL – Data low (low half of the DX register).

DH – Data high (high half of the DX register).

SI – Source index. Contains the current source address.

DI – Destination index. Contains the current receiver address.

BP – Base pointer. For random access to data inside the stack

SP – Stack pointer. Contains a pointer to the top of the stack.

CS – Code segment.

DS – Data segment.

ES – Extra Data segment (additional data segment).

FS – Extended Data segment (additional data segment).

GS – Extended Data segment (additional data segment).

SS – Stack segment.

IP – Instruction pointer (instruction pointer or command counter).

F – Flags (flag register).

GDTR – Global descriptor table register.

IDTR – Interruption descriptor table register.

TR – Task register.

LDTR – Local descriptor table register.

DR – Debug register.

TR – Test register.

CR – Control register.

Features of using registers. When the processor executes commands, some general-purpose registers have a special purpose.

    The contents of the EAX register are automatically used when executing multiplication and division instructions. Because this register is usually associated with the execution of arithmetic instructions, it is often called extended accumulator register (expanded accumulator).

    The ESX register is automatically used by the processor as a loop counter.

    The ESP register accesses data stored on the stack. Stack- This system area memory, accessed according to the FILO (First input – last output) principle “first written, last read”. This register is usually never used to perform normal arithmetic operations and data transfer instructions. It's often called the extended stack pointer register (expanded stack pointer). The ESP register points to the address of the top of the stack (the address where the next variable by command PUSH).

    The ESI and EDI registers are usually used for high-speed data transfer commands from one memory location to another. That's why they are sometimes called extended source index registers And recipient data (expanded source index And expanded destination index). The ESI register is the source address and contains the address of the beginning of the block of information for the “move block” operation (full address DS:ESI), and the EDI register is the destination address (full address ES:EDI).

    The EBP register is commonly used by high-level programming language compilers to access function parameters and to reference local variables allocated on the stack. It should not be used to perform ordinary arithmetic operations or to move data, except in the case of specialized programming techniques by experienced programmers. It is often called extended register stack frame pointer (expanded frame pointer). The EBP register contains the address from which information (or the “depth” of the stack) is added to or taken from the stack. Function parameters have a positive shift relative to EBP, local variables have a negative shift, and the full address of this memory area is stored in the SS:EBP register pair.

    The EIP register stores the offset of the instruction address. Full address to the following executable command stored in register pair CS:EIP.

    The ESP, EBP, ESI, EDI registers store the data address offset.

Segment registers. These registers are used as base registers when accessing pre-allocated areas random access memory which are called segments. There are three types of segments and, accordingly, segment registers:

    code (CS), they store only processor instructions, i.e. machine code of the program;

    data (DS, ES, FS and GS), they store memory areas allocated for program variables and data;

    stack (SS), they store a system memory area called stack, in which local (temporary) program variables and parameters passed to functions when they are called are distributed.

Segment registers are loaded with segment selectors, which are offsets from either the Global Descriptor Table (GDT) or the Local Descriptor Table (LDT).

The segment register bits contain the following information:

0 – 1. RPL – Requested Privilege Level. Level of privileges requested.

2. TI – Table Indicator. Indicator table bit. Tells the processor where to look for the specified selector. If the bit is set, the processor reads the selector from the local descriptor table. If the bit is clear, the processor reads the selector from the global descriptor table.

3 – 15. These bits store the index of the global or local descriptor table.

The CS register is too small to hold a code segment, so it must be set to the executable segment. The SS register is too small to hold a stack segment, so it must be set to a writeable data segment.

The privilege level set in the CS segment register is the privilege level of the running program, and is called the current privilege level (CPL - Current privilege level). The segment registers DS, ES, FS, GS are for data and must be set to a data segment with write permission. The privilege levels required by each of these selectors must be greater than the current CPL privilege level.

Each of the six segment registers has shadow descriptor registers associated with it. In protected mode, the 32-bit base address of the segment, the 20-bit limit, and attributes (access rights) from the descriptor tables are overwritten into the shadow registers.

Command pointer register. The EIP register stores the offset from the beginning of the program (offset) of the next instruction to be executed. The processor has several instructions that affect the contents of this register. Changing the address stored in the EIP register causes control to be transferred to a new section of the program.

Flag register E.F. (EFLAGZ) . The bits of this register are called flags and are responsible either for the specific execution of certain CPU commands or reflect the result of command execution by the ALU unit. The register flags are listed in Table 2.1. Bits 22 to 31 inclusive are reserved.

Table 2.1 Flag Register

To analyze the bits of this register, special processor instructions are provided. They say the flag installed, when the flag bit is 1, and that flag reset, when its bit value is 0. In addition, the extended EF register contains five new flags compared to the F register.

Control flags. The state of the EFLAGS register bits corresponding to the control flags can be changed by the programmer using special processor commands. These flags (DF, IF, TF) control the execution of certain CPU commands:

8. TF – Trap flag. Trace flag (step mode). When it is set (TF=1), after each command is executed, internal interrupt INT 1 is called, which stops the computing process, making it possible to check the contents of the registers.

9. IF – Interrupt-enable flag. Interrupt enable flag. When IF=1, masked hardware interrupts are enabled. When IF=0 – prohibited.

10. DF – Direction flag. Flag to control the direction of array processing in string commands. When DF=1, index registers SI, DI, which take part in string commands, are automatically decremented by the number of operand bytes, thereby processing strings from end to beginning. When DF=0 – they are incremented, processing from the beginning to the end of the line.

Status Flags. These flags reflect various characteristics of the result of executing arithmetic and logical CPU instructions:

0. CF – Carry flag. The carry flag is set if, when performing an unsigned arithmetic operation, a number is obtained whose bit depth exceeds the bit width of the result field allocated for it. In shift instructions, the CF flag fixes the value of the most significant bit.

2. PF – Parity flag. The even/parity flag is set when the result has an even number of ones.

4. AF – Auxiliary Carry. Flag for additional transfer/borrowing from the lowest tetrad to the highest one (from the third to the fourth digit). Used in commands for processing 8-bit data, most often BCD numbers.

6. ZF – Zero flag. The zero flag is set if an arithmetic or logical operation produces a number that is zero (that is, all bits of the result are 0).

7. SF – Sign flag. The sign flag duplicates the value of the most significant bit of the result. SF=0 for positive result, SF=1 for negative.

11. OF – Overflow flag. Overflow flag , or ) is set if, when performing a signed arithmetic operation, a number is obtained whose digit capacity exceeds the digit capacity of the result field allocated for it.

Flags added to the EF register:

12-13. IOPL – I/O Privilege Level. I/O privilege level flag. Used in protected mode of microprocessor operation to control access to I/O commands depending on the privilege of the task.

14. NT – Nested task flag. Task nesting flag. Used in protected mode of microprocessor operation to record the fact that one task is nested within another.

16. RF – Resume flag. Resume flag. In debug mode, a single RF value allows the command to be restarted after interruptions. Used in conjunction with debug breakpoint registers.

17. VM – Virtual Mode flag. Virtual mode flag. A sign that the microprocessor is operating in virtual 8086 mode. 1 – the processor is operating in virtual 8086 mode; 0 – the processor operates in real or protected mode.

18. AC – Alignment Check. Flag of the processor operating mode in which alignment control occurs. Used only at privilege level 3. If AC=1 and AM=1 (AM is a bit in the control register CR0), then if an operand is accessed that is not aligned to the appropriate limit (2 or 4 or 8) bytes, exception 17 will be thrown Aligning the operand to the limits 2,4,8 means that the address of the operand is a multiple of 2,4,8, respectively.

19. VIF – Virtual Interrupt Flag. Interrupt enable flag in virtual mode of processor operation.

20. VIP – Virtual Interrupt Pending. Interrupt delay flag in virtual mode of processor operation.

21. ID – CPU Identification. Processor identification flag. The flag can be changed if the processor supports the CPUID instruction.

In control registers CR0-CR3 stores indicators of the processor state that are common to all tasks. The CR0 register has the following bits:

0. PE – Protection Enable. Protected mode flag. If PE=1, then the processor operating mode is protected.

1. MP – Mathematical present. Flag of the presence of a mathematical coprocessor. If MP=1, then the math coprocessor is present.

2. EM – Emulate Numeric Extension. Flag for emulating commands over floating point numbers. When the flag is set, commands for working with real numbers can be emulated in software.

3. TS – Task Switched. Task switching flag. The flag is set after the task is switched.

4. ET – Extension Type. Extension type flag. Set if a 387 or higher arithmetic coprocessor is present.

5. NE – Numeric Error Enable. Coprocessor error enable flag. If the flag is set, then in case of an error in the coprocessor, exception situation 16 is generated.

16. WP – Write Protect. Write protection flag. If the flag is set, memory pages are assigned to the operating system kernel in read-only mode.

18. AM – Alignment Mask. Alignment masking flag. When the flag is set, exception 17 will be generated if unaligned operands are accessed. If the flag is cleared, the exception will be masked.

29. NW – Not Write-Through. Flag for prohibiting write-through.

30. CD – Cache Disable. Disable caching flag.

31. PG – Paging Enable. Memory paging mode flag. When the flag is set, the operating system operates in memory paging mode.

The remaining bits of the CR0 register are reserved.

Register CR1 is reserved. The CR2 register stores the 32-bit linear address at which the memory page fault occurred. The 20 most significant bits of the CR3 register store the physical base address of the page directory table and the cache control bits. The CR4 register contains the MP architectural extension enable bits. In general, these registers are used in system programming and set the operating mode of the processor (normal, protected, etc.), memory paging, etc.

System address registers. The system pointer registers of the global descriptor table GDTR and interrupt table IDTR store 32-bit base addresses and 16-bit table limits, respectively. The TR and local descriptor table LDTR system segment registers are 16-bit selectors. These correspond to shadow descriptor registers, which contain a 32-bit segment base address, a 20-bit limit, and an access permission byte.

In debug registers DR0-DR3 contains 32-bit breakpoint addresses in debug mode; DR4–DR5 are reserved and not used; DR6 displays the status of the reference point; DR7 – manages the placement of control points in the program.

Test registers TR are included in the group of model-specific registers, their composition and number depend on the type of processor: the MP 386 uses two registers TR6 and TR7 to control the paging system of memory allocation by the operating system. Pentium II and higher use twelve registers TR1–TR12. This group of registers also contains the results of MP and cache memory testing.

AMD64 (also x86-64 or x64) is a 64-bit microprocessor architecture and corresponding instruction set developed by AMD. It is an extension of the x86 architecture with full backward compatibility. The x86-64 instruction set is currently supported by AMD Athlon 64, Athlon 64 FX, Athlon 64 X2, Turion 64, Opteron, and the latest Sempron processors. Interestingly, this instruction set was supported by AMD's main competitor, Intel, under the name EM64T or IA-32e in later models of Pentium 4 processors, as well as in Pentium D, Pentium Extreme Edition, Celeron D, Core 2 Duo and Xeon. Microsoft uses the term x64 to refer to this instruction set.

Operating modes

The architecture's processors support two operating modes: Long mode and Legacy mode (x86 compatibility mode).

Long Mode

“Long” mode is “native” for AMD64 processors. This mode allows you to take advantage of all additional features provided by the AMD64 architecture. To use this mode, you need a 64-bit operating system, such as Windows XP Professional x64 Edition or a 64-bit version of GNU/Linux. This mode allows you to run 64-bit programs; also (for backward compatibility) provides support for running 32-bit code, such as 32-bit applications, although 32-bit programs will not be able to use 64-bit ones system libraries, and vice versa. To deal with this problem, most 64-bit operating systems provide two sets of required system files: one is for native 64-bit applications, and the other is for 32-bit programs. (The same technique was used by early 32-bit systems - for example, Windows 95 - to run 16-bit programs)

Legacy Mode

This mode allows the AMD64 processor to execute instructions designed for x86 processors and provides full compatibility with 32/16-bit code and operating systems. In this mode, the processor behaves exactly like an x86 processor such as a Pentium 4, and additional features provided by the AMD64 architecture (such as additional registers) are not available. In this mode, 64-bit programs and operating systems will not work.

Architecture Features

AMD's x86-64 instruction set (later renamed AMD64) is an extension of the Intel IA-32 (x86-32) architecture. Basic distinctive feature AMD64 is support for 16 64-bit general purpose registers (versus 8 and 32-bit in x86-32), 64-bit arithmetic and logical operations over integers and 64-bit virtual addresses.

The x86_64 architecture has

    16 general purpose integer 64-bit registers (RAX, RBX, RCX, RDX, RBP, RSI, RDI, RSP, R8 - R15),

    8 80-bit floating point registers (ST0 - ST7)

    8 64-bit Multimedia Extensions registers (MM0 - MM7, share space with ST0 - ST7 registers)

    16 128-bit SSE registers (XMM0 - XMM15)

    64-bit RIP pointer and 32-bit EFLAGS flag register

Opteron (codenamed Sledgehammer or K8) is AMD's first microprocessor based on 64-bit AMD64 technology (also called x86-64). AMD created this processor primarily for use in the server market, so there are Opteron variants for use in systems with 1-16 processors.

In June 2004, the Dawning 4000A, a Chinese supercomputer built on Opteron processors, took tenth place in the Top500 supercomputers. In November 2005, it dropped to 42nd place, due to the emergence of more productive competitors. Then, in the November Top500, 10% of supercomputers were built on AMD64 Opteron processors. For comparison, based on Intel processors EM64T Xeon was built in 16.2% of supercomputers. Two key features

Two important technologies embodied in the Opteron processor are: Direct (without emulation) support for 32-bit x86 applications without loss of speed Direct (without emulation) support for 64-bit x86-64 applications (linear addressing of more than 4 GB of RAM)

The first technology is notable for the fact that at the time of the announcement of the Opteron processor, the only 64-bit processor with declared support for 32-bit x86 applications was Intel Itanium. But Itanium ran 32-bit applications with a significant loss of speed.

The second technology, in itself, is not so noteworthy, since the main manufacturers of RISC processors (SPARC, DEC, HP, IBM, MIPS and others) have had 64-bit solutions for many years. But the combination of these 2 properties in one product, on the contrary, brought Opteron recognition, as it offered an affordable and cost-effective solution for running existing x86 applications with a subsequent transition to more promising 64-bit computing.

Opteron processors have an integrated controller DDR memory SDRAM. This made it possible to significantly reduce delays when accessing memory and eliminate the need for a separate northbridge chip on the motherboard.

In May 2005, AMD introduced the first "multi-core" Opteron processor. AMD currently uses the term "multi-core" to refer to "dual-core" processors; Each Opteron processor contains 2 separate processor cores. This effectively doubles the processing power available to each processor socket on motherboards that support these processors.

One of the “top” AMD processors today is considered to be the Athlon X2 6000+ on the Windsor core for the AM2 socket. This processor contains two Athlon cores 64, combined on one chip using a set of additional logic. The cores have at their disposal a dual-channel memory controller based on the Athlon 64 stepping E, and depending on the model, from 512 to 1024 KB of Level 2 cache per core. The Athlon 64 X2 supports the SSE3 instruction set (which was previously only supported by Intel processors), which allows it to run code optimized for Intel processors with maximum performance. These improvements are not unique to the Athlon 64 X2 and are also available in releases of Athlon 64 processors built on the Venice and San Diego cores. AMD officially began shipping the Athlon 64 X2 at Computex on June 1, 2005.

The main advantage that dual-core Athlon 64 X2 processors provide is the ability to separate running programs for several simultaneously running threads. The processor's ability to execute multiple program threads simultaneously is called thread-level parallelism or (TLP). By placing two cores on a single die, the Athlon 64 X2 has double the TLP compared to a single-core Athlon 64 at the same speed. The need for TLP depends largely on the specific situation and in some situations it is simply useless. Most programs are written to work in single-threaded mode, and therefore simply cannot use the computing power of the second core. Programs written to work in multi-threaded mode and capable of using the processing power of the second core include many applications for processing music and video. Having two cores, the Athlon 64 X2 has an increased number of transistors on the chip. The Athlon 64 X2 processor with 1MB L2 cache has 233.2 million transistors, in contrast to the Athlon 64, which had only 114 million transistors. Such dimensions require the use of thinner technological process, which allows you to achieve the required number of serviceable processors from one silicon wafer. Athlon 64 X2 is built on cores: Toledo; Manchester; Windsor on 90 nm process technology. Just recently, AMD officially unveiled its new desktop platform, codenamed AMD Spider.

Spider platform

Spider platform composition

The main component of this platform is the AMD Phenom line processor, together with the AMD 7-Series family chipset.

AMD Spider Platform: General Specifications

When presenting new technologies to the general public, AMD focuses specifically on the platform nature of innovation. The key component of the Spider platform is multi-core AMD Phenom processors (up to 4-core), manufactured in compliance with 65 nm process technology and designed to work with motherboards equipped with the Socket AM2+ connector. In addition, the Spider platform includes a new generation AMD chipsets 7 Series to create motherboards with support for CrossFireX and AMD OverDrive technologies, as well as graphics from the ATI Radeon HD 3800 family with Microsoft support DirectX 10.1.

Scheme AMD platforms Spider

If we put aside the verbosity of press releases, the main innovation implemented in the AMD Spider platform can be called a significant increase in the “performance per watt” parameter, mainly due to the energy-efficient design of 65 nm AMD Phenom processors, 65 nm AMD 7-chipsets Series and 55 nm graphics chips of the ATI Radeon HD 3800 family. Along with this, the AMD Spider platform supports a number of specific energy saving technologies: ATI PowerPlay, Cool'n'Quiet 2.0, Microsoft DirectX 10.1, HyperTransport 3.0 and PCI Express 2.0. In particular, Cool’n’Quiet 2.0 technology reduces the power consumption of AMD Phenom processors with a TDP of 95 W to an average of 32 W in consumer applications and an average of 29 W in commercial applications. At the same time, AMD CoolCore technology, implemented in AMD 7-Series chipsets, ensures that processor cores operate at different frequencies and, accordingly, reduces power consumption, while the average TDP of chipsets is about 10-12 W.

Another innovation of the AMD Spider platform is its significant scalability, unprecedented for solutions based on AMD processors. Thus, motherboards based on AMD 7-Series chipsets, thanks to ATI CrossFireX technology and support for up to 42 PCI Express lanes, can work with three or four ATI Radeon HD 3800 graphics cards. From the point of view of the microarchitecture of AMD processors, the new chips are 4-core Phenom chips for desktop PCs, based on the Stars architecture (Agena core), are the “closest relatives” of the new 4-core AMD Opteron server processors based on the Barcelona core.

In complete analogy with the Barcelona core, the Stars architecture features a 128-bit memory controller with support for up to DDR2-1066, which also has the ability to operate in 2-channel 64-bit mode for independent memory write and read operations. The physical address space has increased to 48 bits, and memory support to 256 TB.

Each of the four cores of the Phenom processor has its own 64 KB of proprietary L1 instruction cache and 64 KB of L1 data cache, for a total of 512 KB of L1 cache per processor. The total volume of L2 cache is 2 MB, 512 KB for each core. In addition, the Barcelona and Stars architectures include 2 MB of L3 cache. Unlike the L1 and L2 caches, which are exclusive to each core, the L3 cache is dynamically shared across all cores.

Key features of the new 4-core Phenom processors include: New floating point scheduler, now supporting 36 new 128-bit operations Support for 128-bit SSE operations, in addition to the capabilities of the previous 64-bit bit architecture Capable of processing two SSE operations and one SSE carry per clock Instruction fetch module buffer now 32 bytes (previously 16 bytes) Branch prediction module with 512-way indirect branch prediction Data cache performance increased from one 64-bit load per clock to one 128 -bit load per clock The performance of the L2 data cache - memory controller has been increased from 64-bit load per clock to 128-bit download per clock Implementation of the HyperTransport 3.0 bus has increased the throughput to 20.8 GB/s Implementation of AMD Virtualization Technology with fast function Rapid Page Indexing

According to information received from a source from among Taiwanese motherboard manufacturers, AMD recently notified its partners of its intention to begin shipping tri-core Phenom X3 (Toliman) processors in February 2008, and not in March, as previously planned. Dual-core Kuma processors will appear only at the end of the second quarter of next year.

Let us recall that the first triple-core processors, models 7700 and 7600, will operate at frequencies of 2.5 GHz and 2.3 GHz, respectively, the heat dissipation of the models is set at 89 W. The clock speeds of Kuma processors, models 6250 and 6050, have not yet been announced; it is only known that their TDP will be at 65 W.

In 1985 the company Intel released a 32-bit microprocessor, which became the founder of the family IA-32. The development of this family has gone through a number of stages, among which the following can be highlighted: implementation of a floating point number processing unit directly on the MP chip (microprocessor I486), introduction MMX- data processing technologies with a fixed point based on the principle SIMD - single instruction multi data(one instruction stream - many data streams) in a microprocessor Pentium MMX and the development of this technology on floating point numbers ( SSE - streaming SIMD Extension), which appeared for the first time in MP Pentium III. However, the main features of this architecture remain unchanged to this day.

The architecture of a 32-bit microprocessor is significantly different from the architecture of a 16-bit microprocessor. Some of these differences are purely quantitative, others are fundamental.

The main external difference is the increase in the data bus and address bus width to 32 bits. This, in turn, is associated with changes in the bit depth of the internal elements of the microprocessor and in the mechanism for performing certain processes, for example, the formation of a physical address.

The registers of the fixed-point number processing unit became 32-bit. Each of them can be accessed as one double word (32 bits). The lower 16 bits of these registers can be accessed in the same way as in a 16-bit microprocessor.

Both quantitative and qualitative changes occurred in the block of segment registers. To the four registers CS, DS, SS and ES used in real mode, two more have been added: FS and GS. Although the width of the registers of this block remains the same (each 16 bits), they are used differently in forming the physical address of RAM. When the microprocessor operates in the so-called protected mode, they are intended to search for a segment descriptor (descriptor) in the corresponding system tables, and the base address and attributes of the segment are stored in the descriptor. In this case, the address generation is performed by the segmentation unit of the memory manager.

If, in addition to segments, the memory is also divided into pages, then the final calculation of physical addresses is performed by the page control unit.

Beginning with the I486 microprocessor, the microprocessor chip includes a floating-point unit that includes eight 80-bit registers to represent the signs, mantissas, and exponents of such numbers.

The microprocessor chip also houses internal cache memory, which is a specially organized high-speed buffer memory, designed to store the most frequently used information (commands and data). In various microprocessor models, the cache memory ranges from 8 KB to 512 KB.

The microprocessor at the hardware level supports the multi-program operating mode of the computer, that is, the ability to simultaneously have several ready-to-execute programs in memory, which are launched by the operating system in accordance with the algorithms of its functioning or depending on special situations that arise in the operation of external devices.

Inextricably linked with this capability are memory protection controls that provide control over unauthorized interactions between individual programs. These include memory management protection and privilege protection.

The main features of the extended instruction format are the ability to use any of the general purpose registers in any of the addressing modes, as well as the addition of another addressing mode - relative base index with scaling. In this case, the effective address is formed as follows:

EA = (base) + (index) scale + disp,

where (base) is the value of the base register; (index) - index register value; scale - scale factor value (scale = 1,2,3,4); disp is the offset value encoded in the command itself.

Note that in a 32-bit architecture, the effective address is usually called an offset, while distinguishing it from the offset encoded in the instruction itself (displacement).

PC bit architecture.

Win64 code combines the main features of 32-bit code, and also includes changes associated with increasing the bit depth. The programmer has at his disposal:

· 64-bit pointers;

· 64-bit data types;

· 32-bit data types;

· Win64 API interface.

Note that 32-bit data types did not disappear as the platform's bit depth increased (as did 16-bit data types when moving to Win32). This is because even in 64-bit applications, in most cases variables do not require 8 bytes of memory, so using 64-bit types in such cases would be extremely inefficient. Operating system it would be necessary to add zeros to the most significant bits in order to increase the data size to 8 bytes (such data is also very inconvenient to read). This would result in decreased performance.

32-bit pointers suffered a different fate: they completely disappeared. The fact is that the use of 32-bit pointers imposes a limitation on the amount of addressable memory. For example, one of the main advantages of the flat memory model (which is the main model for programming 32-bit applications on the NT platform), which uses 32-bit pointers, is the ability to create segments of up to 4 GB. New 64-bit pointers provide the ability to address up to 16 TB of memory (1 TB = 1012 MB). This volume is quite in demand for modern business applications.

The functions in the Win64 API have undergone minor changes. Only the names of some of them have been changed to reflect the 64-bit platform. In most cases, only the types of parameters that were arguments to function calls were changed. All other advantages (the ability to eliminate the use of swap files, etc.) are associated either with increased addressing volume or with new data types.

Design of system boards. Form factor

Design motherboard computer

Motherboard(English motherboard, MB, the name of the English mainboard is also used - main board; slang. mother, mother, motherboard) is a complex multi-layer printed circuit board, on which the main components are installed personal computer(central processor, RAM controller and RAM itself, boot ROM, controllers of basic input-output interfaces). As a rule, the motherboard contains connectors (slots) for connecting additional controllers, for connecting which USB, PCI and PCI-Express buses are usually used.

CPU

System logic set (English chipset) - a set of chips that connect the CPU to RAM and controllers peripheral devices. As a rule, modern system logic sets are built on the basis of two VLSI chips: “north bridge” and “south bridge”.

North Bridge(English Northbridge), MCH (Memory controller hub), system controller - provides connection of the CPU to nodes using high-performance buses: RAM, graphics controller.

To connect the CPU to the system controller, FSB buses such as Hyper-Transport and SCI can be used.

Typically, RAM is connected to the system controller. In this case, it contains a memory controller. Thus, the maximum amount of RAM, as well as the bandwidth of the memory bus of a personal computer, usually depends on the type of system controller used. But the current trend is to embed the RAM controller directly into the CPU (for example, the memory controller is integrated into the processor in AMD K8 and Intel Core i7), which simplifies the functions of the system controller and reduces heat dissipation.

PCI Express is used as a bus for connecting a graphics controller on modern motherboards. Previously, common buses (ISA, VLB, PCI) and the AGP bus were used.

South Bridge(eng. Southbridge), ICH (I/O controller hub), peripheral controller - contains peripheral device controllers ( hard drive, Ethernet, audio), bus controllers for connecting peripheral devices ( PCI buses, PCI-Express and USB), as well as bus controllers to which devices that do not require high bandwidth are connected (LPC - used to connect boot ROM; the LPC bus is also used to connect a multicontroller (Super I/O) - microcircuits, providing support for “legacy” low-performance data transfer interfaces: serial and parallel interfaces, keyboard and mouse controller).

As a rule, the north and south bridges are implemented as separate VLSI chips, but single-chip solutions also exist. It is the set of system logic that determines all the key features motherboard and what devices can connect to it.

Random access memory (also random access memory, RAM) - in computer science - memory, part of the computer memory system, which the processor can access for one operation (jump, move, etc.). Designed to temporarily store data and commands necessary for the processor to perform operations. RAM transmits data to the processor directly or through cache memory. Each RAM cell has its own individual address.

RAM can be manufactured as a separate unit or included in the design of a single-chip computer or microcontroller.

Boot ROM - stores software that is executed immediately after turning on the power. Typically, the boot ROM contains the BIOS, but may also contain software that runs within the EFI framework.