To translate from one language to another, programs, like people, require a translator or, scientifically speaking, a translator.

Translator: basic concepts

Such a program as a translator is a linguistic representation of the calculations I ->P ->P (i). An interpreter is a program whose input is a program P with some input data X. It executes P on X: I(P, x) = P(x). There is only one translator that is capable of performing everything possible programs(which can be represented in a formal system). This is a very significant and profound discovery by Turing. The processor is an interpreter of machine language programs. Write interpreters for languages high level, as a rule, are too expensive, so they are translated into a form that is easier to interpret. Some types of translators have very strange names. The program translates assembly language programs into machine language. The compiler allows you to translate from a high-level language to a higher-level language low level. A translator is a program that takes as input a program in some language S and, after processing, produces a program in language T. Thus, they both have the same semantics: P->X->Q. Thus, for any xP(x)=Q(x). Translating an entire program into something interpreted is called pre-execution compilation or AOT compilation. AOT compilers can be used sequentially. The last one is very often an assembler. So, let's look at an example: Source code -> Compiler (translator) -> Assembly code -> Assembler (translator) -> Machine code -> CPU (interpreter). Dynamic or on-line compilation occurs when part of a program is translated while other previously compiled parts are executed. JIT translators remember what they have already done before, so as not to repeat it again and again source. They are even capable of adaptive compilation and recompilation, which is based on the behavior of the program's runtime environment. Many languages ​​provide the ability to execute code while broadcast, as well as compile new code during program execution.

Broadcast: stages

The translation process consists of synthesis and analysis stages. Schematically, this process looks something like this: Source code -> Analyzer -> Conceptual representation -> Synthesizer (generator) -> Target code. This is due to the following reasons:

- any other method is simply not suitable;

— translation by words simply does not work.

You can use the following engineering solution: if you need to write translators for M source languages ​​and N target languages, you only need to write M+N simple programs(semi-compilers), rather than MxN full (complex) compilers. In practice, however, it is quite rare for a conceptual representation to be expressive and powerful enough to cover all existing target and source languages. Although some users were able to get close to this. Real compilers go through many different stages. By creating your own compiler, you won't have to redo all the hard work that programmers have already done in creating generators and views. You can translate your language directly into JavaScript or C and use existing C language compilers and JavaScript engines to do the rest. You can also use existing intermediate views and virtual machines.

Translator recording

The translator can be technical means or a program that uses three languages: source, target, base. They can be written in the form of a T, placing the source on the left, the target on the right and the base below. There are three types of compilers in total.

  1. A translator is a self-compiler if its source language matches the base language.
  2. A compiler whose target language is equal to its base language is called self-resident.
  3. If the target and base languages ​​are different, then the translator is a cross-compiler.

Why is it important to distinguish between these types of compilers? Even if you never build a really good compiler, it's a good idea to learn about the technology behind it, since all the concepts used for this purpose are ubiquitous in database query languages, text formatting, advanced computer architectures, graphical interfaces, generalized optimization problems, machine translations, controllers and in virtual machines. Also, if you need to write preprocessors, loaders, assemblers, debuggers, or profilers, you need to go through all the same steps as when writing a compiler. You can also learn about better ways to write programs, since developing a translator for a programming language means better understanding all its ambiguities and subtleties. Thanks to studying general principles broadcasts you can become good designer language. But does it really matter? How cool is a language if it can't be implemented efficiently?

Large-scale technology

Compiler technology covers a wide range of different areas of computer science. It includes formal language theory, grammar, computer architecture, parsing, computability, instruction sets, CISC or RISC, pipelining, clock cycles, kernels, etc., as well as sequence control, recursion, conditional execution, functional decomposition, iterations, modularity, synchronization, metaprogramming, constants, scope, templates, output type, annotations, prototypes, threads, mailboxes, monads, wildcards, continuations, transactional memory, regular expressions, polymorphism, inheritance, parameter modes, etc. Also, to create a compiler, you need to understand abstract programming languages, algorithms and data structures, regular expressions, graphical algorithms, dynamic programming.

Compiler design. Possible problems, arising when creating a real translator

What problems might arise with the source language? Is it easy to compile? Is there a preprocessor for this? How are types processed? What grouping of compiler passes is used - single-pass or multi-pass? The desired degree of optimization also deserves special attention. A fast and dirty broadcast of a program with little or no optimization may be normal. Excessive optimization can slow down the compiler, however, at runtime best code might be worth it.

Error detection rate. Is it necessary for the translator to stop at the first error? When should he stop? Should you trust the compiler to correct errors?

Required set of tools

If in your case the source language is not too small, then having an analyzer generator and a scanner is prerequisite. There are also special code generators, but they are not very widespread.

As for the type of target code to generate, you need to choose from pure, augmented or virtual machine code. You can also write an input part that creates popular intermediate views such as LLVM, JVM, RTL. You can also do a translation from source to source code in Java Script or C. If we talk about the target code format, here you can select portable machine code, memory image machine code, assembly language.

Retargeting

When using a large number of generators, it would be nice to have a common input part. Also for this reason, for many input parts it is better to have one generator.

Compiler components

We list the main functional components of the translator, which generates machine code if the output program is a program written in C or a virtual machine:

— the input program enters a lexical analyzer, or in other words, a scanner, which converts it into a stream of tokens;

— the syntax analyzer (parser) builds an abstract syntax tree from them;

— a semantic analyzer decomposes semantic information and checks tree nodes for errors;

— as a result, a semantic graph is constructed. This term refers to an abstract syntax tree with established links and additional properties;

— the intermediate code generator builds a flow graph (tuples are grouped into main blocks);

— the machine-independent optimizer carries out local and global optimization, but mainly remains within the framework of subroutines, while simplifying calculations and reducing redundant code. The result should be a modified flow graph;

— to connect the basic blocks into a straight-line code with transfer of control, a target code generator is used. It creates an object file in assembler with visual registers, which may not be very efficient;

— a machine-dependent optimizer-linker is used to distribute memory between virtual registers and perform instruction scheduling. It also converts a program written in assembly language into real assembler using pipelining.

— error detection subsystems and a symbol table manager are used;

— scanning and lexical analysis. The scanner is used to convert a stream of source code characters into a stream of tokens, removing comments, spaces, and expanding macros. Quite often scanners encounter the following problem: whether to take into account indentations, case, or nested comments.

Those errors that may occur during scanning are called lexical. These include the following:

— symbols missing from the alphabet;

— exceeding the number of characters in a line or word;

- non-closed string literal or character;

- end of file in comment.

Parsing or parsing is used to transform a sequence of tokens into an abstract syntax tree. In this case, each tree node is saved as an object with named fields. Many of them are themselves tree nodes. There are no cycles at this stage. When creating a parser, you first need to pay attention to the level of complexity of the grammar (LR or LL) and find out whether there are any disambiguation rules. Indeed, some languages ​​require semantic analysis. Errors that occur on at this stage, are called syntactic.

Semantic analysis

When carrying out semantic analysis, it is necessary, first of all, to check the admissibility rules and combine parts of the syntax tree into one whole to form a semantic graph by inserting an operation for implicit cast types, name reference resolution, etc. It is clear that different languages programming have a different set of validity rules. When compiling Java-like languages, translators may encounter the following errors:

— multiple declarations of a variable within its scope;

— violation of accessibility rules;

— presence of references to an undeclared name;

— too large or, conversely, insufficient number of arguments when calling a method;

- type mismatch.

Generation

By generating intermediate code, a flow graph is produced, which is composed of tuples grouped into basic blocks. After code generation, the actual machine code is obtained. The first step in traditional compilers for RISC machines is to create an assembler with an infinite number of virtual registers. This probably won't happen for CISC machines.

Translators are implemented as compilers or interpreters. In terms of doing the work, the compiler and interpreter are significantly different.

Compiler(English) compiler- compiler, collector) reads the entire program entirely, translates it and creates a complete version of the program in machine language, which is then executed.

Interpreter(English) interpreter- interpreter, interpreter) translates and executes the program line by line.

After the program is compiled, neither original program, nor the compiler are no longer needed. At the same time, the program processed by the interpreter must again transfer into machine language each time the program is launched.

Each specific language is oriented either towards compilation or interpretation - depending on the purpose for which it was created. For example, Pascal usually used to solve rather complex problems in which program speed is important. That's why given language usually implemented using compiler. On the other side, BASIC was created as a language for novice programmers, for whom line-by-line execution of a program has undeniable advantages. Sometimes for one language there is and compiler, and interpreter. In this case, you can use an interpreter to develop and test the program, and then compile the debugged program to improve its execution speed.

What are programming systems?

Modern programming systems usually provide users with powerful and convenient program development tools. These include:

· compiler or interpreter;

· integrated development environment;

· tools for creating and editing program texts;

· extensive libraries standard programs and functions;

· user-friendly dialogue environment;

· multi-window operating mode;

· powerful graphic libraries; utilities for working with libraries

· built-in assembler;

· built-in help desk;

· other specific features.

Popular programming systems – Turbo Basic, Quick Basic, Turbo Pascal, Turbo C.

Recently, programming systems focused on creating Windows applications:

Borland Delphi 3.0

· plastic bag Borland Delphi(Delphi) - a brilliant successor to the Borland Pascal family of compilers, providing high-quality and very convenient tools visual development. Its exceptionally fast compiler allows you to solve virtually any application programming problem efficiently and quickly.

· plastic bag Microsoft Visual Basic- a convenient and popular tool for creating Windows programs using visual tools. Contains tools for creating diagrams And presentations.

· plastic bag Borland C++- one of the most common tools for developing DOS and Windows applications.

Below, for illustration, are given in BASIC, Pascal and C languages ​​programs for solving the same simple problem - calculating the sum of S elements of a one-dimensional array A=(a 1 , a 2 , ..., a n).

What are tool programs needed for?

In their purpose they are similar to programming systems. TO instrumental programs, for example, include:

· editors;

· program composition tools;

· debugging programs, i.e. programs that help find and fix errors in the program;

· auxiliary programs that implement frequently used system actions;

· graphics packages programs, etc.

Instrumental software can provide assistance at all stages of software development.

What's happened text editor?

This data can be a program or a document or a book. The edited text is displayed on the screen, and the user can make changes to it in dialog mode.

Text editors can provide a variety of functions, namely:

· editing text lines;

· ability to use different character fonts;

· copying and transferring part of the text from one place to another or from one document to another;

· contextual search and replacement of parts of text;

· setting arbitrary line spacing;

· automatic word wrapping new line;

· automatic page numbering;

· processing and numbering of footnotes;

· alignment of paragraph edges;

· creation of tables and diagrams;

· checking the spelling of words and selecting synonyms;

· construction of tables of contents and subject indexes;

· printing the prepared text on a printer in the required number of copies, etc.

The capabilities of text editors are varied - from programs designed for preparing small documents of a simple structure, to programs for typing, design and complete preparation for printing of books and magazines (publishing systems).

Rice. 6.5. Editor window Microsoft Word

The most famous text editor is Microsoft Word.

Full-featured publishing systems - Microsoft Publisher, Corel Ventura And Adobe PageMaker. Publishing systems are indispensable for computer layout and graphics. They greatly facilitate working with multi-page documents; they have the ability to automatically break text into pages, arrange page numbers, create headings, etc. Creating layouts for any publication - from flyers to multi-page books and magazines - becomes very simple, even for beginners.

The specific executors of programming languages ​​are translators and interpreters.

Translator is a program on the basis of which a computer converts programs input into it into machine language, since it can execute programs written only in the language of its processor, and algorithms specified in another language must be translated into machine language before executing them.

Translator- a program or technical means that broadcasts a program.

Broadcast of the program- transformation of a program presented in one of the programming languages ​​into a program in another language, equivalent in terms of execution results to the first one. The translator usually also diagnoses errors, creates dictionaries of identifiers, produces program texts for printing, etc.

The language in which the input program is presented is called original language, and the program itself - the source code. The output language is called the target language or objective code. The purpose of translation is to convert text from one language to another, which is understandable to the recipient of the text. In the case of translator programs, the addressee is technical device(processor) or interpreter program.

Translators are implemented as compilers or interpreters. In terms of doing the work, the compiler and interpreter are significantly different.

The language of processors (machine code) is low-level. A translator that converts programs into machine language that is received and executed directly by the processor is called compiler.

Compiler(English) compiler- compiler, collector) reads the entire program, translates it and creates a complete version of the program in machine language, which is then executed. The result of the compiler is a binary executable file.

The advantage of the compiler: the program is compiled once and no additional transformations are required each time it is executed. Accordingly, a compiler is not required on the target machine for which the program is compiled. Disadvantage: A separate compilation step slows down writing and debugging and makes it difficult to run small, simple, or one-off programs.

If the source language is an assembly language (a low-level language close to machine language), then the compiler of such a language is called assembler.

Another method of implementation is when the program is executed using interpreter no broadcast at all.

Interpreter(English) interpreter- interpreter, interpreter) translates and executes the program line by line.

The interpreter software models a machine whose fetch-execute cycle operates on instructions in high-level languages, rather than on machine instructions. Such software modeling creates virtual machine, which implements the language. This approach is called pure interpretation. Pure interpretation is usually used for languages ​​with a simple structure (for example, APL or Lisp). Interpreters command line process commands in scripts in UNIX or batch files(.bat) in MS-DOS is also usually in pure interpretation mode.

The advantage of a pure interpreter: the absence of intermediate actions for translation simplifies the implementation of the interpreter and makes it more convenient to use, including in dialog mode. The disadvantage is that an interpreter must be present on the target machine where the program is to be executed. Also, as a rule, there is a more or less significant loss in speed. And the property of a pure interpreter, that errors in the interpreted program are detected only when an attempt is made to execute a command (or line) with an error, can be considered both a disadvantage and an advantage.

There are compromises between compilation and pure interpretation in the implementation of programming languages, when the interpreter, before executing the program, translates it into an intermediate language (for example, into bytecode or p-code), more convenient for interpretation (that is, we are talking about an interpreter with a built-in translator) . This method is called mixed implementation. An example of a mixed language implementation is Perl. This approach combines both the advantages of a compiler and interpreter (greater execution speed and ease of use) and disadvantages (additional resources are required to translate and store a program in an intermediate language; an interpreter must be provided to execute the program on the target machine). Also, as in the case of a compiler, a mixed implementation requires that the source code be free of errors (lexical, syntactic and semantic) before execution.

As computer resources increase and heterogeneous networks (including the Internet) connecting computers expand different types and architectures, stood out the new kind interpretation, in which the source (or intermediate) code is compiled into machine code directly at runtime, on the fly. Already compiled sections of code are cached so that when they are accessed again, they immediately receive control, without recompilation. This approach is called dynamic compilation.

The advantage of dynamic compilation is that the speed of program interpretation becomes comparable to the speed of program execution in conventional compiled languages, while the program itself is stored and distributed in a single form, independent of target platforms. The disadvantage is greater implementation complexity and greater resource requirements than in the case of simple compilers or pure interpreters.

This method works well for web applications. Accordingly, dynamic compilation has appeared and is supported to one degree or another in Java implementations. NET Framework, Perl, Python.

Once a program is compiled, neither the program's source code nor a compiler are needed to run the program. At the same time, the program processed by the interpreter must be re-translated into machine language each time the program is launched. That is, the source file is directly executable.

Compiled programs run faster, but interpreted ones are easier to fix and change.

Each specific language is oriented either towards compilation or interpretation - depending on the purpose for which it was created. For example, C++ is usually used to solve rather complex problems in which program speed is important, so this language is implemented using a compiler.

To achieve greater speed of operation of programs in interpreted programming languages, translation into intermediate bytecode can be used. The languages ​​that allow this trick are Java, Python and some other programming languages.

Algorithm for a simple interpreter:

2. analyze the instructions and determine the appropriate actions;

3. take appropriate actions;

4. if the program termination condition is not reached, read the following instructions and go to step 2

The translator usually also diagnoses errors, compiles identifier dictionaries, produces program texts for printing, etc.

Broadcast of the program- transformation of a program presented in one of the programming languages ​​into a program in another language and, in a certain sense, equivalent to the first.

The language in which the input program is presented is called original language, and the program itself - source code. The output language is called target language or object code.

The concept of translation applies not only to programming languages, but also to other computer languages, such as markup languages, similar to HTML, and natural languages, such as English or Russian. However, this article is only about programming languages, about natural languages see: Translation.

Types of translators

  • Address. A functional device that converts a virtual address into real address Memory address.
  • Dialog. Provides use of a programming language in time-sharing mode.
  • Multi-pass. Forms an object module over several views of the source program.
  • Back. Same as detranslator. See also: decompiler, disassembler.
  • Single pass. Forms an object module in one sequential viewing of the source program.
  • Optimizing. Performs code optimization in the generated object module.
  • Syntactic-oriented (syntactic-driven). Receives as input a description of the syntax and semantics of the language and text in the described language, which is translated in accordance with the given description.
  • Test. A set of assembly language macros that allow you to set various debugging procedures in programs written in assembly language.

Implementations

The purpose of translation is to convert text from one language to another, which is understandable to the recipient of the text. In the case of translator programs, the addressee is a technical device (processor) or interpreter program.

There are a number of other examples in which the architecture of the developed series computers was based on or heavily dependent on some model of program structure. Thus, the GE/Honeywell Multics series was based on semantic model execution of programs written in the PL/1 language. In Template:Not translated B5500, B6700 ... B7800 was based on a model of a runtime program written in the extended ALGOL language. ...

The i432 processor, like these earlier architectures, is also based on a semantic model of program structure. However, unlike its predecessors, the i432 is not based on a specific programming language model. Instead, the developers' main goal was to provide direct runtime support for both abstract data(that is, programming with abstract data types), and for domain-specific operating systems. …

The advantage of the compiler: the program is compiled once and no additional transformations are required each time it is executed. Accordingly, a compiler is not required on the target machine for which the program is compiled. Disadvantage: A separate compilation step slows down writing and debugging and makes it difficult to run small, simple, or one-off programs.

If the source language is an assembly language (a low-level language close to machine language), then the compiler of such a language is called assembler.

The opposite method of implementation is when the program is executed using interpreter no broadcast at all. The interpreter software models a machine whose fetch-execute cycle operates on instructions in high-level languages, rather than on machine instructions. This software simulation creates a virtual machine that implements the language. This approach is called pure interpretation. Pure interpretation is usually used for languages ​​with a simple structure (for example, APL or Lisp). Command line interpreters process commands in scripts in UNIX or in batch files (.bat) in MS-DOS, also usually in pure interpretation mode.

The advantage of a pure interpreter: the absence of intermediate actions for translation simplifies the implementation of the interpreter and makes it more convenient to use, including in dialog mode. The disadvantage is that an interpreter must be present on the target machine where the program is to be executed. And the property of a pure interpreter, that errors in the interpreted program are detected only when an attempt is made to execute a command (or line) with an error, can be considered both a disadvantage and an advantage.

There are compromises between compilation and pure interpretation in the implementation of programming languages, when the interpreter, before executing the program, translates it into an intermediate language (for example, into bytecode or p-code), more convenient for interpretation (that is, we are talking about an interpreter with a built-in translator) . This method is called mixed implementation. An example of a mixed language implementation is Perl. This approach combines both the advantages of a compiler and interpreter (greater execution speed and ease of use) and disadvantages (additional resources are required to translate and store a program in an intermediate language; an interpreter must be provided to execute the program on the target machine). Also, as in the case of a compiler, a mixed implementation requires that the source code be free of errors (lexical, syntactic and semantic) before execution.

With the increase in computer resources and the expansion of heterogeneous networks (including the Internet) connecting computers of different types and architectures, a new type of interpretation has emerged, in which the source (or intermediate) code is compiled into machine code directly at runtime, “on the fly.” Already compiled sections of code are cached so that when they are accessed again, they immediately receive control, without recompilation. This approach is called dynamic compilation.

The advantage of dynamic compilation is that the speed of program interpretation becomes comparable to the speed of program execution in conventional compiled languages, while the program itself is stored and distributed in a single form, independent of target platforms. The disadvantage is greater implementation complexity and greater resource requirements than in the case of simple compilers or pure interpreters.

This method works well for

Almost all translators (both compilers and interpreters) contain most of the following processes in one form or another: lexical analysis; parsing; semantic analysis; generation of internal program representation; optimization; generating an object program. A translator is a program that translates an input program in the source (input) language into an equivalent output program in the resulting (output) language. Three programs are always involved in the operation of a translator: 1) the translator itself is a program, usually it is part of the system software of the computing system. That is, the translator is a part of the software. It is a set of machine instructions and data and is executed by a computer, like all other programs within the operating system. 2) the initial data for the operation of the translator is the text of the input program - a certain sequence of sentences in the input programming language. This file must contain program text that satisfies the syntactic and semantic requirements of the input language. 3) the output data of the translator is the text of the resulting program. The resulting program is built according to syntactic rules specified in the output language of the translator, and its meaning is determined by the semantics of the output language. An important requirement in defining a translator is the equivalence of the input and output programs, that is, the coincidence of their meaning in terms of the semantics of the input language (for the source program) and the semantics of the output language (for the resulting program). To create a translator, you must first select the input and output languages. From the point of view of transforming sentences of the input language into equivalent sentences of the output language, the translator acts as a translator. The result of the translator's work will be the resulting program if the text of the source program is correct - does not contain errors in terms of the syntax and semantics of the input language. If the source program is incorrect, the translator will produce an error message. Apart from the concept "translator" a concept close in meaning to it is also widely used "compiler". A compiler is a translator that translates a source program into an equivalent object program in machine command language or assembly language. A compiler differs from a translator only in that its resulting program must always be written in machine code or assembly language. The resulting compiler program is called "object program" or "object code". The file in which it is written is usually called "object file". A program generated by a compiler cannot be directly executed on a computer, since it is not tied to a specific memory area where its code and data should be located. Compilers are by far the most common type of translator. They have the widest practical application, which is due to the widespread use of all kinds of programming languages. Now in modern systems In programming, compilers began to appear in which the resulting program was created not in machine command language or assembly language, but in some intermediate language. It cannot be directly executed on a computer, but requires a special intermediate interpreter to execute programs written on it. An interpreter is a program that accepts an input program in the source language and executes it. Unlike translators, interpreters do not generate a resulting program - and in this fundamental difference between them. The interpreter, like the translator, analyzes the text of the source program. But it does not generate the resulting program, but immediately executes the original one in accordance with its meaning, given by the semantics of the input language. Thus, the result of the interpreter's work will be some desired result (if the program is correct) or an error message. To execute the source program, the interpreter must convert it into machine code language. The resulting machine codes are not available to the user. They are generated by the inter-th, executed and destroyed as needed. The user sees the result of executing these codes - that is, the result of executing the original program.


Purpose of translators, compilers and interpreters

The first compilers were compilers from assembly languages ​​or, as they were called, mnemonic codes. Mnemonic codes turned the program text written in machine command language into a language more or less understandable to a specialist. It has become much easier to create programs, but not a single computer is capable of executing the mnemonic code itself; accordingly, the need arose to create compilers. The next stage was the creation of high-level languages. They represent an intermediate link between purely formal languages ​​and the languages ​​of natural communication between people. From the former they received a strict formalization of the syntactic structure of sentences of the language, from the latter - a significant part of the vocabulary, the semantics of basic constructions and expressions. The emergence of high-level languages ​​has greatly simplified the programming process. However, computers of traditional architecture predominate, which can only understand machine instructions, so the issue of creating compilers continues to be relevant. Compilers have been and continue to be created not only for new, but also for long-known languages. Since most of the theoretical aspects in the field of compilers received their practical implementation (this happened in the late 60s), the development of compilers has followed the path of their user-friendliness, the developer of programs in high-level languages. The logical conclusion of this process was the creation of programming systems - software systems that combine, in addition to compilers themselves, many related components. Today, compilers are an integral part of any computing system. Without their existence, programming any applied task would be difficult, if not simply impossible. And programming of specialized system tasks, as a rule, is carried out, if not in a high-level language, then in assembler, therefore, an appropriate compiler is used. Compilers are usually somewhat simpler to implement than interpreters. They are also superior in efficiency - it is obvious that the compiled code will always be executed faster than the interpretation of a similar source program. In addition, not every programming language allows the construction of a simple interpreter. However, interpreters have one significant advantage - the compiled code is always tied to the architecture of the computer system on which it is targeted, and the source program is only tied to the semantics of the programming language, which is much easier to standardize. The first compilers were mnemonic code compilers. Their descendants - modern compilers from assembly languages ​​- exist for almost all known computing systems. They are extremely architecturally oriented. Then compilers appeared from languages ​​such as fortran, algol-68,. They were aimed at large computers with batch processing of tasks. Of the above languages, only Fortran continues to be used to this day, since it has a huge number of libraries for various purposes. On the market software systems C and C++ language compilers dominate. The first of them was born with operating systems unix type, and then switched to other types of operating systems. The second successfully embodied an example of the implementation of object-oriented programming ideas on a well-proven practical basis. Initially, interpreters were not given significant importance, since in almost all respects they are inferior to compilers. However, now the situation has changed somewhat, since the issue of program portability and their hardware-platform independence is becoming increasingly relevant with the development of the Internet. The best-known example today is the java language (itself combines compilation and interpretation) and its associated javascript. Besides, language html, on which the http protocol is based, is also an interpreted language.

Broadcast stages. General scheme of the translator operation

The compilation process consists of two main stages - synthesis and analysis. At the analysis stage, the text of the source program is recognized, and identifier tables are created and filled in. The result of its work is an internal representation of the program that is understandable to the compiler. At the synthesis stage, based on the internal representation of the program and the information contained in the identifier table, the text of the resulting program is generated. The result of this stage is object code. In addition, the compiler contains a part responsible for analyzing and correcting errors, which, if there is an error in the text of the source program, should inform the user as fully as possible about the type of error and where it occurred. At best, the compiler can offer the user an option to correct the error. These stages, in turn, consist of smaller stages called compilation phases. The compiler as a whole, from the point of view of the theory of formal languages, performs two main functions. First, it is a resolver for the language of the source program. That is, he must receive a chain of symbols from the input language as input, check that it belongs to the language and, moreover, identify the rules by which this chain was built. The generator of input language chains is the user - the author of the input program. Secondly, the compiler is a generator for the language of the resulting program. He must construct a chain of output language at the output according to certain rules, the intended machine instruction language or assembly language. Lexical analysis (scanner) is a part of the compiler that reads program characters in the source language and constructs words (tokens) of the source language from them. The input of the lexical analyzer is the text of the source program, and the output information is transferred for further processing by the compiler at the parsing stage. Parsing is the main part of the compiler during the parsing phase. It extracts syntactic structures in the source program text processed by a lexical analyzer. At the same compilation phase, the syntactic correctness of the program is checked. Parsing plays main role- the role of the text recognizer of the input programming language. Semantic analysis is the part of the compiler that checks the correctness of the source program text from the point of view of the semantics of the input language. In addition to direct verification, semantic analysis must perform text transformations required by the semantics of the input language. Preparation for code generation is the phase in which the compiler performs preliminary actions directly related to the synthesis of the text of the resulting program, but not yet leading to the generation of text in the target language. Code generation is a phase directly related to the generation of commands that make up the sentences of the output language and, in general, the text of the resulting program. This is the main phase in the synthesis phase of the resulting program. In addition to directly generating the text of the resulting program, generation usually also includes optimization - a process associated with processing already generated text. Identifier tables (sometimes “symbol tables”) are specially organized sets of data used to store information about the elements of the source program, which are then used to generate the text of the resulting program. There may be one or several identifier tables in a specific compiler implementation. The elements of the source program, information about which must be stored during the compilation process, are variables, constants, functions, etc. - the specific composition of the set of elements depends on the input programming language used. More generally: in the lexical analysis phase, lexemes are extracted from the text of the input program insofar as they are necessary for the next parsing phase. Parsing and code generation can be performed simultaneously. Thus, these three compilation phases can work in combination, and preparation for code generation can also be performed along with them.

Passage concept. Multi-pass and single-pass compilers

The process of compiling programs consists of several phases. In real compilers, the composition of these phases may differ slightly - some of them can be divided into components, others, on the contrary, are combined into one phase. Real compilers, as a rule, perform translation of the source program text in several passes.

A pass is the process of sequentially reading data from external memory by the compiler, processing it, and placing the result of the work in external memory. Most often, a single pass involves executing one or more compilation phases. The result of intermediate passes is the internal representation of the source program, the result of the last pass is the resulting object program.

As external memory Any storage media can be used - computer RAM, storage devices magnetic disks, magnetic tapes, etc. Modern compilers, as a rule, strive to make maximum use of data storage RAM computer, and only when there is insufficient available memory, hard disk drives are used.

As each pass is executed, the compiler has access to information obtained from all previous passes. It tends to primarily use only the information obtained from the pass immediately preceding the current one, but in principle it can also access data from earlier passes all the way back to the source code of the program. The information obtained by the compiler when executing passes is not available to the user. It is either stored in an op, which is released by the compiler after the translation process is completed, or it is formatted as temporary files on disk, which are also destroyed after the compiler completes its work. Therefore, the person working with the compiler may not even know how many passes the compiler performs - he always sees only the text of the source program and the resulting object program. But the number of passes performed is important technical specifications compiler, reputable companies - compiler developers usually indicate it in the description of their product.

By reducing the number of passes performed by compilers, its speed increases while reducing the memory it requires. A one-pass compiler that takes a source program as input and immediately produces a resulting object program is ideal.

However, it is not always possible to reduce the number of passes. The number of passes required is determined primarily by the grammar and semantic rules of the source language. The more complex the grammar of a language and the more variants the semantic rules suggest, the more passes the compiler will perform

One-pass compilers are rare and are only possible for very simple languages. Real compilers typically perform two to five passes. So, real compilers are multi-pass. The most common are two- and three-pass compilers, for example: the first pass is lexical analysis, the second is parsing and semantic analysis, the third is code generation and optimization (implementation options, of course, depend on the developer). In modern programming systems, the first pass of the compiler (lexical analysis of the code) is often performed in parallel with editing the code of the source program.

Interpreters. Features of constructing interpreters

An interpreter is a program that accepts an input program in a source language and executes it. The main difference between interpreters and translators and compilers is that the interpreter does not generate the resulting program, but simply executes the original program. Term "interpreter" means "translator" . The simplest way to implement an interpreter would be to have the source program first completely translated into machine instructions and then immediately executed. In such an implementation, the interpreter would differ little from the compiler, with the only difference being that the resulting program in it would be inaccessible to the user. The disadvantage of such an interpreter would be that the user would have to wait for the entire source program to be compiled before execution could begin. In essence, there would be no particular point in such an interpreter - it would provide no advantages over a similar compiler. Therefore, the vast majority of interpreters act in such a way that they execute the source program sequentially as it arrives at the interpreter's input. Then the user does not have to wait for the entire source program to be compiled. Moreover, he can sequentially enter the original program and immediately observe the result of its execution as the commands are entered. With this order of operation of the interpreter, a significant feature appears that distinguishes it from the compiler - if the interpreter executes commands as they arrive, then it cannot perform optimization of the source program. Consequently, there will be no optimization phase in the overall structure of the interpreter. Otherwise, it will differ little from the structure of a similar compiler. Not all programming languages ​​allow the construction of interpreters that could execute the original program as commands are received. To do this, the language must allow for the existence of a compiler that parses the source program in one pass. In addition, a language cannot be interpreted as commands are received if it allows calls to functions and data structures to appear before their direct description. The absence of an optimization step means that program execution using an interpreter is less efficient than using a similar compiler. Thus, interpreters always lose to compilers in performance. The advantage of the interpreter is the independence of program execution from the architecture of the target computer system. The result of compilation is object code, which is always oriented to a specific architecture. To migrate to another architecture of the target computer system, the program must be compiled again. And to interpret a program, you only need to have its source text and an interpreter from the appropriate language. Interpreters existed for a limited circle of relatively simple languages programming (basic). High-performance professional software development tools were built on the basis of compilers. A new impetus for the development of interpreters was given by the spread of global computer networks. Such networks can include computers of different architectures, and then the requirement for uniform execution of the source program text on each of them becomes decisive. Therefore, with the development global networks and the spread of the World Wide Web, many new systems have appeared that interpret the text of the original program. In modern programming systems, there are software implementations that combine both compiler and interpreter functions - depending on the user's requirements, the source program is either compiled or executed (interpreted). Some modern programming languages ​​involve two stages of development: first, the source program is compiled into intermediate code, and then this compilation result is executed using the interpreter of this intermediate language. An example of an interpreted language is html (hypertext markup language) - a language for describing hypertext or java languages and javascript - combine compilation and interpretation functions.