Merriam-Webster's online dictionary defines database How large data set, organized in a special way for providing quick search And data extraction(for example, using a computer).

Database management system (DBMS), as a rule, is set of libraries, applications and utilities, freeing the application developer from the burden of worries regarding details data storage and management. The DBMS also provides facilities for searching and updating records.

Over the years, many DBMSs have been created to solve various types of data storage problems.

Database types

In the 1960s and 70s, databases were developed that in one way or another solved the problem of repeating groups. These techniques led to the creation of models of database management systems. The basis for such models, which are still used today, was research conducted at IBM.

One of the fundamental design factors of early DBMSs was efficiency. It is much easier to manipulate database records that have a fixed length, or at least a fixed number of elements per record (columns per row). This avoids the problem of duplicate groups. Anyone who has programmed in any procedural language will easily understand that in this case it is possible to read each database record into a simple C structure. However, in real life such successful situations are rare, so programmers have to deal with less conveniently structured data.

Database with network structure

The network model introduces pointers into databases - records containing links to other records. So, you can store a record for each customer. Each customer has placed numerous orders with us over a period of time. The data is arranged so that the customer record contains a pointer to exactly one order record. Each order record contains both data for that specific order and a pointer to another order record. Then in the currency converter application that we worked on earlier, we could use a structure that would look something like this (Fig. 1.):

Rice. 1. Structure of currency converter records

The data is loaded and a linked (hence the name of the model – network) list for languages ​​is obtained (Fig. 2):

Rice. 2. Linked List

The two different types of records shown in the figure will be stored separately, each in their own table.

Of course, it would be more appropriate if the names of the languages ​​were not repeated in the database over and over again. It would probably be better to introduce a third table that would contain the languages ​​and an identifier (often an integer) that would be used to refer to a language table entry from another type of entry. This identifier is called a key.

The network database model has several important advantages. If you want to find all records of one type related to a specific record of another type (for example, languages ​​spoken in one country), you can do this very quickly by following the pointers, starting with the specified record.

There are, however, some disadvantages. If we wanted a list of countries where French was spoken, we would have to follow the links of all the country records, and for large databases this would be very slow. This can be corrected by creating other linked lists of pointers specifically for languages, but this solution quickly becomes too complex and is certainly not universal, since it is necessary to decide in advance how the links will be organized.

Additionally, writing an application that uses the network database model is quite tedious because it is usually the application's responsibility to create and maintain pointers as records are updated and deleted.

Hierarchical database model

In the late 1960s, IBM used a hierarchical database model in the IMS DBMS. In this model, the problem of repeating groups was solved by representing some records as consisting of sets of others.

This can be thought of as a "bill of materials" that is used to describe the components of a complex product. For example, a car consists of (say) a chassis, a body, an engine and four wheels. Each of these main components in turn consists of several others. An engine includes several cylinders, a cylinder head, and a crankshaft. These components again consist of smaller ones; This is how we get to the nuts and bolts that go into any part of the car.

The hierarchical database model is still used today. A hierarchical DBMS can optimize data storage for certain specific issues, such as being able to easily determine which car uses a particular part.

Relational database model

A huge leap in the development of the theory of database management systems occurred in 1970, when E. F. Codd’s report “A Relational Model of Data for Large Shared Data Banks” was published. "), see this link. This truly revolutionary work introduced the concept of relationships and showed how tables could be used to represent facts that establish relationships with, and therefore store data about, "real world" objects.

By this time, it had already become obvious that efficiency, the achievement of which was initially fundamental in database design, was not as important as data integrity. The relational model places much more importance on data integrity than any other model previously used.

A relational database management system is defined by a set of rules. First, a table entry is called a “tuple,” and this is the term used in some of the PostgreSQL documentation. A tuple is an ordered group of components (or attributes), each of which belongs to a specific type. All tuples are built according to the same template, they all have the same number of components of the same types. Here's an example of a set of tuples:

("France", "FRF", 6.56) ("Belgium", "BEF", 40.1)

Each of these tuples consists of three attributes: country name (string type), currency (string type), and exchange rate (floating-point type). In a relational database, all records added to this set (or table) must follow this same form, so the records below cannot be added:

Moreover, no table can have duplicate tuples. That is, duplicate rows or records are not allowed in any relational database table.

This may seem draconian, as it would seem that for a system that stores orders placed by customers, it would mean that the same customer would not be able to order a product twice.

Each entry attribute must be "atomic", that is, it must be a simple piece of information, not another entry or a list of other arguments. In addition, the types of the corresponding attributes in each entry must match, as shown above. Technically this means that they must come from the same set of values ​​or domain. Almost all of them must be either strings, or integers, or floating-point numbers, or belong to some other type supported by the DBMS.

The attribute that distinguishes otherwise identical records is called a key. In some cases, a combination of several attributes can act as a key.

An attribute (or attributes) designed to distinguish a table record from all other records in that table (or, in other words, make a record unique) is called a primary key. In a relational database, every relationship (table) must have a primary key, something that makes each record different from all the others in that table.

The final rule that defines the structure of a relational database is referential integrity. This requirement is explained by the fact that at any given time all database records must have meaning. The developer of an application that interacts with a database must be careful to ensure that his code does not violate the integrity of the database. Imagine what happens when a client is deleted. If a customer is removed from the CUSTOMER relationship, all of his orders must also be removed from the ORDERS table. Otherwise, there will be records of orders that are not associated with a customer.

My next blogs will provide more detailed theoretical and practical information about relational databases. For now, remember that the relational model is built on mathematical concepts such as sets and relationships, and that there are certain rules to follow when creating systems.

Query languages ​​SQL and others

Relational database management systems, of course, provide ways to add and update data, but this is not the main thing; the power of such systems is that they provide the user with the ability to ask questions about the stored data in a special query language. Unlike earlier databases, which were specifically designed to answer certain types of questions about the information they contained, relational databases are much more flexible and answer questions that were not yet known when the database was created.

Codd's relational model exploits the fact that relationships define sets, and sets can be processed mathematically. Codd suggested that queries could use a section of theoretical logic such as predicate calculus, and query languages ​​were built on its basis. This approach provides unprecedented performance for searching and retrieving data sets.

One of the first to implement the query language was QUEL; it was used in the Ingres database created in the late 1970s. Another query language that used a different method was called QBE (Query By Example). Around the same time, a group working at IBM Research developed the Structured Query Language (SQL), a name usually pronounced "sequel."

SQL- This standard query language, its most common definition is the ISO/IEC 9075:1992 standard, “Information Technology - Database Languages ​​- SQL” (or, more simply, SQL92) and its American counterpart ANSI X3.135-1992, which differs from the first only in a few cover pages. These standards replaced the previously existing SQL89. In fact, there is a later standard, SQL99, but it has not yet become widespread, and most of the updates do not affect the core SQL language.

There are three levels of SQL92 compliance: Entry SQL, Intermediate SQL and Full SQL. The most common is the "Entry" level, and PostgreSQL comes very close to this, although there are some minor differences. Developers are fixing minor omissions, and with each new version PostgreSQL is getting closer to the standard.

There are three types of commands in SQL:

  • Data Manipulation Language (DML)- data manipulation language. This is the part of SQL that is used 90% of the time. It consists of commands to add, delete, update and most importantly retrieve data from the database.
  • Data Definition Language (DDL)- data definition language. These are commands for creating tables and managing other aspects of the database that are structured at a higher level than the data that relates to them.
  • Data Control Language (DCL)- data management language

This is a set of commands that control access rights to data. Many database users never use such commands because they work in large companies where there is a dedicated database administrator (or even several) who manages the database and also controls access rights.

SQL

SQL is almost universally recognized as the standard query language and, as mentioned, is described in many international standards. These days, almost every DBMS supports SQL to some degree. This promotes unification because an application written using SQL as a database interface can be ported and used on another database at a low cost in terms of time and effort.

However, market pressure forces database vendors to create different products. This is how several dialects of SQL appeared, which was facilitated by the fact that the standard describing the language does not define commands for many database administration tasks, which are a necessary and very important component when using the database in the real world. Therefore, there are differences between the SQL dialects adopted by (for example) Oracle, SQL Server and PostgreSQL.

SQL will be described throughout the book, but for now here are a few examples to show what the language is like. It turns out that in order to start working with SQL, you don’t have to learn its formal rules.

Let's create a new table in the database using SQL. This example creates a table for the items offered for sale that will be included in the order:

CREATE TABLE item (item_id serial, description char(64) not null, cost_price numeric(7,2), sell_price numeric(7,2));

Here we have determined that the table needs an identifier to act as a primary key, and that it must be automatically generated by the database management system. The identifier is of type serial, which means that every time a new item element is added to the sequence, a new, unique item_id will be created. Description is a text attribute consisting of 64 characters. The cost price (cost_price) and the selling price (sell_price) are defined as floating point numbers with two decimal places.

Now we use SQL to populate the newly created table. There is nothing complicated about this:

INSERT INTO item(description, cost_price, sell_price) values("Fan Small", 9.23, 15.75); INSERT INTO item(description, cost_price, sell_price) values("Fan Large", 13.36, 19.95); INSERT INTO item(description, cost_price, sell_price) values("Toothbrush", 0.75, 1.45);

The basis of SQL is the SELECT statement. It is used to create result sets - groups of records (or record attributes) that meet some criterion. These criteria can be quite complex. Result sets can be used as targets for changes made by an UPDATE statement or deletions made by a DELETE statement.

Here are some examples of using the SELECT statement:

SELECT * FROM customer, orderinfo WHERE orderinfo.customer_id = customer.customer_id GROUP BY customer_id SELECT customer.title, customer.fname, customer.lname, COUNT(orderinfo.orderinfo_id) AS "Number of orders" FROM customer, orderinfo WHERE customer.customer_id = orderinfo.customer_id GROUP BY customer.title, customer.fname, customer.lname

These SELECT statements list all customer orders in the specified order and count the number of orders placed by each customer.

For example, the PostgreSQL database provides several ways to access data, in particular you can:

  • Use a console application to execute SQL statements
  • Directly embed SQL into the application
  • Use API (Application Programming Interfaces) function calls to prepare and execute SQL statements, view result sets, and update data from many different programming languages
  • Use indirect access to PostgreSQL database data using an ODBC (Open Database Connection) or JDBC (Java Database Connectivity) driver or a standard library such as DBI for Perl

Database Management Systems

DBMS, as mentioned earlier, is a set of programs that make it possible to build databases and use them. The responsibilities of the DBMS include:

  • Database creation. Some systems manage one large file and create one or more databases within it, others may use multiple operating system files or directly implement low-level access to disk partitions. Users and developers do not have to worry about the low-level structure of such files, since all necessary access is provided by the DBMS.
  • Provides a means to perform queries and updates. The DBMS must provide the ability to query data that satisfies some criterion, for example, the ability to select all orders placed by a certain customer that have not yet been delivered. Before SQL became widely accepted as a standard language, the way such queries were expressed varied from system to system.
  • Multitasking. If several applications work with the database or several users access it simultaneously, the DBMS must ensure that the processing of each user's request does not affect the work of others. That is, users only have to wait if someone else is writing data exactly when they need to read (or write) data to some element. Multiple data readings can occur simultaneously. In fact, it turns out that different databases support different levels of multitasking and that these levels can even be customized.
  • Journaling. The DBMS must keep a log of all data changes over a period of time. It can be used to track errors and also (perhaps more importantly) to recover data in the event of a system failure such as an unscheduled power outage. It is common to back up the data and keep a transaction log because the backup can be useful for restoring the database in the event of disk damage.
  • Ensuring database security. The DBMS must provide access control so that only registered users can manipulate the data stored in the database and the database structure itself (attributes, tables and indexes). Typically, a hierarchy of users is defined for each database, at the head of this structure is a “superuser” who can change anything, then there are users who can add and delete data, and at the very bottom there are those who have read-only rights. The DBMS must have the ability to add and remove users and specify which database features they can access.
  • Maintaining referential integrity. Many DBMSs have properties that help maintain referential integrity, that is, data correctness. Typically, if a query or update violates the rules of the relational model, the DBMS issues an error message.

The database query language SQL appeared in the 70s. Its prototype was developed by IBM and is known as SEQUEL (Structured English QUEry Language). SQL incorporates the advantages of the relational model, in particular the fact that it is based on the mathematical apparatus of relational algebra and relational calculus, using a relatively small number of operators and simple syntax.

Thanks to its qualities, the SQL language became first “de facto” and then officially approved as a language standard for working with relational databases, supported by all the world's leading firms operating in the field of database technology. The use of an expressive and effective standard language has now made it possible to ensure a high degree of independence of the developed application software systems from the specific type of DBMS used, and to significantly increase the level and unification of tools for developing applications that work with relational databases.

Speaking about the SQL language standard, it should be noted that most of its commercial implementations have greater or lesser deviations from the standard. This, of course, impairs the compatibility of systems using different SQL “dialects”. On the other hand, useful extensions of language implementations relative to the standard are a means of language development and are eventually included in new editions of the standard.

A large number of books are devoted to the SQL language, including educational ones, some of them are listed in the bibliography of this manual, in particular, the textbook is specifically devoted to the practical study of the SQL language. In this regard, in this manual we will consider only the important general features of this language that are important for the subsequent presentation of the material.

8.1. Difference between SQL and procedural programming languages

The SQL language belongs to the class of non-procedural programming languages. Unlike general-purpose procedural languages, which can also be used to work with databases, SQL is not record-oriented, but set-oriented. This means the following. As input information for a database query formulated in SQL language, a set of record-tuples one or more relationship tables. As a result of executing the request, set of tuples the resulting relation table. In other words, in SQL, the result of any operation on relations is also a relation. The SQL query does not specify a procedure, i.e. the sequence of actions necessary to obtain the result, and the conditions that must be satisfied by the tuples of the resulting relation, formulated in terms of the input (or input) relations.

8.2. SQL Forms and Parts

There are two forms of SQL language that exist and are used: interactive SQL

and embedded SQL.

Interactive SQL used for direct input and obtaining the result of SQL queries by the user in interactive mode.

Embedded SQL consists of SQL commands embedded inside programs that are usually written in some other language (Pascal, C, C++, etc.). This makes programs written in such languages ​​more powerful and efficient, allowing them to work with data stored in relational databases, requiring, however, the introduction of additional tools that provide an interface to SQL with the language in which it is embedded.

Both interactive and embedded SQL are typically divided into the following components.

Data Definition Language– DDL (Data Definition Language), makes it possible to create, modify and delete various database objects (tables, indexes, users, privileges, etc.).

Additional features of the DDL data definition language may also include tools for defining data integrity constraints,

determining the order in data storage structures, describing the elements of the physical level of data storage.

Data Processing Language– DML (Data Manipulation Language),

provides the ability to retrieve information from a database and transform the data stored in it.

However, these are not two different languages, but components of a single SQL.

8.3. Terms and terminology

Keywords are words used in SQL expressions that have a special purpose. For example, they can stand for specific SQL commands. Keywords cannot be used for other purposes, such as names of database objects.

SQL statements are instructions that make SQL calls to a database. Statements are made up of one or more separate logical parts called clauses. Sentences begin with an appropriate keyword and consist of keywords and arguments.

It should be noted that the terms used in the SQL language are somewhat different from the terms used when describing the relational model. In particular, instead of the term relation, it uses the term table, instead of the terms tuple and attribute, respectively, row and column.

8.4. Data sampling. Operator SELECT

The simplest SELECT queries

The SQL SELECT statement is the most important and most commonly used statement. It is designed to retrieve information from database tables. The simplified syntax for the SELECT statement is as follows.

SELECT< attribute list>

FROM< список таблиц>

Square brackets indicate elements that may not be present in the request.

The SELECT keyword tells the DBMS that this statement is a request to retrieve information. After the word SELECT, separated by commas, the names of the fields (list of attributes) whose contents are requested are listed.

The required keyword in a SELECT query clause is the word FROM. The FROM keyword is followed by a comma-separated list of table names from which information is retrieved.

For example,

SELECT NAME, SURNAME FROM STUDENT;

The SQL query must end with a semicolon. The above query retrieves all values ​​of the NAME and

SURNAME from the STUDENT table.

Its result is a table like this

The order of the columns in this table matches the order of the NAME and SURNAME fields specified in the query, not their order in the input table

STUDENT.

Please note that the tables obtained as a result of an SQL query do not fully meet the definition of a relational relationship. IN

in particular, they may contain duplicate tuples with the same attribute values.

For example, the query: “Get a list of names of cities in which students live, information about which is in the STUDENT table,” can be written in the following form

SELECT CITY FROM STUDENT ;

The result will be a table

Belgorod

You can see that there may be identical rows in this table. They are highlighted in bold.

To exclude duplicate records from the result of a SELECT query, use the DISTINCT keyword. If a SELECT query retrieves multiple fields, DISTINCT eliminates duplicate rows in which the values ​​of all selected fields are identical.

Introducing a SELECT statement into an expression, a clause defined by the WHERE keyword (where), allows you to enter a condition expression (predicate) that evaluates to true or false for the table row field values ​​accessed by the SELECT statement. The WHERE clause determines which rows of the specified tables should be selected. The table that is the result of a query includes only those rows for which the condition (predicate) specified in the WHERE clause evaluates to true.

Write a query that selects the names (NAME) of all students with the surname (SURNAME) Petrov, information about which is in the table

SELECT SURNAME, NAME

FROM STUDENT

WHERE SURNAME = ‘Petrov’;

The conditions specified in the WHERE clause can use comparison operations specified by the following operators: = (equal), > (greater than),< (меньше), >= (greater than or equal to),<= (меньше или равно), <>(not equal), as well as the logical operators AND, OR, and NOT.

For example, a request to obtain the names and surnames of students studying in their third year and receiving a scholarship (scholarship amount greater than zero) would look like this

SELECT NAME, SURNAME FROM STUDENT

WHERE KURS = 3 AND STIPEND > 0 ;

8.5. Implementation of relational algebra operations using SQL language. SQL Relational Completeness

IN In the previous sections devoted to the consideration of relational algebra, it was said that one of the important aspects of the presence of such a mathematical apparatus in a relational model is the possibility of evaluation and proofrelational completeness practically used database query languages, in particular SQL. In order to show that SQL is relationally complete, it is necessary to show that any relational algebra operator can be expressed using SQL. In fact, it is enough to show that any of the primitive relational operators can be expressed using SQL. Below are examples of implementing relational operators using the SQL language.

Union operator

Relational algebra: A UNION B SQL statement:

SELECT * FROM A

SELECT * FROM B ;

Intersection operator

Relational Algebra: A INTERSECT B

SQL statement:

SELECT A. FIELD1, A. FIELD2, …,

FROM A, B

WHERE A. FIELD1=B. FIELD1 AND A. FIELD2=B. FIELD2 AND ...;

SELECT A.* FROM A, B

WHERE A.pk =B.pk;

Subtraction operator

Relational Algebra: A MINUS B SQL Statement:

SELECT * FROM A

WHERE A.pk NOT IN (SELECT pk FROM B);

where A.pk and B.pk are the primary keys of tables A and B

Cartesian product operator

Relational Algebra: A TIMES B SQL Statement:

FROM A, B;

SELECT A. FIELD1, A. FIELD2, …, B. FIELD1, B. FIELD2, …

FROM A CROSS JOIN B ;

Projection operator

Relational Algebra: A SQL Statement:

SELECT DISTINCT X, Y, …, Z FROM A;

Sampling operator

Relational Algebra: A WHERE θ SQL Statement:

SELECT * FROM A

WHERE θ ;

θ-join operator

Relational Algebra: (A TIMES B) WHERE θ SQL Statement:

SELECT A. FIELD1, A. FIELD2, …, B. FIELD1, B. FIELD2, …

FROM A, B

WHERE θ ;

SELECT A. FIELD1, A. FIELD2, …, B. FIELD1, B. FIELD2, …

FROM A CROSS JOIN B WHERE θ ;

Division operator

Relational algebra: A(X,Y) DEVIDE BY B(Y) SQL statement:

SELECT DISTINCT A . X FROM A

(SELECT *

(SELECT * FROM A A1

A1. X=A. X AND A1. Y=B. Y));

Thus, the above expressions prove that the SQL language, like relational algebra, is relationally complete.

You should pay attention to the fact that if the above queries contain NULL values ​​in the tables (see section 9.1 below), then all of the above queries may not work correctly, because NULL< >NULL and NULL = NULL are false.

This, however, does not refute the conclusion made about the relational completeness of SQL, since NULL values ​​are not supported by the relational model.

Hi all! Today I will try to tell you as simply as possible, especially for beginners, about what is SQL, and what it is needed for. From this material you will also learn what a database and a database management system are, as well as what a dialect of the SQL language is, because the entire article will be built on smoothly leading you to an understanding of what SQL is.

I think you already imagine that SQL is a kind of language associated with some kind of databases, but in order to better understand what SQL is, you need to understand what SQL is for, what this language is for , i.e. its purpose.

Therefore, first I will give you some background information, from which the purpose of the SQL language will be clear, and why it is needed in general.

What is a database

And I’ll start with the fact that a database is usually understood as any set of information that is stored in a certain way and can be used. But if we talk about some automated databases, then, of course, we are talking about the so-called relational databases.

Relational database– this is ordered information interconnected by certain relationships. It is presented in the form of tables in which all this information lies. And this is very important, since now you should imagine a modern database simply in the form of tables ( speaking in the context of SQL), i.e. In a general sense, a database is a collection of tables. Of course, this is a very simplified definition, but it gives some practical understanding of the database.

What is SQL

Due to the fact that the information in the database is organized, divided into certain entities and presented in the form of tables, it is easy to access and find the information we need.

And here the main question arises: How can we contact her and get the information we need?

There must be a special tool for this, and here SQL comes to our aid, which is the tool with which data is manipulated ( creating, retrieving, deleting, etc.) in the database.

SQL (Structured Query Language) is a structured query language used to write special queries ( so-called SQL statements) to a database for the purpose of obtaining data from the database or manipulating that data.

It is also worth noting that the database, and in particular the relational model, is based on set theory, which implies the combination of different objects into one whole; by one whole in the database we mean a table. This is important, since the SQL language works specifically with a set, with a set of data, i.e. with tables.

Useful materials on the topic:

  • Creating a database in Microsoft SQL Server - instructions for beginners;
  • Adding data to tables in Microsoft SQL Server - INSERT INTO statement.

What is a DBMS

You may have a question, if a database is some kind of information that is stored in tables, then what does it look like physically? How to look at it as a whole?

In short, this is just a file created in a special format, this is exactly what a database looks like ( in most cases the database includes several files, but now at this level it is not so important).

Let's go further, if a database is a file in a special format, then how to create or open it? And here the difficulty arises, because just like that, without any tools, create such a file, i.e. a relational database is not possible; for this you need a special tool that could create and manage the database, in other words, work with these files.

This is precisely the instrument DBMS is a database management system, abbreviated as DBMS.

What kind of DBMS are there?

In fact, there are quite a lot of different DBMSs, some of them are paid and cost a lot of money if we talk about full-featured versions, but even the most, so to speak, “cool” ones have free editions, which, by the way, are great for learning.

Among all, in terms of their capabilities and popularity, the following systems can be distinguished:

  • Microsoft SQL Server is a database management system from Microsoft. It is very popular in the corporate sector, especially in large companies. And this is not just a DBMS - it is a whole set of applications that allows you to store and modify data, analyze it, ensure the security of this data, and much more;
  • Oracle Database is a database management system from Oracle. It is also a very popular DBMS, also among large companies. In terms of their capabilities and functionality, Oracle Database and Microsoft SQL Server are comparable, therefore they are serious competitors to each other, and the cost of their full-featured versions is very high;
  • MySQL is a database management system also from Oracle, but only it is distributed free of charge. MySQL has gained very wide popularity in the Internet segment, i.e. Almost all sites on the Internet run on MySQL, in other words, most sites on the Internet use this DBMS as a means of storing data;
  • PostgreSQL– This database management system is also free and it is very popular and functional.

Useful materials on the topic:

  • Installing Microsoft SQL Server 2016 Express - an example of installing the free edition of Microsoft SQL Server on Windows;
  • Installing Microsoft SQL Server 2017 Express on Ubuntu Server - an example of installing the free edition of Microsoft SQL Server on Linux;
  • Installing PostgreSQL 11 on Windows - example of installing PostgreSQL on Windows;
  • Installing MySQL on Windows - example of installing MySQL on Windows;
  • Installing and configuring MySQL on Linux Mint - an example of installing MySQL on Linux;
  • Installing Oracle Database Express Edition 11g - an example of installing the free edition of Oracle on Windows ( The article was written a long time ago, but it will still be useful).

SQL dialects (SQL extensions)

The SQL language is a standard, it is implemented in all relational databases, but each DBMS has an extension of this standard, has its own language for working with data, it is usually called the SQL dialect, which, of course, is based on SQL, but provides more opportunities for full-fledged programming; in addition, such an internal language makes it possible to obtain system information and simplify SQL queries.

Here are some SQL dialects:

  • Transact-SQL(abbreviated T-SQL) – used in Microsoft SQL Server;
  • PL/SQL(Procedural Language / Structured Query Language) – used in Oracle Database;
  • PL/pgSQL(Procedural Language/PostGres Structured Query Language) – used in PostgreSQL.

Thus, it depends on the DBMS which extension you will use to write SQL statements. If we talk about simple SQL queries, for example,

SELECT ProductId, ProductName FROM Goods

then, of course, such queries will work in all DBMSs, because SQL is a standard.

Note! This is a simple SQL query to retrieve data from one table, displaying two columns.

However, if you are going to program, use all the internal capabilities of the DBMS ( develop procedures, use built-in functions, obtain system information, etc.), then you need to learn a specific dialect of SQL and practice accordingly in the DBMS that uses this dialect. This is important because the syntax of many constructs differs, as do the capabilities and much more. And if, for example, you run an SQL statement, which uses the capabilities of a certain SQL extension, on another DBMS, then such an instruction, of course, will not be executed.

For example, I personally specialize in the T-SQL language, and accordingly, I have been working with Microsoft SQL Server for more than 8 years!

Although, of course, I also worked with other DBMSs; at one time I supported two applications, one of which worked with PostgreSQL, and the second, probably already clear, with Microsoft SQL Server.

I worked with MySQL, like probably many others, as part of maintaining websites and services. Well, I had to work with Oracle Database as part of other projects.

I have grouped all my accumulated experience in the T-SQL language in one place and put it in the form of books, so if you have a desire to learn the Transact-SQL (T-SQL) language, I recommend reading my books:

  • The T-SQL Programmer's Path - a tutorial on the Transact-SQL language for beginners. In it I talk in detail about all the constructions of the language and consistently move from simple to complex. Suitable for comprehensive learning of the T-SQL language;
  • T-SQL programming style - the basics of correct coding. A book aimed at improving the quality of T-SQL code ( for those who are already familiar with the T-SQL language, i.e. knows at least the basics).

I hope you now understand what SQL is and what it is needed for. In the following materials I will tell you how to create SQL queries, tell you what tools you need to use for this and for which DBMS, since each DBMS has its own tools, so follow for the release of new articles in my groups on social networks.

In this chapter, you will learn how SQL is used to extend programs written in other languages. Although the non-procedural nature of SQL makes it very powerful, it also imposes a large number of limitations on it. To overcome these limitations, you can include SQL in programs written in one or another procedural language (having a specific algorithm). For our examples, we chose Pascal, believing that this language is the easiest to understand for beginners, and also because Pascal is one of the languages ​​for which ANSI has a semi-official standard.

WHAT IS A SQL ATTACHMENT

To nest SQL in another language, you must use a software package that provides support for nesting SQL in that language and, of course, support for the language itself. Naturally, you must be familiar with the language you are using. Primarily, you'll use SQL commands to operate on database tables, pass output results to a program, and receive input from the program into which they are inserted, generally referring to the main program (which may or may not accept them from the dialog or send them back to the dialog user and program).

WHY INVEST SQL?

While we've spent some time showing what SQL can do, if you're an experienced programmer, you've probably noticed that on its own, it's not very useful when writing programs. The most obvious limitation is that while SQL can execute a batch of commands at once, interactive SQL basically executes one command at a time. Types of logical constructs such as if ... then ("if ... then"), for ... do ("to ... perform") and while ... repeat ("while ... repeat") - used for the structures of most computer programs, are missing here, so you cannot decide whether to perform, how to perform, or how long to perform one action as a result of another action. Additionally, interactive SQL can't do much with values ​​other than inputting them into a table, placing or distributing them using queries, and of course outputting them to some device.

More traditional languages, however, are strong in these areas. They are designed so that the programmer can begin processing data, and based on its results, decide whether to do this action or another, or repeat the action until some condition is met, creating logical routes and loops. Values ​​are stored in variables that can be used and changed using any number of commands. This gives you the ability to direct users to input or output these commands from a file, and the ability to format the output in complex ways (for example, converting numeric data into charts). The purpose of nested SQL is to combine these capabilities, allowing you to write complex procedural programs that address a database through SQL - allowing you to eliminate complex table operations in a procedural language that is not oriented towards such data structure, while maintaining the structural rigor of the procedural language.

HOW SQL ATTACHMENTS ARE MADE.

SQL commands are placed in the source text of the main program, which is preceded by the phrase - EXEC SQL (EXECute SQL). The following sets up some commands that are special to nested SQL form and will be introduced in this chapter. Strictly speaking, the ANSI standard does not support nested SQL as such. It supports a concept called a module, which more precisely is a callable set of SQL procedures rather than an nesting in another language. The official definition of SQL nesting syntax would involve extending the official syntax of each language into which SQL can be nested, a very long and thankless task that ANSI avoids. However, ANSI provides four applications (not part of the standard) that define SQL nesting syntax for four languages: COBOL, PASCAL, FORTRAN, and PL/1. The C language is also widely supported as other languages. When you insert SQL commands into a program written in another language, you must precompile it before you actually compile it. A program called a precompiler (or preprocessor) will look at the text of your program and convert the SQL commands into a form that the underlying language can use.

You then use a regular translator to transform the program from source code into executable code. According to the modular language approach defined by ANSI, the main program calls SQL procedures. The procedures select parameters from the main program, and return the already processed values ​​back to the main program. A module can contain any number of procedures, each consisting of a single SQL command. The idea is that procedures can operate in the same way as procedures in the language in which they are nested (although the module must still identify the underlying language due to differences in the data types of different languages). Implementations can comply with the standard by performing SQL nesting in such a way as if the modules were already precisely defined. For this purpose, the precompiler will create a module called an access module. Only one module containing any number of SQL procedures can exist for a given program. Placing SQL statements directly in the main code is easier and more practical than directly creating the modules themselves. Each program that uses SQL attachment is associated with an access ID during its execution. The access ID associated with the program must have full privileges to perform the SQL operations performed in the program. In general, the nested SQL program is registered in the database, as is the user executing the program. The details are up to the designer, but it would probably be a good idea to include a CONNECT or similar command in your program.

USING BASE LANGUAGE VARIABLES IN SQL

The main way that SQL and parts of the underlying language of your programs will communicate with each other is through variable values. Naturally, different languages ​​recognize different data types for variables. ANSI defines SQL equivalents for four basic languages ​​- PL/1, Pascal, COBOL, and FORTRAN; All of these details are described in Appendix B. Equivalents for other languages ​​are determined by the designer. Be aware that types such as DATE are not recognized by ANSI; and therefore no equivalent data types for the underlying languages ​​exist in the ANSI standard. More complex base language data types, such as matrices, have no equivalent in SQL. You can use variables from the main program in nested SQL statements wherever you use value expressions. (SQL used in this chapter will be understood as nested SQL unless otherwise noted.) The current value of a variable can be the value used in the command. The main variables should be -

* be declared in the SQL DECLARE SESSION which will be described later.

* have a compatible data type with its functions in the SQL command (for example, a numeric type if inserted into a numeric field)

* be assigned to the value when they are used in an SQL command if the SQL command itself cannot make the assignment.

* precede a colon (:) when they are mentioned in an SQL command

Since master variables differ from SQL column names by having a colon, you can use variables with the same names as your columns if necessary. Let's assume that you have four variables in your program, named: id_num, salesperson, loc, and comm. They contain the values ​​you want to insert into the Sellers table. You could nest the following SQL command in your program: EXEC SQL INSERT INTO Salespeople VALUES (:id_num, :salesperson, :loc, :comm) The current values ​​of these variables will be placed in a table. As you can see, the comm variable has the same name as the column in which this value is nested. Note that there is no semicolon at the end of the command. This is because the appropriate termination for a nested SQL command depends on the language for which the nesting is made.

For Pascal and PL/1, this will be a semicolon; for COBOL, the word END-EXEC ; and there will be no completion for FORTRAN. In other languages, this is implementation dependent, and so we will agree that we will always use semicolons (in this book) so as not to conflict with interactive SQL and Pascal. Pascal ends nested SQL and native commands the same way - with a semicolon. The way to make the entire command as described above is to loop it and repeat it, with different variable values, as shown in the following example: while not end-ot-file (input) do begin readln ( id_num, salesperson, loc, comm); EXEC SOL INSERT INTO Salespeople VALUES (:id_num, :salesperson, :loc, :comm); end; The PASCAL program fragment defines a loop that will read values ​​from a file, store them in four named variables, store the values ​​of these variables in the Sellers table, and then read the next four values, repeating this process until the entire input file has been read. Each set of values ​​is considered to end with a carriage return (for those unfamiliar with Pascal, the readln function reads the input information and moves to the next line in the source of that information). This gives you an easy way to pass data from a text file into a relational structure. Of course, you can first process the data in any way possible in your host language, for example to eliminate all commissions below the value.12 while not end-ot-file (input) do begin readln (id_num, salesperson, loc, comm); if comm > = .12 then EXEC SQL INSERT INTO Salespeople VALUES (:id_num, :salesperson, :loc, :comm); end; Only lines that meet the condition comm >= .12 will be inserted into the output. This shows that you can use both loops and conditions as normal for the host language.

DECLARING VARIABLES

All variables referenced in SQL statements must first be declared in a SQL DECLARE SECTION using the normal host language syntax. You can have any number of such sections in a program, and they can be placed somewhere in the code before the variable being used, subject to constraints defined by the host language. The declaration section must begin and end with nested SQL commands - BEGIN DECLARE SECTION and END DECLARE SECTION - preceded as usual by EXEC SQL. To declare the variables used in the previous example, you can enter the following: EXEC SQL BEGIN DECLARE SECTION; Var id-num: integer; Salesperson: packed array (1 . .10) ot char; loc: packed array (1. .10) ot char; comm: real; EXEC SQL END DECLARE SECTION; For those unfamiliar with PASCAL, Var is a header that precedes a series of declared variables, and packed (or unpacked) arrays are a series of fixed variable values ​​distinguished by numbers (for example, the third character of loc would be loc(3)). The use of a semicolon after each variable indicates that this is Pascal and not SQL.

EXTRACTING VARIABLE VALUES

In addition to putting variable values ​​into tables using SQL commands, you can use SQL to get values ​​for those variables. One way to do this is with a variation of the SELECT command that contains an INTO clause. Let's go back to our previous example and move the Peel row from the Sellers table into our host language variables. EXEC SQL SELECT snum, sname, city, comm INTO:id_num, :salesperson, :loc, :comm FROM Salespeople WHERE snum = 1001; The selected values ​​are placed into variables with ordered names specified in the INTO clause. Of course, variables named in the INTO clause must be of the appropriate type to accept these values, and there must be a variable for each selected column. If you do not take into account the presence of the INTO clause, then this request is similar to any other. However, the INTO clause adds a significant constraint to the query. The query should retrieve no more than one row. If it retrieves many rows, they can't all be inserted into the same variable at the same time. The team will naturally fail. For this reason, SELECT INTO should only be used under the following conditions:

* when you use a predicate that checks for values ​​that you know may be unique, as in this example. Values ​​that you know can be unique are those that have an enforced uniqueness constraint or unique index, as discussed in Chapters 17 and .

* when you use one or more aggregate functions and do not use GROUP BY.

* when you use SELECT DISTINCT on a foreign key with a predicate referencing a single value of the parent key (providing your system with a reference integrity order), as in the following example: EXEC SQL SELECT DISTINCT snum INTO:salesnum FROM Customers WHERE snum = (SELECT snum FROM Salespeople WHERE sname = "Motika"); You assumed that Salespeople.sname and Salespeople.snum were the unique and primary keys of this table, and Customers.snum was the foreign key referencing Salespeople.snum, and you assumed that this query would produce a single row. There are other cases where you may know that a query should produce a single line of output, but these are little known and, in most cases, you are relying on the fact that your data has integrity that cannot be enforced by constraints. Don't rely on this! You are creating a program that is likely to be used for some time, and it is best to play it through to ensure that it does not fail in the future. In any case, there is no need to group queries that produce single rows, since SELECT INTO is only used for convenience. As you'll see, you can use queries that output multiple rows using a cursor.

CURSOR

One of the strengths of SQL is the ability to operate on all rows of a table to meet a specific condition as a block record, without knowing how many such rows there may be. If ten rows satisfy the predicate, then the query can return all ten rows. If ten million rows are defined, all ten million rows will be output. This gets a little tricky when you try to relate it to other languages. How can you assign query output to variables when you don't know how large the output will be? The solution is to use something called a cursor. You're probably familiar with the cursor as the blinking line that marks your position on the computer screen. You can think of the SQL cursor as a device that, similar to this, marks your location in the query output, although the analogs are not complete. A cursor is a type of variable that is associated with a query. The value of this variable can be each row that is output when the request is made. Like main variables, cursors must be declared before they are used. This is done with the DECLARE CURSOR command, as follows: EXEC SQL DECLARE CURSOR Londonsales FOR SELECT * FROM Salespeople WHERE city = "London"; The request will not be executed immediately; it is only being determined. A cursor is a bit like a view, in that the cursor contains a query, and the contents of the cursor resemble any query output whenever the cursor is opened. However, unlike base tables or views, cursor rows are ordered: there is a first, second... ... and last cursor row. This ordering can be arbitrary, with explicit control via an ORDER BY clause in the query, or it can default to some ordering defined by the instrument-defined schema. When you find a point in your program where you want to execute a query, you open a cursor with the following command: EXEC SQL OPEN CURSOR Londonsales; The values ​​in the cursor can be retrieved when you issue this particular command, but not the previous DECLARE command or the subsequent FETCH command. Then, you use the FETCH command to extract the output from this query, one row at a time. EXEC SQL FETCH Londonsales INTO:id_num, :salesperson, :loc, :comm; This expression will move the values ​​from the first selected row into the variables. Another FETCH command outputs the following set of values. The idea is to put the FETCH command inside a loop, so that once you select a row, you can move a set of values ​​from that row into variables, then go back into the loop to move the next set of values ​​into those same variables. For example, perhaps you want the output to be output one line at a time, asking the user each time if he wants to continue to see the next line Look_at_more:= True; EXEC SQL OPEN CURSOR Londonsales; while Look_at_more do begin EXEC SQL FETCH Londonsales INTO:id_num, :Salesperson, :loc, :comm; writeln(id_num, salesperson, loc, comm); writeln("Do you want to see more data? (Y/N)"); readln(response); it response = "N" then Look_at_more: = False end; EXEC SQL CLOSE CURSOR Londonsales; In Pascal, the sign : = means - " is the assigned value of ", while = still has the usual meaning " equals ". The writeln function writes its output, and then moves to a new line. Single quotes around character values ​​in the second writeln and in the if ... then clause are common in Pascal, which happens with duplicates in SQL. As a result of this snippet, the Boolean variable named Look_at _more should be set to true, the cursor opened, and the loop entered. Inside the loop, a line is selected from the cursor and displayed on the screen. The user is asked if he wants to see the next line. Until he answered N (No), the cycle repeats and the next row of values ​​will be selected. Although the look_at_more and response variables must be declared as a Boolean variable and a char variable, respectively, in the variable declaration section of Pascal, they should not be included in the SQL declaration section because they are not used in SQL commands. As you can see, colons before variable names are not used for non-SQL statements. Next, notice that there is a CLOSE CURSOR statement corresponding to the OPEN CURSOR statement. It, as you understand, frees the value cursor, so the query will need to be executed again with the OPEN CURSOR statement before moving on to selecting the next values. This is not necessary for those rows that were selected by the query after the cursor was closed, although this is a common procedure. While the cursor is closed, SQL does not keep track of which rows have been selected. If you open the cursor again, the query is re-executed from that point and you start over. This example does not automatically exit the loop when all rows have been selected. When FETCH has no more rows to fetch, it simply does not change the values ​​in the INTO clause variables. Therefore, if the data is exhausted, these variables will be repeatedly output with identical values ​​until the user ends the loop by entering the answer - N.

SQL CODES

It would be nice to know when the data is exhausted so that the user can be informed and the loop will end automatically. This is even more important than, for example, knowing that the SQL command was executed with an error. The SQLCODE variable (also called SQLCOD in FORTRAN) is intended to provide this function. It must be defined as a host language variable and must have a data type that in the host language corresponds to one of the SQL exact numeric types, as shown in Appendix B. The SQLCODE value is set each time the SQL command is executed. There are basically three possibilities: 1. The command executed without an error, but did not produce any effect. This looks different for different teams:

A) For SELECT, no rows are selected by the query.

B) For FETCH, the last row has already been selected, or no rows have been selected by the query in the cursor.

C) For INSERT, no rows were inserted (implying that the query was used to generate values ​​for insertion, and was rejected when attempting to retrieve any rows.

D) For UPDATE and DELETE, no row met the predicate condition, and therefore no changes will be made to the table.

In any case, the code will be set to SQLCODE = 100.

2. The command executed normally, without satisfying any of the above conditions. In this case, the code SQLCOD = 0 will be set.

3. The command generated an error. If this happens, the changes made to the database by the current transaction will be restored (see Chapter 23). In this case, the code will be set to SQLCODE = some negative number determined by the designer. The task of this number is to identify the problem as accurately as possible. In principle, your system should be equipped with a subroutine, which in this case should be executed to give you information that deciphers the value of the negative number determined by your designer. In this case, some error message will be displayed on the screen or written to a log file, while the program will restore changes to the current transaction, disconnect from the database, and exit from it. Now we can improve

USING SQLCODE TO MANAGE Loops

our previous example for exiting the loop automatically, provided the cursor is empty, all rows are selected, or an error occurs: Look_at_more: = lhe; EXEC SQL OPEN CURSOR Londonsales; while Look_at_more and SQLCODE = O do begin EXEC SQL FETCH London$ales INTO:id_num, :Salesperson, :loc, :comm; writeln(id_num, salesperson, loc, comm); writeln("Do you want to see more data? (Y/N)"); readln(response); If response = "N" then Look_at_more: = Fabe; end; EXEC SQL CLOSE CURSOR Londonsales;

WHENEVER OFFER

This is convenient for exiting when the condition is met - all rows are selected. But if you get an error, you should do something like the one described for the third case above. For this purpose, SQL provides a GOTO clause. In fact, SQL allows you to use it quite broadly so that a program can execute a GOTO command automatically if a certain SQLCODE value is produced. You can do this in conjunction with the WHENEVER clause. There is a snippet from the example for this case: EXEC SQL WHENEVER SQLERROR GOTO Error_handler; EXEC SQL WHENEVER NOT FOUND CONTINUE; SQLERROR is another way to report that SQLCODE< 0; а NOT FOUND - это другой способ сообщить что SQLCODE = 100. (Некоторые реализации называют последний случай еще как - SQLWARNING.) Error_handler - это им того места в программе в которое будет пере- несено выполнение программы если произошла ошибка (GOTO может состоять из одного или двух слов). Такое место определяется любым способом соответствующим для главного языка, например, с помощью метки в Паскале, или имени раздела или имени параграфа в КОБОЛЕ (в дальнейшем мы будем использовать термин - метка). Метка более удач- но идентифицирует стандартную процедуру распространяемую проектировщиком для включения во все программы.

CONTINUE does not do anything special for the SQLCODE value. This is also the default value. unless you use the WHENEVER command, which determines the SQLCODE value. However, these inactive definitions give you the ability to switch back and forth, executing and not executing actions, at various points in your program. For example, if your program includes several INSERT statements that use queries that actually need to produce values, you could print a special message or do something that explains that the queries are returning empty and no values ​​were inserted. . In this case, you can enter the following: EXEC SQL WHENEVER NOT FOUND GOTO No_rows; No_rows is a label in some code that contains a specific action. On the other hand, if you need to make a selection later in the program, you can enter the following at this point, EXEC SQL WHENEVER NOT FOUND CONTINUE; so that the sampling is repeated until all rows are retrieved, which is a normal procedure that does not require special processing.

MODIFYING CURSORS

Cursors can also be used to select a group of rows from a table, which can then be modified or deleted one by one. This gives you the ability to bypass some of the limitations of the predicates used in the UPDATE and DELETE commands. You can reference a table involved in a cursor query predicate or any of its subqueries that you cannot execute in the predicates of those commands themselves. As emphasized in Chapter 16, the SQL standard rejects an attempt to delete all users with a rating below average, in the following form: EXEC SQL DELETE FROM Customers WHERE rating< (SELECT AVG (rating) FROM Customers); Однако, вы можете получить тот же эффект, используя запрос для выбора соответствующих строк, запомнив их в курсоре, и выполнив DELETE с использованием курсора. Сначала вы должны объявить курсор: EXEC SQL DECLARE Belowavg CURSOR FOR SELECT * FROM Customers WHERE rating < (SELECT AVG (rating) FROM Customers); Затем вы должны создать цикл, чтобы удалить всех заказчиков выбранных курсором: EXEC SQL WHENEVER SQLERROR GOTO Error_handler; EXEC SQL OPEN CURSOR Belowavg; while not SOLCODE = 100 do begin EXEC SOL FETCH Belowavg INTO:a, :b, :c, :d, :e; EXEC SOL DELETE FROM Customers WHERE CURRENT OF Belowavg; end; EXEC SOL CLOSE CURSOR Belowavg; Предложение WHERE CURRENT OF означает что DELETE применяется к строке которая в настоящее время выбрана курсором. Здесь подразумевается, что и курсор и команда DELETE, ссылаются на одну и ту же таблицу, и следовательно, что запрос в курсоре - это не объединение. Курсор должен также быть модифицируемым. Являясь модифицируемым, курсор должен удовлетворять тем же условиям что и представления (см. Главу 21). Кроме того, ORDER BY и UNION, которые не разрешены в представлениях, в курсорах - разрешаются, но предохраняют курсор от модифицируемости. Обратите внимание в вышеупомянутом примере, что мы должны выбирать строки из курсора в набор переменных, даже если мы не собирались использовать эти переменные. Этого требует синтаксис команды FETCH. UPDATE работает так же. Вы можете увеличить значение комиссионных всем продавцам, которые имеют заказчиков с оценкой=300, следующим способом. Сначала вы объявляете курсор: EXEC SOL DECLARE CURSOR High_Cust AS SELECT * FROM Salespeople WHERE snum IN (SELECT snum FROM Customers WHERE rating = 300); Затем вы выполняете модификации в цикле: EXEC SQL OPEN CURSOR High_cust; while SQLCODE = 0 do begin EXEC SOL FETCH High_cust INTO:id_num, :salesperson, :loc, :comm; EXEC SQL UPDATE Salespeople SET comm = comm + .01 WHERE CURRENT OF High_cust; end; EXEC SQL CLOSE CURSOR High_cust; Обратите внимание: что некоторые реализации требуют, чтобы вы указы- вали в определении курсора, что курсор будет использоваться для выполнения команды UPDATE на определенных столбцах. Это делается с помощью заключительной фразы определения курсора - FOR UPDATE . Чтобы объявить курсор High_cust таким способом, так чтобы вы мог- ли модифицировать командой UPDATE столбец comm, вы должны ввести следующее предложение: EXEC SQL DECLARE CURSOR High_Cust AS SELECT * FROM Salespeople WHERE snum IN (SELECT snum FROM Customers WHERE rating = 300) FOR UPDATE OF comm; Это обеспечит вас определенной защитой от случайных модификаций, которые могут разрушить весь порядок в базе данных.

INDICATOR VARIABLE

Nulls are special tokens defined by SQL itself. They cannot be placed in main variables. Trying to insert NULL values ​​into a master variable will be incorrect, since major languages ​​do not support NULL values ​​in SQL, by definition. Although the result of an attempt to insert a NULL value into the main variable is determined by the designer, this result should not contradict the theory of the database, and therefore should produce an error: SQLCODE as a negative number, and call the error control routine. Naturally you need to avoid this. Therefore, you can choose NULL values ​​with valid values ​​that do not break your program. Even if the program does not crash, the values ​​in the main variables will become incorrect because they cannot have NULL values. An alternative method provided for this situation is the indicator variable function. The indicator variable is declared in the SQL declaration section and resembles other variables. It may have a host language type that corresponds to a numeric type in SQL. Whenever you perform an operation that needs to place a NULL value in a host language variable, you should use an indicator variable, to be on the safe side. You place the indicator variable in the SQL command directly after the host language variable you want to protect, without any spaces or commas, although you can optionally insert the word - INDICATOR. The indicator variable in the command is initially assigned the value 0. However, if a NULL value is produced, the indicator variable becomes equal to a negative number. You can check the value of the indicator variable to see if a NULL value was found. Let's assume that the city and comm floors, the Vendor tables, do not have the NOT NULL constraint, and that we declare, in the SQL declaration section, two PASCAL integer variables, i_a and i_b. (There is nothing in the declarations section that would represent them as indicator variables. They will become indicator variables when used as indicator variables.) There is one possibility: EXEC SQL OPEN CURSOR High_cust; while SQLCODE = O do begin EXEC SQL FETCH High_cust INTO:id_num, :salesperson, :loc:i_a, :commINDlCATOR:i_b; If i_a > = O and i_b > = O then (no NULLs produced) EXEC SQL UPDATE Salespeople SET comm = comm + .01 WHERE CURRENT OF Hlgh_cust; else (one or both NULL) begin If i_a< O then writeln ("salesperson ", id_num, " has no city"); If i_b < O then writeln ("salesperson ", id_num, " has no commission"); end; {else} end; {while} EXEC SQL CLOSE CURSOR High_cust; Как вы видите, мы включили, ключевое слово INDICATOR в одном случае, и исключили его в другом случае, чтобы показать, что эффект будет одинаковым в любом случае. Каждая строка будет выбрана, но команда UPDATE выполнится только если NULL значения не будут обнаружены. Если будут обнаружены NULL значения, выполнится еще одна часть программы, которая распечатает предупреждающее сообщение, где было найдено каждое NULL значение. Обратите внимание: переменные indicator должны проверяться в главном языке, как указывалось выше, а не в предложении WHERE команды SQL. Последнее в принципе не запрещено, но результат часто бывает непредвиденным.

USING THE INDICATOR VARIABLE TO EMULATE NULL SQL VALUES

Another possibility is to handle the indicator variable, associating it with each host language variable, in a special way that emulates the behavior of SQL NULL values. Whenever you use one of these values ​​in your program, for example in an if ... then clause, you can first check the associated indicator variable to see if its value is=NULL. If so, then you are treating the variable differently. For example, if a NULL value was extracted from the city field for the main variable city, which is associated with the indicator variable i_city, you would set the value of city to a sequence of spaces. This will only be necessary if you print it on a printer; its meaning should not differ from the logic of your program. Naturally, i_city is automatically set to a negative value. Suppose you had the following construct in your program: If sity = "London" then comm: = comm + .01 else comm: = comm - .01 Any value entered into the city variable will either equal "London" or not equals. Consequently, in each case the commission value will either be increased or decreased. However, the equivalent commands in SQL are executed differently: EXEC SQL UPDATE Salespeople SET comm = comm + .01 WHERE sity = "London"; and EXEC SQL UPDATE Salespeople SET comm = comm .01; WHERE city< >"London" (The PASCAL version works only with a single value, while the SQL version works with all tables.) If the value of city in the SQL version is NULL , both predicates will be unknown, and the value of comm will therefore not be changed in anyway. You can use the indicator variable to make your host language's behavior consistent with this by creating a condition that excludes NULL values: If i_city > = O then begin If city = "London" then comm: = comm + .01 else comm: = comm - .01; end; (begin and end are needed here only for understanding)
NOTE: The last line of this example contains a note - (begin and end are necessary for understanding only)
In more complex programs, you may want to set the Boolean variable to "true" to indicate that the value of city =NULL. Then you can simply check this variable whenever you need it.

OTHER USES OF THE INDICATOR VARIABLE

The indicator variable can also be used to assign a NULL value. Simply add it to the main variable name in an UPDATE or INSERT command in the same way as in a SELECT command. If the indicator variable has a negative value, a NULL value will be placed in the field. For example, the following command places NULL values ​​in the city and comm fields of the Vendors table whenever the indicator variables i_a or i_b are negative; otherwise, it puts the values ​​of the main variables there: EXEC SQL INSERT INTO Salespeople VALUES (:Id_num, :salesperson, :loc:i_a, :comm:i_b); The indicator variable is also used to indicate the row being discarded. This will happen if you insert SQL character values ​​into a main variable that is not long enough to hold all the characters. This is a particular problem with the non-standard data types VARCHAR and LONG (see Appendix C). In this case, the variable will be filled with the first characters of the line, and the last characters will be lost. If the indicator variable is used, it will be set to a positive value indicating the length of the portion of the string being discarded, thus letting you know how many characters were lost. In this case, you can check by viewing - the value of the indicator variable > 0, or< 0.

SUMMARY

SQL commands are nested in procedural languages ​​to combine the strengths of the two approaches. Some additional SQL tools are needed to get the job done. Nested SQL commands translated by a program called a precompiler into a form suitable for use by a host language translator, and used in that host language as procedure calls to subroutines that the precompiler creates, are called access modules. ANSI supports SQL nesting in the following languages: PASCAL, FORTRAN, COBOL, and PL/I. Other languages ​​are also used, especially C. In an attempt to briefly describe nested SQL, here are the most important passages in this chapter:

* All nested SQL commands begin with EXEC SQL and end in a manner that depends on the host language used.

* All main variables available in SQL commands must be declared in the SQL declaration section before they are used.

* All main variables must be preceded by a colon when used in an SQL command.

* Queries can store their output directly in main variables using an INTO clause if, and only if, they select a single row.

* Cursors can be used to store query output, and access one row at a time. Cursors can be declared (if they define the query they will contain), open (if they execute the query), and closed (if they delete the query output from the cursor). If the cursor is open, the FETCH command is used to move it to each line of the query output in turn.

* Cursors are modifiable or read-only. To become modifiable, a cursor must satisfy all the criteria that a view satisfies; furthermore, it must not use ORDER BY or UNION clauses, which cannot be used by views anyway. A non-modifiable cursor is a read-only cursor.

* If the cursor is modifiable, it can be used to determine which rows are affected by nested UPDATE and DELETE commands through the WHERE CURRENT OF clause. DELETE or UPDATE must be outside the table that the cursor is accessing in the query.

* SQLCODE must be declared as a numeric variable for every program that will use nested SQL. Its value is set automatically after each SQL command is executed.

* If the SQL command was executed as normal, but did not produce the output or expected change to the database, SQLCODE = 100. If the command produced an error, SQLCODE will be equal to some hardware-defined negative number that describes the error. Otherwise, SQLCODE = 0.

* The WHENEVER clause can be used to determine the action to take when SQLCODE = 100 (not found) or when SQLCODE is a negative number (SQLERROR). The action can be either a transition to some specific mark in the program (GOTO

SQL language is used to retrieve data from the database. SQL is a programming language that closely resembles English but is intended for database management programs. SQL is used in every query in Access.

Understanding how SQL works helps you create more accurate queries and makes it easier to correct queries that return incorrect results.

This is an article from a series of articles about the SQL language for Access. It describes the basics of using SQL to retrieve data and provides examples of SQL syntax.

In this article

What is SQL?

SQL is a programming language designed to work with sets of facts and the relationships between them. Relational database management programs such as Microsoft Office Access use SQL to manipulate data. Unlike many programming languages, SQL is readable and understandable even for beginners. Like many programming languages, SQL is an international standard recognized by standards committees such as ISO and ANSI.

Data sets are described in SQL to help answer questions. When using SQL, you must use the correct syntax. Syntax is a set of rules that allow the elements of a language to be combined correctly. SQL syntax is based on English syntax and shares many elements with Visual Basic for Applications (VBA) syntax.

For example, a simple SQL statement that retrieves a list of last names for contacts named Mary might look like this:

SELECT Last_Name
FROM Contacts
WHERE First_Name = "Mary";

Note: The SQL language is used not only to perform operations on data, but also to create and change the structure of database objects, such as tables. The part of SQL that is used to create and modify database objects is called DDL. DDL is not covered in this article. For more information, see Create or modify tables or indexes using a data definition query.

SELECT statements

The SELECT statement is used to describe a set of data in SQL. It contains a complete description of the set of data that needs to be retrieved from the database, including the following:

    tables that contain data;

    connections between data from different sources;

    fields or calculations based on which data is selected;

    selection conditions that must be met by the data included in the query result;

    necessity and method of sorting.

SQL statements

An SQL statement is made up of several parts called clauses. Each clause in an SQL statement has a purpose. Some offers are required. The table below shows the most commonly used SQL statements.

SQL statement

Description

Mandatory

Defines the fields that contain the required data.

Defines tables that contain the fields specified in the SELECT clause.

Defines the field selection conditions that all records included in the results must meet.

Determines the sort order of the results.

In an SQL statement that contains aggregation functions, specifies the fields for which a summary value is not calculated in the SELECT clause.

Only if such fields are present

An SQL statement that contains aggregation functions defines the conditions that apply to the fields for which a summary value is calculated in the SELECT clause.

SQL terms

Each SQL sentence consists of terms that can be compared to parts of speech. The table below shows the types of SQL terms.

SQL term

Comparable part of speech

Definition

Example

identifier

noun

A name used to identify a database object, such as a field name.

Clients.[Phone Number]

operator

verb or adverb

A keyword that represents or modifies an action.

constant

noun

A value that does not change, such as a number or NULL.

expression

adjective

A combination of identifiers, operators, constants, and functions designed to calculate a single value.

>= Products.[Price]

Basic SQL Clauses: SELECT, FROM, and WHERE

The general format of SQL statements is:

SELECT field_1
FROM table_1
WHERE criterion_1
;

Notes:

    Access does not respect line breaks in SQL statements. Despite this, it is recommended to start each sentence on a new line so that the SQL statement is easy to read both for the person who wrote it and for everyone else.

    Every SELECT statement ends with a semicolon (;). The semicolon can appear either at the end of the last sentence or on a separate line at the end of the SQL statement.

Example in Access

The example below shows what an SQL statement for a simple select query might look like in Access.

1. SELECT clause

2. FROM clause

3. WHERE clause

Let's look at the example sentence by sentence to understand how SQL syntax works.

SELECT clause

SELECT,Company

This is a SELECT clause. It contains a (SELECT) statement followed by two identifiers ("[Email Address]" and "Company").

If the identifier contains spaces or special characters (for example, "Email Address"), it must be enclosed in rectangular brackets.

The SELECT clause does not require you to specify the tables that contain the fields, and you cannot specify selection conditions that must be met by the data included in the results.

In a SELECT statement, the SELECT clause always comes before the FROM clause.

FROM clause

FROM Contacts

This is a FROM clause. It contains a (FROM) statement followed by an identifier (Contacts).

The FROM clause does not specify the fields to select.

WHERE clause

WHERE City="Seattle"

This is the WHERE clause. It contains a (WHERE) statement followed by the expression (City="Rostov").

There are many things you can do with SELECT, FROM, and WHERE clauses. For more information about using these offers, see the following articles:

Sorting results: ORDER BY

Like Microsoft Excel, Access allows you to sort the results of a query in a table. By using the ORDER BY clause, you can also specify how the results are sorted when the query is executed. If an ORDER BY clause is used, it must appear at the end of the SQL statement.

The ORDER BY clause contains a list of fields to sort, in the same order in which the sort will be applied.

For example, suppose you first want to sort the results by the Company field in descending order, and then, if there are records with the same Company field value, sort them by the Email Address field in ascending order. The ORDER BY clause would look like this:

ORDER BY Company DESC,

Note: By default, Access sorts values ​​in ascending order (A to Z, smallest to largest). To sort the values ​​in descending order instead, you must specify the DESC keyword.

For more information about the ORDER BY clause, see the ORDER BY clause article.

Working with summary data: GROUP BY and HAVING clauses

Sometimes you need to work with summary data, such as total sales for the month or the most expensive items in stock. To do this, the SELECT clause applies an aggregate function to the field. For example, if you were to run a query to get the number of email addresses for each company, the SELECT clause might look like this:

The ability to use a particular aggregate function depends on the type of data in the field and the desired expression. For more information about available aggregate functions, see SQL Statistical Functions.

Specifying fields that are not used in an aggregate function: GROUP BY clause

When using aggregate functions, you usually need to create a GROUP BY clause. The GROUP BY clause specifies all fields to which the aggregate function does not apply. If aggregate functions apply to all fields in the query, you do not need to create a GROUP BY clause.

The GROUP BY clause must immediately follow the WHERE or FROM clause if there is no WHERE clause. The GROUP BY clause lists the fields in the same order as the SELECT clause.

Let's continue the previous example. In the SELECT clause, if the aggregate function applies only to the [Email Address] field, then the GROUP BY clause would look like this:

GROUP BY Company

For more information about the GROUP BY clause, see the GROUP BY clause article.

Restricting aggregated values ​​using grouping conditions: the HAVING clause

If you need to specify conditions to limit the results, but the field to which you want to apply them is used in an aggregate function, you cannot use a WHERE clause. The HAVING clause should be used instead. The HAVING clause works the same as the WHERE clause, but is used for aggregated data.

For example, suppose that the AVG function (which calculates the average) is applied to the first field in the SELECT clause:

SELECT COUNT(), Company

If you want to limit query results based on the value of the COUNT function, you cannot apply a selection condition to this field in the WHERE clause. Instead, the condition should be placed in the HAVING clause. For example, if you want your query to return rows only if a company has multiple email addresses, you can use the following HAVING clause:

HAVING COUNT()>1

Note: A query can include both a WHERE clause and a HAVING clause, with selection conditions for fields that are not used in statistical functions specified in the WHERE clause, and conditions for fields that are used in statistical functions in the HAVING clause.

For more information about the HAVING clause, see the HAVING clause article.

Combining query results: UNION operator

The UNION operator is used to simultaneously view all the data returned by multiple similar select queries as a combined set.

The UNION operator allows you to combine two SELECT statements into one. The SELECT statements being merged must have the same number and order of output fields with the same or compatible data types. When a query is executed, the data from each set of matching fields is combined into a single output field, so the query output has as many fields as each individual SELECT statement.

Note: In join queries, numeric and text data types are compatible.

Using the UNION operator, you can specify whether duplicate rows, if any, should be included in the query results. To do this, use the ALL keyword.

A query to combine two SELECT statements has the following basic syntax:

SELECT field_1
FROM table_1
UNION
SELECT field_a
FROM table_a
;

For example, suppose you have two tables called "Products" and "Services". Both tables contain fields with the name of the product or service, price and warranty information, as well as a field that indicates the exclusivity of the product or service offered. Although the Products and Services tables provide different types of warranties, the basic information is the same (whether individual products or services are warranted). You can use the following join query to join four fields from two tables:

SELECT name, price, warranty_available, exclusive_offer
FROM Products
UNION ALL
SELECT name, price, guarantee_available, exclusive_offer
FROM Services
;

For more information about combining SELECT statements using the UNION operator, see