Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

1. Introduction. History of database development

2. Files and file systems

3. The first stage - databases on mainframe computers

4. The second stage - the era of personal computers

5. Third stage - distributed databases

6. The fourth stage - prospects for the development of systems

database management

7. Data types MySQL DBMS

a. Numeric types

b. Text data types

c. Date and Time Types

8. Prospects for the development of network databases

Bibliography

1. Introducedie. History of database development

In the history of computer technology, one can trace the development of two main areas of its use. The first area is the use of computer technology to perform numerical calculations that are too time-consuming or impossible to perform manually. The development of this area contributed to the intensification of methods for numerically solving complex mathematical problems, the emergence of programming languages ​​focused on convenient recording of numerical algorithms, the formation feedback with developers of new computer architectures. A characteristic feature of this area of ​​​​application of computer technology is the presence of complex processing algorithms that are applied to data that is simple in structure, the volume of which is relatively small.

The second area is the use of computer technology in automatic or automated information systems Oh. The information system is a hardware and software complex that provides the following functions:

reliable storage of information in computer memory;

execution of specific this application transformations of information and calculations;

providing users with a convenient and easy-to-learn interface.

Typically, such systems deal with large volumes of information with a fairly complex structure. Classic examples of information systems are banking systems, automated systems enterprise management, reservation systems for air and railway tickets, hotel rooms, etc.

The second area of ​​​​using computer technology arose somewhat later than the first. This is due to the fact that at the dawn of computing, the ability of computers to store information was very limited. Talk about reliable and long-term storage information is possible only if there are storage devices that retain information after turning off the power supply. The RAM (main) memory of computers usually does not have this property. The first computers used two types of devices external memory-- magnetic tapes and drums. The capacity of magnetic tapes was quite large, but by their physical nature they provided sequential access to data. Magnetic drums (they are closest to modern magnetic disks with fixed heads) made it possible to randomly access data, but had a limited amount of stored information.

These limitations were not too significant for purely numerical calculations. Even if a program must process (or produce) a large amount of information, when programming, it is possible to think about the location of this information in external memory (for example, on a serial magnetic tape), ensuring efficient execution of this program. However, in information systems a set of interrelated information objects actually reflects a model of real world objects. And the need of users for information that adequately reflects the state of real objects requires a relatively quick response of the system to their requests. And in this case, the availability of relatively slow data storage devices, which include magnetic tapes and drums, was insufficient.

It can be assumed that it was the requirements of non-numerical applications that caused the advent of removable magnetic disks with movable heads, which was a revolution in the history of computing. These external memory devices had a significantly larger capacity than magnetic drums, provided satisfactory speed of data access in random access mode, and the ability to change the disk package on the device made it possible to have an almost unlimited data archive.

With the advent of magnetic disks, the history of data management systems in external memory began. Previously, each application program that needed to store data in external memory itself determined the location of each piece of data on a magnetic tape or drum and performed exchanges between RAM and external memory devices using software and hardware low level(machine commands or calls to corresponding operating system programs). This mode of operation does not allow or makes it very difficult to maintain one external media several archives of long-term stored information. In addition, each application program had to solve problems of naming parts of data and structuring data in external memory.

2. Files and file systems

An important step in the development of information systems was the transition to the use of centralized file management systems. From the application program's point of view, a file is a named area of ​​external memory that can be written to and from which data can be read. The rules for naming files, how the data stored in a file is accessed, and the structure of that data depend on the particular file management system and possibly on the file type. The file management system takes care of allocating external memory, mapping file names to corresponding addresses in external memory, and providing access to data.

Such systems are sometimes called file systems. Despite the relative simplicity of organization, file systems have a number of disadvantages:

Data redundancy. File systems are characterized by significant redundancy, since the same data located in the file is often used to solve various management problems. different files. Due to duplication of data in different files, memory on external storage devices is used uneconomically; information of the same control object is distributed among many files. At the same time, it is quite difficult to imagine a general information model subject area.

Data inconsistency. Considering that the same information can be placed in different files, it is technologically difficult to monitor changes being made simultaneously in all files. Because of this, data inconsistency may occur, when the same field in different files may have different values.

Dependence of data structures and application programs. In file organization, the logical and physical structure of a file must match its description in the application program. Application program must be modified for any change in logical or physical structure file. Since changes to one program often require changes to other information-related programs, it is sometimes easier to create new program than making changes to the old one. Therefore, this disadvantage of file systems leads to a significant increase in maintenance costs software. Sometimes the cost of maintaining software can reach close to 70% of the cost of its development.

Users see the file as a linear sequence of records and can perform a series of actions on it standard operations:

create a file (of the required type and size);

write a new one to the file in place of the current record, add new entry to the end of the file.

In different file systems, these operations could be slightly different, but their general meaning was exactly the same. The main thing to note is that the file's record structure was known only to the program that worked with it; the file management system did not know it. And therefore, in order to extract some information from the file, it was necessary to know exactly the structure of the file record down to the bit. Each program working with a file had to have an internal data structure corresponding to the structure of this file. Therefore, when changing the file structure, it was necessary to change the structure of the program, and this required a new compilation, that is, the process of translating the program into executable machine codes. This situation was characterized as the dependence of programs on data. Information systems are characterized by the presence of a large number of different users (programs), each of which has its own specific algorithms for processing information stored in the same files. Changing the file structure, which was necessary for one program, required correction and recompilation and additional debugging of all other programs working with the same file. This was the first significant drawback of file systems, which was the impetus for the creation of new storage and information management systems.

To illustrate, let's look at the example given in the book: W. Davis, Operating Systems, M., Mir, 1980:

Several years ago, the Postal Department (with the best of intentions) decided that all addresses must include a postal code. In many data centers, this seemingly minor change has had dire consequences. Adding a new field containing six characters to the address meant that each program that used the data from this task had to make changes in accordance with the changed total field length. The fact that a program does not require knowledge to perform its functions postal code, was not taken into account: if some program contained a call to a new, longer record, then changes were made to such a program to provide additional memory space.

In the conditions of automated management of a centralized database, all such changes are associated with the functions of the database control program. Programs that do not use zip code values ​​do not need to be modified - they still send the same data elements in accordance with requests. In such cases, the change made is imperceptible. Only those programs that use the new data element need to be modified.”

Further, since file systems are shared storage of files belonging, generally speaking, to different users, file management systems must provide authorization for access to files. In general, the approach is that in relation to each registered user of a given computing system for each existing file actions that are allowed or prohibited are indicated to this user. Most modern file management systems use the file protection approach pioneered by UNIX. In this OS, each registered user is associated with a pair of integer identifiers: the identifier of the group to which this user belongs, and his own identifier in the group. For each file, the full identifier of the user who created this file is stored and it is recorded what actions its creator can perform with the file, what actions with the file are available to other users of the same group, and what users of other groups can do with the file. Administration of the access mode of a file is mainly carried out by its creator-owner. For many files reflecting the information model of one subject area, such a decentralized access control principle caused additional difficulties. And the lack of centralized methods for managing access to information was another reason for the development of a DBMS.

The next reason was the need to ensure efficient parallel work of many users with the same files. In general, file management systems have provided multi-user access. If the operating system supports multi-user mode, it is quite possible for two or more users to simultaneously try to work on the same file. If all users are only going to read the file, nothing bad will happen. But if at least one of them changes the file, for these users to work correctly, mutual synchronization of their actions in relation to the file is required.

File management systems have typically taken the following approach. In the operation of opening a file (the first and mandatory operation with which a session of working with a file should begin), among other parameters, the operating mode (reading or changing) was indicated. If by the time this operation was performed by some user process PR1, the file was already opened by another process PR2 in change mode, then, depending on the system features, process PR1 was either informed that it was impossible to open the file, or it was blocked until the operation was performed in process PR2 closing the file.

With this method of organization, the simultaneous work of several users associated with modifying data in a file was either not implemented at all or was very slow.

These shortcomings served as the impetus that forced information system developers to propose a new approach to information management. This approach was implemented within the framework of new software systems, later called Database Management Systems (DBMS), and the information repositories themselves, which worked under the control of these systems, were called databases or data banks (DB and BnD).

3. The first stage - databases on mainframe computers

The history of DBMS development goes back more than 30 years. In 1968, the first industrial DBMS system IMS from IBM was put into operation. In 1975, the first standard of the Association for Data Processing System Languages ​​appeared - the Conference of Data System Languages ​​(CODASYL), which defined a number of fundamental concepts in the theory of database systems, which are still fundamental for network data models.

IN further development A major contribution to database theory was made by the American mathematician E.F. Codd, who is the creator of the relational data model. In 1981, E.F. Codd received the prestigious Turing Award from the American Association for Computing Machinery for his creation of the relational model and relational algebra.

Less than two decades have passed since that moment, but the rapid development of computer technology, the change in its fundamental role in the life of society, the collapse of the personal computer boom and, finally, the emergence of powerful workstations and computer networks also influenced the development of database technology. There are four stages in the development of this direction in data processing. However, it should be noted that there are still no strict time restrictions in these stages: they smoothly transition from one to another and even coexist in parallel, but, nevertheless, highlighting these stages will make it possible to more clearly characterize the individual stages of development of database technology, highlighting the specific features for a specific stage.

The first stage of DBMS development is associated with the organization of databases on big cars type IBM 360/370, ES-computer and minicomputer type PDP11 (company Digital Equipment Corporation -- DEC), different models HP (Hewlett Packard).

The databases were stored in the external memory of the central computer; the users of these databases were tasks launched mainly in batch mode. The interactive access mode was provided using console terminals, which did not have their own computing resources (processor, external memory) and served only as input/output devices for the central computer. Database access programs were written in various languages ​​and ran like regular numerical programs. Powerful operating systems provided the ability to conditionally parallel execute the entire set of tasks. These systems could be classified as distributed access systems because the database was centralized, stored on external memory devices of one central computer, and access to it was supported by many task users.

The features of this stage of development are expressed as follows:

All DBMSs are based on powerful multi-program operating systems (MVS, SVM, RTE, OSRV, RSX, UNIX), so they mainly support working with a centralized database in distributed access mode.

Resource allocation management functions are primarily performed by the operating system (OS).

Low-level data manipulation languages ​​focused on navigational methods of data access are supported.

Data administration plays a significant role.

Serious work is being done to substantiate and formalize the relational data model, and the first system (System R) has been created to implement the ideology of the relational data model.

Theoretical work is being carried out to optimize queries and manage distributed access to a centralized database, and the concept of a transaction has been introduced.

The results of scientific research are openly discussed in the press, there is a strong flow of publicly available publications concerning all aspects of database theory and practice, and the results of theoretical research are actively being implemented in commercial DBMSs.

The first languages ​​appear high level for working with the relational data model. However, there are no standards for these first languages.

The second stage - the era of personal computers

Personal computers quickly burst into our lives and literally turned our understanding of the place and role of computing technology in the life of society. Now computers have become closer and more accessible to every user. The reverent fear of ordinary users of incomprehensible and complex programming languages ​​has disappeared. Many programs have appeared designed for untrained users. These programs were easy to use and intuitive: first of all, various text editors, spreadsheets and others. The operations of copying files and transferring information from one computer to another, printing texts, tables and other documents have become simple and clear. System programmers were relegated to the background. Each user could feel like a complete master of this powerful and convenient device, which allows automating many aspects of activity. And, of course, this also affected the work with databases. Programs appeared that were called database management systems and made it possible to store significant amounts of information; they had a convenient interface for filling out data, and built-in tools for generating various reports. These programs made it possible to automate many accounting functions that were previously carried out manually. The constant reduction in prices for personal computers has made them accessible not only to organizations and firms, but also to individual users. Computers have become a tool for record keeping and their own accounting functions. This all played both a positive and negative role in the field of database development. Apparent simplicity and accessibility personal computers and their software has spawned many amateurs. These developers, considering themselves experts, began to design short-lived databases that did not take into account many of the features of real-world objects. Many fly-by-night systems were created that did not comply with the laws of development and interconnection of real objects. However, the availability of personal computers has forced users from many areas of knowledge who had not previously used computer technology in their activities to turn to them. And the demand for developed convenient programs data processing forced software suppliers to supply more and more new systems, which are commonly called desktop DBMSs. Significant competition among suppliers forced them to improve these systems, offering new capabilities, improving the interface and performance of the systems, and reducing their cost. The presence on the market of a large number of DBMSs performing similar functions required the development of methods for exporting and importing data for these systems and the discovery of data storage formats.

But even during this period, amateurs appeared who, contrary to common sense, developed their own DBMSs using standard languages programming. This was a dead-end option, because further development showed that it was much more difficult to transfer data from non-standard formats to new DBMSs, and in some cases required such labor costs that it would have been easier to develop everything anew, but the data still had to be transferred to a new one. promising DBMS. And this was also the result of underestimating the functions that the DBMS was supposed to perform.

The features of this stage are as follows:

All DBMSs were designed to create databases primarily with exclusive access. And this is understandable. The computer was personal, it was not connected to the network, and the database on it was created for use by one user. In rare cases, the sequential work of several users was assumed, for example, first an operator who entered accounting documents, and then a chief accountant who determined the transactions corresponding to the primary documents.

Most DBMSs had a developed and convenient user interface. Most had an interactive mode of working with the database, both within the framework of describing the database and within the framework of query design. In addition, most DBMSs offered developed and convenient tools for developing ready-made applications without programming. The tool environment consisted of ready-made application elements in the form of templates for screen forms, reports, Labels, and graphical query designers, which could be quite easily assembled into a single complex.

All desktop DBMSs supported only the external level of presentation of the relational model, that is, only the external tabular view of data structures.

With the availability of high-level data manipulation languages ​​such as relational algebra and SQL, desktop DBMSs supported low-level data manipulation languages ​​at the level of individual table rows.

Desktop DBMSs lacked tools to support the referential and structural integrity of the database. These functions were supposed to be performed by applications, but the paucity of application development tools sometimes prevented this, in which case these functions had to be performed by the user, requiring additional control from the user when entering and changing information stored in the database.

The presence of an exclusive operating mode actually led to the degeneration of database administration functions and, in connection with this, to the absence tools database administration.

And finally, the last one and in currently a very positive feature is the relatively modest requirements for hardware from the desktop DBMS side. Quite functional applications developed, for example, on Clipper, worked on the PC 286.

In principle, they can hardly even be called full-fledged DBMSs. Prominent representatives of this family are the DBMS dBase (dBase III+, dBase IV), FoxPro, Clipper, Paradox, which were very widely used until recently.

Third floor AP - distributed databases

It is well known that history develops in a spiral, so after the process of “personalization” the reverse process began - integration. The number of local networks is increasing, more and more information is transferred between computers, the problem of consistency of data stored and processed in different places, but logically connected to each other, is becoming acute; problems arise related to the parallel processing of transactions - sequences of operations on the database, transferring it from one consistent state into another consistent state. The successful solution of these problems leads to the emergence of distributed databases that retain all the advantages of desktop DBMSs and at the same time allow for parallel processing of information and support for database integrity.

Features of this stage:

Almost all modern DBMSs provide support for the full relational model, namely:

structural integrity - only data presented in the form of relations of the relational model are valid;

linguistic integrity, that is, high-level data manipulation languages ​​(mainly SQL);

referential integrity - control over compliance with referential integrity throughout the entire operation of the system, and guarantees that the DBMS cannot violate these restrictions.

Most modern DBMSs are designed for multi-platform architecture, that is, they can run on computers with different architectures and under different operating conditions. operating systems, while users have access to data managed by the DBMS on different platforms practically indistinguishable.

The need to support multi-user work with the database and the possibility of decentralized data storage required the development of database administration tools with the implementation of the general concept of data protection tools.

The need for new implementations has led to the creation of serious theoretical works on optimizing the implementation of distributed databases and working with distributed transactions and queries with the implementation of the results obtained in commercial DBMSs.

In order not to lose customers who previously worked on desktop DBMSs, almost all modern DBMSs have tools for connecting client applications developed using desktop DBMSs, and tools for exporting data from desktop DBMS formats of the second stage of development.

This stage includes the development of a number of standards within the framework of data description and manipulation languages ​​(SQL89, SQL92, SQL99) and technologies for data exchange between various DBMSs, which include the ODBC (Open DataBase Connectivity) protocol proposed by Microsoft.

It is to this stage that we can attribute the beginning of work related to the concept of object-oriented databases - OODB. Representatives of the DBMS belonging to the second stage can be considered MS Access 97 and all modern servers databases Oga1e7.3, Oga1e 8.4, MS SQL 6.5, MS SQL 7.0, System 10, System 11, Informix, DB2, SQL Base and other modern database servers, of which there are currently several dozen.

The fourth stage - development prospectsdatabase management systems

This stage is characterized by the emergence of a new data access technology - the intranet. The main difference between this approach and client-server technology is that there is no need to use specialized client software. To work with a remote database, use a standard Internet browser, for example Microsoft Internet Explorer or Netscape Navigator, and for the end user the process of accessing data is similar to surfing the World Wide Web. At the same time, the code built into the HTML pages downloaded by the user, usually written in Java, Java-script, Perl and others, tracks all user actions and translates them into low-level SQL queries to the database, thus performing the work that In client-server technology, the client program does the work. The convenience of this approach led to the fact that it began to be used not only for remote access to databases, but also for users local network enterprises. Simple data processing tasks that are not associated with complex algorithms that require coordinated changes in data in many interconnected objects can be quite simply and efficiently built using this architecture. In this case, to connect a new user to the opportunity, use this task no additional client software required. However, it is recommended to implement algorithmically complex tasks in a client-server architecture with the development of special client software.

Each of the above approaches to working with data has its own advantages and disadvantages, which determine the scope of application of a particular method, and currently all approaches are widely used.

MySQL DBMS data types.

All types of data that the MySQL DBMS works with can be divided into three large groups: numeric, text and date-time. Let's look at these data types in order.

a. Numeric types

Numeric column types are used to store numbers, all numeric types can be divided into two subtypes to store exact numbers and floating point numbers. All numeric types are characterized by the length of the numbers they store, and floating-point types are also characterized by the number of decimal places. These values ​​are specified after the column type declaration, for example, FLOAT(10, 2). In the example, the length of the number is 10 characters and two places after the decimal separator. You can also end declarations of numeric types with the keywords ZEROFILL and/or USIGNED. The USIGNED keyword means that the column contains only positive numbers or zeros.

ZEROFILL - means that the number will be displayed with leading zeros.

NUMERIC or DECIMAL

These data types are identical, and DECIMAL can be shortened to DEC. These data types are used to store floating point numbers. They are usually used to store monetary values.

The INTEGER data type can be abbreviated to INT. It is simply an integer in a given range. This data type is stored in 4 bytes and can store numbers up to two to the thirty-second power. There are also several variants of the INTEGER type.

TINYINT - Storage size is one byte and accordingly stores numbers from 1 to 127 (one bit is a negative sign)

SMALLINT - Value range of two bytes

MEDIUMINT - Three bytes

BIGINT - The largest integer type with a range of eight bytes.

These are normal precision floating point numbers (4 bytes). They can represent numbers ranging from 1.18 over 10 to the minus 38th power to 3.4 over 10 to the thirty-eighth power.

Double precision floating point numbers (8 bytes) range of values ​​plus minus ten to the three hundred and eighth power (well, a lot).

b. Text data types

The CHAR type is used to store fixed-length strings. After the CHAR keyword, the length of the string is usually indicated, for example, CHAR(50); if the length of the string is not specified, then the length is considered to be one character. Maximum field length of this type equals 255 characters. If the number of characters transferred to the string is less than the specified length, then the string will be padded with spaces; if more, it will be truncated. Returning a value will remove spaces from the string.

The VARCHAR type is designed to store strings of variable length. Just like in the previous data type, VARCHAR is given a maximum string length, for example, VARCHAR(30) longer strings passed to this column will be truncated.

The difference between the two types described is that for lines with a fixed length, sampling is much faster. And if the speed of the database is important to you, then it is preferable to choose a fixed row type.

TEXT field types are used to store longer pieces of text than allowed by previous types. The abbreviation BLOB stands for binary large object. These two types are the same except that in the BLOB type, string comparisons are case-sensitive, and in the TEXT type, they are case-insensitive. Both types are variable length and both have some variations:

TINYTEXT and TINYBLOB - Can store up to 255 characters

TEXT and BLOB can store up to 64 kilobytes of information

MEDIUMTEXT and MEDIUMBLOB - up to 16 megabytes

LONGTEXT and LONGBLOB up to 4 gigabytes

This type allows you to list a set of possible values ​​to enter into a field and stores only one value from the list provided. For example, ENUM(`m`,`a`,`z`) if you do not specify which value is used in the default field, the first value in the list will be used.

This type is similar to the ENUM type, but allows you to store multiple values ​​from a list of values ​​in a field.

c. Date and time types.

The type is used to store dates in the format (yyyy-mm-dd)

Stores time as (hh:mm:ss)

A combination of the previous two types. The format is YYYY-MM-DD HH:MM:SS.

Column type when specified, if you do not specify a value, the current value of the time when the row was created or modified will be substituted, and the row value will be displayed in DATETIME format.

The field type contains the year value. There are two possible lengths: YEAR(2) and YEAR(4) for two and four digits of the year, respectively. It should be noted that with YEAR(2), the date range is taken from 1970 to 2069.

This concludes our consideration of the data types used in the MySQL DBMS. In the next article we will try to learn how to modify and delete tables, as well as optimize the operation of tables.

Prospects for the development of network databases

The term “next (or third) generation systems” came into use after a group of well-known database experts published the “Manifesto of Third Generation Database Systems.” Proponents of this direction adhere to the principle of evolutionary development of DBMS capabilities without radically disrupting previous approaches and maintaining continuity with previous generation systems.

Part of the requirements for next-generation systems simply means the need to implement long-known features that are missing in most current relational DBMSs (integrity constraints, triggers, database modification through views, etc.). The new requirements include the completeness of the type system supported in the DBMS; support for hierarchy and type inheritance; controllability complex objects etc.

One of the most famous third-generation DBMSs is the Postgres system, and the creator of this system, M. Stonebreker, is apparently the inspirer of the entire trend. Postgres implements many interesting features: the temporal model of data storage and access is supported and, in connection with this, the mechanism for logging changes, transaction rollbacks and database recovery after failures has been completely revised; a powerful integrity constraint mechanism is provided; non-normalized relationships are supported (work in this direction began in the Ingres environment), although in a rather strange way: a dynamically executed database query can be stored in the relationship field.

One property of the Postgres system brings it closer to object-oriented DBMSs. Postgres allows abstract, user-defined data types to be stored in relationship fields. This makes it possible to introduce a behavioral aspect into the database, i.e. solves the same problem as OODB, although, of course, the semantic capabilities of the Postgres data model are significantly weaker than those of object-oriented data models.

Although the classification of a DBMS into one class or another can currently only be done conditionally (for example, sometimes the object-oriented DBMS O2 is classified as a next-generation system), three trends in the field of next-generation DBMS can be noted. In order not to invent names, we will denote them by the names of the most typical DBMS.

1. Postgres direction. Main characteristic: maximum adherence (as far as possible, taking into account the new requirements) to the known principles of DBMS organization (except for the mentioned radical reworking of the external memory management system).

2. Exodus/Genesis direction. The main characteristic: the creation of not a system itself, but a generator of systems that most fully meet the needs of applications. The solution is achieved by creating sets of modules with standardized interfaces, and the idea extends down to the most basic layers of the system.

3. Starburst direction. Main characteristic: achieving system extensibility and adaptability to the needs of specific applications through the use of a standard rules management mechanism. In essence, the system is a certain interpreter of a system of rules and a set of action modules called in accordance with these rules. You can change sets of rules (there is a special language for specifying rules) or change actions by substituting other modules with the same interface.

In general, we can say that the next generation of DBMSs are direct descendants of relational systems.

Bibliography

1. Brown M., Honeycutt D. “HTML 3.2”, K., 2006

2. Vyukova N.I., Galatenko V.A., “Information security of database management systems”, DBMS No. 1 2001

3. Graber M., “SQL Reference Guide”, M., 2002

4. Data K. “Introduction to system databases”, M., 1999

5. Dunaev S.B. “Intranet technologies.”, M., 1997

6. Kirillov V.V. “Structured Query Language (SQL)”, M., 1997

7. Kuznetsov S.D. “Fundamentals of modern databases”, K., 1999

8. Kuznetsov S.D. “Safety and integrity or, Your own worst enemy is yourself,” St. Petersburg, 1998

9. Meyer M. “Theory of relational databases”, M., 2006

10. CNIT NSU. “Use of WWW technologies to access databases”, N., 1997

11. Shpenik M., Sledge O. et al. “Database Administrator's Guide Microsoft SQL Server 7.0”, M., 1999

12. "SQL Complete Guide" K., 2008

Similar documents

    Prerequisites for the appearance and history of the evolution of databases (DBs and DBMSs). Main types of development of database management systems. Features and Traits of Access. Creating and entering data into table cells. Sorting and filtering. Sample request, basic connections.

    presentation, added 12/01/2015

    Development trend of database management systems. Hierarchical and network models of DBMS. Basic requirements for a distributed database. Distributed query processing, interoperability. Data replication technology and multi-tier architecture.

    abstract, added 11/29/2010

    The terms "logical" and "physical" as a reflection of the differences in aspects of data presentation. Methods for accessing records in files. Structure of database management systems. Distinctive features data processing typical for file systems and DBMS.

    lecture, added 08/19/2013

    Basic concepts of database and database management systems. Types of data that databases work with Microsoft Access. Classification of DBMS and their main characteristics. Post-relational databases. Trends in the world of modern information systems.

    course work, added 01/28/2014

    General concept and signs of classification of information systems. Types of architectures for building information systems. Basic components and properties of a database. The main differences between file systems and database systems. Client-server architecture and its users.

    presentation, added 01/22/2016

    Logical data organization, file model. Network, hierarchical and relational data models. Database management systems, their definitions and basic concepts. History, development trends, classification of DBMS, properties and technology of use.

    thesis, added 07/26/2009

    Database management system as component automated data bank. Structure and functions of a database management system. Classification of DBMS according to the method of accessing the database. SQL language in database management systems, Microsoft DBMS.

    abstract, added 11/01/2009

    Reasons for the emergence of object DBMS. Basic principles of implementation of the concept of an objective-oriented approach, history and stages of its development. The most significant disadvantages of the relational data model and relational databases. Prospects for their development.

    course work, added 03/02/2014

    Two-dimensional file databases and relational database management systems (DBMS). Creating a database and processing queries to them using a DBMS. Main types of databases. Basic concepts of relational databases. Fundamental properties of relationships.

    abstract, added 12/20/2010

    Classification of databases according to the nature of the information being stored, the method of storing data and the structure of their organization. Modern systems database management and programs for their creation: Microsoft Office Access, Cronos Plus, Base Editor, My SQL.

The first stage - databases on mainframe computers. The first stage of DBMS development is associated with the organization of databases on large machines such as IBM 360/370, EC-computers and mini-computers such as PDP11 (Digital Equipment Corporation - DEC), various HP models (Hewlett Packard). The databases were stored in the external memory of the central computer; the users of these databases were tasks launched mainly in batch mode. The interactive access mode was provided using console terminals, which did not have their own computing resources (processor, external memory) and served only as input/output devices for the central computer.

The second stage - the era of personal computers. There are many programs designed for use by untrained users. These programs are easy to use and intuitive: first of all, they are various text editors, spreadsheets and others. Each user can automate many aspects of the activity. And, of course, this also affected the work with databases. Programs appeared that were called database management systems and made it possible to store significant amounts of information; they had a convenient interface for filling out data, and built-in tools for generating various reports. These programs made it possible to automate many accounting functions that were previously carried out manually. Computers have become a tool for record keeping and their own accounting functions. This all played both a positive and negative role in the field of database development.

Third stage - distributed databases data. It is well known that history develops in a spiral, so after the process of “personalization” the reverse process began - integration. The number of local networks is increasing, more and more information is transferred between computers, the problem of consistency of data stored and processed in different places, but logically connected to each other, is becoming acute; tasks arise related to the parallel processing of transactions - sequences of operations on a database, transferring it from one consistent state into another consistent state. The successful solution of these problems leads to the emergence of distributed databases that retain all the advantages of desktop DBMSs and at the same time allow for parallel processing of information and support for database integrity.

The fourth stage - prospects for the development of database management systems. This stage is characterized by the emergence of a new data access technology - the intranet. The main difference between this approach and client-server technology is that there is no need to use specialized client software. To work with a remote database, a standard Internet browser, for example Microsoft InternetExplorer, is used, and for the end user the process of accessing data is similar to using the Internet. At the same time, the code built into the HTML pages downloaded by the user, usually written in Java, Java-script, Perl and others, tracks all user actions and translates them into low-level SQL queries to the database, thus performing the work that In client-server technology, the client program does the work.

Basics of using a database

So let's start from the beginning. What is a database? Database is a collection of data organized in accordance with certain rules and maintained in computer memory, characterizing the current state of a certain subject area and used to satisfy the information needs of users (Information from Wikipedia)

Thus, the database includes:

    Interface for database management, called DBMS - Database Management System

    The actual data stored in a specific form

There are different types of databases. The main feature of classification is the principle of data storage.

    Hierarchical

  • Relational

    Object-oriented

    Object

    Object-relational

Files and file systems

An important step in the development of information systems was the transition to the use of centralized file management systems. From an application program's perspective, a file is a named area of ​​external memory that can be written to and read from. The rules for naming files, how you access the data stored in a file, and the structure of that data depend on the specific file management system and, possibly, the type of file. The file management system takes care of allocating external memory, mapping file names to corresponding addresses in external memory, and providing access to data.

We'll look at the specific file models used in a file management system later when we move on to the physical ways of organizing databases, but at this point we just need to know that users see a file as a linear sequence of records and can perform a number of standard operations on it:

    create a file (of the required type and size);

    write a new record to the file in place of the current record, add a new record to the end of the file.

In different file systems, these operations could be slightly different, but their general meaning was exactly the same. The main thing to note is that the file's record structure was known only to the program that worked with it; the file management system did not know it. And therefore, in order to extract some information from the file, it was necessary to know exactly the structure of the file record down to the bit. Each program working with a file had to have an internal data structure corresponding to the structure of this file. Therefore, when changing the file structure, it was necessary to change the structure of the program, and this required a new compilation, that is, the process of translating the program into executable machine codes. This situation was characterized as the dependence of programs on data. Information systems are characterized by the presence of a large number of different users (programs), each of which has its own specific algorithms for processing information stored in the same files. Changing the file structure, which was necessary for one program, required correction and recompilation and additional debugging of all other programs working with the same file. This was the first significant drawback of file systems, which was the impetus for the creation of new storage and information management systems.

Further, since file systems are shared storage of files belonging, generally speaking, to different users, file management systems must provide authorization for access to files. In general, the approach is that in relation to each registered user of a given computer system, for each existing file, actions that are allowed or prohibited for this user are indicated. Most modern file management systems use the file protection approach pioneered by UNIX. In this OS, each registered user has a pair of integer identifiers; ID of the group this user belongs to and his own ID in the group. For each file, the full identifier of the user who created this file is stored and it is recorded what actions its creator can perform with the file, what actions with the file are available to other users of the same group, and what users of other groups can do with the file. Administration of a file's access mode is primarily performed by its creator/owner. For many files reflecting the information model of one subject area, such a decentralized access control principle caused additional difficulties. And the lack of centralized methods for managing access to information was another reason for the development of a DBMS.

The next reason was the need to ensure efficient parallel work of many users with the same files. In general, file management systems provided multi-user access. If the operating system supports multi-user mode, it is quite possible for two or more users to simultaneously try to work with the same file. If all users are only going to read the file, nothing bad will happen. But if at least one of them changes the file, for these users to work correctly, mutual synchronization of their actions in relation to the file is required

File management systems have typically taken the following approach. In the operation of opening a file (the first and mandatory operation with which a session of working with a file should begin), among other parameters, the operating mode (reading or changing) was indicated. If, by the time some user process PR1 performed this operation, the file was already opened by another process PR2 in change mode, then, depending on the characteristics of the system, process PR1 was either informed that it was impossible to open the file, or it was blocked until the operation was performed in process PR2 closing the file.

With this method of organization, the simultaneous work of several users associated with modifying data in a file was either not implemented at all or was very slow.

These shortcomings served as the impetus that forced information system developers to propose a new approach to information management. This approach was implemented within the framework of new software systems, later called Database Management Systems (DBMS), and the information repositories themselves, which worked under the control of these systems, were called databases or data banks (DB and DB).

Question 3.Distributed databases (RDB)- a set of logically interconnected databases distributed on a computer network.

Basic principles

The RDB consists of a set of nodes connected by a communication network in which:

    each node is a full-fledged DBMS in itself;

    nodes interact with each other in such a way that a user of any of them can access any data on the network as if it were on his own node.

Each node is itself a database system. Any user can perform operations on data on his local node in the same way as if this node was not part of the distributed system at all. A distributed database system can be thought of as a partnership between separate local DBMSs on separate local nodes.

Fundamental principle for creating distributed databases (“Rule 0”): To the user, a distributed system should look the same as a non-distributed system.

A fundamental principle entails certain additional rules or purposes. There are only twelve such goals:

    Local independence. Nodes in a distributed system must be independent, or autonomous. Local independence means that all operations on a node are controlled by that node.

    Lack of support for the central node. Local independence implies that all nodes in a distributed system should be treated as equals. Therefore, there should not be any calls to the "central" or "master" node in order to obtain some centralized service.

    Continuous operation. Distributed systems should provide more high degree reliability and availability.

    Location independent. Users should not know where exactly the data is physically stored and should act as if all the data was stored on their own local node.

    Fragmentation independent. A system supports fragmentation independence if a given relation variable can be divided into parts or fragments when organizing its physical storage. In this case, data can be stored in the place where it is most often used, which allows localization of most operations and reduced network traffic.

    Replication independent. A system supports data replication if a given stored relation variable - or in general a given fragment of a given stored relation variable - can be represented by several separate copies or replicas that are stored on several separate nodes.

    Processing distributed requests. The point is that a request may need to contact multiple nodes. In such a system, there may be many possible ways to forward data to satisfy the request in question.

    Distributed transaction management. There are 2 main aspects of transaction management: recovery management and concurrency management. With regard to recovery management, to ensure the atomicity of a transaction in a distributed environment, the system must ensure that the entire set of agents related to a given transaction (an agent is a process that runs for a given transaction on a separate node) has either committed its results or performed a rollback. As for concurrency control, in most distributed systems it is based on a blocking mechanism, just like in non-distributed systems.

    Hardware independence. It is desirable to be able to run the same DBMS on different hardware platforms and, moreover, to ensure that different machines participate in the operation of a distributed system as equal partners.

    Operating system independent. Ability to operate the DBMS under various operating systems.

    Network independence. The ability to support many fundamentally different nodes, differing in hardware and operating systems, as well as a number of different types of communication networks.

    Independence from the type of DBMS. It is necessary that the DBMS instances on different nodes all support the same interface, and it is not at all necessary that these are copies of the same version of the DBMS.

Types of distributed databases

    Distributed Databases

    Multidatabases with global schema. A multidatabase system is a distributed system that serves as an external interface for access to multiple local DBMSs or is structured as a global level above local DBMSs.

    Federated databases. Unlike multibases, they do not have a global schema that all applications access. Instead, a local data import-export scheme is supported. Each node maintains a partial global schema that describes information from those remote sources whose data is necessary for operation.

    Multibases with a common access language - distributed management environments with client-server technology

Question 4.Database Design is a complex process of solving a number of problems associated with creating databases.

Main tasks of designing a DBMS

    Ensuring the ability to correctly obtain data for all requests;

    Ensuring that all necessary information is stored in the database; Reduce data redundancy and duplication;

    Ensure the integrity of all data in the database and eliminate its loss;

    The main stages in database design;

    Infological (Conceptual) design is the competent construction of a formalized model of the entire subject area. Such a model is created using standard language tools, most often graphic, such as ER diagrams. Such a model is created without any orientation towards a specific DBMS.

The main elements of this model:

    Description of all objects of the subject area and all connections between them;

    Description of all information needs of users, for example, description of the most basic database queries, etc.;

    Drawing up a complete description of document flow. Description of all documents that are used as source data for the database;

    Description of the main algorithmic dependencies that arise between data;

    Detailed description of integrity constraints. This includes requirements for all valid data values ​​and their relationships;

Types of design:

    Logical or datalogical design consists of mapping an information model onto any data model that is used in a specific DBMS. Relational DBMSs are characterized by a datalogical model, namely: a set of all tables indicating the main or key fields and all relationships between these tables. Datalogical design of any infological model, which is built in the form of ER diagrams, represents the construction of tables according to any specific formalized rules.

    Physical design of a DBMS is the process of implementing a data logical model using the tools of a specific DBMS, as well as choosing various solutions that are associated with the physical storage environment for all data.

Question 5.Relational Data Model (RDM)- logical data model, applied theory of database construction, which is an application to data processing problems of such branches of mathematics as set theory and first-order logic.

Relational databases are built on the relational data model.

The relational data model includes the following components:

    Structural aspect (component) - the data in the database is a set of relationships.

    An aspect (component) of integrity - relationships (tables) meet certain integrity conditions. RMD supports declarative integrity constraints at the domain (data type) level, relation level, and database level.

    Aspect (component) of processing (manipulation) - RMD supports relation manipulation operators (relational algebra, relational calculus).

In addition, the relational data model includes the theory of normalization.

The term "relational" means that the theory is based on the mathematical concept of relation. The word table is often used as an informal synonym for the term “relationship”. It must be remembered that “table” is a loose and informal concept and often does not mean “relationship” as an abstract concept, but a visual representation of the relationship on paper or screen. Incorrect and lax use of the term “table” instead of the term “relation” often leads to misunderstandings. Most common mistake consists in reasoning that RMD deals with “flat” or “two-dimensional” tables, whereas only visual representations of tables can be such. Relations are abstractions, and can be neither “flat” nor “non-flat”.

For a better understanding of RMD, three important circumstances should be noted:

    the model is logical, that is, the relationships are logical (abstract) rather than physical (stored) structures;

    For relational databases, the information principle is true: all the information content of the database is represented in one and only one way, namely by explicitly specifying attribute values ​​in relation tuples; in particular, there are no pointers (addresses) linking one value to another;

    The presence of relational algebra allows for declarative programming and declarative description of integrity constraints, in addition to navigational (procedural) programming and procedural condition checking.

The principles of the relational model were formulated in 1969-1970 by E. F. Codd. Codd's ideas were first publicly presented in the article "A Relational Model of Data for Large Shared Data Banks", which has become a classic.

A rigorous presentation of the theory of relational databases (relational data model) in the modern sense can be found in the book by K. J. Data. "C. J.Date. An Introduction to Database Systems" ("Date, K. J. Introduction to Database Systems").

The most well-known alternatives to the relational model are the hierarchical model and the network model. Some systems using these older architectures are still in use today. In addition, we can mention the object-oriented model on which the so-called object-oriented DBMSs are built, although there is no clear and generally accepted definition of such a model.

Question 6. SELECT operator.

The select command is used to retrieve data from a table. This command can be used to select data by rows or columns from one or more tables.

A query is a call to a database to obtain the resulting data. This process is also called data discovery. All SQL queries expressed through the selection operator (select). This operator can be used both to select records (rows) from one or more tables, and to build projections, i.e. selecting data for a certain subset of attributes (columns) from one or more tables.

SELECT- keyword, which tells the DBMS that this command is a query. All queries begin with this word followed by a space. It can be followed by a sampling method - with removal of duplicates (DISTINCT) or without removal (ALL, implied by default). This is followed by a comma-separated list of columns that the query selects from the tables, or the "*" (asterisk) character to select the entire row. Any columns not listed here will not be included in the resulting relation corresponding to the command execution. This, of course, does not mean that they will be deleted or their information will be erased from the tables, because the query does not affect the information in the tables - it only displays the data.

Question 7. Mathematical functions

Every DBMS must have a set of built-in functions for processing standard data types. In MySQL, for built-in functions, there should be no spaces between the name and the opening bracket, otherwise there will be a message about the absence of such a function in the database. In some DBMSs, like Oracle, if a function has no arguments, then the parentheses can be omitted.

abs(x) - absolute value;

ceil(x) - the smallest integer that is not less than the argument;

exp(x) - exponent;

floor(x) - the largest integer that is not greater than the argument;

ln(x) - natural logarithm;

power(x, y) - raises x to the power y;

round(x [,y]) - rounds x to y digits to the right of the decimal point. By default, y is 0;

sign(x) - returns -1 for negative values ​​of x and 1 for positive ones;

sqrt(x) - square root;

trunc(x [,y]) - truncates x to y decimal places. If y is 0 (the default), then x is truncated to an integer. If y is less than 0, the digits to the left of the decimal point are discarded.

Trigonometric functions work with radians:

acos(x) - arc cosine;

asin(x) - arcsine;

atan(x) - arctangent;

cos(x) - cosine;

sin(x) - sine;

tan(x) - tangent.

ceil(fraction) – rounds a fraction to the nearest larger integer.

floor(fraction) – rounds a fraction to the nearest smaller integer.

number_format("number", "decimals", "decimal point", "thousands_sep") – returns a formatted version of the specified number ("number").

pow(number,exponent) – returns the result of raising the given numbernumber to the exponent power.

rand(min,max) – generates a random number from a given range.

round(fraction) – rounds a fraction to the nearest whole number.

sqrt(number) – returns the square root of the given number.

Question 8. Advantages and disadvantages of MySQL.

Disadvantages of MySQL

MySQL is indeed a very fast server, but to achieve this, the developers had to sacrifice some of the requirements for relational DBMSs.

There is no support for foreign keys.

MySQL Advantages:

the best data processing speed for up to 500,000 records;

free open licenses;

ease of use;

support by most hosting companies;

possibility of use on various platforms (Unix, Windows, etc.);

Question 9 Decomposition of a flat table.

The meaning of decomposition is as follows. A flat table (a large table that brings together all the data to solve a problem with a high degree of repeatability of data) is converted into a collection of interconnected individual tables.

    the number of entities (objects) described by the flat table is determined.

    the fields of a flat table are divided between tables (object relations) corresponding to objects (entities);

    a field (set of fields) is defined that is used as a key for the connection between individual tables. Sometimes special tables (linked relationships) can be used for this purpose.

    none of the fields should, in all respects, contain groups of values. N

    If in some fields the data is repeated too often, you can create additional tables (relations) that play the role of reference books.

    The above sequence of actions are the steps of normalization - a method of organizing a relational database in order to reduce redundancy.

10 Question.Commands for creating databases, tables and indexes

create database if not exists – Create a database

create table if not exists tovar (ID int unsigned not null auto_increment primary key,

tovar_name char (100) not null ,

tovar_mark char (100) not null,

Cena int not null ,

data_buy date default curdate() ,

family char (100) not null); creating a table

creating an index on the au_id column of the authors table

create index au_id_ind

History of the emergence and development of databases

In the history of computer technology, one can trace the development of two main areas of its use:

    First area- use of computer technology to perform numerical calculations, complex processing algorithms using algorithmic languages, but they all deal with simple data structures, the volume of which is small.

    Second area- is the use of computer technology in automatic or automated information systems. The information system is a hardware and software complex that provides the following functions:

Reliable storage of information in computer memory;

Perform application-specific information transformations and calculations;

Providing users with a convenient and easy-to-learn interface.

An important step in the development of information systems was the transition to the use of centralized file management systems.

File - This named area external memory into which data can be written to and read from.

Rules for naming files, how data stored in a file is accessed, and the structure of that data depend on the specific file management systems and possibly depending on the file type. File management system takes over the allocation of external memory, display file names to corresponding addresses in external memory and providing access to data. The user has a number of standard operations:

    create a file (of the required type and size);

    write a new one to the file in place of the current one, add a new one write to the end of the file.

The file's recording structure was known only to the program that worked with it. Each program, working with a file, had to have an internal data structure corresponding to the structure of this file. Therefore, when changing the file structure, it was necessary to change the program structure, and this required a new compilation. That is, this means program dependency on data. Information systems are used by many users simultaneously. When you change the file structure, you must change the programs of all users. And it incurs additional development costs.

This was the first significant drawback of file systems, which was the impetus for the creation of new storage and information management systems.

Since files are shared data storage, the file management system must provide access authorization to files. For each existing file, actions that are allowed or prohibited for this user are indicated. Each registered user has a pair of integer identifiers: group id, to which this applies user, and his own identifier in the group. For each file, a complete

identifier the user who created this file, and record what actions are available to him and available to other users in the group.

Administration The mode of access to a file is mainly carried out by its creator-owner. For sets files reflecting the information model of one subject area, such a decentralized access control principle caused additional difficulties. Absence centralized methods information access control was another reason for the development DBMS.

The simultaneous work of several users in multi-user operating systems, associated with modification of data in a file, was either not implemented at all or was very slow.

All these shortcomings contributed to the development of a new approach to information management. This approach was implemented in DBMS(data management systems).

History of development DBMS dates back more than 30 years. In 1968, the first industrial plant was put into operation DBMS system IMS companies IBM. In 1975, the first association standard for languages ​​of data processing systems appeared - Conference of Data System Languages ​​(CODASYL), which defined a number of fundamental concepts in the theory of database systems, which are still fundamental to network model data. A major contribution to the further development of database theory was made by the American mathematician E. F. Codd, who is the creator of the relational data model. In 1981, E. F. Codd received for creating the relational model and relational algebra the prestigious Turing Award from the American Association for Computing Machinery.

The development of computer technology also influenced the development of database technology. There are four stages in the development of this direction in data processing.

First stage development DBMS associated with the organization of databases on large machines like IBM 360/370, ES-computer and mini-computer PDP11 type (company Digital Equipment Corporation - DEC), different HP models (Hewlett Packard).

Database were stored in the external memory of the central computer, the users of these databases were tasks launched mainly in batch mode. Interactive mode access was provided using console terminals, which did not have their own computing resources (processor, external memory) and served only as input/output devices for the central computer. Access programs DB were written in various languages ​​and ran like regular numerical programs.

The features of this stage of development are expressed as follows:

    All DBMSs are based on powerful multiprogram operating systems ( MVS, SVM, RTE, OSRV, RSX, UNIX), therefore, it mainly supports working with a centralized database in distributed access mode.

    Resource allocation management functions are primarily performed by the operating system (OS).

    Supported low level languages data manipulation, focused on navigational methods of accessing data.

    Data administration plays a significant role.

    Serious work is being done to substantiate and formalize the relational data model, and the first system (System R) has been created to implement the ideology of the relational data model.

    Theoretical work is being carried out on query optimization and managing distributed access to a centralized database, the concept of a transaction was introduced.

    The results of scientific research are openly discussed in the press, there is a strong flow of publicly available publications concerning all aspects of database theory and practice, and the results of theoretical research are actively being implemented in commercial DBMSs.

    The first high-level languages ​​for working with the relational data model appear. However, there are no standards for these first languages.

Stage 2- This is the stage of development of personal computers.

The features of this stage are as follows:

    All DBMSs were designed to create databases primarily with exclusive access.

    Most DBMSs had a developed and convenient user interface. Most existed interactive mode working with the database both within the framework of describing the database and within the framework of query design. In addition, most DBMSs offered developed and convenient tools for developing ready-made applications without programming (based on ready-made templates forms, query builders).

    All DBMSs supported only the external level of presentation of the relational model, that is, only the external tabular view of data structures.

    If there are high-level languages ​​for manipulating data like relational algebra and SQL in desktop DBMSs supported low-level languages ​​for data manipulation at the level of individual table rows.

    Desktop DBMSs lacked tools to support the referential and structural integrity of the database. These functions had to be performed by applications.

    The presence of an exclusive operating mode actually led to the degeneration of database administration functions and, in connection with this, to the absence of database administration tools.

    relatively modest hardware requirements on the part of desktop DBMSs.

Representatives of this family are the DBMS Dbase (DbaseIII+, DbaseIV), FoxPro, Clipper, Paradox, which were very widely used until recently.

Stage 3 - distributed databases(transition from personalization to integration)

Features of this stage:

    Almost all modern DBMSs provide support for the full relational model, namely:

    About structural integrity - only data presented in the form of relations of the relational model are valid;

    About linguistic integrity, that is, high-level data manipulation languages ​​(mainly SQL);

    About referential integrity, monitoring compliance with referential integrity throughout the entire operation of the system, and guarantees that the DBMS cannot violate these restrictions.

    Most modern DBMSs are designed for multi-platform architecture, that is, they can run on computers with different architectures and under different operating systems.

    The need to support multi-user work with the database and the possibility of decentralized data storage required the development of database administration tools with the implementation of the general concept of data protection tools.

    Creation of theoretical works on optimizing the implementation of distributed databases and working with distributed transactions and queries with the implementation of the results obtained in commercial DBMSs.

    In order not to lose customers who previously worked on desktop DBMSs, almost all modern DBMSs have tools for connecting client applications developed using desktop DBMSs, and tools for exporting data from desktop DBMS formats of the second stage of development.

    Development of standards for languages ​​for describing and manipulating data SQL89, SQL92, SQL99 and technologies for data exchange between various DBMSs.

    Development of the concept of object-oriented databases - OODB. Representatives of the DBMS belonging to the second stage can be considered MS Access 97 and all modern database servers Oracle7.3,Oracle 8.4 MS SQL6.5, MS SQL7.0, System 10, System 11, Informix, DB2, SQL Base and other modern database servers, of which there are currently several dozen.

Stage 4 characterized by the emergence of new data access technology - intranet.

The main difference between this approach and technology client-server is that there is no need to use specialized client software. To work with a remote database, use standard browser.

At the same time, the code built into the HTML pages loaded by the user, usually written in Java, Java-script, Perl and others, tracks all user actions and translates them into low-level SQL queries to the database, thus performing the work that technology client-server deals with client program. Complex tasks are implemented in the architecture " client-server"with the development of special client software.

In the history of computer technology, one can trace the development of two main areas of its use. The first area is the use of computer technology to perform numerical calculations that are too time-consuming or impossible to perform manually.

The development of this area contributed to the intensification of methods for numerically solving complex mathematical problems, the emergence of programming languages ​​focused on convenient record numerical algorithms, establishing feedback with developers of new computer architectures. A characteristic feature of this area of ​​​​application of computer technology is the presence of complex processing algorithms that are applied to data that is simple in structure, the volume of which is relatively small.

The second area is the use of computer technology in automatic or automated information systems.

The information system is a hardware and software complex that provides the following functions:

    reliable storage of information in computer memory;

    performing application-specific information transformations and calculations;

    providing users with a convenient and easy-to-learn interface.

Typically, such systems deal with large volumes of information with a fairly complex structure. Classic examples of information systems are banking systems, automated enterprise management systems, reservation systems for airline or train tickets, hotel reservations, etc.

With the advent of magnetic disks, the history of data management systems in external memory began. Before this, each applied program, which needed to store data in external memory, itself determined the location of each piece of data on a magnetic tape or drum and performed exchanges between RAM and external memory devices using low-level hardware and software (machine commands or calls to the corresponding operating system programs). This mode of operation does not allow or makes it very difficult to maintain several archives of long-term stored information on one external storage device. In addition, each application program had to solve problems of naming parts of data and structuring data in external memory.

Files and file systems

An important step in the development of information systems was the transition to the use of centralized file management systems. From an application program point of view, file - This named area external memory into which data can be written to and read from. Rules for naming files, how data stored in a file is accessed, and the structure of that data depend on the specific file management systems and possibly depending on the file type. File management system takes over the allocation of external memory, display file names to corresponding addresses in external memory and providing access to data.

We'll look at the specific file models used in a file management system later when we move on to physical ways of organizing databases, at which point we just need to know what users see file as a linear sequence of records and can perform a number of standard operations on it:

    create a file (of the required type and size);

    write a new one to the file in place of the current one, add a new one write to the end of the file.

On different file systems these operations may have differed slightly, but their general meaning was exactly the same. The main thing to note is that the file's recording structure was known only to the program that worked with it, file management system didn't know her. And therefore, in order to extract some information from the file, it was necessary to know exactly the structure of the file record down to the bit. Each program, working with a file, had to have an internal data structure corresponding to the structure of this file. Therefore, when changing the file structure, it was necessary to change the structure of the program, and this required a new compilation, that is, the process of translating the program into executable machine codes. This situation was characterized as the dependence of programs on data.

Since file systems are shared storage of files belonging, generally speaking, to different users, file management systems must provide access authorization to files. In general, the approach is that in relation to each registered user of a given computer system, for each existing file, actions that are allowed or prohibited for this user are indicated. Most modern file management systems use the file protection approach pioneered by the OS UNIX. In this OS, each registered user has a pair of integer identifiers: group id, to which this applies user, and his own identifier in Group. For each file, a complete identifier the user who created this file, and records what actions the creator can perform with the file, what actions with the file are available to other users of the same group, and what users of other groups can do with the file.

Administration The mode of access to a file is mainly carried out by its creator-owner. For sets files reflecting the information model of one subject area, such a decentralized access control principle caused additional difficulties. And absence centralized methods information access control was another reason for the development DBMS.

The next reason was the need to ensure efficient parallel work of many users with the same files. In general file management systems provided multi-user access mode. If operating system supports multiplayer mode, it is quite possible that two or more users are trying to work with the same file at the same time. If all users are only going to read file, nothing bad will happen. But if at least one of them cheats file, for the correct operation of these users, mutual synchronization of their actions in relation to the file is required.

The first stage - databases on mainframe computers

History of development DBMS dates back more than 30 years. In 1968, the first industrial plant was put into operation DBMS system IMS companies IBM. In 1975, the first association standard for languages ​​of data processing systems appeared - Conference of Data System Languages ​​(CODASYL), which defined a number of fundamental concepts in the theory of database systems, which are still fundamental to network data model.

A major contribution to the further development of database theory was made by the American mathematician E. F. Codd, who is the creator of the relational data model. In 1981, E. F. Codd received for creating the relational model and relational algebra the prestigious Turing Award from the American Association for Computing Machinery.

First stage of development DBMS associated with the organization of databases on large machines like IBM 360/370, ES-computer and mini-computer PDP11 type (company Digital Equipment Corporation - DEC), different HP models (Hewlett Packard).

Database were stored in the external memory of the central computer, the users of these databases were tasks launched mainly in batch mode. Interactive mode access was provided using console terminals, which did not have their own computing resources (processor, external memory) and served only as input/output devices for the central computer. Access programs DB were written in various languages ​​and ran like regular numerical programs. Powerful operating systems made it possible to execute everything conditionally in parallel. sets tasks. These systems could be classified as distributed access systems because database was centralized, stored on external memory devices of one central computer, and access it was supported by many task users.

The features of this stage of development are expressed as follows:

    All DBMSs are based on powerful multiprogram operating systems ( MVS, SVM, RTE, OSRV, RSX, UNIX), therefore, it mainly supports working with a centralized database in distributed access mode.

    Resource allocation management functions are primarily performed by the operating system (OS).

    Supported low level languages data manipulation, focused on navigational methods of accessing data.

    Data administration plays a significant role.

    Serious work is being done to substantiate and formalize the relational data model, and the first system (System R) has been created to implement the ideology of the relational data model.

    Theoretical work is being carried out on query optimization and managing distributed access to a centralized database, the concept of a transaction was introduced.

    The results of scientific research are openly discussed in the press, there is a strong flow of publicly available publications concerning all aspects of database theory and practice, and the results of theoretical research are actively being implemented in commercial DBMSs.

The first high-level languages ​​for working with the relational data model appear. However, there are no standards for these first languages.

The era of personal computers

The features of this stage are as follows:

    All DBMSs were designed to create databases primarily with exclusive access. And this is understandable. The computer was personal, it was not connected to the network, and the database on it was created for use by one user. In rare cases, the sequential work of several users was assumed, for example, first an operator who entered accounting documents, and then a chief accountant who determined the transactions corresponding to the primary documents.

    Most DBMSs had a developed and convenient user interface. Most existed interactive mode working with the database both within the framework of describing the database and within the framework of query design. In addition, most DBMSs offered developed and convenient tools for developing ready-made applications without programming. The tool environment consisted of ready-made application elements in the form of templates for screen forms, reports, Labels, and graphical query designers, which could be quite easily assembled into a single complex.

    All desktop DBMSs supported only the external level of presentation of the relational model, that is, only the external tabular view of data structures.

    If there are high-level languages ​​for manipulating data like relational algebra and SQL in desktop DBMSs supported low-level languages ​​for data manipulation at the level of individual table rows.

    Desktop DBMSs lacked tools to support the referential and structural integrity of the database. These functions were supposed to be performed by applications, but the paucity of application development tools sometimes prevented this, in which case these functions had to be performed by the user, requiring additional control from the user when entering and changing information stored in the database.

    The presence of an exclusive operating mode actually led to the degeneration of database administration functions and, in connection with this, to the absence of database administration tools.

    And finally, the last and currently very positive feature is the relatively modest hardware requirements of desktop DBMSs. Quite functional applications developed, for example, on Clipper, worked on the PC 286.

    In principle, they can hardly even be called full-fledged DBMSs. Prominent representatives of this family are the DBMS Dbase (DbaseIII+, DbaseIV), FoxPro, Clipper, Paradox, which were very widely used until recently.

Distributed Databases

After the "personalization" process began back process - integration. The number of local networks is increasing, more and more information is being transferred between computers, the problem of consistency of data stored and processed in different places, but logically connected to each other, is becoming acute; problems associated with parallel processing arise transactions - sequences of operations on DB, transferring it from one consistent state to another consistent state. Successful solution of these problems leads to the emergence distributed databases, preserving all the advantages of desktop DBMS and at the same time allowing for parallel processing of information and integrity support DB.

Features of this stage:

    Almost all modern DBMSs provide support for the full relational model, namely:

    • About structural integrity - only data presented in the form of relations of the relational model are valid;

      About linguistic integrity, that is, high-level data manipulation languages ​​(mainly SQL);

      About referential integrity, monitoring compliance with referential integrity throughout the entire operation of the system, and guarantees that the DBMS cannot violate these restrictions.

    Most modern DBMSs are designed for multi-platform architecture, that is, they can run on computers with different architectures and under different operating systems, while for users access to data managed by the DBMS on different platforms is practically indistinguishable.

    The need to support multi-user work with the database and the possibility of decentralized data storage required the development of database administration tools with the implementation of the general concept of data protection tools.

    The need for new implementations has led to the creation of serious theoretical works on optimizing distributed database implementations and working with distributed transactions and queries with the implementation of the results obtained in commercial DBMSs.

    In order not to lose customers who previously worked on desktop DBMSs, almost all modern DBMSs have tools for connecting client applications developed using desktop DBMSs, and tools for exporting data from desktop DBMS formats of the second stage of development.

    It is this stage that includes the development of a number of standards within the framework of languages ​​for describing and manipulating data, starting with SQL89, SQL92, SQL99 and technologies for exchanging data between various DBMSs, which include the ODBC (Open DataBase Connectivity) protocol proposed by Microsoft.

    It is to this stage that we can attribute the beginning of work related to the concept of object-oriented databases - OODB. Representatives of the DBMS belonging to the second stage can be considered MS Access 97 and all modern database servers Oracle7.3,Oracle 8.4 MS SQL6.5, MS SQL7.0, System 10, System 11, Informix, DB2, SQL Base and other modern database servers, of which there are currently several dozen.

Prospects for the development of database management systems

This stage is characterized by the emergence of a new data access technology - intranet. The main difference between this approach and technology client-server is that there is no need to use specialized client software. To work with a remote database, use the standard browser Internet, such as Microsoft Internet Explorer or Netscape Navigator, and for the end user the process of accessing data is similar to surfing the World Wide Web. At the same time, the code built into the HTML pages loaded by the user, usually written in Java, Java-script, Perl and others, tracks all user actions and translates them into low-level SQL queries to the database, thus performing the work that technology client-server deals with client program. The convenience of this approach led to the fact that it began to be used not only for remote access to databases, but also for users of the enterprise local network. Simple data processing tasks that are not associated with complex algorithms that require coordinated changes in data in many interconnected objects can be quite simply and efficiently built using this architecture. In this case, installing additional client software is not required to enable a new user to use this task. However, algorithmically complex tasks are recommended to be implemented in the architecture " client-server"with the development of special client software.

Rice. 1.1. Interaction with the database in intranet technology

Each of the above approaches to working with data has its own advantages and disadvantages, which determine the scope of application of a particular method, and currently all approaches are widely used.

Control questions

    Find similarities between the first and fourth stages of development.

    Find the differences between the first and third stages of development.

    When using file systems for parallel user access, if you create copies of files for each user, can this speed up parallel work with information?

The history of database development can be divided into four periods.

1. Period of formation – early 60s - early 70s. During this period, the term “database” itself appeared and several initial systems were created. The basis for the emergence of databases was the proposal in the late 50s to use files to store source data. The main requirement for such file systems– be a shared data store. Subsequently, it became obvious that shared data must have specific properties, in particular: data independence, absence of duplication and inconsistency, control of data access rights, effective data access techniques, and many others.

Awareness of these facts, as well as the advent of large computers with magnetic disks as data carriers led to the emergence in the mid-60s. the first database management systems, of which the most developed was the IMS system from IMB, which supported a hierarchical data structure. Bachman developed the first industrial database system, IDS, in 1963. The IDS system supported the network organization of data on magnetic media.

The CODASYL Association, the body that developed the Cobol programming language, organized a database working group in 1967. This group summarized the language specifications for database systems in 1969 and 1971. issued corresponding reports, which, based on the name of the working group (DataBaseTaskGroup), were named DBTG69, DBTG71. The approach chosen by the Working Group was based on the network data structure and methods of navigating it developed in the IDS system, but the network data model in the DBTG reports received significant development and justification.

A typical system that supports the DBTGCODASYL offerings is the Integrated Database Management System (IDMS) from Cullinet Software, Inc., designed for use on mainstream IBM machines running most operating systems.

During the same period, two approaches regarding the problem of closedness of database systems clearly crystallized. Closed-loop systems characterized by the fact that they do not contain traditional programming languages, but have non-procedural query languages. The main goal in this case is to create a system that could be used by a non-programming specialist. Such systems included TDMS and UL/1.

Systems with included languages In addition to the database manipulation languages ​​themselves, they also provide language and tools for developing applications using existing programming languages. This principle, in particular, was professed by the DBTG.

At the end of this period the term appeared information management system(MIS). At that time, MIS was understood as a database system focused on data retrieval and providing the ability to work from a remote terminal.

Development period – 70s. The concept of databases is spreading widely due to the increasing performance of computer hardware. Systems that support hierarchical and network data structures are being successfully implemented.

Throughout this period, the work of DBTG CODASYL continued. A language system for CODASYL databases was specified, which included the following groups of language specifications:

    Language descriptions data YaOD(Data Definition Language - DDL). Represents a description of a conceptual schema in terms of a network data structure.

    Cobol Database Tools. It is a means of providing an interface for the Cobol language with a database described in DDL. Cobol includes data manipulation language tools.

    Fortran Database Tools. It is a means of providing a Fortran language interface with a database described in DDL. Cobol includes data manipulation language tools.

    End User Tools. Defines the user interface when such a user manages the database described in the DDL.

    Language descriptions storage dataYaOHD(Data Stored Definition Language - DSDL). Represents a language that maps the conceptual schema described in DDL to an internal schema.

In 1975, the report of the ANSI/X3/SPARCA working group of the American National Standards Institute appeared, which was a significant milestone in the development of database issues. The group was tasked with exploring to what extent it is advisable to raise the issue of standardizing databases and DBMSs and what exactly can be subject to standardization. The group came to the conclusion that if the question of standardization is raised, then only with regard to the interfaces that may exist between various DBMS components; the software components themselves cannot in any case be subject to standardization. In this regard, they directed their subsequent efforts to identifying such interfaces and, in the end, came to the formulation of a three-tier database architecture, which became a classic and has not lost its relevance to this day.

However, this period is largely characterized by the emergence of the relational data model, proposed in 1970 by E.F., an employee of the IBM Institute in San Jose. Codd, comprehensive studies of theoretical and applied issues of this model, development of experimental relational DBMS. Theoretical research ultimately led to the creation of a formal theory of databases, which until then had been descriptive in nature. Over the years, many leading forms have conducted experimental research to create prototypes of relational DBMSs and improve their efficiency and functionality. At the end of the 70s. The first industrial relational DBMSs appear.

Maturity period – 80s. The relational model received full theoretical justification. Large relational DBMSs Oracle, Informix, and others have been developed. Industrial relational systems are becoming widespread in all areas of human activity. Relational systems have practically replaced early hierarchical and network-type DBMSs from the world market.

Further development of relational DBMSs went in the following directions:

Ease of use. The advent of personal computers made the issue of ease of use of programs a fundamental issue, which also applied to the DBMS. Throughout this period, the external interface for user interaction with databases has been intensively developed.

Versatility. Initially, databases were developed for storing and processing symbolic information and were traditionally used in such areas as economic information processing, statistics, banking, reservation systems, and information systems in various fields. The emergence of demand for databases in non-traditional areas of their application, design automation systems, publishing and others, required storing and processing images, sounds, and full-text information in databases.

This period is also characterized by theoretical and experimental research in the field of knowledge bases. Numerous expert systems using knowledge bases are being developed. In the vast majority of cases, knowledge bases are developed on the basis of relational DBMSs.

Post-racial period - from the beginning of the 90s. During this period, intensive research began on deductive and object-oriented databases, as well as the development of research prototypes of such systems.

A special place in the development of object-oriented DBMS issues is occupied by the activities of the object database management group ODMG (ObjectDataManagementGroup), a non-profit consortium of object database manufacturers and other organizations interested in developing standards for storing objects in databases. ODMG was created in 1991. In 1993, the group released its first standard, ODMG-93. An improved version of this standard was published in 1995.

In connection with the development of Internet technologies, great efforts are being made to implement databases on the Internet. Various approaches are emerging to include DBMSs with their databases on the World Wide Web, ranging from the simplest “publications” of databases on the Internet to the development of web database servers that are able to provide a full range of services to Internet users for using databases on the server.

Finally, research and development on the representation and manipulation of data structures on the Internet is intensively developing.

Databases and knowledge 9