Artificial intelligence, neural networks, machine learning- What do all these currently popular concepts actually mean? For most uninitiated people, which I myself am, they always seemed like something fantastic, but in fact their essence lies on the surface. I have long had the idea of ​​writing in simple language about artificial neural networks. Find out for yourself and tell others what this technology is, how it works, consider its history and prospects. In this article, I tried not to get into the weeds, but to simply and popularly talk about this promising direction in the world of high technology.


Artificial intelligence, neural networks, machine learning - what do all these currently popular concepts actually mean? For most uninitiated people, which I myself am, they always seemed like something fantastic, but in fact their essence lies on the surface. I have long had the idea of ​​writing in simple language about artificial neural networks. Find out for yourself and tell others what this technology is, how it works, consider its history and prospects. In this article, I tried not to get into the weeds, but to simply and popularly talk about this promising direction in the world of high technology.

A little history

For the first time, the concept of artificial neural networks (ANN) arose in an attempt to simulate brain processes. The first major breakthrough in this area can be considered the creation of the McCulloch-Pitts neural network model in 1943. Scientists have developed a model of an artificial neuron for the first time. They also proposed the design of a network of these elements to perform logical operations. But most importantly, scientists have proven that such a network is capable of learning.

The next important step was the development by Donald Hebb of the first algorithm for calculating ANN in 1949, which became fundamental for several subsequent decades. In 1958, Frank Rosenblatt developed the parceptron, a system that imitates brain processes. At one time, the technology had no analogues and is still fundamental in neural networks. In 1986, almost simultaneously, independently of each other, American and Soviet scientists significantly improved the fundamental method of training a multilayer perceptron. In 2007, neural networks experienced a rebirth. British computer scientist Geoffrey Hinton first developed a deep learning algorithm for multilayer neural networks, which is now, for example, used to operate self-driving cars.

Briefly about the main thing

In the general sense of the word, neural networks are mathematical models that work on the principle of networks of nerve cells in an animal organism. ANNs can be implemented in both programmable and hardware solutions. To make things easier to understand, a neuron can be thought of as a cell that has many input holes and one output hole. How multiple incoming signals are formed into an output signal is determined by the calculation algorithm. Effective values ​​are supplied to each neuron input, which are then distributed along interneuron connections (synopses). Synapses have one parameter - weight, due to which the input information changes when moving from one neuron to another. The easiest way to imagine the principle of operation of neural networks is by mixing colors. Blue, green and red neuron have different weights. The information of the neuron whose weight is greater will be dominant in the next neuron.

The neural network itself is a system of many such neurons (processors). Individually, these processors are quite simple (much simpler than a personal computer processor), but when connected into a larger system, neurons are capable of performing very complex tasks.

Depending on the area of ​​application, a neural network can be interpreted in different ways. For example, from the point of view of machine learning, an ANN is a pattern recognition method. From a mathematical point of view, this is a multi-parameter problem. From the point of view of cybernetics - a model of adaptive control of robotics. For artificial intelligence, ANN is a fundamental component for modeling natural intelligence using computational algorithms.

The main advantage of neural networks over conventional computing algorithms is their ability to learn. In the general sense of the word, learning is about finding the correct coupling coefficients between neurons, as well as summarizing data and identifying complex dependencies between input and output signals. In fact, successful training of a neural network means that the system will be able to identify the correct result based on data that is not in the training set.

Current situation

And no matter how promising this technology may be, ANNs are still very far from the capabilities of the human brain and thinking. However, neural networks are already used in many areas of human activity. So far, they are not capable of making highly intelligent decisions, but they are able to replace a person where he was previously needed. Among the many areas of application of ANN, we can note: the creation of self-learning production process systems, unmanned vehicles, image recognition systems, intelligent security systems, robotics, quality monitoring systems, voice interaction interfaces, analytics systems and much more. This widespread use of neural networks is, among other things, due to the emergence of in various ways accelerating ANN training.

Today, the market for neural networks is huge - billions and billions of dollars. As practice shows, most neural network technologies around the world differ little from each other. However, the use of neural networks is a very expensive activity, which in most cases can only be afforded by large companies. The development, training and testing of neural networks requires large computing power, and it is obvious that large players in the IT market have plenty of this. Among the main companies leading developments in this area are the Google DeepMind division, the Microsoft Research division, IBM, Facebook and Baidu.

Of course, all this is good: neural networks are developing, the market is growing, but so far the main task still not resolved. Humanity has failed to create a technology even approaching the capabilities of the human brain. Let's look at the main differences between the human brain and artificial neural networks.

Why are neural networks still far from the human brain?

The most important difference, which radically changes the principle and efficiency of the system, is different gear signals in artificial neural networks and in a biological network of neurons. The fact is that in an ANN, neurons transmit values ​​that are real values, that is, numbers. In the human brain, impulses are transmitted with a fixed amplitude, and these impulses are almost instantaneous. This leads to a number of advantages of the human network of neurons.

First, the communication lines in the brain are much more efficient and economical than those in the ANN. Secondly, pulse circuit ensures simplicity of technology implementation: it is enough to use analog circuits instead of complex computing mechanisms. Ultimately, pulsed networks are immune to audio interference. Real numbers are subject to noise, which increases the likelihood of errors.

Bottom line

Of course, in the last decade there has been a real boom in the development of neural networks. This is primarily due to the fact that the ANN training process has become much faster and easier. So-called “pre-trained” neural networks have also begun to be actively developed, which can significantly speed up the process of introducing technology. And if it is too early to say whether neural networks will one day be able to fully reproduce the capabilities of the human brain, the likelihood that in the next decade ANNs will be able to replace humans in a quarter of existing professions is increasingly becoming true.

For those who want to know more

  • The Great Neural War: What Google is Really Up to
  • How cognitive computers could change our future

If you follow news from the world of science and technology, you have probably heard something about the concept of neural networks.

For example, in 2016, neural Google network AlphaGo beat one of the best professional Counter-Strike: Global Offensive players in the world with a score of 4-1. YouTube also announced that they will be using neural networks to better understand their videos.

But what is a neural network? How it works? And why are they so popular in machine processing?

Computer as a brain

Modern neuroscientists often discuss the brain as a type of computer. Neural networks are aiming to do the opposite: build a computer that functions like a brain.

Of course, we only have a superficial understanding of the brain's extremely complex functions, but by creating simplified simulations of how the brain processes data, we can build a type of computer that functions very differently from a standard one.

Computer processors process data sequentially (“in order”). They perform many operations on a set of data, one at a time. Parallel processing (“processing multiple threads at the same time”) significantly speeds up a computer by using multiple processors in series.

In the figure below, the parallel processing example requires five different processors:

An artificial neural network (so-called to distinguish it from real neural networks in the brain) has a fundamentally different structure. It's very interconnected. This allows you to process data very quickly, learn from that data, and update your own internal structure to improve performance.

However high degree interconnectedness has some striking implications. For example, neural networks are very good at recognizing unclear data structures.

Learning ability

A neural network's ability to learn is its greatest strength. In a standard computing architecture, a programmer must design an algorithm that tells the computer what to do with incoming data to ensure that the computer produces the correct answer.

The answer to I/O can be as simple as "when the A key is pressed", "A is displayed on the screen", or more complex than performing complex statistics. On the other hand, neural networks do not require the same algorithms. Through learning mechanisms, they can essentially develop their own algorithms. Machine algorithms to make sure they work correctly.

It is important to note that since neural networks are programs written on machines that use standard Hardware for sequential processing, current technology still imposes limitations. Actually creating a hardware version of a neural network is a completely different problem.

From neurons to nodes

Now that we've laid the foundation for how neural networks work, we can start looking at some of the specifics. The basic structure of an artificial neural network looks like this:


Each of the circles is called a “node” and simulates a single neuron. On the left are the input nodes, in the middle are the hidden nodes, and on the right are the output nodes.

In the most basic terms, input nodes accept input values, which can be binary 1s or 0s, part of the value RGB colors, the status of a chess piece or anything else. These nodes represent information entering the network.

Each input node is connected to several hidden nodes (sometimes to every hidden node, sometimes to a subset). Input nodes take the information they are given and pass it along to the hidden layer.

For example, an input node might send a signal ("fire" in neuroscience parlance) if it receives a 1, and remain dormant if it receives a zero. Each hidden node has a threshold: if all its summed inputs reach certain value, it works.

From synapses to connections

Each connection, equivalent to an anatomical synapse, also has a certain weight, which allows the network to pay more attention to the action of a particular node. Here's an example:


As you can see, the weight of connection "B" is higher than that of connection "A" and "C". Let's say the hidden node "4" will fire only if it receives a total input of "2" or more. This means that if "1" or "3" fires individually, then "4" will not fire, but "1" and "3" together will trigger the node. Node "2" can also initiate the node itself via connection "B".

Let's take the weather as a practical example. Let's say you're designing a simple neural network to determine whether there should be a winter storm warning.

Using the connections and weights above, node 4 can only trigger if the temperature is below -18 C and the wind is above 48 km/s, or it will trigger if the chance of snow is greater than 70 percent. Temperatures will be fed to node 1, winds to node 3, and the probability of snow to node 2. Now node 4 can take all of this into account when determining what signal to send to the output layer.

Better than simple logic

Of course, this function could simply be implemented using simple AND/OR gates. But more complex neural networks, like the ones below, are capable of much more complex operations.


The output layer nodes function in the same way as the hidden layer: the output nodes sum the inputs from the hidden layer, and if they reach a certain value, the output nodes trigger and send specific signals. At the end of the process, the output layer will send a set of signals that indicates the result of the input.

While the network shown above is simple, deep neural networks can have many hidden layers and hundreds of nodes.


Bug fix

This process is still relatively simple. But where neural networks are really needed is in learning. Most neural networks use a backpropagation process that sends signals back through the network.

Before developers deploy a neural network, they run it through a training phase in which it receives a set of inputs with known outputs. For example, a programmer can teach a neural network to recognize images. The input could be a picture of a car, and the correct output would be the word "car".

The programmer provides an image as input and sees what comes out of the output nodes. If the network responds with "airplane", the programmer tells the computer that it is incorrect.

The network then makes adjustments to its own connections, changing the weight of different links between nodes. This action is based on a special learning algorithm added to the network. The network continues to adjust the connection weights until it produces the correct output.

This is a simplification, but neural networks can learn very complex operations using similar principles.

Continuous improvement

Even after training, backpropagation (training) continues - and this is where neural networks get really, really cool. They continue to learn as they are used, integrating new information and making changes to the weights of various compounds, becoming increasingly effective at the task for which they are designed.

It can be as simple as pattern recognition or as complex as playing CS:GO.

Thus, neural networks are constantly changing and improving. And this can have unexpected consequences, leading to networks that prioritize things that a programmer wouldn't consider a priority.

In addition to the process described above, which is called supervised learning, there is another method: unsupervised learning.

In this situation, neural networks take input data and try to recreate it exactly as their output, using backpropagation to update their connections. This may sound like a futile exercise, but this is how networks learn to extract useful features and generalize these features to improve your models.

Questions of depth

Backpropagation is very effective method teach neural networks... when they consist of only a few layers. As the number of hidden layers increases, the efficiency of backpropagation decreases. This is a problem for deep networks. Using backpropagation, they are often no more efficient than simple networks.

Scientists have developed a number of solutions to this problem, the specifics of which are quite complex and beyond the scope of this introductory part. What many of these solutions attempt to do in layman's terms is reduce the complexity of the network by teaching it to "compress" data.


To do this, the network learns to extract fewer identifying features from the input data, ultimately becoming more efficient in its computations. Essentially, the network makes generalizations and abstractions, much in the same way that humans learn.

After this training, the network can prune nodes and connections that it deems less important. This makes the network more efficient and learning easier.

Neural Network Applications

In this way, neural networks model how the brain learns using multiple layers of nodes—input, hidden, and output—and they can learn in both supervised and unsupervised situations. Complex networks are able to make abstractions and generalizations, which makes them more efficient and more capable of learning.

What can we use these exciting systems for?

In theory, we can use neural networks for almost anything. And you've probably used them without realizing it. They are very common in speech and visual recognition, for example, because they can learn to pick out certain features, something common in sounds or images.

So when you say, "OK Google," your iPhone runs your speech through a neural network to understand what you're saying. Perhaps there is another neural network that learns to predict what you are likely to ask for.

Self-driving cars can use neural networks to process visual data, thereby following road rules and avoiding collisions. Robots of all types can benefit from neural networks that help them learn to perform tasks efficiently. Computers can learn to play games like chess or CS:GO. If you've ever interacted with a chatbot, chances are it uses a neural network to suggest appropriate responses.

Internet search can greatly benefit from neural networks, as the highly efficient parallel processing model can quickly generate a lot of data. The neural network can also learn your habits to personalize your search results or predict what you're going to search for in the near future. This predictive model will obviously be very valuable to marketers (and anyone who needs to predict complex human behavior).

Pattern recognition, optical image recognition, stock market forecasting, route finding, big data processing, medical cost analysis, sales forecasting, artificial intelligence in video games, the possibilities are almost endless. Neural networks' ability to learn patterns, make generalizations, and successfully predict behavior makes them valuable in countless situations.

The future of neural networks

Neural networks have advanced from very simple models to high-level training simulations. They are on our phones, tablets and many of the web services we use. There are many other machine learning systems.

But neural networks, because of their similarities (in a very simplified form) to the human brain, are among the most fascinating. While we continue to develop and improve the models, we cannot say what they are capable of.

Do you know of any interesting applications of neural networks? Do you have experience working with them yourself? What excites you most about this technology? Share your thoughts in the comments below!

Good afternoon, my name is Natalia Efremova, and I am a research scientist at NtechLab. Today I will talk about the types of neural networks and their applications.

First, I will say a few words about our company. The company is new, maybe many of you don’t yet know what we do. Last year we won the MegaFace competition. This is an international facial recognition competition. In the same year, our company was opened, that is, we have been on the market for about a year, even a little more. Accordingly, we are one of the leading companies in facial recognition and biometric image processing.

The first part of my report will be directed to those who are unfamiliar with neural networks. I am directly involved in deep learning. I have been working in this field for more than 10 years. Although it appeared a little less than a decade ago, there used to be some rudiments of neural networks that were similar to the deep learning system.

Over the past 10 years, deep learning and computer vision have developed at an incredible pace. Everything that has been done that is significant in this area has happened in the last 6 years.

I will talk about practical aspects: where, when, what to use in terms of deep learning for image and video processing, for image and face recognition, since I work in a company that does this. I’ll tell you a little about emotion recognition and what approaches are used in games and robotics. I will also talk about the non-standard application of deep learning, something that is just emerging from scientific institutions and is still little used in practice, how it can be applied, and why it is difficult to apply.

The report will consist of two parts. Since most are familiar with neural networks, first I will quickly cover how neural networks work, what biological neural networks are, why it is important for us to know how it works, what artificial neural networks are, and what architectures are used in what areas.

I apologize right away, I will skip a little to English terminology, because I don’t even know most of what it is called in Russian. Perhaps you too.

So, the first part of the report will be devoted to convolutional neural networks. I will tell you how convolutional neural network (CNN) and image recognition work using an example from facial recognition. I’ll tell you a little about recurrent neural networks (RNN) and reinforcement learning using the example of deep learning systems.

As a non-standard application of neural networks, I will talk about how CNN works in medicine to recognize voxel images, how neural networks are used to recognize poverty in Africa.

What are neural networks

The prototype for creating neural networks was, oddly enough, biological neural networks. Many of you may know how to program a neural network, but where it came from, I think some do not know. Two-thirds of all sensory information that comes to us comes from the visual organs of perception. More than one-third of the surface of our brain is occupied by the two most important visual areas - the dorsal visual pathway and the ventral visual pathway.

The dorsal visual pathway begins in the primary visual zone, at our crown, and continues upward, while the ventral pathway begins at the back of our head and ends approximately behind the ears. All the important pattern recognition that happens to us, everything that carries meaning that we are aware of, takes place right there, behind the ears.

Why is it important? Because it is often necessary to understand neural networks. Firstly, everyone talks about this, and I’m already used to this happening, and secondly, the fact is that all the areas that are used in neural networks for image recognition came to us precisely from the ventral visual pathway, where each a small zone is responsible for its strictly defined function.

The image comes to us from the retina, passes through a series of visual zones and ends in the temporal zone.

In the distant 60s of the last century, when the study of the visual areas of the brain was just beginning, the first experiments were carried out on animals, because there was no fMRI. The brain was studied using electrodes implanted into various visual areas.

The first visual area was studied by David Hubel and Thorsten Wiesel in 1962. They conducted experiments on cats. The cats were shown various moving objects. What the brain cells responded to was the stimulus that the animal recognized. Even now many experiments are carried out in these draconian ways. But nevertheless, this is the most effective way to find out what every small cell in our brain is doing.

In the same way, many more important properties of the visual areas were discovered, which we use in deep learning now. One of the most important properties is the increase in the receptive fields of our cells as we move from the primary visual areas to the temporal lobes, that is, the later visual areas. The receptive field is that part of the image that every cell of our brain processes. Each cell has its own receptive field. The same property is preserved in neural networks, as you probably all know.

Also, as receptive fields increase, so do the complex stimuli that neural networks typically recognize.

Here you see examples of the complexity of stimuli, the different two-dimensional shapes that are recognized in areas V2, V4 and various parts of the temporal fields in macaque monkeys. A number of MRI experiments are also being carried out.

Here you can see how such experiments are carried out. This is a 1 nanometer part of the monkey's IT cortex zones when recognizing various objects. Where it is recognized is highlighted.

Let's sum it up. An important property that we want to adopt from the visual areas is that the size of the receptive fields increases, and the complexity of the objects that we recognize increases.

Computer vision

Before we learned to apply this to computer vision, in general, it didn’t exist as such. In any case, it did not work as well as it works now.

We transfer all these properties to the neural network, and now it’s working, if you don’t include a small digression on the datasets, which I’ll talk about later.

But first, a little about the simplest perceptron. It is also formed in the image and likeness of our brain. The simplest element resembling a brain cell is a neuron. Has input elements that by default are arranged from left to right, occasionally from bottom to top. On the left are the input parts of the neuron, on the right are the output parts of the neuron.

The simplest perceptron is capable of performing only the simplest operations. In order to perform more complex calculations, we need a structure with more hidden layers.

In the case of computer vision, we need even more hidden layers. And only then will the system meaningfully recognize what it sees.

So, I will tell you what happens during image recognition using the example of faces.

For us to look at this picture and say that it shows exactly the face of the statue is quite simple. However, before 2010, this was an incredibly difficult task for computer vision. Those who have dealt with this issue before this time probably know how difficult it was to describe the object that we want to find in the picture without words.

We needed to do this in some geometric way, describe the object, describe the relationships of the object, how these parts can relate to each other, then find this image on the object, compare them and get what we recognized poorly. It was usually a little better than flipping a coin. Slightly better than chance level.

This is not how it works now. We divide our image either into pixels or into certain patches: 2x2, 3x3, 5x5, 11x11 pixels - as is convenient for the creators of the system in which they serve as the input layer to the neural network.

Signals from these input layers are transmitted from layer to layer using synapses, each layer having its own specific coefficients. So we pass from layer to layer, from layer to layer, until we get that we have recognized the face.

Conventionally, all these parts can be divided into three classes, we will denote them X, W and Y, where X is our input image, Y is a set of labels, and we need to get our weights. How do we calculate W?

Given our X and Y, this seems simple. However, what is indicated by an asterisk is a very complex nonlinear operation, which, unfortunately, does not have an inverse. Even with 2 given components of the equation, it is very difficult to calculate it. Therefore, we need to gradually, by trial and error, by selecting the weight W, make sure that the error decreases as much as possible, preferably so that it becomes equal to zero.

This process occurs iteratively, we constantly reduce until we find the value of weight W that suits us sufficiently.

By the way, not a single neural network that I worked with achieved an error equal to zero, but it worked quite well.

This is the first network to win the international ImageNet competition in 2012. This is the so-called AlexNet. This is the network that first declared itself that convolutional neural networks exist, and since then convolutional neural networks have never given up their positions in all international competitions.

Despite the fact that this network is quite small (it has only 7 hidden layers), it contains 650 thousand neurons with 60 million parameters. In order to iteratively learn to find the necessary weights, we need a lot of examples.

The neural network learns from the example of a picture and a label. Just as we are taught in childhood “this is a cat, and this is a dog,” neural networks are trained on a large number of pictures. But the fact is that until 2010 there was no large enough data set that could teach such a number of parameters to recognize images.

The largest databases that existed before this time were PASCAL VOC, which had only 20 object categories, and Caltech 101, which was developed at the California Institute of Technology. The last one had 101 categories, and that was a lot. Those who were unable to find their objects in any of these databases had to cost their databases, which, I will say, is terribly painful.

However, in 2010, the ImageNet database appeared, which contained 15 million images, divided into 22 thousand categories. This solved our problem of training neural networks. Now everyone who has an academic address can easily go to the base’s website, request access and receive this base for training their neural networks. They respond quite quickly, in my opinion, the next day.

Compared to previous data sets, this is a very large database.

The example shows how insignificant everything that came before it was. Simultaneously with the ImageNet base, the ImageNet competition appeared, an international challenge in which all teams wishing to compete can take part.

This year the winning network was created in China, it had 269 layers. I don’t know how many parameters there are, I suspect there are also a lot.

Deep neural network architecture

Conventionally, it can be divided into 2 parts: those who study and those who do not study.

Black indicates those parts that do not learn; all other layers are capable of learning. There are many definitions of what is inside each convolutional layer. One of the accepted notations is that one layer with three components is divided into convolution stage, detector stage and pooling stage.

I won’t go into details; there will be many more reports that will discuss in detail how this works. I'll tell you with an example.

Since the organizers asked me not to mention many formulas, I threw them out completely.

So, the input image falls into a network of layers, which can be called filters different sizes and the varying complexity of the elements they recognize. These filters make up their own index or set of features, which then goes into the classifier. Usually this is either SVM or MLP - multilayer perceptron, whichever is convenient for you.

In the same way as a biological neural network, objects of varying complexity are recognized. As the number of layers increased, it all lost contact with the cortex, since there is a limited number of zones in the neural network. 269 ​​or many, many zones of abstraction, so only an increase in complexity, number of elements and receptive fields is maintained.

If we look at the example of face recognition, then our receptive field of the first layer will be small, then a little larger, larger, and so on until finally we can recognize the entire face.

From the point of view of what is inside our filters, first there will be inclined sticks plus a little color, then parts of faces, and then entire faces will be recognized by each cell of the layer.

There are people who claim that a person always recognizes better than a network. Is it so?

In 2014, scientists decided to test how well we recognize in comparison with neural networks. They took the 2 best ones this moment networks - this is AlexNet and the network of Matthew Ziller and Fergus, and compared with the response of different areas of the brain of a macaque, which was also taught to recognize some objects. The objects were from the animal world so that the monkey would not get confused, and experiments were conducted to see who could recognize better.

Since it is impossible to get a clear response from the monkey, electrodes were implanted into it and the response of each neuron was directly measured.

It turned out that under normal conditions, brain cells responded as well as the state of the art model at that time, that is, Matthew Ziller’s network.

However, with an increase in the speed of displaying objects and an increase in the amount of noise and objects in the image, the recognition speed and quality of our brain and the brain of primates drops significantly. Even the simplest convolutional neural network can recognize objects better. That is, officially neural networks work better than our brains.

Classic problems of convolutional neural networks

There are actually not many of them; they belong to three classes. Among them are tasks such as object identification, semantic segmentation, face recognition, human body part recognition, semantic edge detection, highlighting objects of attention in an image and highlighting surface normals. They can be roughly divided into 3 levels: from the lowest-level tasks to the highest-level tasks.

Using this image as an example, let's look at what each task does.

  • Defining boundaries- This is the lowest-level task for which convolutional neural networks are already classically used.
  • Determining the vector to the normal allows us to reconstruct a three-dimensional image from a two-dimensional one.
  • Saliency, identifying objects of attention- this is what a person would pay attention to when looking at this picture.
  • Semantic segmentation allows you to divide objects into classes according to their structure, without knowing anything about these objects, that is, even before they are recognized.
  • Semantic boundary highlighting- this is the selection of boundaries divided into classes.
  • Highlighting human body parts.
  • And the highest level task is recognition of the objects themselves, which we will now consider using the example of facial recognition.

Face recognition

The first thing we do is run the face detector over the image in order to find a face. Next, we normalize, center the face and run it for processing in a neural network. After which we obtain a set or vector of features that uniquely describes the features of this face.

Then we can compare this feature vector with all the feature vectors that are stored in our database, and get a reference to a specific person, to his name, to his profile - everything that we can store in the database.

This is exactly how our FindFace product works - it free service, which helps you search for people’s profiles in the VKontakte database.

In addition, we have an API for companies who want to try our products. We provide services for face detection, verification and user identification.

We have now developed 2 scenarios. The first is identification, searching for a person in a database. The second is verification, this is a comparison of two images with a certain probability that this is the same person. In addition, we are currently developing emotion recognition, image recognition on video and liveness detection - this is an understanding of whether the person in front of the camera or a photograph is alive.

Some statistics. When identifying, when searching through 10 thousand photos, we have an accuracy of about 95%, depending on the quality of the database, and a 99% accuracy of verification. And besides this this algorithm very resistant to changes - we don’t have to look at the camera, we can have some obstructing objects: glasses, sunglasses, a beard, a medical mask. In some cases, we can even overcome the incredible challenges for computer vision, such as glasses and a mask.

Very fast search, takes 0.5 seconds to process 1 billion photos. We have developed a unique index quick search. We can also work with images Low quality, received from CCTV cameras. We can process all this in real time. You can upload photos via the web interface, via Android, iOS and search through 100 million users and their 250 million photos.

As I already said, we took first place in the MegaFace competition - an analogue for ImageNet, but for face recognition. It has been running for several years, last year we were the best among 100 teams from around the world, including Google.

Recurrent neural networks

We use Recurrent neural networks when it is not enough for us to recognize only an image. In cases where it is important for us to maintain consistency, we need the order of what is happening, we use ordinary recurrent neural networks.

This is used for recognition natural language, for video processing, even used for image recognition.

I won’t talk about natural language recognition - after my report there will be two more that will be aimed at natural language recognition. Therefore, I will talk about the work of recurrent networks using the example of emotion recognition.

What are recurrent neural networks? This is approximately the same as ordinary neural networks, but with feedback. We need feedback to transmit to the input of the neural network or to some of its layers previous state systems.

Let's say we process emotions. Even in a smile - one of the simplest emotions - there are several moments: from a neutral facial expression to the moment when we have a full smile. They follow each other sequentially. To understand this well, we need to be able to observe how this happens, to convey what was in the previous frame in next step system operation.

In 2005, at the Emotion Recognition in the Wild competition, a team from Montreal presented a recurrent system specifically for recognizing emotions, which looked very simple. It only had a few convolutional layers and worked exclusively with video. This year they also added audio recognition and aggregated frame-by-frame data obtained from convolutional neural networks, audio signal data with the operation of a recurrent neural network (with state return) and received first place in the competition.

Reinforcement learning

The next type of neural networks, which has been used very often lately, but has not received as much publicity as the previous 2 types, is deep reinforcement learning.

The fact is that in the previous two cases we use databases. We either have data from faces, or data from pictures, or data with emotions from videos. If we don’t have this, if we can’t film it, how can we teach a robot to pick up objects? We do this automatically - we don't know how it works. Another example: compiling large databases in computer games complicated, and not necessary, it can be done much simpler.

Everyone has probably heard about the success of deep reinforcement learning in Atari and Go.

Who has heard of Atari? Well, someone heard, okay. I think everyone has heard about AlphaGo, so I won’t even tell you what exactly happens there.

What's going on at Atari? The architecture of this neural network is shown on the left. She learns by playing with herself in order to get the maximum reward. The maximum reward is the fastest possible outcome of the game with the highest possible score.

At the top right is the last layer of the neural network, which depicts the entire number of states of the system, which played against itself for just two hours. Desirable outcomes of the game with the maximum reward are depicted in red, and undesirable ones are depicted in blue. The network builds a certain field and moves through its trained layers to the state it wants to achieve.

In robotics the situation is a little different. Why? Here we have several difficulties. Firstly, we don't have many databases. Secondly, we need to coordinate three systems at once: the perception of the robot, its actions with the help of manipulators and its memory - what was done in the previous step and how it was done. In general, this is all very difficult.

The fact is that not a single neural network, even deep learning at the moment, can cope with this task effectively enough, so deep learning is only a piece of what robots need to do. For example, Sergei Levin recently provided a system that teaches a robot to grab objects.

Here are the experiments he conducted on his 14 robotic arms.

What's going on here? In these basins that you see in front of you, there are various objects: pens, erasers, smaller and larger mugs, rags, different textures, different hardness. It is unclear how to teach a robot to capture them. For many hours, and even, like, weeks, the robots trained to be able to grab these objects, and databases were compiled about this.

Databases are a kind of environmental response that we need to accumulate in order to be able to train the robot to do something in the future. In the future, robots will learn from this set of system states.

Non-standard applications of neural networks

Unfortunately, this is the end, I don’t have much time. I will tell you about those non-standard solutions that currently exist and which, according to many forecasts, will have some application in the future.

Well, Stanford scientists recently came up with a very unusual application of a CNN neural network to predict poverty. What did they do?

The concept is actually very simple. The fact is that in Africa the level of poverty goes beyond all imaginable and inconceivable limits. They don't even have the ability to collect social demographic data. Therefore, since 2005, we have no data at all about what is happening there.

Scientists collected day and night maps from satellites and fed them to a neural network over a period of time.

The neural network was pre-configured on ImageNet. That is, the first layers of filters were configured so that it could recognize some very simple things, for example, roofs of houses, to search for settlements on daytime maps. Then the daytime maps were compared with the nighttime maps illumination of the same area of ​​the surface in order to say how much money the population has to at least illuminate their houses during the night.

Here you see the results of the forecast built by the neural network. The forecast was made at different resolutions. And you see - the very last frame - real data collected by the Ugandan government in 2005.

You can see that the neural network made a fairly accurate forecast, even with a slight shift since 2005.

Of course there were side effects. Scientists who engage in deep learning are always surprised to discover various side effects. For example, like the fact that the network has learned to recognize water, forests, large construction sites, roads - all this without teachers, without pre-built databases. In general, completely independently. There were certain layers that reacted, for example, to roads.

And the last application I would like to talk about is semantic segmentation of 3D images in medicine. In general, medical imaging is a complex field that is very difficult to work with.

There are several reasons for this.

  • We have very few databases. It’s not so easy to find a picture of a brain, what’s more, a damaged one, and it’s also impossible to take it from anywhere.
  • Even if we have such a picture, we need to take a medic and force him to manually place all the multi-layered images, which is very time-consuming and extremely inefficient. Not all doctors have the resources to do this.
  • Very high precision is required. The medical system cannot make mistakes. When recognizing, for example, cats were not recognized - no big deal. And if we did not recognize the tumor, then this is no longer very good. The requirements for system reliability are particularly stringent here.
  • Images are in three-dimensional elements - voxels, not pixels, which brings additional complexity to system developers.
But how did this issue get around in this case? CNN was dual-stream. One part processed a more normal resolution, the other a slightly worse resolution in order to reduce the number of layers that we need to train. Due to this, the time required to train the network was slightly reduced.

Where it is used: identifying damage after an impact, to look for a tumor in the brain, in cardiology to determine how the heart works.

Here is an example for determining the volume of the placenta.

Automatically it works well, but not well enough to be released into production, so it's just getting started. There are several startups to create such medical vision systems. In general, there will be a lot of startups in deep learning in the near future. They say that venture capitalists have allocated more budget to deep learning startups in the last six months than in the past 5 years.

This area is actively developing, there are many interesting directions. We live in interesting times. If you are involved in deep learning, then it’s probably time for you to open your own startup.

Well, I'll probably wrap it up here. Thank you very much.

This article contains materials - mostly in Russian - for a basic study of artificial neural networks.

An artificial neural network, or ANN, is a mathematical model, as well as its software or hardware embodiment, built on the principle of organization and functioning of biological neural networks - networks of nerve cells of a living organism. The science of neural networks has existed for quite a long time, but it is precisely in connection with the latest achievements of scientific and technological progress this area is starting to gain popularity.

Books

Let's start the selection with the classic way of studying - through books. We have selected Russian-language books with a large number of examples:

  • F. Wasserman, Neurocomputer technology: Theory and practice. 1992
    The book sets out in a publicly accessible form the basics of building neurocomputers. The structure of neural networks and various algorithms for their configuration are described. Separate chapters are devoted to the implementation of neural networks.
  • S. Khaikin, Neural networks: Complete course. 2006
    The main paradigms of artificial neural networks are discussed here. The presented material contains a strict mathematical justification for all neural network paradigms, is illustrated with examples, descriptions of computer experiments, contains many practical problems, as well as an extensive bibliography.
  • D. Forsythe, Computer Vision. Modern approach. 2004
    Computer vision is one of the most popular areas at this stage of development of global digital technologies. computer technology. It is required in manufacturing, robot control, process automation, medical and military applications, satellite surveillance, and personal computer applications such as digital image retrieval.

Video

There is nothing more accessible and understandable than visual learning using video:

  • To understand what machine learning is in general, look here these two lectures from Yandex ShAD.
  • Introduction into the basic principles of neural network design - great for continuing your introduction to neural networks.
  • Lecture course on the topic “Computer Vision” from the Moscow State University Computing Machinery. Computer vision is the theory and technology of creating artificial systems that detect and classify objects in images and videos. These lectures can be considered an introduction to this interesting and complex science.

Educational resources and useful links

  • Artificial Intelligence Portal.
  • Laboratory “I am intelligence”.
  • Neural networks in Matlab.
  • Neural networks in Python (English):
    • Classifying text using ;
    • Simple .
  • Neural network on .

A series of our publications on the topic

We have previously published a course #neuralnetwork@tproger on neural networks. In this list, publications are arranged in order of study for your convenience.

NEURAL NETWORKS artificial, multilayer highly parallel (i.e., with a large number of elements operating independently in parallel) logical structures composed of formal neurons. The beginning of the theory of neural networks and neurocomputers laid down by the work of American neurophysiologists W. McCulloch and W. Pitts, “A Logical Calculus of Ideas Relating to Nervous Activity” (1943), in which they proposed a mathematical model of the biological neuron. Among the fundamental works, one should highlight the model of D. Hebb, who in 1949 proposed a learning law, which was the starting point for learning algorithms for artificial neural networks. On further development The theory of a neural network was significantly influenced by the monograph of the American neurophysiologist F. Rosenblatt, “Principles of Neurodynamics,” in which he described in detail the circuit of a perceptron (a device that models the process of information perception by the human brain). His ideas were developed in the scientific works of many authors. In 1985–86 the theory of neural networks received a “technological impetus” caused by the possibility of modeling neural networks on the available and high-performance technologies that appeared at that time personal computers. The theory of neural networks continues to develop quite actively at the beginning of the 21st century. According to experts, significant technological growth in the field of designing neural networks and neurocomputers is expected in the near future. In recent years, many new possibilities of neural networks have already been discovered, and work in this area makes a significant contribution to industry, science and technology, and is of great economic importance.

Main areas of application of neural networks

Potential areas of application of artificial neural networks are those where human intelligence is ineffective, and traditional calculations are labor-intensive or physically inadequate (i.e., do not or poorly reflect real physical processes and objects). The relevance of the use of neural networks (i.e., neurocomputers) increases many times when there is a need to solve poorly formalized tasks h. Main areas of application of neural networks: automation of the classification process, automation of forecasting, automation of the recognition process, automation of the decision-making process; management, encoding and decoding of information; approximation of dependencies, etc.

With the help of neural networks, an important problem in the field of telecommunications– design and optimization of communication networks (finding the optimal traffic path between nodes). In addition to controlling flow routing, neural networks are used to obtain effective solutions in the design of new telecommunication networks.

Speech recognition– one of the most popular areas of application of neural networks.

Another area - price and production management(losses from suboptimal production planning are often underestimated). Since the demand and conditions for selling products depend on time, season, exchange rates and many other factors, the volume of production must vary flexibly in order to optimally use resources (the neural network system detects complex dependencies between advertising costs, sales volumes, price, competitors’ prices, day of the week, season, etc.). As a result of using the system, the optimal production strategy is selected from the point of view of maximizing sales volume or profit.

At consumer market analysis(marketing), when conventional (classical) methods for predicting consumer responses may not be accurate enough, a predictive neural network system with an adaptive neurosimulator architecture is used.

Demand research allows you to maintain the company’s business in a competitive environment, i.e. maintain constant contact with consumers through “feedback”. Large companies conduct consumer surveys to find out what factors are decisive for them when purchasing a given product or service, why in some cases preference is given to competitors, and what products the consumer would like to see in the future. Analyzing the results of such a survey is a rather difficult task, since there are a large number of correlated parameters. The neural network system allows you to identify complex dependencies between demand factors, predict consumer behavior when changing marketing policies, find the most significant factors and optimal advertising strategies, and also outline the consumer segment that is most promising for a given product.

IN medical diagnostics neural networks are used, for example, to diagnose hearing in infants. The objective diagnostic system processes the recorded “evoked potentials” (brain responses), which appear as bursts on the electroencephalogram, in response to an audio stimulus synthesized during the examination. Typically, to confidently diagnose a child’s hearing, an experienced audiologist needs to conduct up to 2,000 tests, which takes about an hour. A system based on a neural network is capable of determining the level of hearing with the same reliability from 200 observations within just a few minutes, and without the participation of qualified personnel.

Neural networks are also used for forecasting short and long term trends in various fields (financial, economic, banking, etc.).

Structure of neural networks

The human nervous system and brain consist of neurons connected to each other by nerve fibers. Nerve fibers are capable of transmitting electrical impulses between neurons. All processes of transmission of irritations from our skin, ears and eyes to the brain, processes of thinking and control of actions - all this is implemented in a living organism as the transmission of electrical impulses between neurons.

Biological neuron(Cell) has a nucleus (Nucleus), as well as processes of nerve fibers of two types (Fig. 1) - dendrites (Dendrites), along which impulses are received (Carries signals in), and a single axon (Axon), along which a neuron can transmit an impulse (Carries signals away). The axon contacts the dendrites of other neurons through special formations - synapses (Synapses), which affect the strength of the transmitted impulse. A structure consisting of a collection of a large number of such neurons is called a biological (or natural) neural network.

Appearance formal neuron largely due to the study of biological neurons. A formal neuron (hereinafter referred to as a neuron) is the basis of any artificial neural network. Neurons are relatively simple, uniform elements that imitate the functioning of neurons in the brain. Each neuron is characterized by its current state by analogy with nerve cells in the brain, which can be excited and inhibited. An artificial neuron, like its natural prototype, has a group of synapses (inputs) that are connected to the outputs of other neurons, as well as an axon - the output connection of a given neuron, from where the excitation or inhibition signal arrives at the synapses of other neurons.

A formal neuron is a logical element with $N$ inputs, ($N+1$ ) weighting coefficients, an adder and a nonlinear converter. The simplest formal neuron that performs a logical transformation $y = \text(sign)\sum_(i=0)^(N)a_ix_i$ input signals (which, for example, are the output signals of other formal neurons of the N.S.) into the output signal are presented in Fig. 1.

Here $y$ is the output value of the formal neuron; $a_i$ – weighting coefficients; $x_i$ – input values ​​of the formal neuron ($x_i∈\left \(0,1\right \),\; x_0=1$). The process of calculating the output value of a formal neuron represents the movement of a data stream and its transformation. First, the data arrives at the input block of the formal neuron, where the original data is multiplied by the corresponding weighting coefficients, the so-called. synoptic weights (in accordance with the synapses of biological neurons). A weighting factor is a measure that determines how much the corresponding input value affects the state of a formal neuron. Weighting coefficients can change in accordance with training examples, computer architecture, learning rules, etc. The values ​​obtained (by multiplication) are converted in the adder into one numeric value $g$ (by summation). Then, to determine the output of the formal neuron in the nonlinear transformation block (implementing the transfer function), $g$ is compared with a certain number (threshold). If the sum is greater than the threshold value, the formal neuron generates a signal, otherwise the signal will be null or inhibitory. This formal neuron uses a nonlinear transformation $$\text(sign)(g)= \begin(cases) 0,\; g< 0 \\ 1,\; g ⩾ 0 \end{cases},\quad \text{где}\,\,g = \sum_{i=0}^N a_i x_i.$$

The choice of neural network structure is carried out in accordance with the characteristics and complexity of the task. Theoretically, the number of layers and the number of neurons in each layer of a neural network can be arbitrary, but in fact it is limited by the resources of a computer or a specialized chip on which the neural network is usually implemented. Moreover, if a single jump function is used as an activation function for all neurons of the network, the neural network is called multilayer perceptrono m.

In Fig. Figure 3 shows the general diagram of a multilayer neural network with serial connections. High processing parallelism is achieved by combining a large number of formal neurons into layers and connecting different neurons to each other in a certain way.

In general, cross-links and feedback links with customizable weighting coefficients can be introduced into this structure (Fig. 4).

Neural networks are complex nonlinear systems with a huge number of degrees of freedom. The principle by which they process information differs from the principle used in computers based on processors with von Neumann architecture - with the logical basis of AND, OR, NOT (see J. von Neumann, Calculating machine). Instead of classical programming (as in traditional computing systems) neural network training is used, which usually comes down to adjusting weighting coefficients in order to optimize a given criterion for the quality of functioning of the neural network.

Neural network algorithms

A neural network algorithm for solving problems is a computational procedure that is fully or mostly implemented in the form of a neural network of one structure or another (for example, a multilayer neural network with sequential or cross connections between layers of formal neurons) with a corresponding algorithm for adjusting weighting coefficients. The basis for the development of a neural network algorithm is a systems approach, in which the process of solving a problem is represented as the functioning in time of some dynamic system. To build it, it is necessary to determine: an object that acts as an input signal to the neural network; an object that acts as the output signal of a neural network (for example, the solution itself or some of its characteristics); desired (required) output signal of the neural network; structure of the neural network (number of layers, connections between layers, objects serving as weighting coefficients); system error function (characterizing the deviation of the desired output signal of the neural network from the actual output signal); criterion for the quality of the system and its optimization functionality, depending on the error; the value of the weighting coefficients (for example, determined analytically directly from the problem statement, using some numerical methods or the procedure for adjusting the weighting coefficients of a neural network).

The number and type of formal neurons in the layers, as well as the number of layers of neurons, are selected based on the specifics of the problems being solved and the required quality of the solution. A neural network, in the process of being configured to solve a specific problem, is considered as a multidimensional nonlinear system that, in an iterative mode, purposefully seeks the optimum of some functionality that quantitatively determines the quality of the solution to the problem. For neural networks, as multidimensional nonlinear control objects, algorithms for adjusting a variety of weighting coefficients are formed. The main stages of studying a neural network and constructing algorithms for setting (adapting) their weighting coefficients include: studying the characteristics of the input signal for various modes of operation of the neural network (the input signal of the neural network is, as a rule, the input processed information and the indication of the so-called “teacher” of the neural network ); selection of optimization criteria (with a probabilistic model outside world such criteria may be the minimum of the average risk function, the maximum of the posterior probability, in particular if there are restrictions on the individual components of the average risk function); development of an algorithm for searching for extrema of optimization functionals (for example, to implement algorithms for searching for local and global extrema); construction of algorithms for adapting neural network coefficients; analysis of reliability and diagnostic methods of a neural network, etc.

It should be noted that the introduction of feedback and, as a consequence, the development of algorithms for adjusting their coefficients in the 1960–80s had a purely theoretical meaning, since there were no practical problems adequate to such structures. Only in the late 1980s and early 1990s did such problems and simple structures with customizable feedback loops for solving them (the so-called recurrent neural networks) begin to appear. Developers in the field of neural network technologies were engaged not only in creating algorithms for setting up multilayer neural networks and neural network algorithms for solving various problems, but also in the most effective (at the current moment in the development of electronics technology) hardware emulators (special programs that are designed to run one system in the shell of another) neural network algorithms. In the 1960s, before the advent of the microprocessor, the most effective emulators of neural networks were analog implementations of open-loop neural networks with developed tuning algorithms on mainframe computers (sometimes systems based on adaptive elements with analog memory). This level of development of electronics made it urgent to introduce cross-connections into the structures of neural networks. This led to a significant reduction in the number of neurons in the neural network while maintaining the quality of the problem solution (for example, discriminant ability when solving pattern recognition problems). Research from the 1960s and 70s in the field of optimizing the structures of cross-connected neural networks will certainly be developed in the implementation memristor neural systems[memristor (memristor, from memory – memory, and resistor – electrical resistance), a passive element in microelectronics, capable of changing its resistance depending on the charge flowing through it], taking into account their specificity in terms of analog-digital information processing and a very significant amount customizable coefficients. The specific requirements of applied problems determined some features of the structures of neural networks using tuning algorithms: a continuum (from the Latin continuum - continuous, continuous) of the number of classes, when the indication of the “teacher” of the system is formed in the form of a continuous function value in a certain range of change; a continuum of solutions of a multilayer neural network, formed by the choice of the continuum activation function of the neuron of the last layer; continuum of the number of features formed by a transition in the feature space from the representation of the output signal as an $N$-dimensional vector real numbers to a real function in a certain range of changes in the argument; the continuum of the number of features, as a consequence, requires a specific software and hardware implementation of a neural network; a variant of the continuum of input space features was implemented in the problem of recognizing periodic signals without transforming them using analog-to-digital converter(ADC) at the system input, and the implementation of an analog-to-digital multilayer neural network; continuum of the number of neurons in a layer; The implementation of multilayer neural networks with a continuum of classes and solutions is carried out by selecting the appropriate types of activation functions of the neurons of the last layer.

The table shows a systematic set of options for algorithms for setting up multilayer neural networks in the “Input signal – solution space” space. Many options for the characteristics of input and output signals of neural networks are presented, for which the algorithms for adjusting the coefficients developed by the Russian scientific school in 1960–70. The signal to the input of the neural network is described by the number of classes (gradations) of images representing the instructions of the “teacher”. The output of the neural network is a quantitative description of the solution space. The table provides a classification of options for the functioning of neural networks for various types input signal (2 classes, $K$ classes, continuum of classes) and various options for a quantitative description of the solution space (2 solutions, $K_p$ solutions, continuum of solutions). Numbers 1, 7, 8 represent specific options for the functioning of neural networks.

Table. A set of configuration algorithm options

Space (number) of solutions

Input signal

2 classes$K$ classesContinuum of classes
2 1 7 8
$K_p$$K_p=3$3a$K\lt K_p$9 10
$K = K_p$2
$K_p =\text(const)$3b$K\gt K_p$4
Continuum5 6 11

The main advantages of neural networks as a logical basis for algorithms for solving complex problems are: invariance (consistency, independence) of methods for synthesizing neural networks on the dimension of the feature space; the ability to select the structure of neural networks in a wide range of parameters depending on the complexity and specifics of the problem being solved in order to achieve the required quality of the solution; adequacy to current and promising microelectronics technologies; fault tolerance in the sense of a small, rather than catastrophic, change in the quality of the solution to a problem depending on the number of failed elements.

Neural networks are a special type of control object in an adaptive system

Neural networks were one of the first examples in control theory of the transition from the control of the simplest linear stationary systems to the control of complex nonlinear, nonstationary, multidimensional, multi-connected systems. In the second half of the 1960s, the method of synthesizing neural networks was born, which developed and was successfully applied over the next almost fifty years. The general structure of this technique is presented in Fig. 5.

Input signals of neural networks

A probabilistic model of the surrounding world is the basis of neural network technologies. Such a model is the basis of mathematical statistics. Neural networks arose just at the time when experimenters using methods of mathematical statistics asked themselves the question: “Why are we obliged to describe the distribution functions of input random signals in the form of specific analytical expressions (normal distribution, Poisson distribution, etc.)? If this is correct and there is some physical reason for it, then the task of processing random signals becomes quite simple."

Specialists in neural network technologies said: “We know nothing about the distribution function of input signals, we refuse the need for a formal description of the distribution function of input signals, even if we narrow the class of problems being solved. We consider the distribution functions of input signals to be complex and unknown, and we will solve specific specific problems in such conditions a priori uncertainty(i.e. incomplete description; there is no information about possible results).” This is why neural networks were effectively used in solving pattern recognition problems in the early 1960s. Moreover, the problem of pattern recognition was treated as a problem of approximation of a multidimensional random function taking $K$ values, where $K$ is the number of image classes.

Below are some operating modes of multilayer neural networks, determined by the characteristics of random input signals, for which algorithms for adjusting coefficients were developed back in the late 1960s.

Training neural networks

It is obvious that the functioning of the neural network, i.e., the actions that it is capable of performing, depends on the magnitude of synoptic connections. Therefore, having given the structure of a neural network that meets a certain task, the developer must find optimal values ​​for all weighting coefficients $w$ . This stage is called training the neural network, and the ability of the network to solve the problems posed to it during operation depends on how well it is performed. The most important parameters of training are: the quality of selection of weighting coefficients and the time that needs to be spent on training. As a rule, these two parameters are inversely related and must be chosen based on a compromise. Currently, all neural network training algorithms can be divided into two large classes: “supervised” and “unsupervised”.

Prior probabilities of class appearance

Despite the insufficiency of a priori information about the distribution functions of input signals, ignoring some useful information may lead to a loss of quality in solving the problem. This primarily concerns the a priori probabilities of the appearance of classes. Algorithms for setting up multilayer neural networks were developed taking into account the available information about the a priori probabilities of the appearance of classes. This occurs in tasks such as recognizing letters in text, when of this language the probability of the appearance of each letter is known and this information must be used when constructing an algorithm for adjusting the coefficients of a multilayer neural network.

Teacher qualification

The neural network is presented with the values ​​of both input and output parameters, and according to some internal algorithm it adjusts the weights of its synaptic connections. Supervised learning assumes that for each input vector there is a target vector that represents the desired output. In general, the qualifications of a “teacher” may be different for different classes of images. Together they are called representative or training sample y. Typically, a neural network is trained on a certain number of such samples. The output vector is presented, the output of the neural network is calculated and compared with the corresponding target vector, the difference (error) using feedback is fed into the neural network and the weights are changed according to an algorithm that seeks to minimize error. The vectors of the training set are presented sequentially, errors are calculated and weights are adjusted for each vector until the error across the entire training set reaches an acceptably low level.

In pattern recognition tasks, as a rule, the default qualification of the “teacher” is complete, i.e. the probability of the “teacher” correctly assigning images to one class or another is equal to one. In practice, in the presence of indirect measurements, this is often not true, for example, in medical diagnostic tasks, when when verifying (checking) an archive of medical data intended for training, the probability of attributing this data to a particular disease is not equal to one. The introduction of the concept of “teacher” qualification made it possible to develop unified algorithms for adjusting the coefficients of multilayer neural networks for learning modes, learning “with a teacher” with finite qualifications, and self-learning (clustering), when, in the presence of $K$ or two classes of images, the qualification of the “teacher” ( the probability of assigning images to one class or another) is equal to $\frac (1) (K)$ or 1 / 2 . The introduction of the concept of “teacher” qualification in pattern recognition systems made it possible to purely theoretically consider the modes of “sabotage” of the system when it is informed of a deliberately false (with varying degrees of falsity) assignment of images to a particular class. This mode adjusting the coefficients of a multilayer neural network has not yet found practical application.

Clustering

Clustering (self-learning, unsupervised learning) is private mode the operation of multilayer neural networks, when the system is not informed of information about the belonging of samples to a particular class. The neural network is presented with only input signals, and the network outputs are formed independently, taking into account only the input signals and their derivatives. Despite numerous applied advances, supervised learning has been criticized for its biological implausibility. It is difficult to imagine a learning mechanism in natural human intelligence that would compare desired and actual output values, making adjustments through feedback. If we assume a similar mechanism in the human brain, then where do the desired outputs come from? Unsupervised learning is a more plausible model of learning in a biological system. It does not require a target vector for outputs and therefore does not require comparison with predefined ideal responses. The training set consists only of input vectors. The training algorithm adjusts the weights of the neural network so that consistent output vectors are obtained, that is, so that the presentation of sufficiently close input vectors produces identical outputs. The learning process therefore extracts the statistical properties of the training set and groups similar vectors into classes. Presenting a vector from a given class as input will give a certain output vector, but before training it is impossible to predict what output will be produced by a given class of input vectors. Consequently, the outputs of such a network must be transformed into some understandable form, determined by the learning process. This is not a serious problem. It is usually not difficult to identify the connection between input and output established by the network.

Many scientific works are devoted to clustering. The main task of clustering is to process a set of vectors in a multidimensional feature space with the selection of compact subsets (subsets close to each other), their number and properties. The most common clustering method is the “$K$ -means” method, which is practically unrelated to backpropagation methods and is not generalizable to architectures such as multilayer neural networks.

The introduction of the concept of “teacher” qualifications and a unified approach to teaching and self-learning in the 1960s made it possible to actually create the basis for the implementation of the clustering mode in multilayer neural networks of a wide class of structures.

Non-stationary images

Existing developments in the field of pattern recognition systems based on multilayer neural networks mainly relate to stationary images, i.e. to random input signals that have complex unknown but time-stationary distribution functions. Some works have attempted to extend the proposed technique for tuning multilayer neural networks to non-stationary images, when the assumed unknown distribution function of the input signal depends on time or the input random signal is a superposition of a regular component and a random component with an unknown complex distribution function that does not depend on time.

On primary optimization criteria in multilayer neural networks

The probabilistic model of the world, taken as a basis for the construction of adaptation algorithms in multilayer neural networks, made it possible to formulate a criterion for primary optimization in the systems under consideration in the form of requirements for the minimum of the average risk function and its modifications: the maximum of the posterior probability (the conditional probability of a random event, provided that the posterior , i.e., data based on experience); minimum of the average risk function; minimum of the average risk function, subject to equality of conditional risk functions for different classes; minimum of the average risk function provided set value conditional risk function for one of the classes; other primary optimization criteria arising from the requirements of a specific practical problem. The works of Russian scientists presented modifications of algorithms for setting up multilayer neural networks for the above primary optimization criteria. Note that the vast majority of works in the field of neural network theory and backpropagation algorithms consider the simplest criterion - the minimum mean square error, without any restrictions on conditional risk functions.

In the self-learning (clustering) mode, a prerequisite for the formation of the criterion and functionality for the primary optimization of neural networks is the representation of the distribution function of the input signal in the form of a multimodal function in a multidimensional feature space, where each mode with a certain probability corresponds to a class. Modifications of the average risk function were used as criteria for primary optimization in self-learning mode.

The presented modifications of the primary optimization criteria were generalized to cases of a continuum of classes and solutions; continuum of attributes of the input space; continuum of the number of neurons in a layer; with arbitrary teacher qualifications. An important section in the formation of the criterion and functional for primary optimization in multilayer neural networks with a probabilistic model of the world is the choice of the loss matrix, which in the theory of statistical decisions determines the loss coefficient $L_(12)$ in case of erroneous assignment of images of the 1st class to the 2nd and the loss coefficient $L_(21)$ when assigning images of the 2nd class to the 1st. As a rule, by default the matrix $L$ of these coefficients when synthesizing algorithms for tuning multilayer neural networks, including when using the backpropagation method, is assumed to be symmetrical. In practice this is not true. A typical example is a mine detection system using geolocator. In this case, losses due to the erroneous classification of a stone as a mine are equivalent to some small loss of time by the user of the geolocator. Losses associated with the erroneous classification of a mine as a stone are associated with life or significant loss of health for users of the geolocator.

Analysis of open-loop neural networks

This stage synthesis aims to determine in a general form the statistical characteristics of the output and intermediate signals of neural networks as multidimensional, nonlinear control objects with the aim of further forming the criterion and functionality of secondary optimization, i.e., the functionality actually optimized by the adaptation algorithm in a specific neural network. In the vast majority of works, the mean square error is taken as such a functional, which degrades the quality of the solution or does not at all correspond to the optimization problem posed by the primary optimization criterion.

Methods and algorithms for generating a secondary optimization functional corresponding to a given primary optimization functional have been developed.

Algorithms for finding the extremum of secondary optimization functionals

The algorithm for searching for an extremum in relation to a specific secondary optimization functionality determines the algorithm for adjusting the coefficients of a multilayer neural network. At the beginning of the 21st century, such algorithms implemented in the MatLab system (short for “Matrix Laboratory” - a package of application programs for solving technical computing problems and a programming language of the same name) are of greatest practical interest. However, it is necessary to note the particularity of adaptation algorithms in multilayer neural networks used in MatLab systems (Neural Network Toolbox - provides functions and applications for modeling complex nonlinear systems that are described by equations; supports supervised and unsupervised learning, direct propagation, radial basis functions, etc.), and the orientation of these algorithms not on the specifics of the problems being solved, but on the imaginary “geometry” of secondary optimization functionals. These algorithms do not take into account many details of the specifics of using multilayer neural networks when solving specific problems and, naturally, require radical, if not fundamental, processing when moving to memristor neural systems. A detailed comparative analysis of the backpropagation method and Russian methods of the 1960–70s was carried out. The main feature of these algorithms is the need to search for local and global extrema of a multiextremal functional in the multidimensional space of tunable coefficients of the neural network. An increase in the size of the neural network leads to a significant increase in the number of adjustable coefficients, i.e., to an increase in the dimension of the search space. Back in the 1960s, works proposed search and analytical procedures for calculating the gradient of the secondary optimization functional, and in the class of analytical procedures, the use of not only the first, but also the second derivative of the secondary optimization functional for organizing the search was proposed and studied. The specificity of the multi-extremal nature of the secondary optimization functional led over the next decades to the emergence of various modifications of search methods (genetic algorithms, etc.). Algorithms for searching for extrema of secondary optimization functionals with restrictions on the size, speed and other parameters of the weighting coefficients of neural networks have been created. It is these methods that should be the basis for work on methods for tuning neural networks using memristors (weighting coefficients) taking into account such specific characteristics as transfer functions.

Initial conditions when setting coefficients

The choice of initial conditions for the iterative procedure for searching for extrema of secondary optimization functionals is an important stage in the synthesis of algorithms for tuning multilayer neural networks. The problem of choosing initial conditions must be solved specifically for each problem solved by a neural network, and be an integral component of the general procedure for synthesizing algorithms for setting up multilayer neural networks. A high-quality solution to this problem can significantly reduce setup time. The a priori complexity of the secondary optimization functional made it necessary to introduce a procedure for selecting initial conditions in the form of random values ​​of coefficients with repetition of this procedure and the procedure for adjusting the coefficients. This procedure seemed extremely redundant back in the 1960s in terms of the time spent adjusting the coefficients. However, despite this, it is still widely used today. For individual problems, the idea of ​​choosing initial conditions specific to the given problem being solved was adopted. This procedure was tested for three tasks: pattern recognition; clustering; neuroidentification of nonlinear dynamic objects.

Memory in the coefficient adjustment circuit

A systematic approach to the construction of algorithms for searching for the extremum of the secondary optimization functional assumes, as one of the tuning modes, the reconfiguration of the coefficients in each step of the arrival of input images according to the current value of the gradient of the secondary optimization functional. Algorithms have been developed for tuning multilayer neural networks with filtering the sequence of gradient values ​​of the secondary optimization functional: a zero-order filter with $m_n$ memory (for stationary images); $1, …, k$-th order filter with $m_n$ memory (for non-stationary images) with different hypotheses of changes in time of distribution functions for images of different classes.

Study of adaptation algorithms in neural networks

Main question– how to choose the structure of a multilayer neural network to solve a selected specific problem is still largely unresolved. We can only offer a reasonable, targeted selection of structural options with an assessment of their effectiveness in the process of solving the problem. However, assessing the quality of the tuning algorithm on a specific selected structure or specific task may not be correct enough. Thus, to assess the quality of operation of linear dynamic control systems, standard input signals (step, quadratic, etc.) are used, based on the response to which the steady-state error (system astatism) and errors in transient processes are assessed.

Similarly, typical input signals for multilayer neural networks have been developed to test and compare the performance of different tuning algorithms. Naturally, typical input signals for objects such as multilayer neural networks are specific to each problem being solved. First of all, standard input signals were developed for the following tasks: pattern recognition; clustering; neurocontrol of dynamic objects.

The main axiomatic principle of using neural network technologies instead of methods of classical mathematical statistics is the rejection of a formalized description of probability distribution functions for input signals and the adoption of the concept of unknown, complex distribution functions. It is for this reason that the following typical input signals have been proposed.

For the clustering problem, a random signal sample with a multimodal distribution was proposed, implemented in the $N$ -dimensional feature space with distribution function modes, the centers of which in the amount of $Z$ are located on the hyperbisector of the $N$ -dimensional feature space. Each mode implements a random sample component with a normal distribution and a standard deviation $σ$ equal for each of the $Z$ modes. The subject of comparison of different clustering methods will be the dynamics of settings and the quality of the solution of the problem depending on $N$, $Z$ and $σ$, with a sufficiently large random sample $M$. This approach can be considered one of the first fairly objective approaches to comparing clustering algorithms, including those based on multilayer neural networks with an appropriate choice of structure to achieve required quality clustering. For classification problems, the test inputs are similar to those for clustering, with the difference that a multimodal sample is divided into two (in the case of two classes) or into $K$ (in the case of $K$ classes) parts with interspersed modes of the distribution function for the individual classes .

Neural networks with variable structure

The refusal in neural network technologies from a priori information, from information about the distribution functions of input signals, leads to the need to implement a reasonable enumeration of the parameters of the structure of multilayer neural networks to ensure the required quality of problem solving.

In the 1960s, for a class of problems that was very relevant at that time—pattern recognition—a procedure was proposed for tuning multilayer neural networks, in which the structure is not fixed a priori, but is the result of tuning along with the values ​​of the adjusted coefficients. In this case, during the setup process, the number of layers and the number of neurons in the layers are selected. The procedure for adjusting the coefficients of a multilayer neural network with a variable structure is easily transferred from the problem of recognizing two classes of images to the problem of recognizing $K$ classes of images. Moreover, here the result of the setup is $K$ neural networks, in each of which the first class is the $k$ th class ($k = 1, \ldots, K$ ), and the second is all the others. A similar idea of ​​setting up multilayer neural networks with variable structure is also applicable to solving the clustering problem. In this case, the original analyzed sample is taken as the first class of images, and a sample with a uniform distribution in the range of changes in features is taken as the second class. A multilayer neural network with a variable structure, implemented during the setup process, qualitatively and quantitatively reflects the complexity of solving the problem. From this point of view, the task of clustering as a task of generating new knowledge about the object under study consists in identifying and analyzing those areas of the multidimensional space of features in which the probability distribution function exceeds the level of uniform distribution in the range of changes in feature values.

Development prospects

At the beginning of the 21st century, one of the main concepts for the development (training) of a multilayer neural network is the desire to increase the number of layers, and this involves ensuring the choice of a neural network structure that is adequate to the problem being solved, and the development of new methods for creating coefficient adjustment algorithms. The advantages of neural networks are: the property of the so-called. gradual degradation - upon failure individual elements the quality of system operation decreases gradually (for comparison, logical networks of AND, OR elements do NOT fail if the operation of any network element is disrupted); increased resistance to changes in the parameters of the circuits that implement them (for example, very significant changes in weights do not lead to errors in the implementation of simple logical function two variables), etc.

The widespread use of neural network algorithms in the field of complex formalizable, weakly formalizable and non-formalizable problems has led to the creation of a new direction in computational mathematics - neuromathematicians. Neuromathematics includes neural network algorithms for solving the following problems: pattern recognition; optimization and extrapolation of functions; graph theory; cryptographic problems; solution of real and Boolean systems of linear and nonlinear equations, ordinary one-dimensional and multidimensional differential equations, partial differential equations, etc. Based on the theory of neural networks, it was created new section modern theory of control of complex nonlinear and multidimensional, multi-connected dynamic systems – neurofeedback, including methods for neural network identification of complex dynamic objects; construction of neuroregulators in control circuits of complex dynamic objects, etc.