If you follow news from the world of science and technology, you have probably heard something about the concept of neural networks.

For example, in 2016, neural Google network AlphaGo beat one of the best professional Counter-Strike: Global Offensive players in the world with a score of 4-1. YouTube also announced that they will be using neural networks to better understand your videos.

But what is a neural network? How it works? And why are they so popular in machine processing?

Computer as a brain

Modern neuroscientists often discuss the brain as a type of computer. Neural networks aim to do the opposite: build a computer that functions like a brain.

Of course, we only have a superficial understanding of the brain's extremely complex functions, but by creating simplified simulations of how the brain processes data, we can build a type of computer that functions very differently from a standard one.

Computer processors process data sequentially (“in order”). They perform many operations on a set of data, one at a time. Parallel processing (“processing multiple threads at the same time”) significantly speeds up a computer by using multiple processors in series.

In the figure below, the parallel processing example requires five different processors:

An artificial neural network (so-called to distinguish it from real neural networks in the brain) has a fundamentally different structure. It's very interconnected. This allows you to process data very quickly, learn from that data, and update your own internal structure to improve performance.

However high degree interconnectedness has some striking implications. For example, neural networks are very good at recognizing unclear data structures.

Learning ability

A neural network's ability to learn is its greatest strength. In a standard computing architecture, a programmer must design an algorithm that tells the computer what to do with incoming data to ensure that the computer produces the correct answer.

The answer to I/O can be as simple as "when the A key is pressed", "A is displayed on the screen", or more complex than performing complex statistics. On the other hand, neural networks do not require the same algorithms. Through learning mechanisms, they can essentially develop their own algorithms. Machine algorithms to make sure they work correctly.

It is important to note that since neural networks are programs written on machines that use standard Hardware for sequential processing, current technology still imposes limitations. Actually creating a hardware version of a neural network is a completely different problem.

From neurons to nodes

Now that we've laid the foundation for how neural networks work, we can start looking at some of the specifics. The basic structure of an artificial neural network looks like this:


Each of the circles is called a “node” and simulates a single neuron. On the left are the input nodes, in the middle are the hidden nodes, and on the right are the output nodes.

In the most basic terms, input nodes accept input values, which can be binary 1 or 0, part of an RGB color value, the status of a chess piece, or anything else. These nodes represent information entering the network.

Each input node is connected to several hidden nodes (sometimes to every hidden node, sometimes to a subset). Input nodes take the information they are given and pass it along to the hidden layer.

For example, an input node might send a signal ("fire" in neuroscience parlance) if it receives a 1, and remain dormant if it receives a zero. Each hidden node has a threshold: if all its summed inputs reach certain value, it works.

From synapses to connections

Each connection, equivalent to an anatomical synapse, also has a certain weight, which allows the network to pay more attention to the action of a particular node. Here's an example:


As you can see, the weight of connection "B" is higher than that of connection "A" and "C". Let's say the hidden node "4" will fire only if it receives a total input of "2" or more. This means that if "1" or "3" fires individually, then "4" will not fire, but "1" and "3" together will trigger the node. Node "2" can also initiate the node itself via connection "B".

Let's take the weather as a practical example. Let's say you're designing a simple neural network to determine whether there should be a winter storm warning.

Using the connections and weights above, node 4 can only trigger if the temperature is below -18 C and the wind is above 48 km/s, or it will trigger if the chance of snow is greater than 70 percent. Temperatures will be fed to node 1, winds to node 3, and the probability of snow to node 2. Now node 4 can take all of this into account when determining what signal to send to the output layer.

Better than simple logic

Of course, this function could simply be implemented using simple AND/OR gates. But more complex neural networks, like the ones below, are capable of much more complex operations.


The output layer nodes function in the same way as the hidden layer: the output nodes sum the inputs from the hidden layer, and if they reach a certain value, the output nodes trigger and send specific signals. At the end of the process, the output layer will send a set of signals that indicates the result of the input.

While the network shown above is simple, deep neural networks can have many hidden layers and hundreds of nodes.


Bug fix

This process is still relatively simple. But where neural networks are really needed is in learning. Most neural networks use a backpropagation process that sends signals back through the network.

Before developers deploy a neural network, they run it through a training phase in which it receives a set of inputs with known outputs. For example, a programmer can teach a neural network to recognize images. The input could be a picture of a car, and the correct output would be the word "car".

The programmer provides an image as input and sees what comes out of the output nodes. If the network responds with "airplane", the programmer tells the computer that it is incorrect.

The network then makes adjustments to its own connections, changing the weight of different links between nodes. This action is based on a special learning algorithm added to the network. The network continues to adjust the connection weights until it produces the correct output.

This is a simplification, but neural networks can learn very complex operations using similar principles.

Continuous improvement

Even after training, backpropagation (training) continues - and this is where neural networks get really, really cool. They continue to learn as they are used, integrating new information and making changes to the weights of various compounds, becoming increasingly effective at the task for which they are designed.

It can be as simple as pattern recognition or as complex as playing CS:GO.

Thus, neural networks are constantly changing and improving. And this can have unexpected consequences, leading to networks that prioritize things that a programmer wouldn't consider a priority.

In addition to the process described above, which is called supervised learning, there is another method: unsupervised learning.

In this situation, neural networks take input data and try to recreate it exactly as their output, using backpropagation to update their connections. This may sound like a futile exercise, but this is how networks learn to extract useful features and generalize these features to improve your models.

Questions of depth

Backpropagation is very effective method teach neural networks... when they consist of only a few layers. As the number of hidden layers increases, the efficiency of backpropagation decreases. This is a problem for deep networks. Using backpropagation, they are often no more efficient than simple networks.

Scientists have developed a number of solutions to this problem, the specifics of which are quite complex and beyond the scope of this introductory part. What many of these solutions are trying to do in simple language, is called reducing the complexity of the network by teaching it to “compress” data.


To do this, the network learns to extract fewer identifying features from the input data, ultimately becoming more efficient in its computations. Essentially, the network makes generalizations and abstractions, much in the same way that humans learn.

After this training, the network can prune nodes and connections that it deems less important. This makes the network more efficient and learning easier.

Neural Network Applications

In this way, neural networks model how the brain learns using multiple layers of nodes—input, hidden, and output—and they can learn in both supervised and unsupervised situations. Complex networks are able to make abstractions and generalizations, which makes them more efficient and more capable of learning.

What can we use these exciting systems for?

In theory, we can use neural networks for almost anything. And you've probably used them without realizing it. They are very common in speech and visual recognition, for example, because they can learn to pick out certain features, something common in sounds or images.

So when you say, "OK Google," your iPhone runs your speech through a neural network to understand what you're saying. Perhaps there is another neural network that learns to predict what you are likely to ask for.

Self-driving cars can use neural networks to process visual data, thereby following road rules and avoiding collisions. Robots of all types can benefit from neural networks that help them learn to perform tasks efficiently. Computers can learn to play games like chess or CS:GO. If you've ever interacted with a chatbot, chances are it uses a neural network to suggest appropriate responses.

Internet search can greatly benefit from neural networks, as the highly efficient parallel processing model can quickly generate a lot of data. The neural network can also learn your habits to personalize your search results or predict what you're going to search for in the near future. This predictive model will obviously be very valuable to marketers (and anyone who needs to predict complex human behavior).

Pattern recognition, optical image recognition, stock market forecasting, route finding, big data processing, medical cost analysis, sales forecasting, artificial intelligence in video games - the possibilities are almost endless. Neural networks' ability to learn patterns, make generalizations, and successfully predict behavior makes them valuable in countless situations.

The future of neural networks

Neural networks have advanced from very simple models to high-level training simulations. They are on our phones, tablets and many of the web services we use. There are many other machine learning systems.

But neural networks, because of their similarities (in a very simplified form) to the human brain, are among the most fascinating. While we continue to develop and improve the models, we cannot say what they are capable of.

Do you know of any interesting applications of neural networks? Do you have experience working with them yourself? What excites you most about this technology? Share your thoughts in the comments below!

Recently, more and more people are talking about so-called neural networks, they say they will soon be actively used in robotics, mechanical engineering, and many other areas of human activity, but search engine algorithms, like Google, are already slowly starting to use them work. What are these neural networks, how do they work, what are their applications and how can they be useful for us? Read on about all this.

What are neural networks

Neural networks are one of the areas of scientific research in the field of creating artificial intelligence(AI), which is based on the desire to imitate the human nervous system. Including its (nervous system) ability to correct errors and self-learn. All this, although somewhat crudely, should allow us to simulate the functioning of the human brain.

Biological neural networks

But this definition in the paragraph above is purely technical; if we speak in the language of biology, then a neural network is the human nervous system, that collection of neurons in our brain, thanks to which we think, make certain decisions, and perceive the world around us.

A biological neuron is a special cell consisting of a nucleus, a body and processes, and also has a close connection with thousands of other neurons. Through this connection, electrochemical impulses are continually transmitted, bringing the entire neural network into a state of excitement or, conversely, calm. For example, some pleasant and at the same time exciting event (meeting a loved one, winning a competition, etc.) will generate an electrochemical impulse in the neural network, which is located in our head, which will lead to its excitation. As a result, the neural network in our brain will transmit its excitation to other organs of our body and lead to increased heart rate, more frequent eye blinking, etc.

Here in the picture is a highly simplified model of the biological neural network of the brain. We see that a neuron consists of a cell body and a nucleus; the cell body, in turn, has many branched fibers called dendrites. Long dendrites are called axons and have a length much greater than shown in this figure; through axons, communication between neurons is carried out, thanks to them the biological neural network works in our heads.

History of neural networks

What is the history of the development of neural networks in science and technology? It originates with the advent of the first computers or computers (electronic computers) as they were called in those days. So, back in the late 1940s, a certain Donald Hebb developed a neural network mechanism, which laid down the rules for teaching computers, these “proto-computers.”

The further chronology of events was as follows:

  • In 1954, the first practical use of neural networks in computer operation took place.
  • In 1958, Frank Rosenblatt developed a pattern recognition algorithm and a mathematical annotation to it.
  • In the 1960s, interest in the development of neural networks faded somewhat due to the weak computer power of that time.
  • And it was revived again in the 1980s; it was during this period that a system with a feedback mechanism appeared and self-learning algorithms were developed.
  • By 2000, computer power had grown so much that it could make the wildest dreams of scientists of the past come true. At this time, voice recognition programs, computer vision and much more appear.

Artificial neural networks

Artificial neural networks are commonly understood as computer systems that have the ability to self-learn and gradually increase their performance. The main elements of the neural network structure are:

  • Artificial neurons, which are elementary, interconnected units.
  • A synapse is a connection that is used to send and receive information between neurons.
  • The signal is the actual information to be transmitted.

Application of neural networks

The scope of artificial neural networks is expanding every year; today they are used in such areas as:

  • Machine learning, which is a type of artificial intelligence. It is based on training AI using the example of millions of similar tasks. Nowadays, machine learning is being actively implemented search engines Google, Yandex, Bing, Baidu. So, based on the millions of search queries that we all enter into Google every day, their algorithms learn to show us the most relevant results so that we can find exactly what we are looking for.
  • In robotics, neural networks are used to develop numerous algorithms for the iron “brains” of robots.
  • Architects computer systems use neural networks to solve the problem of parallel computing.
  • With the help of neural networks, mathematicians can solve various complex mathematical problems.

Types of Neural Networks

In general, different types and types of neural networks are used for different tasks, among which are:

  • convolutional neural networks,
  • recurrent neural networks,
  • Hopfield neural network.

Convolutional Neural Networks

Convolutional networks are one of the most popular types of artificial neural networks. Thus, they have proven their effectiveness in visual pattern recognition (video and images), recommender systems and language processing.

  • Convolutional neural networks scale well and can be used for image recognition of any high resolution.
  • These networks use 3D volumetric neurons. Within one layer, neurons are connected by only a small field, called the receptive layer.
  • Neurons of neighboring layers are connected through a spatial localization mechanism. The operation of many such layers is ensured by special nonlinear filters that respond to an increasing number of pixels.

Recurrent neural networks

Recurrent neural networks are those whose connections between neurons form an indicative cycle. Has the following characteristics:

  • Each connection has its own weight, also known as priority.
  • Nodes are divided into two types, introductory nodes and hidden nodes.
  • Information in a recurrent neural network is transmitted not only in a straight line, layer by layer, but also between the neurons themselves.
  • Important distinctive feature A recurrent neural network is the presence of a so-called “area of ​​attention”, when the machine can be given certain pieces of data that require enhanced processing.

Recurrent neural networks are used in the recognition and processing of text data (Google Translator, the Yandex “Palekh” algorithm, etc.) voice assistant Apple Siri).

Neural networks, video

And finally, an interesting video about neural networks.

In the chapter, we became familiar with concepts such as artificial intelligence, machine learning and artificial neural networks.

In this chapter, I will describe in detail the artificial neuron model, talk about approaches to training the network, and also describe some well-known types of artificial neural networks that we will study in the following chapters.

Simplification

In the last chapter, I constantly talked about some serious simplifications. The reason for the simplifications is that no modern computers can not fast simulate complex systems such as our brain. In addition, as I already said, our brain is filled with various biological mechanisms that are not related to information processing.

We need a model for converting the input signal into the output signal we need. Everything else doesn't bother us. Let's start simplifying.

Biological structure → diagram

In the previous chapter, you realized how complex biological neural networks and biological neurons are. Instead of drawing neurons as tentacled monsters, let's just draw diagrams.

Generally speaking, there are several ways graphic image neural networks and neurons. Here we will depict artificial neurons as circles.

Instead of a complex interweaving of inputs and outputs, we will use arrows indicating the direction of signal movement.

Thus, an artificial neural network can be represented as a collection of circles (artificial neurons) connected by arrows.

Electrical signals → numbers

In a real biological neural network, an electrical signal is transmitted from the network inputs to the outputs. It may change as it passes through the neural network.

An electrical signal will always be an electrical signal. Conceptually, nothing changes. But what then changes? The magnitude of this electrical signal changes (stronger/weaker). And any value can always be expressed as a number (more/less).

In our artificial neural network model, we do not need to implement the behavior of the electrical signal at all, since nothing will depend on its implementation anyway.

We will supply some numbers to the network inputs, symbolizing the magnitude of the electrical signal if it existed. These numbers will move through the network and change in some way. At the output of the network we will receive some resulting number, which is the response of the network.

For convenience, we will still call our numbers circulating in the network signals.

Synapses → connection weights

Let us recall the picture from the first chapter, in which the connections between neurons - synapses - were depicted in color. Synapses can strengthen or weaken the electrical signal passing through them.

Let's characterize each such connection with a certain number, called the weight of this connection. The signal passed through this connection, is multiplied by the weight of the corresponding connection.

This is a key point in the concept of artificial neural networks, I will explain it in more detail. Look at the picture below. Now each black arrow (connection) in this picture corresponds to a certain number ​\(w_i \) ​ (the weight of the connection). And when the signal passes through this connection, its magnitude is multiplied by the weight of this connection.

In the above figure, not every connection has a weight simply because there is no space for labels. In reality, each ​\(i \) ​th connection has its own ​\(w_i \) ​th weight.

Artificial Neuron

We now move on to consider the internal structure of an artificial neuron and how it transforms the signal arriving at its inputs.

The figure below shows a complete model of an artificial neuron.

Don't be alarmed, there is nothing complicated here. Let's look at everything in detail from left to right.

Inputs, weights and adder

Each neuron, including artificial ones, must have some inputs through which it receives a signal. We have already introduced the concept of weights by which signals passing through the communication are multiplied. In the picture above, the weights are shown as circles.

The signals received at the inputs are multiplied by their weights. The signal of the first input ​\(x_1 \) ​ is multiplied by the weight ​\(w_1 \) ​ corresponding to this input. As a result, we get ​\(x_1w_1 \) ​. And so on until the ​\(n\) ​th input. As a result, at the last input we get ​\(x_nw_n \) ​.

Now all products are transferred to the adder. Just based on its name, you can understand what it does. It simply sums all the input signals multiplied by the corresponding weights:

\[ x_1w_1+x_2w_2+\cdots+x_nw_n = \sum\limits^n_(i=1)x_iw_i \]

Mathematical help

Sigma - Wikipedia

When it is necessary to briefly write down a large expression consisting of a sum of repeating/same-type terms, the sigma sign is used.

Let's consider simplest option entries:

\[ \sum\limits^5_(i=1)i=1+2+3+4+5 \]

Thus, from below the sigma we assign the counter variable ​\(i \) ​ a starting value, which will increase until it reaches the upper limit (in the example above it is 5).

The upper limit can also be variable. Let me give you an example of such a case.

Let us have ​\(n \) stores. Each store has its own number: from 1 to ​\(n\) ​. Each store makes a profit. Let's take some (no matter what) ​\(i \) ​th store. The profit from it is equal to ​\(p_i \) ​.

\[ P = p_1+p_2+\cdots+p_i+\cdots+p_n \]

As you can see, all terms of this sum are of the same type. Then they can be briefly written as follows:

\[ P=\sum\limits^n_(i=1)p_i \]

In words: “Sum up the profits of all stores, starting with the first and ending with ​\(n\) ​-th.” In the form of a formula, it is much simpler, more convenient and more beautiful.

The result of the adder is a number called a weighted sum.

Weighted sum(Weighted sum) (​\(net \) ​) - the sum of the input signals multiplied by their corresponding weights.

\[ net=\sum\limits^n_(i=1)x_iw_i \]

The role of the adder is obvious - it aggregates all input signals (of which there can be many) into one number - a weighted sum that characterizes the signal received by the neuron as a whole. Another weighted sum can be represented as the degree of general excitation of the neuron.

Example

To understand the role of the last component of an artificial neuron - the activation function - I will give an analogy.

Let's look at one artificial neuron. His task is to decide whether to go on vacation at sea. To do this, we supply various data to its inputs. Let our neuron have 4 inputs:

  1. The cost of travel
  2. What's the weather like at sea?
  3. Current work situation
  4. Will there be a snack bar on the beach

We will characterize all these parameters as 0 or 1. Accordingly, if the weather at sea is good, then we apply 1 to this input. And so with all other parameters.

If a neuron has four inputs, then there must be four weights. In our example, weighting coefficients can be represented as indicators of the importance of each input, affecting common decision neuron. We distribute the input weights as follows:

It is easy to see that the factors of cost and weather at sea (the first two inputs) play a very important role. They will also play a decisive role when the neuron makes a decision.

Let us supply the following signals to the inputs of our neuron:

We multiply the weights of the inputs by the signals of the corresponding inputs:

The weighted sum for such a set of input signals is 6:

\[ net=\sum\limits^4_(i=1)x_iw_i = 5 + 0 + 0 + 1 =6 \]

This is where the activation function comes into play.

Activation function

It’s quite pointless to simply submit a weighted amount as output. The neuron must somehow process it and generate an adequate output signal. It is for these purposes that the activation function is used.

It converts the weighted sum into a certain number, which is the output of the neuron (we denote the output of the neuron by the variable ​\(out \) ​).

For different types of artificial neurons, the most different functions activation. In general, they are denoted by the symbol ​\(\phi(net) \) ​. Specifying the weighted signal in parentheses means that the activation function takes the weighted sum as a parameter.

Activation function (Activation function)(​\(\phi(net) \) ​) is a function that takes a weighted sum as an argument. The value of this function is the output of the neuron (​\(out \) ​).

Single jump function

The simplest type of activation function. The output of a neuron can only be equal to 0 or 1. If the weighted sum is greater than a certain threshold ​\(b\) ​, then the output of the neuron is equal to 1. If lower, then 0.

How can it be used? Let's assume that we go to the sea only when the weighted sum is greater than or equal to 5. This means our threshold is 5:

In our example, the weighted sum was 6, which means the output signal of our neuron is 1. So, we are going to the sea.

However, if the weather at sea were bad and the trip was very expensive, but there was a snack bar and the work environment was normal (inputs: 0011), then the weighted sum would be equal to 2, which means the output of the neuron would be equal to 0. So, We're not going anywhere.

Basically, a neuron looks at a weighted sum and if it is greater than its threshold, then the neuron produces an output equal to 1.

Graphically, this activation function can be depicted as follows.

The horizontal axis contains the values ​​of the weighted sum. On the vertical axis are the output signal values. As is easy to see, only two values ​​of the output signal are possible: 0 or 1. Moreover, 0 will always be output from minus infinity up to a certain value of the weighted sum, called the threshold. If the weighted sum is equal to or greater than the threshold, then the function returns 1. Everything is extremely simple.

Now let's write this activation function mathematically. You've almost certainly come across the concept of a compound function. This is when we combine several rules under one function by which its value is calculated. In the form of a composite function, the single jump function will look like this:

\[ out(net) = \begin(cases) 0, net< b \\ 1, net \geq b \end{cases} \]

There is nothing complicated about this recording. The output of a neuron (​\(out \) ​) depends on the weighted sum (​\(net \) ​) as follows: if ​\(net \) ​ (weighted sum) is less than some threshold (​\(b \ ) ​), then ​\(out \) ​ (neuron output) is equal to 0. And if ​\(net \) ​ is greater than or equal to the threshold ​\(b \) ​, then ​\(out \) ​ is equal to 1 .

Sigmoid function

In fact, there is a whole family of sigmoid functions, some of which are used as activation functions in artificial neurons.

All these functions have some very useful properties, for which they are used in neural networks. These properties will become apparent once you see graphs of these functions.

So... the most commonly used sigmoid in neural networks is logistic function.

The graph of this function looks quite simple. If you look closely, you can see some resemblance to the English letter ​\(S \) ​, which is where the name of the family of these functions comes from.

And this is how it is written analytically:

\[ out(net)=\frac(1)(1+\exp(-a \cdot net)) \]

What is the parameter ​\(a \) ​? This is some number that characterizes the degree of steepness of the function. Below are logistic functions with different parameters ​\(a \) ​.

Let's remember our artificial neuron, which determines whether it is necessary to go to the sea. In the case of the single jump function, everything was obvious. We either go to the sea (1) or not (0).

Here the case is closer to reality. We are not completely sure (especially if you are paranoid) - is it worth going? Then use logistics function as an activation function will result in you getting a number between 0 and 1. Moreover, the larger the weighted sum, the closer the output will be to 1 (but will never be exactly equal to it). Conversely, the smaller the weighted sum, the closer the neuron's output will be to 0.

For example, the output of our neuron is 0.8. This means that he believes that going to the sea is still worth it. If his output were equal to 0.2, then this means that he is almost certainly against going to the sea.

What remarkable properties does the logistics function have?

  • it is a “compressive” function, that is, regardless of the argument (weighted sum), the output signal will always be in the range from 0 to 1
  • it is more flexible than the single jump function - its result can be not only 0 and 1, but any number in between
  • at all points it has a derivative, and this derivative can be expressed through the same function

It is because of these properties that the logistic function is most often used as an activation function in artificial neurons.

Hyperbolic tangent

However, there is another sigmoid - the hyperbolic tangent. It is used as an activation function by biologists to create a more realistic model of a nerve cell.

This function allows you to get output values ​​of different signs (for example, from -1 to 1), which can be useful for a number of networks.

The function is written as follows:

\[ out(net) = \tanh\left(\frac(net)(a)\right) \]

In the above formula, the parameter ​\(a \) ​ also determines the degree of steepness of the graph of this function.

And this is what the graph of this function looks like.

As you can see, it looks like a graph of a logistic function. The hyperbolic tangent has all the useful properties that the logistic function has.

What have we learned?

Now you have a complete understanding of the internal structure of an artificial neuron. I'll bring it again short description his works.

A neuron has inputs. They receive signals in the form of numbers. Each input has its own weight (also a number). The input signals are multiplied by the corresponding weights. We get a set of “weighted” input signals.

The weighted sum is then converted activation function and we get neuron output.

Let us now formulate the shortest description of the operation of a neuron – its mathematical model:

Mathematical model of an artificial neuron with ​\(n \) ​ inputs:

Where
​\(\phi \) ​ – activation function
\(\sum\limits^n_(i=1)x_iw_i \)​ – weighted sum, as the sum of ​\(n\) ​ products of input signals by the corresponding weights.

Types of ANN

We have figured out the structure of an artificial neuron. Artificial neural networks consist of a collection of artificial neurons. A logical question arises - how to place/connect these same artificial neurons to each other?

As a rule, most neural networks have a so-called input layer, which performs only one task - distributing input signals to other neurons. The neurons in this layer do not perform any calculations.

Single-layer neural networks

In single-layer neural networks, signals from the input layer are immediately fed to the output layer. It performs the necessary calculations, the results of which are immediately sent to the outputs.

A single-layer neural network looks like this:

In this picture, the input layer is indicated by circles (it is not considered a neural network layer), and on the right is a layer of ordinary neurons.

Neurons are connected to each other by arrows. Above the arrows are the weights of the corresponding connections (weighting coefficients).

Single layer neural network (Single-layer neural network) - a network in which signals from the input layer are immediately fed to the output layer, which converts the signal and immediately produces a response.

Multilayer neural networks

Such networks, in addition to the input and output layers of neurons, are also characterized by a hidden layer (layers). Their location is easy to understand - these layers are located between the input and output layers.

This structure of neural networks copies the multilayer structure of certain parts of the brain.

It is no coincidence that the hidden layer got its name. The fact is that only relatively recently methods for training hidden layer neurons were developed. Before this, only single-layer neural networks were used.

Multilayer neural networks have much greater capabilities than single-layer ones.

The work of hidden layers of neurons can be compared to the work of a large factory. The product (output signal) at the plant is assembled in stages. After each machine some intermediate result is obtained. Hidden layers also transform input signals into some intermediate results.

Multilayer neural network (Multilayer neural network) - a neural network consisting of an input, an output and one (several) hidden layers of neurons located between them.

Direct distribution networks

You can notice one very interesting detail in the pictures of neural networks in the examples above.

In all examples, the arrows strictly go from left to right, that is, the signal in such networks goes strictly from the input layer to the output layer.

Direct distribution networks (Feedforward neural network) (feedforward networks) - artificial neural networks in which the signal propagates strictly from the input layer to the output layer. The signal does not propagate in the opposite direction.

Such networks are widely used and quite successfully solve a certain class of problems: forecasting, clustering and recognition.

However, no one forbids the signal to go in the opposite direction.

Feedback networks

In networks of this type, the signal can also go in the opposite direction. What's the advantage?

The fact is that in feedforward networks, the output of the network is determined by the input signal and weighting coefficients for artificial neurons.

And in networks with feedback, the outputs of neurons can return to the inputs. This means that the output of a neuron is determined not only by its weights and input signal, but also by previous outputs (since they returned to the inputs again).

The ability of signals to circulate in a network opens up new, amazing possibilities for neural networks. Using such networks, you can create neural networks that restore or complement signals. In other words, such neural networks have the properties of short-term memory (like a person’s).

Feedback networks (Recurrent neural network) - artificial neural networks in which the output of a neuron can be fed back to its input. More generally, this means the ability to propagate a signal from outputs to inputs.

Neural network training

Now let's look at the issue of training a neural network in a little more detail. What it is? And how does this happen?

What is network training?

An artificial neural network is a collection of artificial neurons. Now let's take, for example, 100 neurons and connect them to each other. It is clear that when we apply a signal to the input, we will get something meaningless at the output.

This means we need to change some network parameters until the input signal is converted into the output we need.

What can we change in a neural network?

Changing the total number of artificial neurons makes no sense for two reasons. Firstly, increasing the number of computing elements as a whole only makes the system heavier and more redundant. Secondly, if you gather 1000 fools instead of 100, they still won’t be able to answer the question correctly.

The adder cannot be changed, since it performs one strictly defined function - adding. If we replace it with something or remove it altogether, then it will no longer be an artificial neuron at all.

If we change the activation function of each neuron, we will get a neural network that is too heterogeneous and uncontrollable. In addition, in most cases, neurons in neural networks are of the same type. That is, they all have the same activation function.

There is only one option left - change connection weights.

Neural network training (Training)- search for such a set of weighting coefficients in which the input signal, after passing through the network, is converted into the output we need.

This approach to the term “neural network training” also corresponds to biological neural networks. Our brain consists of a huge number of neural networks connected to each other. Each of them individually consists of neurons of the same type (the activation function is the same). We learn by changing synapses - elements that strengthen / weaken the input signal.

However there is one more important point. If you train a network using only one input signal, then the network will simply “remember the correct answer.” From the outside it will seem that she “learned” very quickly. And as soon as you give a slightly modified signal, expecting to see the correct answer, the network will produce nonsense.

In fact, why do we need a network that detects a face in only one photo? We expect the network to be able generalize some signs and recognize faces in other photographs too.

It is for this purpose that they are created training samples.

Training set (Training set) - a finite set of input signals (sometimes together with the correct output signals) from which the network is trained.

After the network is trained, that is, when the network produces correct results for all input signals from the training set, it can be used in practice.

However, before launching a freshly baked neural network into battle, the quality of its work is often assessed on the so-called test sample.

Test sample (Testing set) - a finite set of input signals (sometimes together with the correct output signals) by which the quality of the network is assessed.

We understood what “network training” is – choosing the right set of weights. Now the question arises - how can you train a network? In the most general case, there are two approaches that lead to different results: supervised learning and unsupervised learning.

Tutored training

The essence of this approach is that you provide a signal as an input, look at the network’s response, and then compare it with a ready-made, correct response.

Important point. Do not confuse the correct answers with the known solution algorithm! You can trace the face in the photo with your finger (correct answer), but you won’t be able to tell how you did it (well-known algorithm). The situation is the same here.

Then, using special algorithms, you change the weights of the neural network connections and again give it an input signal. You compare its answer with the correct one and repeat this process until the network begins to respond with acceptable accuracy (as I said in Chapter 1, the network cannot give unambiguously accurate answers).

Tutored training (Supervised learning) is a type of network training in which its weights are changed so that the network’s answers differ minimally from the already prepared correct answers.

Where can I get the correct answers?

If we want the network to recognize faces, we can create a training set of 1000 photos (input signals) and independently select faces from it (correct answers).

If we want the network to predict price increases/declines, then the training sample must be made based on past data. As input signals, you can take certain days, the general state of the market and other parameters. And the correct answers are the rise and fall of prices in those days.

It is worth noting that the teacher, of course, is not necessarily a person. The fact is that sometimes the network has to be trained for hours and days, making thousands and tens of thousands of attempts. In 99% of cases, this role is performed by a computer, or more precisely, a special computer program.

Unsupervised learning

Unsupervised learning is used when we do not have the correct answers to the input signals. In this case, the entire training set consists of a set of input signals.

What happens when the network is trained in this way? It turns out that with such “training” the network begins to distinguish classes of signals supplied to the input. In short, the network begins clustering.

For example, you are demonstrating candy, pastries and cakes to the network. You do not regulate the operation of the network in any way. You simply feed data about this object to its inputs. Over time, the network will begin to produce signals of three different types, which are responsible for the objects at the input.

Unsupervised learning (Unsupervised learning) is a type of network training in which the network independently classifies input signals. The correct (reference) output signals are not shown.

conclusions

In this chapter, you learned everything about the structure of an artificial neuron, as well as a thorough understanding of how it works (and its mathematical model).

Moreover, you now know about various types artificial neural networks: single-layer, multi-layer, as well as feedforward networks and networks with feedback.

You also learned about supervised and unsupervised network learning.

You already know the necessary theory. Subsequent chapters include consideration of specific types of neural networks, specific algorithms for their training, and programming practice.

Questions and tasks

You should know the material in this chapter very well, since it contains basic theoretical information on artificial neural networks. Be sure to achieve confident and correct answers to all the questions and tasks below.

Describe the simplifications of ANNs compared to biological neural networks.

1. The complex and intricate structure of biological neural networks is simplified and represented in the form of diagrams. Only the signal processing model is left.

2. The nature of electrical signals in neural networks is the same. The only difference is their size. We remove electrical signals, and instead use numbers indicating the magnitude of the transmitted signal.

The activation function is often denoted by ​\(\phi(net) \) ​.

Write down a mathematical model of an artificial neuron.

An artificial neuron with ​\(n \) ​ inputs converts an input signal (number) into an output signal (number) as follows:

\[ out=\phi\left(\sum\limits^n_(i=1)x_iw_i\right) \]

What is the difference between single-layer and multi-layer neural networks?

Single-layer neural networks consist of a single computational layer of neurons. The input layer sends signals directly to the output layer, which converts the signal and immediately produces the result.

Multilayer neural networks, in addition to input and output layers, also have hidden layers. These hidden layers carry out some internal intermediate transformations, similar to the stages of production of products in a factory.

What is the difference between feedforward networks and feedback networks?

Feedforward networks allow the signal to pass in only one direction - from inputs to outputs. Networks with feedback do not have these restrictions, and the outputs of neurons can be fed back into the inputs.

What is a training set? What is its meaning?

Before using the network in practice (for example, to solve current problems for which you do not have answers), you need to collect a collection of problems with ready-made answers, on which to train the network. This collection is called the training set.

If you collect too small a set of input and output signals, the network will simply remember the answers and the learning goal will not be achieved.

What is meant by network training?

Network training is the process of changing the weighting coefficients of the artificial neurons of the network in order to select a combination of them that converts the input signal into the correct output.

What is supervised and unsupervised learning?

When training a network with a teacher, signals are given to its inputs, and then its output is compared with a previously known correct output. This process is repeated until the required accuracy of answers is achieved.

If networks only supply input signals, without comparing them with ready outputs, then the network begins to independently classify these input signals. In other words, it performs clustering of input signals. This type of learning is called unsupervised learning.

Intelligent systems based on artificial neural networks can successfully solve problems of pattern recognition, making predictions, optimization, associative memory and control.

17.04.1997 Jianchang Mao, Enil Jane

Intelligent systems based on artificial neural networks can successfully solve problems of pattern recognition, making predictions, optimization, associative memory and control. There are other, more traditional approaches to solving these problems, but they do not have the necessary flexibility outside limited conditions. ANNs provide promising alternative solutions and many applications benefit from their use.

Intelligent systems based artificial neural networks(ANN) make it possible to successfully solve problems of pattern recognition, making predictions, optimization, associative memory and control. There are other, more traditional approaches to solving these problems, but they do not have the necessary flexibility outside limited conditions. ANNs provide promising alternative solutions and many applications benefit from their use. This article is an introduction to modern problems of ANNs and contains a discussion of the reasons for their rapid development. The basic principles of operation of a biological neuron and its artificial computational model are also described here. A few words will be said about neural network architectures and ANN training processes. The article concludes with an introduction to the problem of text recognition - the most successful implementation of ANN.

A long period of evolution has given the human brain many qualities that are absent both in machines with von Neumann architecture and in modern parallel computers. These include:

  • massive parallelism;
  • distributed representation of information and calculations;
  • learning ability and generalization ability;
  • adaptability;
  • property of contextual information processing;
  • tolerance for mistakes;
  • low power consumption.

It can be assumed that devices built on the same principles as biological neurons will have the listed characteristics.

From biological networks to ANN

Modern digital computing machines superior to humans in the ability to perform numerical and symbolic calculations. However, a person can effortlessly solve complex problems of perceiving external data (for example, recognizing a person in a crowd only by his flashing face) with such speed and accuracy that the most powerful computer in the world seems hopelessly slow-witted in comparison. What is the reason for such a significant difference in their performance? The architecture of a biological neural system is completely different from the architecture of a von Neumann machine (Table 1), significantly influencing the types of functions that are more efficiently performed by each model.

Table 1. Von Neumann machine compared to a biological neural system

von Neumann machine Biological neural system
CPU Difficult Simple
High speed Low speed
One or more A large number of
Memory Separated from the processor Integrated into the processor
Localized Distributed
Addressing not based on content Addressing by content
Computations Centralized Distributed
Consecutive Parallel
Stored programs Self-study
Reliability High vulnerability Vitality
Specialization Numerical and symbolic operations Problems of perception
Operating environment Strictly defined Poorly defined
Strictly limited No limits

Like a biological neural system, an ANN is a computing system with a huge number of parallel functioning simple processors with many connections. ANN models to some extent reproduce the “organizational” principles inherent in the human brain. Modeling a biological neural system using ANN can also contribute to a better understanding of biological functions. Manufacturing technologies such as VLSI (ultra-high level of integration) and optical hardware make such simulations possible.

An in-depth study of ANN requires knowledge of neurophysiology, cognitive science, psychology, physics (statistical mechanics), control theory, computational theory, artificial intelligence problems, statistics/mathematics, pattern recognition, computer vision, parallel computing and hardware (digital/analog/VLSI /optical). On the other hand, ANNs also stimulate these disciplines by providing them with new tools and insights. This symbiosis is vital for neural network research.

Let us present some problems that can be solved in the context of ANN and are of interest to scientists and engineers.

Classification of images. The task is to indicate the membership of an input image (for example, a speech signal or a handwritten character), represented by a feature vector, to one or more predefined classes. Notable applications include letter recognition, speech recognition, electrocardiogram signal classification, and blood cell classification.

Clustering/categorization. When solving a clustering problem, which is also known as unsupervised image classification, there is no training set with class labels. The clustering algorithm is based on image similarity and places similar images into one cluster. There are known cases of using clustering to extract knowledge, compress data, and study data properties.

Function approximation. Suppose that there is a training sample ((x 1 ,y 1 ), (x 2 ,y 2 )..., (x n ,y n )) (input-output data pairs), which is generated by an unknown function (x) distorted by noise . The approximation problem is to find an estimate for the unknown function (x). Function approximation is necessary in solving numerous engineering and scientific modeling problems.

Prediction/forecast. Let n discrete samples (y(t 1 ), y(t 2 )..., y(t n )) be given at successive times t 1 , t 2 ,..., t n . The task is to predict the value of y(t n+1 ) at some future time t n+1 . Prediction/forecasting has a significant impact on decision making in business, science and technology. Predicting stock market prices and weather forecasting are typical applications of prediction/forecasting techniques.

Optimization. Numerous problems in mathematics, statistics, engineering, science, medicine, and economics can be considered optimization problems. The task of the optimization algorithm is to find a solution that satisfies the system of constraints and maximizes or minimizes the objective function. The traveling salesman problem, which belongs to the class of NP-complete, is a classic example of an optimization problem.

Content addressable memory. In the von Neumann model of computing, memory is accessed only through an address, which is independent of the contents of the memory. Moreover, if an error is made in calculating the address, completely different information may be found. Associative memory, or content-addressable memory, is accessed by specific content. Memory contents can be recalled even by partial input or corrupted contents. Associative memory is extremely desirable when creating multimedia information bases data.

Control. Consider a dynamic system defined by the set (u(t), y(t)), where u(t) is the input control action, and y(t) is the output of the system at time t. In control systems with a reference model, the control goal is to calculate the input action u(t) such that the system follows the desired path dictated by the reference model. An example is optimal engine control.

Brief historical overview

Research in the field of ANN has experienced three periods of intensification. The first peak in the 40s is due to the pioneering work of McCulloch and Pitts. The second arose in the 60s thanks to Rosenblatt's perceptron convergence theorem and the work of Minsky and Papert, which indicated the limited capabilities of the simplest perceptron. Minsky and Papert's results dampened the enthusiasm of most researchers, especially those working in the field of computer science. The lull in neural network research lasted almost 20 years. Since the early 1980s, ANNs have received renewed interest from researchers due to the Hopfield energy approach and the backpropagation algorithm for multilayer perceptron training (multilayer feedforward networks), first proposed by Verbos and independently developed by a number of other authors. The algorithm became famous thanks to Rumelhart in 1986. Anderson and Rosenfeld prepared a detailed historical background on the development of ANN.

Biological neural networks

A neuron (nerve cell) is a special biological cell that processes information (Fig. 1). It consists of a cell body, or soma, and two types of external tree-like branches: the axon and dendrites. The cell body includes a nucleus (nucleus), which contains information about hereditary properties, and plasma, which has the molecular means to produce the materials necessary for the neuron. A neuron receives signals (impulses) from other neurons through dendrites (receivers) and transmits signals generated by the cell body along an axon (transmitter), which eventually branches into fibers (strands). At the ends of these fibers there are synapses.

Rice. 1.

A synapse is an elementary structure and functional unit between two neurons (an axon fiber of one neuron and a dendrite of another). When an impulse reaches a synaptic terminal, certain chemicals called neurotransmitters are released. Neurotransmitters diffuse across the synaptic cleft, stimulating or inhibiting, depending on the type of synapse, the ability of the receiver neuron to generate electrical impulses. The performance of a synapse can be tuned by the signals passing through it, so that synapses can learn depending on the activity of the processes in which they participate. This dependence on background acts as memory, which is possibly responsible for human memory.

The human cerebral cortex is an extensive surface formed by neurons, 2 to 3 mm thick, with an area of ​​about 2200 cm 2, which is twice the surface area standard keyboard. The cerebral cortex contains about 1011 neurons, which is approximately equal to the number of stars in the Milky Way. Each neuron is connected to 103 - 104 other neurons. Overall, the human brain contains approximately 1014 to 1015 connections.

Neurons communicate through a short series of impulses, typically lasting a few milliseconds. The message is transmitted using pulse frequency modulation. The frequency can vary from a few hertz to hundreds of hertz, which is a million times slower than the fastest switching electronic circuits. Nevertheless, a person makes complex decisions on the perception of information, such as facial recognition, in a few hundred ms. These decisions are controlled by a network of neurons that have a speed of only a few milliseconds. This means that the computation requires no more than 100 consecutive stages. In other words, for such complex tasks, the brain “runs” parallel programs containing about 100 steps. This is known as the hundred pace rule. Reasoning in a similar way, one can find that the amount of information sent from one neuron to another must be very small (a few bits). It follows that the main information is not transmitted directly, but is captured and distributed in connections between neurons. This explains the name connectionist model applied to ANNs.

Basic Concepts

Technical neuron model

McCulloch and Pitts proposed using a binary threshold element as a model for an artificial neuron. This mathematical neuron calculates the weighted sum of n input signals x j, j = 1, 2... n, and generates an output signal of value 1 if this sum exceeds a certain threshold u, and 0 otherwise.

It is often convenient to think of u as a weighting coefficient associated with the constant input x 0 = 1. Positive weights correspond to excitatory connections, and negative weights correspond to inhibitory connections. McCulloch and Pitts showed that, with appropriately selected weights, a collection of parallel neurons of this type is capable of performing universal calculations. There is a certain analogy here with a biological neuron: signal transmission and connections are imitated by axons and dendrites, connection weights correspond to synapses, and the threshold function reflects the activity of the soma.

Neural network architecture

An ANN can be considered as a directed graph with weighted connections, in which artificial neurons are nodes. Based on the architecture of connections, ANNs can be grouped into two classes (Fig. 2): feed-forward networks, in which graphs do not have loops, and recurrent networks, or networks with feedback connections.

Rice. 2.

In the most common family of first-class networks, called multilayer perceptrons, neurons are arranged in layers and have unidirectional connections between layers. In Fig. Figure 2 shows typical networks of each class. Feedforward networks are static in the sense that for a given input they produce one set of output values ​​independent of previous state networks. Recurrent networks are dynamic, since due to feedback, the inputs of neurons are modified in them, which leads to a change in the state of the network.

Education

The ability to learn is a fundamental property of the brain. In the context of an ANN, the learning process can be viewed as tuning the network architecture and connection weights to efficiently perform a specific task. Typically, a neural network must adjust the connection weights based on the available training set. Network performance improves as the weights are adjusted iteratively. The ability of networks to learn from examples makes them more attractive compared to systems that follow a certain system of operating rules formulated by experts.

To design the learning process, first of all, it is necessary to have a model of the external environment in which the neural network operates - to know the information available to the network. This model defines the learning paradigm. Secondly, it is necessary to understand how to modify the network's weight parameters - what learning rules govern the tuning process. A learning algorithm means a procedure that uses learning rules to adjust the weights.

There are three learning paradigms: “with a teacher”, “without a teacher” (self-learning) and mixed. In the first case, the neural network has the correct answers (network outputs) for each input example. The weights are adjusted so that the network produces answers as close as possible to the known correct answers. The enhanced version of supervised learning assumes that only a critical estimate of the correctness of the neural network's output is known, but not the correct output values ​​themselves. Unsupervised learning does not require knowing the correct answers to each example of the training sample. This reveals the internal data structure or correlations between patterns in the data system, allowing samples to be categorized. In blended learning, part of the weights is determined through supervised learning, while the rest is obtained through self-learning.

Learning theory considers three fundamental properties associated with learning from examples: capacity, sample complexity, and computational complexity. Capacity refers to how many samples the network can remember, and what functions and decision boundaries can be formed on it. Sample complexity determines the number of training examples required to achieve generalization ability of the network. Too few examples can cause the network to be “overtrained,” where it performs well on examples from the training set, but poorly on test examples subject to the same statistical distribution. There are 4 main types of learning rules: error correction, Boltzmann machine, Hebb's rule and competitive learning.

Error correction rule. In supervised learning, each input example is given a desired output d. The actual network output y may not coincide with the desired one. The principle of error correction during training is to use the signal (d-y) to modify the weights, ensuring a gradual reduction in error. Learning only takes place when the perceptron makes a mistake. Various modifications of this learning algorithm are known.

Boltzmann training. Represents a stochastic learning rule that follows from information theoretical and thermodynamic principles. The goal of Boltzmann training is to adjust the weights such that the states of the visible neurons satisfy the desired probability distribution. Boltzmann learning can be considered as a special case of error correction, in which error is understood as the divergence of state correlations in the two modes.

Hebb's rule. The oldest learning rule is the Hebbian learning postulate. Hebb relied on the following neurophysiological observations: if neurons on both sides of a synapse fire simultaneously and regularly, the strength of the synaptic connection increases. An important feature of this rule is that the change in synaptic weight depends only on the activity of neurons that are connected by a given synapse. This greatly simplifies the learning chains in VLSI implementations.

Training through competition. Unlike Hebbian learning, in which many output neurons can fire simultaneously, in competitive learning, output neurons compete with each other to fire. This phenomenon is known as the winner-take-all rule. Similar learning takes place in biological neural networks. Learning through competition allows for clustering of input data: similar examples are grouped by the network according to correlations and represented as a single element.

During training, only the weights of the “winning” neuron are modified. The effect of this rule is achieved by changing the sample stored in the network (the vector of connection weights of the winning neuron), in which it becomes a little closer to the input example. In Fig. Figure 3 gives a geometric illustration of training using the competition method. The input vectors are normalized and represented by points on the surface of a sphere. The weight vectors for the three neurons are initialized with random values. Their initial and final values ​​after training are marked with X in Fig. 3a and 3b respectively. Each of the three groups of examples is detected by one of the output neurons, whose weight vector is tuned to the center of gravity of the detected group.

Rice. 3.

It can be observed that the network never stops learning unless the learning rate parameter is 0. Some input sample may activate another output neuron in subsequent iterations in the learning process. This raises the question of the sustainability of the training system. The system is considered stable if none of the examples in the training sample changes its category membership after a finite number of iterations of the training process. One way to achieve stability is to gradually decrease the learning rate parameter to 0. However, this artificial inhibition of learning causes another problem called plasticity, which relates to the ability to adapt to new data. These features of competitive learning are known as the Grossberg stability-plasticity dilemma.

Table 2 presents various learning algorithms and their associated network architectures (the list is not exhaustive). The last column lists the tasks for which each algorithm can be applied. Each learning algorithm is focused on a network of a specific architecture and is intended for a limited class of tasks. In addition to those discussed, some other algorithms should be mentioned: Adaline and Madaline, linear discriminant analysis, Sammon projections, principal component analysis.

Table 2.Known learning algorithms

Paradigm Learning Rule Architecture Learning algorithm Task
With teacher Error Correction Single-layer and multi-layer perceptron Perceptron training algorithms
Backpropagation
Adaline and Madaline
Classification of images
Function approximation
Prediction, control
Boltzmann Recurrent Boltzmann learning algorithm Classification of images
Hebb Linear discriminant analysis Data analysis
Classification of images
Competition Competition Vector quantization Categorization within a class Data compression
ART Network ARTMap Classification of images
Without a teacher Error Correction Multilayer direct propagation Sammon's projection Categorization within a class Data analysis
Hebb Direct distribution or competition Principal Component Analysis Data analysis
Data compression
Hopfield network Associative memory training Associative memory
Competition Competition Vector quantization Categorization
Data compression
SOM Kohonen SOM Kohonen Categorization
Data analysis
ART networks ART1, ART2 Categorization
Mixed Error correction and competition RBF network RBF learning algorithm Classification of images
Function approximation
Prediction, control

Multilayer feedforward networks

A standard L-layer feedforward network consists of a layer of input nodes (we will adhere to the statement that it is not included in the network as an independent layer), (L-1) hidden layers and an output layer connected sequentially in the forward direction and not containing connections between elements within a layer and feedback between layers. In Fig. Figure 4 shows the structure of a three-layer network.

Rice. 4.

Multilayer Perceptron

The most popular class of multilayer feedforward networks is formed by multilayer perceptrons, in which each computing element uses a threshold or sigmoid activation function. A multilayer perceptron can form arbitrarily complex decision boundaries and implement arbitrary Boolean functions. The development of a backpropagation algorithm for determining weights in a multilayer perceptron has made these networks the most popular among researchers and users of neural networks. The geometric interpretation explains the role of the elements of the hidden layers (a threshold activation function is used).

RBF networks

Radial basis function networks (RBF networks) are a special case of a two-layer feedforward network. Each element of the hidden layer uses a Gaussian-type radial basis function as its activation function. The radial basis function (kernel function) is centered at a point that is determined by the weight vector associated with the neuron. Both the position and the width of the kernel function must be trained from selective samples. Typically there are much fewer kernels than training examples. Each output element computes a linear combination of these radial basis functions. From the point of view of the approximation problem, hidden elements form a set of functions that form a basic system for representing input examples in the space built on it.

There are various algorithms for training RBF networks. The main algorithm uses a two-step learning strategy, or blended learning. It estimates the kernel position and width using an unsupervised clustering algorithm, followed by a supervised mean square error minimization algorithm to determine the weights of the connections between the hidden and output layers. Since the output elements are linear, a non-iterative algorithm is used. Once this initial guess is obtained, gradient descent is used to refine the network parameters.

This mixed RBF network training algorithm converges much faster than the backpropagation algorithm for training multilayer perceptrons. However, an RBF network often contains too many hidden elements. This entails a slower operation of an RBF network than a multilayer perceptron. The efficiency (error as a function of network size) of an RBF network and a multilayer perceptron depends on the problem being solved.

Unsolved problems

There are many controversial issues when designing feedforward networks—for example, how many layers are needed for a given task, how many elements should be selected in each layer, how the network will respond to data not included in the training set (how much generalization ability of the network), and what size training set is necessary to achieve “good” generalization ability of the network.

Although multilayer feedforward networks are widely used for function classification and approximation, many parameters still need to be determined through trial and error. Existing theoretical results provide only weak guidelines for the selection of these parameters in practical applications.

Self-organizing Kohonen maps

Self-organizing Kohonen maps(SOM) have the advantageous property of topology preservation, which reproduces an important aspect of feature maps in the cerebral cortex of highly organized animals. In a topology-preserving mapping, close input examples excite nearby output elements. In Fig. Figure 2 shows the basic architecture of Kohonen's SOM network. Essentially it represents two-dimensional array elements, with each element connected to all n input nodes.

Such a network is a special case of a competitive learning network in which a spatial neighborhood is determined for each output element. The local neighborhood can be a square, rectangle or circle. The initial neighborhood size is often set to 1/2 to 2/3 the size of the network and shrinks according to a specific law (for example, exponential decay). During training, all weights associated with the winner and its neighbors are modified.

Self-organizing Kohonen maps (networks) can be used for high-dimensional data design, density approximation, and clustering. This network has been successfully used for speech recognition, image processing, robotics and control tasks. Network parameters include the dimension of the neuron array, the number of neurons in each dimension, the shape of the neighborhood, the neighborhood compression law, and the learning rate.

Models of adaptive resonance theory

Let us recall that the stability-plasticity dilemma is an important feature of competitive learning. How to teach new phenomena (plasticity) and at the same time maintain stability so that existing knowledge is not erased or destroyed?

Carpenter and Grossberg, who developed the adaptive resonance theory models (ART1, ART2 and ARTMAP), attempted to resolve this dilemma. The network has a sufficient number of output elements, but they are not used until the need arises. We will say that an element is allocated (not allocated) if it is used (not used). The learning algorithm corrects the existing category prototype only if the input vector is sufficiently similar to it. In this case they resonate. The degree of similarity is controlled by the similarity parameter k, 0

To illustrate the model, consider the ART1 network, which is designed for a binary (0/1) input. A simplified diagram of the ART1 architecture is shown in Fig. 5. It contains two layers of elements with complete connections.

Rice. 5.

The top-down weight vector w j corresponds to input layer element j, and the bottom-up weight vector i is associated with output element i; i is the normalized version of w i . Vectors w j store prototypes of clusters. The role of normalization is to prevent long-length vectors from dominating over short-length vectors. The reset signal R is generated only when the similarity is below a specified level.

The ART1 model can create new categories and discard input examples when the network reaches its capacity. However, the number of categories detected by the network is sensitive to the similarity parameter.

Hopfield network

Hopfield used the energy function as a tool for constructing recurrent networks and for understanding their dynamics. Hopfield's formalization made clear the principle of storing information as dynamically stable attractors and popularized the use of recurrent networks for associative memory and for solving combinatorial optimization problems.

Dynamically changing network states can be done in at least two ways: synchronously and asynchronously. In the first case, all elements are modified simultaneously at each time step, in the second, at each moment in time one element is selected and processed. This element may be selected randomly. The main property of the energy function is that during the evolution of network states, according to the equation, it decreases and reaches a local minimum (attractor), in which it maintains constant energy.

Associative memory

If the patterns stored in the network are attractors, it can be used as associative memory. Any example located in the area of ​​attraction of a stored sample can be used as a pointer to restore it.

Associative memory typically operates in two modes: storage and retrieval. In the storage mode, the weights of connections in the network are determined so that the attractors remember a set of p n-dimensional samples (x 1, x 2,..., x p) that must be stored. In the second mode, the input example is used as the initial state of the network, and then the network evolves according to its dynamics. The output pattern is established when the network reaches equilibrium.

How many examples can be stored in a network with n binary elements? In other words, what is the storage capacity of the network? It is finite because a network with n binary elements has at most 2n different states, and not all of them are attractors. Moreover, not all attractors can store useful patterns. False attractors can also store samples, but they are different from the training set examples. Shown, that maximum number of random samples that a Hopfield network can store is Pmax (0.15 n. When the number of stored samples is p (0.15 n, the most successful recall of data from memory is achieved. If the stored samples are represented by orthogonal vectors (as opposed to random), then the number of stored in memory samples will increase. The number of false attractors increases when p reaches the network capacity. Several learning rules are proposed to increase the memory capacity of a Hopfield network. Note that in a network, 2n links are required to be implemented to store p n-bit examples.

Energy minimization

The Hopfield network evolves in the direction of reducing its energy. This allows combinatorial optimization problems to be solved if they can be formulated as energy minimization problems. In particular, the traveling salesman problem can be formulated in a similar way.

Applications

At the beginning of the article, 7 classes of various ANN applications were described. It should be kept in mind that to successfully solve real-world problems, it is necessary to define a number of characteristics, including the network model, its size, activation function, training parameters, and a set of training examples. For illustration practical application feedforward networks, consider the problem of character image recognition (OCR task, which consists of processing a scanned image of text and converting it into text form).

OCR system

An OCR system typically consists of preprocessing, segmentation, feature extraction, classification, and contextual processing blocks. The paper document is scanned and a grayscale or binary (black and white) image is created. At the preprocessing stage, filtering is applied to remove noise, the text region is localized and converted to a binary image using a global and local adaptive threshold converter. In the segmentation step, the text image is divided into individual characters. This task is especially difficult for handwritten text that contains connections between adjacent characters. One effective technique is to partition a composite sample into small samples (intermediate segmentation) and find the correct segmentation points using the output of the sample classifier. Due to varying slant, distortion, noise, and writing styles, recognizing segmented characters is challenging.

Calculation schemes

In Fig. Figure 6 presents two main schemes for using ANN in OCR systems. The first performs explicit extraction of characteristic features (not necessarily on a neural network). For example, these could be signs of circumvention. The selected features are fed to the input of a multilayer feed-forward network. This scheme is flexible in terms of the use of a wide variety of features. Another scheme does not provide for the explicit extraction of features from the source data. Feature extraction occurs implicitly in the hidden layers of the ANN. The convenience of this scheme is that feature extraction and classification are combined and training occurs simultaneously, which gives an optimal classification result. However, the scheme requires a larger network size than in the first case.

Rice. 6.

A typical example of such an integrated circuit is considered by Kuhn for zip code recognition.

results

ANNs are very effectively used in OCR applications. However, there is no convincing evidence of their superiority over corresponding statistical classifiers. At the first OCR systems conference in 1992, more than 40 handwriting recognition systems were compared for the same data. Of these, the top 10 used a variant of the multilayer feedforward network or the nearest neighbor classifier. ANNs tend to be superior in speed and memory requirements compared to the nearest neighbor method, in contrast to which the classification speed using ANNs does not depend on the size of the training sample. The recognition accuracy of the best OCR systems based on a data of pre-segmented characters was about 98% for numbers, 96% for capital letters and 87 for lowercase. (The low accuracy for lowercase letters is due in large part to the fact that the test data was significantly different from the training data.) Based on the test data, we can conclude that on isolated OCR characters the system is close in accuracy to a human. However, humans outperform OCR systems on constraint-free and handwritten documents.

***

The development of ANN has caused a lot of enthusiasm and criticism. Some comparative studies have been optimistic, others pessimistic. For many tasks, such as pattern recognition, no dominant approaches have yet been created. The choice of the best technology should be dictated by the nature of the problem. One must try to understand the capabilities, background and scope of different approaches and make maximum use of their additional benefits for further development intelligent systems. Such efforts could lead to a synergistic approach that combines ANN with other technologies to make significant breakthroughs in solving pressing problems. As Minsky recently noted, it's time to build systems beyond individual components. Individual modules are important, but we also need an integration methodology. It is clear that the interaction and joint work of researchers in the field of ANN and other disciplines will not only avoid repetition, but also (more importantly) stimulate and impart new qualities to the development of individual areas.

Literature

1. DARPA Neural Network Study, AFCEA Int"l Press, Fairfax, Va., 1988.
2. J. Hertz, A. Krogh, and R.G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, Mass., 1991.
3. S. Haykin, Neural Networks: A Comprehensive Foundation, MacMillan College Publishing Co., New York, 1994.
4. W.S. McCulloch and W. Pitts, "A logical Calculus of Ideas Immanent in Nervous Activity", Bull. Mathematical Biophysics, Vol. 5, 1943, pp. 115-133.
5. R. Rosenblatt, "Principles of Neurodynamics", Spartan Books, New York, 1962.
6. M. Mitnsky and S. Papert, "Perceptrons: An Introduction to Computational Geometry", MIT Press, Cambridge, Mass., 1969.
7. J.J. Hopfield, “Neural Networks and Physical Systems with Emerging Collective Computational Abilities,” in Proc. National Academy of Sciences, USA 79, 1982, pp. 2554-2558.
8. P. Werbos, "Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences", Phd Thesis, Dept. of Applied Mathematics, Harvard University, Cambridge, Mass., 1974.
9.D.E. Rumelhart and J.L. McClelland, Parallel Distributed Processing: Exploration in the Microstructure of Cognition, MIT Press, Cambridge, Mass., 1986.
10. J.A. Anderson and E. Rosenfeld, "Neurocomputing: Foundation of Research", MIT Press, Cambridge, Mass., 1988.
11. S. Brunak and B. Lautrup, Neural Networks, Computers with Intuition, World Scientific, Singapore, 1990.
12. J. Feldman, M.A. Fanty, and N.H. Goddard, “Computing with Structured Neural Networks,” Computer, Vol. 21, No. 3, Mar.1988, pp. 91-103.
13. D.O. Hebb, The Organization of Behavior, John Wiley & Sons, New York, 1949.
14. R.P. Lippmann, "An Introduction to Computing with Neural Nets", IEEE ASSP Magazine, Vol.4, No.2, Apr. 1987, pp. 4-22.
15. A.K. Jain and J. Mao, “Neural Networks and Pattern Recognition,” in Computational Intelligence: Imitating Life, J.M. Zurada, R.J. Marks II, and C.J. Robinson, eds., IEEE Press, Piscataway, N.J., 1994, pp. 194-212.
16. T. Kohonen, Self-Organization and Associative Memory, Third Edition, Springer-Verlag, New York, 1989.
17. G. A. Carpenter and S. Grossberg, Pattern Recognition by Self-Organizing Neural Networks, MIT Press, Cambridge, Mass., 1991.
18. "The First Census Optical Character Recognition System Conference", R. A. Wilkinson et al., eds., . Tech. Report, NISTIR 4912, US Dep. Commerse, NIST, Gaithersburg, Md., 1992.
19. K. Mohiuddin and J. Mao, “A Comparative Study of Different Classifiers for Handprinted Character Recognition,” in Pattern Recognition in Practice IV, E.S. Gelsema and L.N. Kanal, eds., Elsevier Science, The Netherlands, 1994, pp. 437-448.
20. Y. Le Cun et al., "Back-Propagation Applied to Handwritten Zip Code Recognition", Neural Computation, Vol 1, 1989, pp. 541-551.
21. M. Minsky, “Logical Versus Analogical or Symbolic Versus Connectionist or Neat Versus Scruffy,” AI Magazine, Vol. 65, No. 2, 1991, pp. 34-51.

Anil K. Jain ([email protected]) - University of Michigan; Jianchang Mao, K M. Moiuddin - IBM Research Center in Almaden.

Anil K., Jain, Jianchang Mao, K.M. Mohiuddin. Artificial Neural Networks: A TutorialIEEEComputer, Vol.29, No.3, March/1996, pp. 31-44.IEEE Computer Society. All rights reserved. Reprinted with permission.



An artificial neural network is a collection of neurons interacting with each other. They are capable of receiving, processing and creating data. It is as difficult to imagine as the functioning of the human brain. The neural network in our brain works so that you can read this now: our neurons recognize letters and put them into words.

An artificial neural network is like a brain. It was originally programmed to simplify some complex computing processes. Today neural networks have much more possibilities. Some of them are on your smartphone. Another part has already recorded in its database that you opened this article. How all this happens and why, read on.

How it all started

People really wanted to understand where a person’s mind comes from and how the brain works. In the middle of the last century, Canadian neuropsychologist Donald Hebb realized this. Hebb studied the interaction of neurons with each other, investigated the principle by which they are combined into groups (in scientific terms - ensembles) and proposed the first algorithm in science for training neural networks.

A few years later, a group of American scientists modeled an artificial neural network that could distinguish square shapes from other shapes.

How does a neural network work?

Researchers have found that a neural network is a collection of layers of neurons, each of which is responsible for recognizing a specific criterion: shape, color, size, texture, sound, volume, etc. Year after year, as a result of millions of experiments and tons of calculations, the simplest network More and more layers of neurons were added. They work in turns. For example, the first determines whether a square is square or not, the second understands whether a square is red or not, the third calculates the size of the square, and so on. Not squares, not red, and inappropriately sized shapes end up in new groups of neurons and are explored by them.

What are neural networks and what can they do?

Scientists have developed neural networks so that they can distinguish between complex images, videos, texts and speech. There are many types of neural networks today. They are classified depending on the architecture - sets of data parameters and the weight of these parameters, a certain priority. Below are some of them.

Convolutional neural networks

Neurons are divided into groups, each group calculates a characteristic given to it. In 1993, French scientist Yann LeCun showed the world LeNet 1, the first convolutional neural network that could quickly and accurately recognize numbers written on paper by hand. See for yourself:

Today, convolutional neural networks are used mainly for multimedia purposes: they work with graphics, audio and video.

Recurrent neural networks

Neurons sequentially remember information and build further actions based on this data. In 1997, German scientists modified the simplest recurrent networks to networks with long short-term memory. Based on them, networks with controlled recurrent neurons were then developed.

Today, with the help of such networks, texts are written and translated, bots are programmed to conduct meaningful dialogues with humans, and page and program codes are created.

The use of this kind of neural networks is an opportunity to analyze and generate data, compile databases and even make predictions.

In 2015, SwiftKey released the world's first keyboard running on a recurrent neural network with controlled neurons. Then the system provided hints while typing based on the last words entered. Last year, developers trained a neural network to study the context of the text being typed, and the hints became meaningful and useful:

Combined neural networks (convolutional + recurrent)

Such neural networks are able to understand what is in the image and describe it. And vice versa: draw images according to the description. The most striking example was demonstrated by Kyle MacDonald, who took a neural network for a walk around Amsterdam. The network instantly determined what was in front of it. And almost always exactly:

Neural networks are constantly self-learning. Through this process:

1. Skype has introduced simultaneous translation capabilities for 10 languages. Among which, for a moment, there are Russian and Japanese - some of the most difficult in the world. Of course, the quality of the translation requires serious improvement, but the very fact that now you can communicate with colleagues from Japan in Russian and be sure that you will be understood is inspiring.

2. Yandex created two search algorithms based on neural networks: “Palekh” and “Korolev”. The first helped to find the most relevant sites for low-frequency queries. "Palekh" studied the page headings and compared their meaning with the meaning of the requests. Based on Palekh, Korolev appeared. This algorithm evaluates not only the title, but also the entire text content of the page. The search is becoming more accurate, and site owners are beginning to approach page content more intelligently.

3. SEO colleagues from Yandex created a musical neural network: it composes poetry and writes music. The neurogroup is symbolically called Neurona, and it already has its first album:

4. Google Inbox uses neural networks to respond to messages. Technology development is in full swing, and today the network is already studying correspondence and generating possible options answer. You don’t have to waste time on typing and don’t be afraid of forgetting some important agreement.

5. YouTube uses neural networks to rank videos, and according to two principles at once: one neural network studies videos and audience reactions to them, the other conducts research on users and their preferences. That's why YouTube recommendations are always on point.

6. Facebook is actively working on DeepText AI, a communications program that understands jargon and cleans chats of obscene language.

7. Apps like Prisma and Fabby, built on neural networks, create images and videos:

Colorize restores colors in black and white photos (surprise grandma!).

MakeUp Plus selects the perfect lipstick for girls from a real range of real brands: Bobbi Brown, Clinique, Lancome and YSL are already in business.


8.
Apple and Microsoft are constantly upgrading their neural Siri and Contana. For now they are only carrying out our orders, but in the near future they will begin to take the initiative: give recommendations and anticipate our desires.

What else awaits us in the future?

Self-learning neural networks can replace people: they will start with copywriters and proofreaders. Robots are already creating texts with meaning and without errors. And they do it much faster than people. They will continue with call center employees, technical support, moderators and administrators of public pages on social networks. Neural networks can already learn a script and reproduce it by voice. What about other areas?

Agricultural sector

The neural network will be implemented into special equipment. Harvesters will autopilot, scan plants and study the soil, transmitting data to a neural network. She will decide whether to water, fertilize or spray against pests. Instead of a couple of dozen workers, you will need at most two specialists: a supervisor and a technical one.

Medicine

Microsoft is currently actively working on creating a cure for cancer. Scientists are engaged in bioprogramming - they are trying to digitize the process of the emergence and development of tumors. When everything works out, programmers will be able to find a way to block such a process, and a medicine will be created by analogy.

Marketing

Marketing is highly personalized. Already now, neural networks can determine in seconds what content to show to which user and at what price. In the future, the participation of the marketer in the process will be reduced to a minimum: neural networks will predict queries based on user behavior data, scan the market and provide the most suitable offers by the time a person thinks about purchasing.

Ecommerce

Ecommerce will be implemented everywhere. You no longer need to go to the online store using a link: you can buy everything where you see it in one click. For example, you are reading this article several years later. You really like the lipstick in the screenshot from the MakeUp Plus application (see above). You click on it and go straight to the cart. Or watch a video about the latest Hololens model (glasses mixed reality) and immediately place an order directly from YouTube.

In almost every field, specialists with knowledge or at least understanding of the structure of neural networks, machine learning and artificial intelligence systems will be valued. We will exist with robots side by side. And the more we know about them, the calmer our life will be.

P.S. Zinaida Falls is a Yandex neural network who writes poetry. Rate the work that the machine wrote after being trained by Mayakovsky (spelling and punctuation preserved):

« This»

This
just everything
something
in future
and power
that person
is everything in the world or not
there's blood all around
deal
getting fat
glory to
land
with a bang in the beak

Impressive, right?