Google search engine. Google (search engine). Google Search Tools

Search engine Google system(Google) — world famous and largest search system.

The name comes from a corruption of "googol" - a number depicted as 1 followed by 100 zeros. The creator of the search engine, Sergey Brin, simply spelled the word incorrectly, and this typo has become firmly established among Internet users.

Where it all started

Personal information was posted on the Internet, occupying the memory of the World Wide Web; it seemed to “settle” on the Internet, which is why the name of the Internet information unit, site (literally translated as “sitting”) arose.

Soon, site owners, especially businessmen, wanted fame on the Internet. Sites were advertised in every possible way, even by distributing leaflets.

But, as you know, supply creates demand. To buy a product, the client will spend a long time looking for other options, for example, cheaper ones. There was a need for search, and the Internet had to satisfy it: sites were developed that focused on searching for goods, services, and soon information. It was they who received the name search engines or systems, one of which was Google.

Supernova explosion

The people responsible for the birth of Google are Stanford University students Larry Page and Sergey Brin. Innovation met with enthusiasm, as a result of which Google emerged, which 20 years later took the place of the No. 1 search engine in the whole world. The search engine domain was registered in September 1997, and a year later the Google Inc. corporation opened specifically for Google.

How Google works

The search engine is constantly acquiring more and more new capabilities in terms of algorithms and functionality.

Any search engine algorithm is based on software templates that rank sites according to the correspondence of queries to search results and the level of significance. In 1997 algorithms counted the number of external links to the site. A large number of links was the key to high positions in search engine results. Over time, the authority of the site on which external links were placed began to be taken into account, and the term “link weight” was introduced.

Google gained worldwide fame as it improved its navigation in every possible way and made it easier to find information. As soon as the user wrote part of a word, options for its ending appeared in a pop-up menu, any of which could be clicked.

Google in SEO

The search engine and search engine promotion are inextricably linked with each other, because the webmaster strives in every possible way to improve his position, but without indexing the site this is impossible. Therefore, the webmaster, to attract attention Google robot, optimizes your website using white hat and illegal black hat SEO techniques. It is better to avoid the latter, otherwise you may get banned or filtered.

Each page of the site is assigned a certain degree of quality, a rank - PR, or Page Rank. The coincidence of the sound of the name Larry Page and Page Rank has even led to the fact that there is an opinion on the Internet that PR is based on the sympathy or antipathy of the search engine creator for a particular site.

Considering the authority and scope of Google, optimizers try to promote their sites in this search engine. But a huge number of external links and prohibited, black hat optimization methods do not at all guarantee positions in the TOP. First of all, here we need to focus on the desires of users.

Hello, dear readers of the blog site. A couple of months ago, I wrote about the features of promotion specifically for this search engine, arguing that I cannot help but repay the same coin to the company that brings more than half of all visitors to my blog. What do you think has happened since then?

That's right, everything turned upside down and now Google brings more than half of all visitors to my blog. I’m a little afraid that after this article he will also turn his back on me, but I’ll still take the risk. Although it is worth it, because now it is one of the most expensive and promising companies in the world.

And the quality of Google search itself is in many ways superior to the analogue from the Runet mirror (read about that), but the majority still sit in Yandex mainly out of habit (like me, in fact).

History of the Google.com search engine

So, the history of the development of this company can begin to be counted from 1996, although the search engine officially began to work only in the fall of 1998 (it turns out that Googlers were a year behind Yandex, but nevertheless managed to catch up with everything).

It was in 1996 that the prototype of today's search engine began working on the campus of Stanford University, where at that time Sergei Brin and Larry Page, now well known in the world, were studying in graduate school.

Both of these gentlemen were born in 1973 (practically my age) and both came from professorial families with Jewish roots. The mothers and fathers of both Google founders did mathematics (taught) and computer technologies, which, in fact, prompted their children’s interest in these areas of science. Both Sergey Brin and Larry Page have always prioritized obtaining a high-quality specialized education (well done, what can I say).

However, they had a difference in origin. Sergey Brin was born in the Soviet Union in the glorious city of Moscow and was taken by his parents in 1979 to the states under the emigration program for Jewish families. And Larry Page was already originally a born American, although, in fact, this is not so important, because Sergei was only six years old when he ceased to be our compatriot. What is noteworthy is that Brin still speaks Russian very well.

If you've read the history of Yandex, you've probably noticed that it essentially appeared as a development of the topic that its creators were engaged in when they were scientists. The same can be said about Google.com. Brin and Page, while graduate students at Stanford, were working on solving the problem of searching through large, unstructured data sets. Initially, they wanted to tie this to something related to identifying the most popular goods, but the problem of searching on the Internet came up in time.

The search engines available at that time could hardly cope with their task. The search results had a very low correlation with what the user wanted to see in response to their query. The fact is that then the main marker (factor) by which it was carried out in the search results was word frequency from a user request in a document.

It is clear that such a selection criterion is very easy to cheat on the part of webmasters by simply increasing the nausea of texts. Can you imagine how much time has already passed since text spam appeared, which search engines have only now begun to seriously fight and eradicate (because other factors have appeared that can significantly reduce the importance of the frequency of occurrence of keys in the text when ranking).

Here you go. From childhood, Larry Page, using the example of his parents, who moved in scientific circles, saw and understood that the authority of a particular scientist largely depends on how many scientific works refer to him as a primary source or as an authoritative specialist. The more links, the more authoritative the name of the scientist. Logical?

Of course, it’s logical, but what does Google have to do with it? The fact is that Page had the idea to transfer this ranking system to Internet search. He associated scientists with individual documents (not websites, but individual web pages), and links on the Internet have existed since back in 1989 (by the way, a couple of years later, it was Tim who founded).

Well, as a result, a well-known ranking factor appeared, which is still taken into account by search engines - PageRank. This term is compound. Rank means ranking, but Page can mean either a web page in the English spelling variation, or the fact that this ranking parameter was invented by none other than Page (who is Larry).

But this is not the point, because PageRank made a revolution and made it possible to improve the quality of search future Google to an unattainable height. In general, for general development, you can read an article about, in which I tried to explain their essence with my fingers, or look diagonally at my largest creation on this blog, which is entirely dedicated specifically.

PR made it possible to take into account when ranking documents not only the quantity, but also the quality of links leading to a particular web page. Well, the quality of the link, accordingly, depended on the number of incoming backlinks to the donor page (in SEO, a donor is usually called the one from whom the link originates, and an acceptor is the one to whom it is placed).

Those. This is where the concept of static weight of pages on the Internet appears, according to which the quality of links leading from them is assessed. All this disgrace was calculated by Google in several passes (iterations) and was (at least at that time) an excellent ranking factor. In general, the topic of PageRank is not so obvious, so for a detailed acquaintance you will have to read the articles mentioned just above.

Well, let's go back to Google history and let's see how two talented gentlemen from the stronghold of democracy managed to realize the idea of ranking documents global network into life and casually earn twenty billion greens from this (although for them this was not an end in itself, but simply became the result of their work, which it would be foolish to refuse).

First of all, to check the functionality of the Page Rank calculation program, it was necessary to obtain a huge amount of data. Larry Page decided that he could for this purpose download the entire Internet to your computer, which left its leaders in bewilderment.

However, with funds from Stanford University, Larry and Sergey were able to assemble the required number of computers from components (this made it possible for the same money to get three times more hardware than by purchasing already assembled servers) and launch their spider (a program that copies the web pages it found on the Internet) .

By the way, I note that Larry and Sergey remained true to their idea - Google is now number one computer assembler in the world(significantly ahead of Dell and HP) without selling a single server, but using them all for its data centers, which, according to approximate data, already have more than a million units worldwide. This allows them to obtain significantly greater productivity at the same cost and without saving on redundancy, which makes it the absolute leader in speed and reliability.

But let's return to history again. So, as a result, Brin and Page's brainchild began to serve as a search tool for all users of Stanford University. Sergey and Larry asked their first users to express their impressions and comments on the search operation, tried to take them into account and refine them (in fact, this was the alpha testing stage). The search became available in 1997 at google.stanford.edu. An application was submitted from Stanford for search technologies using PageRank.

As you can see, the URL already uses the word Google (although there was a time when the search algorithm was called BackRub, because it was based on backlinks), which was invented the day before as one of the possible names of the search engine. The origin of this word is due to misspelling term Googol, which denotes a tricky number consisting of one and one hundred zeros following it.

In general, at first there was a proposal to call the search GooglePlex (in the correct spelling Googolplex - ten to the power of googol), but it seemed too long and they settled on the mentioned term (for some reason, it was written incorrectly as a result, but they didn’t change it anymore, because .k. was almost immediately domain purchased Google.com). With this name, the creators probably wanted to emphasize or predict the enormity of the index base of the future world search leader.

Google and Yandex - common points in development and formation

A distinctive feature of Google’s main page at that time was its complete asceticism, which remains to this day. Compare (there sometimes you can see a banner under the search bar, the cost of placing it for a week is equal to the cost of an apartment in Moscow):

And look at the main page of Google.ru:

As they say, feel the difference. In the distant nineties, all sites and portals were full of colorful banners and inscriptions (ala today’s Terekhoff, and I, too, am not without a sin), which caused genuine irritation in Larry and Sergei. Therefore, when working on the design of the main page of his search, Sergey Brin used the principle of minimalism, allowing himself only to paint the letters of the Google logo in different colors.

It turned out well, but there was a curious case when the test group, which was tasked with finding something there through Google.com, sat for several minutes in front of the computer screen with a rather puzzled look. It turns out that they were waiting for the main page to load completely (there were no minimalist style sites on the Internet at that time).

Therefore, the developers had to increase the font at the bottom of the main page so that it would become a kind of marker for users when the page has finished loading.

Do you know what is the most interesting thing in the history of Google? That it could end around this point. As I mentioned earlier, for Sergei and Larry the main thing was to receive a quality education, and working on the search engine took up all the time and did not leave any time for studying. What do you think they came up with?

Surely, sell all rights on the use of PageRank technology and put an end to the development of the Google project. What is noteworthy is that they offered their product for a relatively small amount of one lemon of greens to such well-known titans at that time as AltaVista, and other companies that are no longer well-known. Remarkably, AltaVista even managed to reduce the price by a quarter, but in the end still did not buy it.

After this, Larry Page and Sergei decided to give up on studying after all (many of us made the same decision even for much less compelling reasons) and began to closely refine and promote an innovative, at that time, search system with an unknown name to anyone Google.

It all came down to the fact that for development we needed money to buy servers. Without this, it would have been impossible to move on, because even with Google’s relatively low popularity at that time, it already required quite large resources to store and process numerous user requests on the fly.

So, let me remind you that the domain was registered in September 1997, and exactly a year later it was already registered Google company Inc. A few days earlier, Sergey Brin and Larry Page managed to receive their first development check from a competent guy from Sun Microsystems. They say that when the check was issued, the company as such did not yet exist, therefore, having indicated the name on the check, already during the official registration of the company it was necessary to focus on this name (otherwise there would have been a problem with cashing out money).

This amount was spent on the purchase of components and the assembly of new servers, which were designed to process the ever-increasing number of requests to Google.com, because the popularity of the search engine was growing. Although even despite the fact that at the end of 1998 their home page the Beta inscription was still adorned, leading media outlets writing about IT technologies had already paid attention to the young search engine and expressed their positive reviews about her work.

The popularity of Googol grew and the hundred thousand greens received were quickly spent on components. The guys again hit a wall, but they again managed to do the incredible - get twenty-five lyams of green money for development from two venture capital firms, without driving themselves into bondage (leaving reserves full right to manage the company and resolve all issues at our own discretion). Well done, what can I say.

Once again, I never tire of drawing parallels between the development of bourgeois and Russian searches. Arkady Volozh and Ilya Segalovich were similarly forced to seek money for development from investors, and in the same way managed to defend their right to manage the company at their own discretion.

In general, the ideologists of Yandex and Google have a lot in common (smart, educated and intelligent people) and the main thing that unites them is the desire, first of all, to develop their projects (and get a buzz from it), and not stupidly make money (although not without it):

It is clear that this could not continue indefinitely. The project must generate income or at least be at the level of self-sufficiency. True, there are examples of large Internet projects that do not make money from their brainchild. . However, other wealthy companies, whose owners understand the importance of Wiki for the Internet, help it stay afloat.

Google is such an understanding company, donating several lyams to Wiki. By the way, the owner social network“” also recently threatened to give Wikipedia about a liter of greenery, which does him honor and puts him on a par with the Greats.

But Google (as well as Yandex) did not dare to follow the path of the unmercenary. Sergey Brin and Larry Page had to give up a little on the principles of not accepting advertising on the pages of their brainchild. But they did it very elegantly and their way of presenting advertising brings much more positivity to users than the banners then commonly used.

I'm talking about contextual advertising Google AdWords, which is text strings with a link to the advertiser’s website. Moreover, ads are shown in strict accordance with the specific query entered by the user in the search bar. Such advertising does not create discomfort for users, but it brings them truly fantastic income:

By the way, Google is not greedy and allows any webmasters to make money from contextual advertising by showing it on their websites. True, for this he keeps for himself about half of the amount paid by the advertiser, but this is a completely logical payment for using the advertiser base. Many webmasters live solely by the desire to pass.

Since we periodically compare Google and Yandex here, I’ll say that the way a RuNet mirror makes a living is no different from the leader of world search - it’s still the same context, but it’s all called . Well, it also allows webmasters to make money on their context (for half the margin) and for this purpose it was created, with which you can work both directly and through partner service centers (I work through Profit Partner).

Google launched its context in 2000, and Yandex in 2002. The question of the original source of the idea probably does not arise. Although, the founder did not immediately begin to use the payment scheme only for clicks made by users on advertisers’ ads, but worked towards this for several years.

Well, the idea of organizing a continuous auction for the sale of certain search queries (when entered, advertisements from advertisers will be displayed) is generally an ingenious solution that can significantly increase income and simplify and facilitate the pricing scheme as much as possible. As they say, we hit the bull's eye.

In 2001, Google.com turned out to be in the black by as much as seven million dollars, and now the profit of the world search leader is in the billions. In addition to the American audience, it has long been aimed at the whole world. You can search in 200 languages on various regional websites. In addition to, in fact, search technologies, this company during its existence has opened a lot of related services, starting with and , and ending with .

He has already even encroached on the reign of the most popular in the world, opposing it with his brainchild called. Even the dominance of Melkosoft in the market operating systems he is ready to take the plunge by developing his OS based on Chrome. Well, he has already revolutionized the mobile operating system market with his Android, which is now installed on most tablets and smartphones.

I already wrote that Yandex went public in 2011 and very successfully became an open joint stock company with a fairly large margin for itself. In 2004, Google also went public in a similar manner, although not as successfully as it could have been. However, in both cases, the ideologists of the companies still remained at the helm and can choose their own development paths and not follow anyone’s lead. In principle, this is good, because both teams lack enthusiasm.

Google.ru - features of promotion and SEO optimization

Now let's talk a little about Google and how it differs from a similar action under Yandex. Of course, by and large, there is no need to make special differences when using these search engines, but they have their own characteristics and various factors rankings are taken into account differently in them.

Well, I think it’s clear that we’ll be talking about the Russian-language Google.ru, because.com carries out rankings using a different formula and several other factors are taken into account (we are backward, whatever you say). I have already written about shares in the RuNet search market:

But it is worth considering the fact that selling queries are asked much more often in Yandex (maybe even many times more often), but the share of information queries is higher in the bourgeois counterpart. Therefore, blogs like mine may have an even higher share of referrals from Google than from the RuNet mirror.

It all has to do with the audience targeting of these search engines. Let no one be offended by me, but Yandex is dominated by ordinary people who want to buy something or have fun, and in the brainchild of Page and Brin there are intellectuals (well, sort of) who want to learn something. Personally, out of habit, I use a domestic product, but if it does not give me a comprehensive answer, then I turn to the bourgeoisie, and, as a rule, he does not disappoint me.

You've probably all already experienced that Google not only works faster and indexes new pages faster, but also that you can get to the Top much faster.

The main factor here is, most likely, that Yandex some time ago introduced a lag (from several months to a year, depending on the subject of the search query), after which the backlinks placed (purchased) on the document begin to be taken into account (thus it increased their income, because during the period necessary to promote to the Top, the site owner will be forced to attract users through Yandex Direct).

With Google, this lag is much smaller (if it exists at all) and you can get to the Top quite quickly, having good and not spammy text on a decent site, and also a certain amount of external links.

A bit of a minor fact, but I remembered that a bourgeois search will (from the same article) lead to the same page only if these are hash links.

Well, again, it seems to me that Google loves texts like those that you can find on this blog more than Yandex - large in volume and structured (headings using H1-H6 tags, OL or UL lists, and you could also I wish I could use tables for variety).

In any case, structured texts will give you an advantage. Search engines take into account the density of keys in passages, and lists, tables and headings allow create more passages, which means it will be possible to use a larger number of occurrences of keys without the risk of falling under .

IN Google regions represent countries, and not individual subjects of Russia or Ukraine, as in Yandex. Therefore, you should take this into account, as well as take into account those factors by which it can attribute your site to a particular region (read more below).

It also has one feature - when ranking, it pays more attention to the indicators of a specific web page (the relevance of which it calculates) and pays less attention to the indicators of the entire site. This means in comparison with Yandex, for promotion in which it is very important to have a trust resource.

What does this give us? It turns out that a young project that does not have sufficient trust is much It’s easier to get to the Top of Google than in the Top Runet mirrors, provided that the article is of high quality, optimized and well-pumped with links from quality donors (for example, from an exchange eternal links or from the article exchange) and from the internal pages of the same site (and, along with external links, helps to increase the static and dynamic weight of the page).

For example, my blog is in the Top of Google for many frequent queries, but not in the Top of the RuNet mirror. It turns out that either the general trust of the resource is not yet enough for this, or the backlinks have not yet started working due to the lag used, or these documents are under a filter for over-optimizing texts (which is also possible).

The Google.ru search engine attaches great importance to the presence and number of incoming links (external and internal) to a document and more respects direct occurrences search query. In Yandex, the link factor is somewhat less important and there it will be very important not to overdo it with direct occurrences, to use word forms and dilutions more. But, again, search algorithms are developing and over time they will get closer to what we now have in global search with the .com extension.
I haven’t checked it myself and haven’t experienced it myself, but it is believed that end-to-end links (placed from all pages of the resource, for example, in the donor’s sidebar) work well in Google, because For him, it is the quantitative ratio of the link mass that is important. Yandex simply glues drafts together and it is unlikely that their total weight exceeds the weight of a regular external link from one single page of the same donor.

Search traffic- this is manna from heaven for any owner of a web resource, the most stable and, as a rule, the main source of influx of visitors to the site. For example, for me it makes up about three quarters of the total attendance. Therefore, I never tire of repeating in almost every article - take all the slightest nuances seriously. search engine optimization. What may seem trivial or unnecessary to you can become a key point in the success of your project.

True, there are also quite a few nuances in complying with all requirements, the main one of which is rapid change rules of the game for search engines. Of course, the main basic and fundamental optimization techniques remain unchanged (at least for a long time), but nevertheless, Yandex is characterized by a sharp change in attitudes towards certain methods of cheating.

Previously, with Google one could feel more calm, because she was not so feverish. But spring 2012 came and with it came the zoo: Penguin and Panda. They came immediately on a global scale and affected all countries. After this, the search results began to shake constantly and almost every second website in the world suffered to one degree or another from these filters.

It was like a bolt from the blue. Everyone unanimously criticized Yandex, and then abruptly switched to its bourgeois competitor. Penguin began to punish the poor quality of incoming links to the site, and Panda began to punish for over-optimization of content. One gets the feeling that the developers have set themselves the task of making promotion under Google is as unpredictable as possible(today you are in the Top, and tomorrow in...). Why do they need this?

Most likely the reason is money. If SEO does not provide at least some guarantees and stability, then everyone will begin to use it even more actively alternative method, namely to buy contextual advertising. Actually, this is what is happening now. As long as search engines exist, the situation will get worse and worse.

But nevertheless, search traffic is generated precisely from Yandex and Google, and in many cases these two search engines will provide approximately the same number of visitors. This means you will need to take into account the nuances of optimization for both.

Global (Google.com) and regional search (.ru, .ua)

Let's start with the fact that Google positions itself as a search engine for the entire global Internet. In this regard, it has both a global search engine.com, which works with English-speaking users from all over the world, and regional search engines (for example, .ru or.com.ua, related to Russia and Ukraine).

In the global google.com, in addition to a huge number of visitors, there is also a huge number of indexed sites in the database (a huge collection is collected there), and from this the conclusion follows that it will be very difficult to get into the top search results. In addition, in a global search engine it will be almost impossible to take a place in the top for high-frequency queries in a short time.

This is primarily due to the large number of filters that are used. Global search engine.com uses pessimization, thereby exercising strict control over the quality of donors (sites from which links to your resource will be placed), the development of your project, the speed of building up the link mass, etc. things.

As a result, only resources that have existed and developed for quite a long time will be in the top for highly competitive queries (enough for the search to be convinced of their viability).

At the same time, in regional search engines (for example, google.ru or.com.ua) it is quite possible to get to the top for a high-frequency query within one or two months after the creation of the project. This is due to the fact that many of the filters used in.com do not work there, or do not work to the fullest extent.

But he is gradually transferring and strengthening the filters of his regional search engines, so this problem will also have to be faced in the near future when promoting. Actually, this has already happened. After the arrival of Panda and Penguin, everyone became equal.

With all its sophistication and perfection, the search engine cannot guarantee a 100% correct determination of the region to which your resource belongs. As a result of this, a rather unpleasant situation may arise, consisting in the absence of your site in the search results of the desired regional Google search engine (we were promoting, for example, under England, and the search decided that your site belongs to Australia, as a result, your project will not appear in the English results ).

Even the language of the content of your resource cannot serve guarantor of the correct determination of the region. Alas and ah, even the fact that your site is in Albanian does not mean at all that it will appear in Albanian Google results. How does he determine the region to which this or that project belongs, and how can we, on our part, help him make the right choice?

Everything is quite simple here. First of all, when trying to attribute a resource to a particular region, Google looks at the region to which this Internet project belongs. If the domain clearly indicates a regional affiliation (for example: zone .RU - Russia, .DE - Germany or .US - states), then the search engine will select the region for the site based on this.

Therefore, if you use for your website Domain name, belonging to the zone of a particular country, then problems with the wrong choice of region for your resource should not arise.

But your resource may well have a domain that belongs to some common zone (like .COM or .NET). What guides the search when choosing a region in this case? It turns out that he is analyzing Hosting server IP address, where this project is located. Which country this IP address will belong to, the site will be assigned to this region.

Therefore, when creating a new project focused on promoting a specific region (country) in the Google search engine, you should ensure that it immediately and accurately identifies the region of your site. To do this, you will need to either select a name in the domain zone of the desired region, or use hosting with the IP addresses of the servers of the country you need.

If for your resource it has not correctly identified the regional affiliation, then you, in principle, can add an additional word to the keywords of the search queries for which you are promoting, which specifies the regional affiliation (for example, “search engine promotion Russia”).

And in this case, your website will participate in the regional Google results for the country you need (Russia), but not for a high-frequency query (“search engine promotion”), but for a much less high-frequency one (“... Russia”).

If, on the contrary, you want to create a resource focused on many countries, then be sure to choose a domain name from a common zone (like .COM or .NET), and choose hosting with the IP address of the country from which the largest number of visitors is expected.

Setting the site region in the Yandex search engine

The Yandex search engine has recently also begun to distinguish sites by region. But in this case, regions do not mean countries (as in Google), but regions of Russia. The regional affiliation of a site is determined based on the mention of the region on its pages or based on the settings made by the owner of the resource in .

After you log in from your account to the webmaster panel, you will need to select “Site Geography” - “Region” from the left menu:

As you can see from the screenshot above, Yandex did not define a region for my blog: The content of the site does not have a clear regional affiliation. Otherwise, you will need to enter the desired region of the site in the appropriate field and indicate in the field below the URL of your blog page where the name of this region will appear.

Once again, I would like to draw your attention to the fact that you should choose hosting not only based on the correct definition of the region (with the IP address of the country in which you want to see your resource), but also on the degree of its reliability. In terms of website optimization, reliable and stable hosting operation is very important, because constantly for search engine bots, as a result of or other host failures, it can lead to a lower position of the site in search engine results.

Therefore, approach your project with full responsibility. For me personally, on this moment, after a series of experiments most impressed InfoBox provider, my first impressions about which you can read in the article on.

In short, InfoBox currently has a lot of bonuses (30 days of freebies plus a free domain forever, if you pay for hosting for three months) and an excellent technical support service.

If you think that I am here harping on about this hosting only in the hope of getting , then I hasten to dissuade you, because InfoBox exists only for legal entities, and I am an individual. It’s just really not a bad hosting and I sincerely, without any benefit for myself, recommend it to you.

How the Google search engine works

In principle, there are no special differences in logic Google work there is no difference from the work of other search engines. I have already written a fairly detailed article about, and almost all of this can be attributed to our hero. Therefore, without dwelling on the details, I will try to briefly describe how this whole thing works from the point of view of identifying the documents that are most relevant to a particular request.

So, in Google, as well as in other search engines, two basic principles are used, guided by which it determines the position of a particular document (by document I mean a web page) in the search results for a specific request. First, it analyzes the textual content of the document, thus determining its topic and making calculations.

And based on these two factors (document content and link ranking), it determines the site’s position in the search results for a particular search query. Google does not search on real sites, but according to the so-called collection, which represents all documents indexed by the search engine on the network.

Indexing involves reading the contents of a page and storing all the words it contains in the form of reverse indexes, which take into account the location of a given word in a document and the frequency of its use.

Saved copies of documents are also added to the search database, on the basis of which the search system will then generate specific queries. Scanning sites on the Internet is carried out by so-called search bots, which move from document to document using links leading from these documents.

How can Google search bots find new resource pages? Firstly, a search bot can receive a task to visit a particular document after you add the address of a particular page to . Secondly, the bot can index a document by following a link from another or from your own resource.

From this we can conclude that good and thoughtful navigation will not only be useful to your visitors, but will also help speed up search engines. One of the ways to achieve this acceleration is designed specifically for bots.

Backlinks to this document collected by the Google search engine when indexing the documents on which they were affixed. As a result, when a user enters a specific query into the search bar, he will analyze and find all documents that have at least some relation to this request, and then among them will be sorted according to the relevance (compliance) of documents with this request.

When calculating relevance, the content of the document is taken into account, as well as the quantity and quality of backlinks leading to it.

Main and supplementary Google indexes

The material capabilities of the Google company (both monetary and hardware) allow this search engine to index all pages in a row and store them in its index database (collection). Smaller search engines, including Yandex, cannot afford such luxury and remove duplicate content (for example) and other low-quality documents from the index.

But our hero is not like that - he has such great power that he is able to store in his collection all the documents (web pages) indexed by him on the network.

True, this is of little benefit to webmasters, because the Google database consists of two parts: main index and additional(supplemental, it is also sometimes called snotty or simply snot). So, it searches only for documents located in the main index, and documents (web pages) that are included in the snot practically do not participate in the search, unless, in general, there are no answers relevant to the request at all. And the likelihood of such a case is extremely low.

I have already written about special and therefore indirectly evaluate the quality of your web project. Although you can yourself, without using any services, see how many pages your site has is in Google's main index.

To get started, enter the following query into the search bar:

Site:site

replacing my blog's domain name with your own. As a result, a page with results will open, where all the pages of your resource that are in the index will be listed. Documents not only from the main index, but also from the supplemental index will be listed here. Below the query string will be the total number of pages of your resource indexed by this search engine:

Now, having remembered the total number of pages in the index, modify the query in the Google search bar to the following:

Site:site/&

replacing my blog's domain name with your own. As a result, a page with results will open, where only those pages of your site will be listed. which are in the main index:

Only these documents located in the main index will be searched. I have already written about what reasons can lead to pages ending up in an additional rather than the main index, but I will repeat myself a little and, perhaps, add a few new ones possible reasons why the page may end up in Google's snot:

non-uniqueness of the page content (full or partial duplication of the text of another page of your own resource or any other)
there is too little text on the web page (pictures, although they are taken into account by image search, and hyperlinks do not count). I can’t say exactly how much text in characters or words should be in a document to get it out of Google’s snot, but I’ve already written about what the text size should be to best optimize it for search queries
the page may end up in an additional index if you forgot to register for it
snot also threatens pages whose Title or Description mega-tags are not unique or consist of one word

Video from youtube.com about the principles of promotion on Google:

Good luck to you! See you soon on the pages of the blog site

1. Either one or the other

Sometimes we are not exactly sure that we remembered or heard the right information correctly. No problem! Just enter a few suitable options via the “|” icon or English "or", and then select the appropriate result.

2. Search by synonym

As you know, the great and mighty Russian language is rich in synonyms. And sometimes this is not at all beneficial. If you need to quickly find sites on a given topic, and not just a specific phrase, use the "~" symbol.

For example, the results of the query “healthy food” will help you learn the principles of healthy eating, introduce you to healthy recipes and products, and also suggest visiting healthy restaurants.

3. Search within the site

4. Star power

When an insidious memory fails us and hopelessly loses words or numbers from a phrase, the “*” icon comes to the rescue. Just put it in place of the forgotten fragment and get the desired results.

5. Lots of missing words

But if not just one word, but half a phrase has been lost from memory, try writing the first and last word, and between them - AROUND (the approximate number of missing words). For example, like this: “I didn’t really love you AROUND(7).”

6. Time frame

Sometimes we desperately need to get acquainted with the events that occurred in a certain period of time. To do this, we add a time frame to the main phrase, written through an ellipsis. For example, we want to know what scientific discoveries were made between 1900 and 2000.

Obtaining private data does not always mean hacking - sometimes it is published in public access. Knowledge Google settings and a little ingenuity will allow you to find a lot of interesting things - from credit card numbers to FBI documents.

WARNING

All information is provided for informational purposes only. Neither the editors nor the author are responsible for any possible harm caused by the materials of this article.

Today, everything is connected to the Internet, with little concern for restricting access. Therefore, many private data become the prey of search engines. Spider robots are no longer limited to web pages, but index all content available on the Internet and constantly add non-public information to their databases. Finding out these secrets is easy - you just need to know how to ask about them.

Looking for files

In capable hands, Google will quickly find everything that is not found on the Internet, for example, personal information and files for official use. They are often hidden like a key under a rug: there are no real access restrictions, the data simply lies on the back of the site, where no links lead. Google's standard web interface only provides basic settings advanced search, but even these will be sufficient.

You can limit your Google search to a specific type of file using two operators: filetype and ext . The first specifies the format that the search engine determined from the file title, the second specifies the file extension, regardless of its internal content. When searching in both cases, you only need to specify the extension. Initially, the ext operator was convenient to use in cases where the file did not have specific format characteristics (for example, to search for ini and cfg configuration files, which could contain anything). Now Google's algorithms have changed, and there is no visible difference between operators - in most cases the results are the same.

Filtering the results

By default, Google searches for words and, in general, any entered characters in all files on indexed pages. You can limit your search by domain top level, a specific site or at the location of the desired sequence in the files themselves. For the first two options, use the site operator, followed by the name of the domain or selected site. In the third case, a whole set of operators allows you to search for information in service fields and metadata. For example, allinurl will find the given one in the body of the links themselves, allinanchor - in the text equipped with the tag , allintitle - in page titles, allintext - in the body of pages.

For each operator there is a lightweight version with a shorter name (without the prefix all). The difference is that allinurl will find links with all words, and inurl will only find links with the first of them. The second and subsequent words from the query can appear anywhere on web pages. The inurl operator also differs from another operator with a similar meaning - site. The first also allows you to find any sequence of characters in a link to the searched document (for example, /cgi-bin/), which is widely used to find components with known vulnerabilities.

Let's try it in practice. We take the allintext filter and make the request produce a list of numbers and verification codes of credit cards that will expire only in two years (or when their owners get tired of feeding everyone).

Allintext: card number expiration date /2017 cvv

When you read in the news that a young hacker “hacked into the servers” of the Pentagon or NASA, stealing classified information, in most cases we are talking about just such a basic technique of using Google. Suppose we are interested in a list of NASA employees and their contact information. Surely such a list is available in electronic form. For convenience or due to oversight, it may also be on the organization’s website itself. It is logical that in this case there will be no links to it, since it is intended for internal use. What words can be in such a file? At a minimum - the “address” field. Testing all these assumptions is easy.

Inurl:nasa.gov filetype:xlsx "address"

We use bureaucracy

Finds like this are a nice touch. A truly solid catch is provided by a more detailed knowledge of Google's operators for webmasters, the Network itself, and the peculiarities of the structure of what is being sought. Knowing the details, you can easily filter the results and refine the properties of the necessary files in order to get truly valuable data in the rest. It's funny that bureaucracy comes to the rescue here. It produces standard formulations that are convenient for searching for secret information accidentally leaked onto the Internet.

For example, the Distribution statement stamp, required by the US Department of Defense, means standardized restrictions on the distribution of a document. The letter A denotes public releases in which there is nothing secret; B - intended only for internal use, C - strictly confidential, and so on until F. The letter X stands out separately, which marks particularly valuable information representing a state secret of the highest level. Let those who are supposed to do this on duty search for such documents, and we will limit ourselves to files with the letter C. According to DoDI directive 5230.24, this marking is assigned to documents containing a description of critical technologies that fall under export control. You can find such carefully protected information on sites in the top-level domain.mil, allocated for the US Army.

"DISTRIBUTION STATEMENT C" inurl:navy.mil

It is very convenient that the .mil domain contains only sites from the US Department of Defense and its contract organizations. Search results with a domain restriction are exceptionally clean, and the titles speak for themselves. Searching for Russian secrets in this way is practically useless: chaos reigns in domains.ru and.rf, and the names of many weapons systems sound like botanical ones (PP “Kiparis”, self-propelled guns “Akatsia”) or even fabulous (TOS “Buratino”).

By carefully studying any document from a site in the .mil domain, you can see other markers to refine your search. For example, a reference to the export restrictions “Sec 2751”, which is also convenient for searching for interesting technical information. From time to time it is removed from official sites where it once appeared, so if you cannot follow an interesting link in the search results, use Google’s cache (cache operator) or the Internet Archive site.

Climbing into the clouds

In addition to accidentally declassified government documents, links to personal files from Dropbox and other data storage services that create “private” links to publicly published data occasionally pop up in Google's cache. It’s even worse with alternative and homemade services. For example, the following query finds data for all Verizon customers who have an FTP server installed and actively using their router.

Allinurl:ftp:// verizon.net

There are now more than forty thousand such smart people, and in the spring of 2015 there were many more of them. Instead of Verizon.net, you can substitute the name of any well-known provider, and the more famous it is, the larger the catch can be. Through the built-in FTP server, you can see files on an external storage device connected to the router. Usually this is a NAS for remote work, a personal cloud, or some kind of peer-to-peer file downloading. All contents of such media are indexed by Google and other search engines, so you can access files stored on external drives via a direct link.

Looking at the configs

Before the widespread migration to the cloud, simple FTP servers ruled as remote storage, which also had a lot of vulnerabilities. Many of them are still relevant today. For example, the popular WS_FTP Professional program stores configuration data, user accounts and passwords in the ws_ftp.ini file. It is easy to find and read, since all records are saved in text format, and passwords are encrypted with the Triple DES algorithm after minimal obfuscation. In most versions, simply discarding the first byte is sufficient.

It is easy to decrypt such passwords using the WS_FTP Password Decryptor utility or a free web service.

When talking about hacking an arbitrary website, they usually mean obtaining a password from logs and backups of configuration files of CMS or e-commerce applications. If you know their typical structure, you can easily indicate keywords. Lines like those found in ws_ftp.ini are extremely common. For example, in Drupal and PrestaShop there is always a user identifier (UID) and a corresponding password (pwd), and all information is stored in files with the .inc extension. You can search for them as follows:

"pwd=" "UID=" ext:inc

Revealing DBMS passwords

In the configuration files of SQL servers, names and addresses Email users are stored in clear text, and instead of passwords, their MD5 hashes are recorded. Strictly speaking, it is impossible to decrypt them, but you can find a match among the known hash-password pairs.

There are still DBMSs that do not even use password hashing. The configuration files of any of them can simply be viewed in the browser.

Intext:DB_PASSWORD filetype:env

With the appearance on the servers Windows place configuration files were partially taken over by the registry. You can search through its branches in exactly the same way, using reg as the file type. For example, like this:

Filetype:reg HKEY_CURRENT_USER "Password"=

Let's not forget the obvious

Sometimes it is possible to get to classified information with the help of accidentally opened and caught in the field of view Google data. The ideal option is to find a list of passwords in some common format. Store account information in text file, Word document or electronic Excel spreadsheet Only desperate people can, but there are always enough of them.

Filetype:xls inurl:password

On the one hand, there are a lot of means to prevent such incidents. It is necessary to specify adequate access rights in htaccess, patch the CMS, not use left-handed scripts and close other holes. There is also a file with a list of robots.txt exceptions that prohibits search engines from indexing the files and directories specified in it. On the other hand, if the structure of robots.txt on some server differs from the standard one, then it immediately becomes clear what they are trying to hide on it.

The list of directories and files on any site is preceded by the standard index of. Since for service purposes it must appear in the title, it makes sense to limit its search to the intitle operator. Interesting things are in the /admin/, /personal/, /etc/ and even /secret/ directories.

Stay tuned for updates

Relevance here is extremely important: old vulnerabilities are closed very slowly, but Google and its search results change constantly. There is even a difference between a “last second” filter (&tbs=qdr:s at the end of the request URL) and a “real time” filter (&tbs=qdr:1).

Date time interval last update Google also indicates the file implicitly. Through the graphical web interface, you can select one of the standard periods (hour, day, week, etc.) or set a date range, but this method is not suitable for automation.

By appearance address bar We can only guess about a way to limit the output of results using the &tbs=qdr: construction. The letter y after it sets the limit of one year (&tbs=qdr:y), m shows the results for the last month, w - for the week, d - for the past day, h - for the last hour, n - for the minute, and s - for give me a sec. The most recent results that Google has just made known are found using the filter &tbs=qdr:1 .

If you need to write a clever script, it will be useful to know that the date range is set in Google in Julian format using the daterange operator. For example, this is how you can find a list PDF documents with the word confidential, uploaded from January 1 to July 1, 2015.

Confidential filetype:pdf daterange:2457024-2457205

The range is indicated in Julian date format without taking into account the fractional part. Translating them manually from the Gregorian calendar is inconvenient. It's easier to use a date converter.

Targeting and filtering again

In addition to specifying additional operators in search query they can be sent directly in the body of the link. For example, the filetype:pdf specification corresponds to the construction as_filetype=pdf . This makes it convenient to ask any clarifications. Let's say that the output of results only from the Republic of Honduras is specified by adding the construction cr=countryHN to the search URL, and only from the city of Bobruisk - gcs=Bobruisk. You can find a complete list in the developer section.

Google's automation tools are designed to make life easier, but they often add problems. For example, the user’s city is determined by the user’s IP through WHOIS. Based on this information, Google not only balances the load between servers, but also changes the search results. Depending on the region, for the same request, different results will appear on the first page, and some of them may be completely hidden. The two-letter code after the gl=country directive will help you feel like a cosmopolitan and search for information from any country. For example, the code of the Netherlands is NL, but the Vatican and North Korea do not have their own code in Google.

Often, search results end up cluttered even after using several advanced filters. In this case, it is easy to clarify the request by adding several exception words to it (a minus sign is placed in front of each of them). For example, banking, names and tutorial are often used with the word Personal. Therefore, cleaner search results will be shown not by a textbook example of a query, but by a refined one:

Intitle:"Index of /Personal/" -names -tutorial -banking

One last example

A sophisticated hacker is distinguished by the fact that he provides himself with everything he needs on his own. For example, VPN is a convenient thing, but either expensive, or temporary and with restrictions. Signing up for a subscription for yourself is too expensive. It's good that there are group subscriptions, and with the help of Google it's easy to become part of a group. To do this, just find the Cisco VPN configuration file, which has a rather non-standard PCF extension and a recognizable path: Program Files\Cisco Systems\VPN Client\Profiles. One request and you join, for example, the friendly team of the University of Bonn.

Filetype:pcf vpn OR Group

INFO

Google finds password configuration files, but many of them are encrypted or replaced with hashes. If you see strings of a fixed length, then immediately look for a decryption service.

Passwords are stored encrypted, but Maurice Massard has already written a program to decrypt them and provides it for free through thecampusgeeks.com.

At Google help hundreds are executed different types attacks and penetration tests. There are many options, affecting popular programs, major database formats, numerous vulnerabilities of PHP, clouds, and so on. If you know exactly what you are looking for, it will make it much easier to get necessary information(especially one that was not planned to be made public). Shodan is not the only one that feeds with interesting ideas, but every database of indexed network resources!

Google (Google) is the largest search engine on the Internet, which began operating in September 1998.

Google is owned by the American corporation Google Inc. and ranks first in the world in popularity with an indicator of 77%.

Google.ru - Home page

There is no doubt that the main function of Google.ru is search. necessary information based on queries entered by users. In order to find the necessary information, the user only needs to enter the desired word, phrase or sentence into the search bar. After this, just click on “Enter” on your keyboard or on the magnifying glass image directly on the site. By the way, in order to find the necessary information on the Internet, you can also use the portal.

It is quite convenient that when searching for the necessary information using Google.ru, the user can enter the desired queries not only directly from the keyboard of his device, but also use on-screen keyboard or voice search, which is undoubtedly convenient and affordable.

Google.ru - On-screen keyboard

In order to be able to access Gmail, YouTube, Google+ and other Google services, the user only needs to create an account. After that, using a single login and password for it, you can visit various Google services.

Google.ru - Account creation

It’s worth saying right away that Google.ru has a lot of services and tools, each of which has its own purpose and application.

There are products whose task is to make working on the Internet easier. Among them is the famous Google browser Chrome.

Google.ru - Google Chrome

Products developed for mobile devices(Search, Maps) and specifically for businesses, which include AdWords, AdMob, AdSense and others.

For those who are interested in multimedia, YouTube, Image Search and Video Search, Books, News and Picasa will be useful.

Google.ru - YouTube

It will undoubtedly be useful Google service Maps, which is a map and satellite images of the Earth. There is also a business directory and a road map.

Google.ru - Google Maps

This is only a part of the available services. Get to know them full list you can directly on the Google.ru website by clicking on the “Services” icon located at the top of the page.

Google.ru - Tabs

For those who find this not enough and need even more information about Google.ru, it is worth visiting the tab called “All about Google,” which is located at the bottom of the page.