To search the index, the user must formulate a query and send it to the search engine. The request can be very simple, at least it should consist of one word. To build a more complex query, you need to use Boolean operators that allow you to refine and expand your search terms.

The most commonly used Boolean operators are:

  • AND - all expressions connected by the “AND” operator must be present on the searched pages or documents. Some search engines use the “+” operator instead of the word AND.
  • OR - at least one of the expressions connected by the "OR" operator must be present in the pages or documents being searched.
  • NOT - the expression or expressions following the "NOT" operator must not appear on the searched pages or documents. Some search engines use the "-" operator instead of the word NOT.
  • FOLLOWED BY - one of the expressions must immediately follow the other.
  • NEAR - one of the expressions must be at a distance from the other no greater than the specified number of words.
  • Quotation marks - words enclosed in quotation marks are treated as a phrase to be found in the document or file.

Prospects for the development of search engines

The search specified by Boolean operators is literal - the machine searches for words or phrases exactly as they were entered. This can cause problems when the words entered are ambiguous. For example, the English word “Bed” can mean a bed, a flower bed, a place where fish spawn, and much more. If the user is only interested in one of these meanings, he does not need pages with a word that has other meanings. It is possible to construct a literal search query aimed at cutting out unwanted values, but it would be nice if the search engine itself could provide appropriate assistance.

One of the options for how a search engine works is conceptual search. Part of this search involves using statistical analysis of pages containing words or phrases entered by a user to find other pages that might be of interest to that user. It is clear that conceptual search requires storing more information about each page, and each search query will require more calculations. Currently, many development teams are working on improving the efficiency and performance of these types of search engines. Other researchers have focused on a different area called natural-language queries.

The idea behind natural language queries is for the user to formulate a query the same way they would ask the person sitting next to them—without having to keep track of Boolean operators or complex query structures. The most popular modern natural language search site is AskJeeves.com, which analyzes the query to identify keywords that are then used to search in the site index built by the search engine. The said site only works with simple search queries, but the developers, in a highly competitive environment, are developing a natural language search engine that can handle very complex queries.

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Similar documents

    Basic protocols used on the Internet. Internet search tools. Popular search engines. How search engines work. Search and structuring tools. Automated web navigation. Criteria for the quality of search engine performance.

    abstract, added 02/14/2012

    The essence and content of the World Wide Web, the use of hypertext technology, in which documents are interconnected using hyperlinks. Browsers for viewing Web pages. Ways to communicate on the Internet. Servers file archives, their tasks.

    presentation, added 12/21/2014

    The structure of Internet reference and search systems, the operation of search mechanisms. Comparative review of reference and search systems (Gopher, WAIS, WWW, AltaVista, Yahoo, OpenText, Infoseek). Search robots, the most popular reference and search systems.

    abstract, added 01/14/2010

    A browser extension that provides information such as translation, dictionary meanings, and audio for selected English words on any Internet page. The set of errors returned by all functions. Structure of data storage on the server.

    thesis, added 11/30/2016

    Assessment of the current state of the Internet as a source information support scientific research, methods for improving search necessary files. The development of the Semantic Web as a way to enhance the role of the Internet as a source for research.

    course work, added 08/29/2015

    Tools for searching information on the Internet. Basic requirements and methods of information retrieval. Structure and characteristics of search services. Global search engines WWW (World Wide Web). Planning the search and collection of information on the Internet.

    abstract, added 11/02/2010

    The concept of the Internet as a worldwide information system, its internal structure and principles of operation. History and main stages of development of the World Wide Web, characteristics of the services provided. Assessment of prospects and expansion trends.

    Modern search engines are the most powerful hardware and software systems, the purpose of which is to index documents on the Internet to provide data at the request of users.

    To provide quality and up-to-date information search engines have to constantly improve their ranking formulas. Ensuring the highest possible quality of search results for users and preventing optimizations from manipulating them are the key goals of search engine development.

    At a time when search engines were just beginning to emerge, their ranking algorithms were very primitive. Thanks to this, the most resourceful optimizers began to promote their sites so that they appear in the search results for queries that interest them. As a result, this led to the fact that resources that often did not provide the user with any useful information, became the first, thereby relegating more useful sites to the background.

    In response to these actions, search engines began to defend themselves by improving their ranking algorithms, introducing more and more variables into the formulas and taking into account more and more factors. Over time, this struggle between optimizers and search engines moved to new level and contributed to the emergence of more advanced algorithms, including those based on machine learning.

    Stages of search engine development:

    As you can see from the diagram, the development of search engines and their algorithms goes in circles. Some create new algorithms, others adapt to them. It is difficult to say whether this process will ever stop, but personally I am inclined to believe that it will not. Despite the fact that search engine ranking algorithms have recently not only changed the significance of various factors, but also changed qualitatively, this does not frighten optimizers: their arsenal is constantly being replenished with more and more new techniques.

    How often do search engines change their algorithms?

    Let's turn to the main search engine of the Runet - Yandex. Qualitative and fundamental changes in ranking formulas occur on average once a year. Not long ago, Yandex introduced a new search platform called “Kaliningrad”. Its essence is to generate personal results for each user based on their search history and preferences.

    In addition, we should not forget that every search engine, including Yandex, constantly experiences “tweaks” in ranking formulas, when in automatic or semi-automatic mode the influence of certain factors is underestimated, while others, on the contrary, are increased. All this is done with only one goal - to improve search results as much as possible, ridding them of sites that do not satisfy user needs, and thereby increasing its relevance.

    Looking at changes to search Google system, you can see that transformations of the ranking formula also occur constantly, and Google itself reports hundreds of small changes from year to year. But if we talk not about the ranking formula, but about the filters that help Google clear the results of low-quality sites, then new versions of algorithms, such as Panda or Penguin, appear every 3-6 months.

    The answer to the question posed above can be this: search engines are constantly improving their ranking algorithms, and dramatic changes occur on average once every 6-12 months.

    Which search engine algorithms pose a real threat to promotion?

    I would like to answer the “rally” - none, but still, let’s figure it out. And to do this, we need to ask the question: do search engines set themselves the goal of preventing search engine promotion?

    I think not. There are several justifications for this:

    1. Optimizers help search engines improve their algorithms, which ultimately leads to improved quality of search results. After all, if there were no optimizers, then search engines, most likely, would have stopped their development in 2000.

    2. Without optimizers, the results for many commercial queries would look like a collection of abstracts and useless information articles.

    If search engine promotion did not exist in principle, then it would not make sense for search engines to grow and develop as intensively as they do now.

    Thus, we come to the following conclusion:

    Search engines and SEO are closely and inextricably linked with each other. That is why, by following the rules they set, you can have absolutely no fear of algorithms, because PSs do not set out to destroy SEO as such.

    Development of search engine services

    Speaking about search engines, do not forget that Yandex, Google or Bing have their own services designed to help users. In addition to search results, over the years of evolution, search engines have studied the behavior of their users in order to increase satisfaction with search results.

    Actually, for this purpose the Yandex search engine came up with the so-called mechanism. “Wizards” who help the user quickly get an answer to their question. So, for example, when entering the request “weather forecast”, Yandex will display information about the weather for the current date directly on the search results page, thereby relieving the user of the need to navigate through the search results.

    Other search engines, such as Google, went further and instead of “Wizards” they offered more interesting solution- “Knowledge graph”.

    “Knowledge graph”(from English Knowledge Graph) is the first step on Google's path to intelligent search. Thanks to this innovation, the search engine displays in the search results not only standard links, but also direct answers to user questions, a brief reference about the object of the request and information about facts related to it. Technically, the “Knowledge Graph” is a semantic network that links together various entities: individuals, events, spheres of life, things, categories. Information base for the “knowledge graph” there are a number of sources: the open semantic database Freebase, Wikipedia, the CIA open data collection and other sources.

    What conclusions can be drawn, you ask?

    The answer is simple: search and search services will continue to develop towards quick and relevant answers to user questions, providing the opportunity to get all necessary information directly into the SERP (search results) and eliminating the need to go to other sites.

    There is an opinion that search engines, with their desire to answer the user’s question here and now, can destroy search engine optimization, becoming sort of global knowledge bases. But such fears are unfounded, since in order to become global knowledge bases, they need information, and it is stored by the very sites that are worked on by the same optimizers who are involved in the fact that search engines do not stand still, but are constantly evolving.

    As you can see, both SEO and search engines are links in the same chain that cannot exist without each other. Therefore, thoughts about the imminent death of SEO are unfounded. It is quite possible that search engine optimization Over time it will evolve, for example, into consulting, but it certainly won’t die. I wish everyone successful promotion to the TOP!

    A variety of technologies and methods created over the years of development of the theory and practice of information retrieval find their application in modern information retrieval systems. Along with classic library information retrieval systems, which continue to be improved, intensive development is taking place in the field of global information retrieval systems on the Internet, which has become the main driving force of modern information retrieval technologies. The enormous volume of available information resources requires the use of scalable search algorithms. Hypertexts allow the use of fundamentally new search models based on semantic analysis of document collections. The high speed of updating pages, their free placement and the lack of guarantee of constant access leads to the need for constant re-indexing of current information resources.

    Finally, the heterogeneous composition of users, who often do not have the skills to work with a search engine, forces us to look for effective ways to formulate queries that work with minimal initial information.

    6.1. Dictionary information retrieval systems

    Dictionary information retrieval systems today are the fastest and most effective search engines that are most widespread on the Internet. Searching for the necessary information in dictionary information systems is carried out using keywords. Search results are generated during the work of one or another search algorithm with a dictionary and a query compiled by the user in the IP language.

    IPS vocabulary structure (Fig. 13) consists of the following components: a document viewer, a user interface, a search engine, a database of search images and an indexing agent.

    The information array includes information resources potentially available to the user. This includes text and graphic documents, multimedia information, etc. For the global IRS, this is the entire Internet, where all documents are characterized by a unique URL (URL - Uniform Resource Locator).

    The search engine interface determines the way the user interacts with the search engine. This includes rules for forming queries, a mechanism for viewing search results, etc. The interface of Internet search engines is usually implemented in a web browser environment. Appropriate software is used to work with audio and video information.

    The main function of a search engine is the implementation of the adopted search model. First, the user's request, prepared in IP, is translated according to established rules into a formal request. Then, during the execution of the search algorithm, the request is compared with search images of documents from the database. Based on the comparison results, a final list of found documents is generated. Typically it contains the title, size, creation date and brief annotation of the document, a link to it, as well as the value of the similarity measure between the document and the query.

    Fig. 13. Structure of the IPS vocabulary.

    The list is subject to ranking (ordering according to some criterion, usually according to the value of formal relevance).

    The database of search document images is designed to store descriptions of indexed documents. The structure of a typical IRS dictionary database is described in detail in Part 1 of the guidelines.

    The indexing agent performs indexing of available documents in order to compile their search images. In local systems, this operation is usually carried out once: after the formation of an array of documents is completed, all information is indexed and search images are entered into the database. In the dynamic decentralized information array of the Internet, a different approach is used. A special robot program, called a spider or crawler, continuously crawls the network. Transitions between different documents are made using the hyperlinks they contain. The speed of updating information in the search engine database is directly related to the speed of network scanning. For example, a powerful indexing robot can crawl the entire Internet in a few weeks. With each new crawl cycle, the database is updated and old invalid addresses are removed.

    Some documents are closed to search engines. This is information that is authorized or accessed not through a link, but upon request from a form. Intelligent methods for scanning the hidden part of the Internet are currently being developed, but they have not yet received widespread use.

    To index hypertext documents, agent programs use sources: hypertext links (href), headings (title), headings (H1, H2, etc.), annotations, lists of keywords (keywords), image captions. URLs are used to index non-text information (for example, files transferred via FTP).

    Semi-automatic or manual indexing capabilities are also used.

    In the first case, administrators leave messages about their documents, which the indexing agent processes after some time; in the second, administrators independently enter the necessary information into the IRS database.

    An increasing number of information retrieval systems produce full-text indexing. In this case, the entire text of the document is used to compose the search image. Formatting, links, etc. in this case become an additional factor influencing the significance of a particular term. A term from the title will receive more weight than a term from the figure caption.

    Modern large information retrieval systems must process hundreds of requests within a second. Therefore, any delay can lead to an outflow of users and, as a consequence, to the unpopularity of the system and commercial failures. From an architectural point of view, such information systems are implemented in the form of distributed computing systems consisting of hundreds of computers located around the world. Search algorithms and program code are subject to extremely careful optimization.

    In information retrieval systems with a large document database, technologies are used to speed up their work separation and pruning .

    Separation consists in dividing the database into obviously more relevant and less relevant parts. First, the IPS searches for documents in the first part of the database. If no documents are found or not enough are found, then the search is performed in the second part.

    Using pruning (Pruning – English abbreviation, deletion) request processing automatically stops after finding a sufficient number of relevant documents.

    Also widely used threshold search models , which define certain threshold values ​​for the characteristics of documents issued to the user. For example, the relevance of documents is usually limited to some relevance value

    All documents with a relevance value are brought to the user's attention

    If you rank search results by date, the thresholds determine the time interval when the documents were modified. For example, the IPS can automatically cut off documents that have not been changed for the last three years.

    The main advantage of a dictionary-type IPS is its almost complete automation. The system independently analyzes search resources, compiles and stores their descriptions, and searches among these descriptions. Wide coverage of Internet resources is also an advantage of such systems. Significant database volumes make dictionary information systems especially useful for exhaustive searches, complex queries, or for localizing obscure information.

    At the same time, the huge number of documents in the system database often leads to too many documents found. This causes difficulties for most users when analyzing the information found and makes it impossible to quickly search. Automatic indexing methods cannot take into account the specifics of specific documents, and the number of non-pertinent documents among

    found by such a system is often large.

    Another disadvantage of the dictionary information system is the need to formulate queries to the system in a special language. Although there is a tendency towards convergence of FL with natural languages, today the user must have certain skills in formulating queries.

    Search engine ranking algorithms are constantly evolving and improving. The main goals of this development are to provide high quality search for users and create maximum difficulty for manipulation search results website optimizers.

    These goals are interrelated, since the quality of the search directly depends on the ability or impossibility of influencing it by interested parties.

    When the search engines Yandex and Google were just beginning to develop, their ranking algorithms were primitive, which made them quite easy to manipulate. Page relevance was highly influenced by the following factors: meta tags, keyword density on the page, and highlight tags. However, this allowed “black” optimizers, who promoted sites aimed not at people, but at search engines in order to make money on the flow of visitors, to worsen the overall quality of search.

    As a result, search engines stopped taking into account the Keywords meta tag and, apparently, Description, which is now used only to form a snippet ( brief description pages) on Google. The importance of other internal optimization factors, which made it possible to maliciously manipulate search results, also decreased.

    Then the optimizers found that the number external links on the site, as well as their anchors affect the site’s position in search results. Thousands of website directories and programs immediately appeared automatic adding in them (the most famous program of this kind is AllSubmitter).

    Search engines quickly excluded most of the site directories, sharply reducing the efficiency of runs through directories, which began to be widely used by optimizers.

    After this, effective attempts to manipulate search results began to consist mainly of buying links from regular sites that were not created using directory scripts.

    Very soon, search engines learned to recognize the crude work of selling links and introduced sanctions in the form of a filter or ban for sites created solely for selling links. Moreover, in some cases, sanctions may apply to sites to which links are purchased.

    All stages of search engine development represent the following logical chain:

    1. Some basic algorithm ranking.

    2. Optimizers identify weaknesses in it and begin to massively manipulate search results.

    3. Search engines seriously adjust the ranking algorithm, changing the degree of influence of certain factors.

    4. Optimizers analyze these changes, adapt to new conditions and again begin to manipulate search en masse.

    However, search engine ranking algorithms have recently not only changed the importance of various factors, but also changed qualitatively in general.

    A comprehensive accounting of hundreds of different factors becomes relevant, and a single ranking formula is abolished, instead of which a matrix system begins to be used. An example of this is the Yandex algorithm “Snezhinsk” (a description of this algorithm is given on the page http://seo-in.ru/poiskovaya-optimizaciya/62-snezhinsk.html).

    By new system, for each individual query its own ranking formula is generated, which may be completely different from the ranking formula for other queries. If earlier it was quite easy to identify some common dependencies in the principles of search engine ranking, then in the future there will simply be no common dependencies.

    Paid tools for website promotion will most likely remain, but their use high probability will become economically unfeasible. This is exactly the situation that is currently observed in the English-language sector of the Internet.

    In the near future, a combination of the following main factors will have the greatest effect on website promotion:

    • a large array of high-quality content (unique and useful);
    • site trust;
    • site age;
    • reasonable internal optimization.

    Any special technical advancement based on identifying weaknesses in ranking algorithms will most likely lose relevance. At least, everything is going towards this.