In terms of content, the Internet is a treasure trove of information, and in terms of the way it is organized, it is a huge dump. But, fortunately, the situation is not so hopeless – catalogs and search engines help to find the necessary information.
The work of search engines, to which catalogs are often and at the same time mistakenly referred to, is fully automated and is carried out according to the following scheme: scanning resources using a robot program, forming an index database and, finally, servicing queries by keywords. No matter how popular catalogs are, it should be clear that only automatic indexes can provide real availability of information on the Internet in its entirety.
What search engines do users prefer?
On various electronic surveys on the topic “What search engines do you use?” in Russia, the distribution of results is approximately as follows:
How do you use search engines to popularize your site? Up to forty percent of visitors go to the necessary resources through links from search engines. Therefore, the correct indexing of the site in them, that is, the correspondence of its content to the request, should be the subject of special care.
How does indexing work? Either a search engine robot gets to your site through links, or you yourself mark it on the registration page that exists in any search engine. In the first case, the indexing process may take a long time, in the second you will need to spend your time.
For the correct indexing of the site, you need to take into account the following:
• site pages must be text. Search engines do not recognize text in graphic images. True, the text in the ALT attribute of the IMG tag is usually indexed;
• every document on the site must contain an intelligible title (tag TITLE), keywords (metatag NAME = “keywords”) and a short description (metatag NAME = “description”);
• you should prepare a robots.txt file, and also enter the metatag NAME = “robots” into the documents;
• it is advisable to register the site in each search engine you are interested in manually and further control its indexing.
After registering a site in various search engines, you should ensure that the link to it when searching is included at least in the first ten (or better, if there are several links to your documents in this ten).
A directory is usually a database that stores the address of a resource and its description. The resource description is done either by the compilers of the directory (as, for example, on Yahoo!), or by those who want to include it in the directory. In a search engine, things are different.
Search engines are fully automated systems that scan the Internet. Its network agent (robot, “spider”, “worm”) bypasses all the servers assigned to it and collects an index, that is, information about what was found on which page. Network agents are essentially programs that examine the structure of hypertext on the Internet. Moving from one document to another, the robots transmit the collected information to the search engine, which enters it into its database.
The main functions of search engines are as follows:
• collection of statistics. The first robots were created precisely for this and determined the number of pages on the server, the types of files present on it, their ratios, the average page size, etc .;
• service. This refers to such functions as collecting information about faulty links and updated documents, checking links of sites whose authors have independently submitted a registration application, etc.;
• search for new resources. It is not necessary to register the site with your own hands – this can be done by a robot, since it is constantly looking for new resources. However, it can take a long time.
Full-text search engines index all words found on a Web page, with the exception of stop words (usually uninformative and low in frequency, such as conjunctions and prepositions).
Every day, search engines crawl Web sites and store text information in their huge directories so that Internet regulars can search for a list of Web pages using keywords. As a rule, as a result, hundreds of resources corresponding to the request are found, but they are displayed on the screen in “chunks” of 10–25 records. First of all, the most suitable pages are displayed, according to the evaluation of the search engine.
In this regard, the growing interest of Web site developers in search services becomes understandable, which are able to provide up to 40%, and in some cases, up to 70% of visits to the site.
Using search engines to promote a Web site cannot guarantee success if the developer has not taken into account a number of subtleties of this procedure. So, it is not always obvious which of the search engines are most significant for increasing traffic. An exhaustive answer to this question can only be given by an analysis of the statistics of site visits after registration. In addition, in order to ensure the practical, and not only theoretical, accessibility of your site from the response list for a particular request, it is necessary to take into account the peculiarities of the functioning of individual services.
Searching for information in the Russian-language part of the Internet facilitates the existence of special search tools. The principle of their operation is similar to the work of traditional databases, when in response to entering a keyword, a list of documents containing the desired concept is displayed. These systems are, in fact, bases of such words, replenished during periodic scanning of the contents of Internet servers. With the help of special robotic programs, search engines regularly scan the Internet, fixing both newly appeared and updated resources, and deleting information about resources that have become obsolete. This colossal material, with links to where each word is stored, is contained in the form of giant index files that search engines refer to for a specific request.
The advantages and disadvantages of search engines are determined by different characteristics. The principal thing is how completely the system examines the documents: are all words entered into index files or only terms from titles, headings, the first few lines or pages of text, etc. their compliance with the request. Not the least role is played by the simplicity and convenience of the interface, the ability to use boolean operators (operators of mathematical logic) and operators of the distance between words in the text of the document, as well as additional service functions, for example, searching for news, music files, goods, etc.
The service provided by the information retrieval system includes text preprocessing, including the compilation of an index, which is then searched. Such a search engine can be organized as a database with text fields. Another option for organizing is working with external texts. In this case, the texts retain their original form, that is, they remain files in the file system, pages on the server, or fields of some other database, and the index is supplied only with links to the corresponding sources.
Working with search tools requires certain experience and skills from the user.