|
| Smart Web Center - Basic Search Engine Tutorial |
|
Search Engine Basics
The purpose of this quick tutorial on search engines for beginners is to briefly and simply explain how search engines work. Simple Definition of a Search Engine One of the ways that people find what they are looking for on the Internet is by using Search Engines. Search engines are websites that catalog other websites. Visitors to search engines type in a word or several words that relate to what they are looking for. The search engine gives them a list of websites that contain the words they are searching for. How Search Engines Work Search engines use computer programs called robots to collect information for their indexes. Search engine robots are also called spiders, web crawlers, worms or bots. These search engine robots are designed to follow a link to go to a web page, read and analyze the text it finds on the web page, copy all or some of the text, record the website address, find the links on the page and follow the links on the page to another web page. It returns the information it gathers to the search engine. The information collected by the robot is added to the search engine's collection of information. Some search engine robots are very smart and are designed to collect the information only under certain conditions. All search engine robots are designed to travel the web by finding links on web pages and following them to other web pages. Back at the search engine, the collected text and web page information is catalogued and indexed. When a visitor comes to the search engine and types in a word, the search engine looks in its catalog and finds the pages that match the word. The visitor is given a list of web pages containing the word they are looking for, including a short description and the link to go to the web page. The words that the visitors type in to a search engine are called keywords. They are called keywords because the search engine doesn't index words like "a", "the", "at" and so on. It only indexes words that are useful for discovering information. Visitors can type in more than one word to find information on a search engine. The group of words is called a "keyword search phrase." Keyword Relevance and Density When a web page is indexed by a search engine, all of the keywords on the page are counted and assigned a percentage. As an example, if a web page contains 100 keywords and the keyword "cat" appears seven times on the page, then the keyword "cat" it is 7% of the total number of keywords. This percentage is called keyword density. In our example, "cat" has a keyword density of 7%. The search engine sorts its list of web pages starting with the web page with the highest keyword density for a particular keyword or keyword phrase at the top and the web page with the lowest keyword density at the bottom. Pages with higher keyword density are considered to be more relevant than those with lower keyword densities. Page Ranking Small, simple search engines simply use keyword density to sort their results. Larger, more sophisticated search engines have collected millions and millions of web pages and added them to their indexes. Any given keyword may result in dozens, even hundreds of web pages that have the exact same keyword density. As an example there could be 457 web pages with a keyword density of 7% for the "cat" keyword. The more sophisticated engines use various procedures to decide how to further sort the 457 pages. Each search engine has its own procedures. They add overall web page revelance, overall web site relevance, website quality and website popularity to their sorting process. Overall Page Relevance To check overall page relevance, the search engine may cross reference in many different ways:
Overall Site Relevance The search engine might check other pages on the web site to see if they appear to be about the feline variety of cat, maybe more images or text about tabby cats, check to see if the domain name is tabbycats.com and so on. It may check the links to other web sites and see if what their overall site relevance is. It might count how many pages there are on the entire website. Website Quality The search engine's robot may check for things like outdated links that don't work any more, web page code it recognizes as harmful code or code for multiple pop up pages and so on. These will be indexed as problems and can be factored into the ranking method of the search engine. Some search engines look at the age of the website to see how long it has been online. It may compare the number of links on the website to the amount of text on the website. It may check to see if the website is linked to sites it has banned in its index for spamming or that it has given a low quality rating to. It may check to see if the website links only to other websites with the same domain name owner name. It may also check to see if the website links to the same website on every page. Some search engines will compare the changes made to a website over time looking for how often it adds new content, how often the pages are updated and so on. The search engine may check for duplicate web pages on the website or web pages that exactly match web pages on other websites. Website Popularity To see if a website is popular, the search engine checks to see how many sites link to the website, what their relevance, popularity and quality level is. Overall Result Ranking The search engine keeps track of the results it has accumulated about relevancy, quality and popularity. As the search engine robots bring back fresh information it re-indexes its entire collection, usually on a schedule, once a day, week or month. It really depends on the individual search engine's procedure. |