Search Engines may look uncomplicated from the outside – a simple box that you type your search query into. However, behind the scenes is a whole other story. Complex doesn't begin to cover the intricacies of how search works.
Imagine a giant cobweb bigger than anything you can comprehend and all over the intricate silk threads a family of spiders crawl speedily around capturing their prey and binding it in silk to feed on later.
The process of search begins in a similar way with virtual spiders crawling all over the web to find relevant content to index in the search results in answer to a typed search query.
The World Wide Web currently consists of more than thirty trillion pages and every minute of every day this number continues to grow.
A search engine such as Google, navigates around the web using their virtual spiders to crawl, by following links on web pages, from one page to another. These web pages are sorted by their content, along with other relevant factors, and are stored in the Index, where they stay, until a search query ultimately determines whether or not they should be returned as search results.
Site owners can opt out if they don't want their pages to be crawled, by blocking the pages from a Robots.txt file that resides on the server.
The index contains over 100 Million Gigabytes of Content. In addition to all the sorted web page content, the index is made up of copy from millions of books from International libraries and a variety of other collaborators.
Computer Algorithms are the driving force that provides the searcher with the answers they are seeking. An algorithm is a computer program made of formulas based on over 200 factors that get to work searching for clues to better understand what your search query means and what exactly it is you are looking for in order to deliver the best results possible as you seek out answers.
Solving these clues, result in appropriate documents being pulled from the Index and in turn the results are ranked in order of relevancy. This consecutively is what the searcher sees when they view their search results in Google or another search engine. All this happens in 1/8th of a second.
There is no option to pay for a high ranking as Google do not accept payment to improve a site's position in the search engine; if a site owner wants to appear prominently in search results without relying on SEO rankings, they must pay for the privilege using Pay per Click (PPC) advertising, which is completely separate from organic search results.
Google search algorithms change constantly. This entire process begins as ideas are sparked in the engineers minds; these ideas evolve into algorithmic tests. The results are analysed, amended and run again, with the process being repeated countless times until relevant search results are achieved. With new content constantly being added to the web and old content becoming outdated, these algorithm tests are a never-ending progression.
The results a Google visitor is presented with after running their search query can vary and types include the following:
Fighting spam has never been more prevalent and Google is making it their business to fight spam 24/7 in order to ensure the search results are relevant and helpful to the searcher.
The majority of spam removal is automatic in response to algorithmic program detection.
In addition to this, Google staff carefully examines other questionable documents by hand. If spam is found, manual action is taken, such as written warnings to site owners, a drop in rankings or a complete ban from the Index.
What Google have removed lately is readily available for visitors to see in real-time –These page results include ‘pure spam’, which are pages deemed to use destructive spam practices such as automatically generated spun content (which results in gibberish nonsensical reading to humans), cloaking and content scraped from websites around the web.
When action is taken against spam, a notification is sent from Google to the website owners to allow them a chance to rectify the problem.
So there you have it; behind the simple looking search box, is a multifaceted web of ever-evolving tests designed to support more than one-hundred billion searches Google receive each and every month.
This overview of ‘How Search Works’ is derived from this fascinating depiction on Google: www.google.com/insidesearch/howsearchworks/thestory/