It’s no secret that SEO is a complicated undertaking. This post is intended to help lay users with some of the Jargon and basic concepts of search engines and their indexes.
So let’s just start with the terms in the title. A search engine spider is a piece of software that essentially just collects links. Google actually starts with a seed list of roughly 10,000 high quality sites & pages. In this case the quality of those pages was determined by hand and was actually picked by a human reviewer. Google has now actually started mixing in human reviewers and quantifying that data in a lot of their algorithms. Anyway back to spiders, spiders take all the internal and external links on those seed pages and follow those links as far as far down the internet chain as they go to new sites on new domains. In this manner Google actually can collect a very large portion of the links that exist out there on the internet. This collection process eventually ends up with a giant list of about 50 – 100 billion unique urls. This is the main manner in which Google acquires and crawls for new content to decide what to keep and include in its database otherwise known as an index. Other additions to Googles indexes come from the submitted sitemaps in Google Webmaster Tools, or manual submissions, but this number is very small in comparison to what the spiders find.
Now on to indexes. While Google is crawling through all the pages, its actually making a fairly accurate copy of that portion of the internet including all pages and content such as flash, videos, audio, pdfs and the list goes on. This copy of the internet is known as an index. As you can imagine its quite a lot of work keeping this index fresh and useful. Just imagine in a single day how much larger and more complex the inter-linking and content items on Facebook.com by itself could be. Google is a mammoth machine using some very sophisticated and exciting technology to deliver relevant and on-topic responses to its user search queries, and this copy of the internet, or index is what it does it calculations and analysis on to determine what should rank where, and in what position.
Using roughly 200 unique factors Google determines which pages should rank for what and in what position. Google searches its giant index of billions of documents then displays its relevant results to the user on what is known as a Search Engine Results Page (SERP), and it does this in less than half a second. AS I mentioned Google is a mammoth machine with a very noble purpose, to categorize and make available the worlds information in a useful and user friendly manner. There is much more to the process but that’s the gist of it for now.
As always we are open to feedback, suggestions, and comments. Let us know if there is something specific you would like us to write about or if you have common problems in your online marketing and SEO efforts. Thanks and have a great day!