Not much: an “indexed” page is a page that has been crawled by a search engine spider and filed away in that search engine’s index for later use (note the italics), and a “cached” page is one that might actually show up in search results.
Apparently, getting your submitted pages listed in the SERPs is a multi-step process. First, the spiders need to access your home page and crawl whatever links it finds there. If they crawl, when they crawl, and how deep they crawl, depends on a number of factors, but let’s assume you have an easily accessible site rich with content and that the spiders have crawled every page. Congratulations! The next step is for them to decide whether or not to put the information they find into their index. Now assume they like your content well enough and decide to index all of it. Great! At this point, you can say that your pages have been indexed.
So what? Big deal. By this I mean it’s good to be indexed, but it’s not enough because this part of the index is not offered to the general search public. That is, your pages have not been cached, but all of the textual information from your pages (urls, titles, tags, snippets) have been indexed. Of course, this textual information has been cataloged, categorized, filed (indexed!) based on the keywords you chose when you built your pages and on the relative focus and interaction of those keywords on those pages, and it has been ranked accordingly, but this part of the index is not meant for the SERPs and is not accessible to surfers. This is web page purgatory. La-la land. The Google “sandbox”. It’s a way station for lonely, unused web content. A word bank from which search engines can withdraw money. Okay, no more metaphors. The point is that the search engines will not offer this content to web surfers at this stage. Google may tell you, “We have just indexed your pages!” Great, but they will NOT show up in search engine results. They may eventually arrive, but not yet.
The last step in the process comes when the search engines decide that your information (which they have already indexed and so have access to) might actually be of use to someone. When that happens, they will take a “snapshot” of the page (save or download it) and store the file away in their index of cached web pages. This is a completely different index, or more accurately, a subset of the original index, but it is safe to say that these cached pages qualify for inclusion in the SERPs, and now your potential customers may be able to find you. The search engine gods have gone one step beyond indexing your content and have now indexed the actual pages, which they will present to web surfers in their full and complete glory. It’s really just a matter of semantics. Your content has been indexed the whole time, but that doesn’t guarantee inclusion in the SERPs. Only further indexing (caching) will do that.
So, a search engine could have all 1,000 of your pages “indexed” but only 50 or so “cached.” This explains the strange numbers you get when you do a site search (site:yourdomain.com) and notice that the number at the top differs, often drastically, from the actual results on the page. This is to be expected because there may be several pages on your site which the spiders have crawled and indexed, but which the search engines did not find meaningful or useful enough to cache. You may even agree with them. A “thank you for ordering” page may get indexed (spiders are not overly picky eaters), but it probably shouldn’t get cached and it probably won’t. But the search engines figure (or were told to figure) that this page may become useful someday and so keep it in the index uncached. They will likely revisit the page on subsequent crawls to see if any changes have been made and to reevaluate the situation.
Sometimes a search engine will just stubbornly refuse to cache a page in their index which you know for a fact is very useful and which you know your customers would just love
Leave a ReplyWant to join the discussion?
Feel free to contribute!