Thursday 27 November 2008

What page errors? Soft vs. hard 404s

One of the websites I look after demonstrated its amazing crapness to me the other day when I discovered the supplier that ran it / produced it for me hadn't configured it for hard 404s, they'd used soft 404s.
  • Hard 404 - tells your web browser that the page you've tried to access does not exist. Precisely it tells the browser that the server (website) exists but it couldn't find the page you wanted. More on 404s here.
  • Soft 404 - the page you're trying to access still doesn't exist but instead of telling you so it returns a 200 response code. Now it may well show you, the user, an error page similar to a hard 404 but it also might just redirect you to the homepage so you don't actually know whether the page is broken or not. More importantly it doesn't tell your browser the page doesn't exist.
T-shirt with the 404 error code on it
Do geekier t-shirts exist?
Now substitute 'browser' for 'search engine spider' in the above definitions and hopefully you can see that by having soft 404s instead of hard 404s search engines will just assume that all these dead pages exist (or just get confused, not crawl you properly and ruin your search rankings).

If on the other hand you return hard 404s search engine spiders will know that page doesn't exist, won't index it and will move on. There's also no reason why a hard 404 response also can't show a nice pretty page explaining this to humans - such as Google's 404 page.

Now back to my example, Google had reached a page on my site that didn't exist (due to a link to a dud page on another website) and instead of ignoring it, had indexed it and it was appearing as the first hit on Google for my site name. So users clicking on this link went straight to an error page - not good.

By changing the soft 404s to hard 404s Google won't index this page, and my home page is the top hit on Google for my site name - better.

Not sure what your site returns? Think of a page that doesn't exist on your site, i.e. www.mysite.com/thispageobviouslydoesntexist.html, and enter it into this handy online tool that will tell you the HTTP status code (that's the bit that says 404 or 200).
UPDATE: If the above tool doesn't work try this one from gsitecrawler.com.

Never use soft 404s, it's the first thing I check on any website build. It's amazing how many sites still use soft 404s, such as Wikipedia. Maybe there is a good reason for it? Somebody care to tell me?

Useful link - How to make a snazzy 404 page that's good for SEO

No comments: