Published in Search Engines on Tuesday, August 17th, 2004
Keeping your urls consistent with a couple of simple re-write rules.
Updated on 19/08/2004
The following are two (of the many) little things that I've learnt over the years trolling the bounty of information that is WebmasterWorld. They're useful, and I've noticed that not everyone seems to apply them...
Search engines sometimes get a little fooled when they find links pointing to both the 'www' and 'non-www' version of websites.
Many people have the experience of finding, for example, that Google gives them a Page Rank of 4 for http://mydomain.com
and 5 for http://www.mydomain.com
. In addition, the same type of confusion can occur with http://www.mydomain.com/index.html
vs. http://www.mydomain.com/
.
What to do? Well, luckily the answer is quite simple.
The general consensus for a solution to the aforementioned problems is to use a 301 - permanently moved response and send the user to the URL that you want to use. The following code, on an Apache server, will do the trick:
RewriteEngine on # ============================================= # This sends all to www. Remove the 'www' to # send to mydomain.com # --------------------------------------------- RewriteCond %{HTTP_HOST} !^www.mydomain.com RewriteRule ^(.*)$ http://www.mydomain.com/$1 [R=301,L] # ================================================== # This sends requests for index.html to the root. # -------------------------------------------------- RewriteRule ^index.html$ / [R=301,L]
The result is that search engines have only one place to go to find your home page, and any links pointing to the other URLs are credited to the one which you select to use. Clean, simple and consistent.
RewriteCond %{HTTP_HOST} !^www.mydomain.com
we are redirecting anything that is not www.mydomain.com
to that domain. If you have joe.mydomain.com
type sub-domains, you may want to consider using RewriteCond %{HTTP_HOST} ^mydomain.com
, which specifies redirecting only the non-www version. That is:RewriteEngine on RewriteCond %{HTTP_HOST} ^mydomain.com RewriteRule ^(.*)$ http://www.mydomain.com/$1 [R=301,L]
Sitepoint's web devlopment books have helped me out on many occasions both for finding a quick solution to a problem but also to level out my knowlegde in weaker areas (JavaScript, I'm looking at you!). I am recommending the following titles from my bookshelf:
I started freelancing by diving in head first and getting on with it. Many years and a lot of experience later I was still able to take away some gems from this book, and there are plenty I wish I had thought of beforehand. If you are new to freelancing and have a lot of questions (or maybe don't know what questions to ask!) do yourself a favor and at least check out the sample chapters.
The author line-up for this book says it all. 7 excellent developers show you how to get your JavaScript coding up to speed with 7 chapters of great theory, code and examples. Metaprogramming with JavaScript (chapter 5 from Dan Webb) really helped me iron out some things I was missing about JavaScript. That said each chapter really helped me to develop my JavaScript skills beyond simple Ajax calls and html insertion with libs like JQuery.
Like the other books listed here, this provides a great reference for the PHP developer looking to have the right answers from the right people at their fingertips. I tend to pull this off the shelf when I need to delve into new territory and usually find a workable solution to keep development moving. This only needs to happen once and you recoup the price of the book in time saved from having to develop the solution or find the right pattern for getting the job done..
Comments and Feedback
Finally, someone took the lead in this thing and wrote it up. This applies to more things though. Like creating directories for weblog posts with an 'index.php' inside (and linking to both, depending if you start from the mainpage or the Atom feed), all WordPress weblogs with posts retrievable in both the directory, /post/, and file, /post, way. Et cetera.
However, this is great first step. I prefer no-www myself, but consistency is key.
Good information I always wondered about how to get around the www or index.html.
What would the referring log show up as for it?
Hey Blake, here's the result of a quick test on my local laptop server:
Glad you both found this useful/relevant!
This is so easy yet so many big, big websites have issues between "www." and no "www.", especially ".co.uk" domains. I see no need for the "www." prefix now that http:// is synonymous with web documents (as opposed to FTP traffic, etc.). Anyway, nice to see this documented at a high profile site.
just one question though: what's the purpose of
RewriteCond %{HTTP_HOST} .
unless i'm missing something, it just test whether HTTP_HOST is made up of any single character?
Thanks Patrick, all fixed. Funny how the eye misses these things...
Hmmmm, I wonder who wrote it first?
;-)
Ha, I do beleive that I e-mailed you about this before you used it on your sites ;-P. Maybe you missed it 'cause I never did hear back from you on that one...
Anyway, I didn't realize that someone had to take
, or I would have published it sooner!You emailed me? *cough*Must've missed it*cough* You did guide me a bit, but I had the drive and determination to study for weeks before finding the correct way of doing it and writing the most concisely thorough article on the subject the world has ever seen. So I demand that you delete this entry along with all comments so that I get my just due.
...
To everyone who doesn't know me, yes I am joking. Mike and I be cool.
All good... I read yor article and liked it better; a different tone, one that comes with age and experience ;-]
Another option for enforcing www consistency, particularly if you are already running multiple sites, is to use a separate virtual host. For instance:
<VirtualHost *>
ServerName example.tld
Redirect permanent / http://www.example.tld/
</VirtualHost>
(Sorry for the double-line spacing, but your comment form insists that
pre
is an illegal tag... despite the fact that an existing comment uses it! And since it won't takebr
either, even in XHTML form, I had to make do with paragraphs.)The same technique can be used to remove the www.
This has the advantage that it doesn't require mod_rewrite. On the other hand, it can't take care of extra index.html links.
One last note: on new sites, you can prevent duplicate index.html entries by making sure you never use index.html in a link. Absolute links are each, and for relative links within a directory, you can use
<a href="./">...</a>
. If search engines (and visitors making their own links) never see a link straight to the file, and it hasn't been manually submitted, they'll never even look at the index.html location.Great advice Kelson, and sorry about the
pre
bit. I'll admit I cheated and added it right into the db. I was going to 'allow' pre and some other things today but haven't gotten around to it.That last bit is most certainly true. Don't let'em know it exists and they can't post to it...