220 likes | 602 Views
301 Redirect: How Do I Love You, Let Me Count the Ways. presented by Stephan Spencer, Founder & President, Netconcepts. Time To Drink From the Firehose!. No need to take furious notes though. (Phew!) Download this Powerpoint right now from www.netconcepts.com/learn/301-redirect.ppt.
E N D
301 Redirect:How Do I Love You, Let Me Count the Ways presented by Stephan Spencer,Founder & President, Netconcepts
Time To Drink From the Firehose! • No need to take furious notes though. (Phew!) • Download this Powerpoint right now from www.netconcepts.com/learn/301-redirect.ppt
Let’s Go Under the Hood with 301s • In .htaccess (or httpd.conf), you can redirect individual URLs, the contents of directories, entire domains… : • Redirect 301 /old_url.htm http://www.example.com/new_url.htm • Redirect 301 /old_dir/ http://www.example.com/new_dir/ • Redirect 301 / http://www.example.com • Pattern matching can be done with RedirectMatch 301 • RedirectMatch 301 ^/(.+)/index\.html$ http://www.example.com/$1/
301 Redirects via Rewrite Rules • My preference is to use Apache’s mod_rewrite module and set up rewrite rules that use the [R=301] flag. Or if on Microsoft IIS Server, using ISAPI_Rewrite plugin. • The rewrite rules go in either .htaccess or your Apache config file (e.g. httpd.conf, sites_conf/…) • Precede all the rewrite rules with the line “RewriteEngine on” • If within .htaccess, also add another line “RewriteBase /”. Never add to the server config). Use it and you won’t have to have “^/” at the beginning of all your rules, just “^”
An Example Rewrite Rule • A simple example for httpd.conf • RewriteRule ^(.*)/index\.html$ /$1/ [R=301,L] • Store stuff in memory with () then access via variable $1 • A rough equivalent for .htaccess • RewriteBase / • RewriteRule ^(.*)/?index\.html$ /$1/ [R=301,L] • Ah, but there’s an error with the rule immediately above. Hint: “.*” is “greedy”
The Magic of Regular Expressions • You need to become a master of pattern matching • * means 0 or more of the immediately preceding character • + means 1 or more of the immediately preceding character • ? means 0 or 1 occurrence of the immediately preceding char • ^ means the beginning of the string, $ means the end of it • . means any character (i.e. wildcard) • \ “escapes” the character that follows, e.g. \. means dot • [ ] is for character ranges, e.g. [A-Za-z]. • ^ inside [] brackets means “not”, e.g. [^/]
Regular Expression Errors • Incredibly easy to make errors in regular expressions • When debugging, RewriteLog and RewriteLogLevel (4+) is your friend! • Back to the previous example... • RewriteRule ^(.*)/?index\.html$ /$1/ [L,R=301] • What’s the problem? .* is greedy and so it will capture the “/” within memory • http://www.example.com/blah/index.html redirects to http://www.example.com/blah//
Regular Expression Gotchas • “Greedy” expressions. Use [^ or .*? instead of .* • e.g [^/]+/[^/] instead of .*/.* • e.g ^(.*?)/ instead of ^(.*)/ • .* can match on nothing. Use .+ instead • e.g. .+/ instead of .*/ • Unintentional substring matches because ^ or $ wasn’t specified or . was used for a dot instead of \. • e.g. ^/default\.htm$ instead of /default.htm
Let’s Go Deeper Down the Rabbit Hole • A more complex example • RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC] • RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301] • [NC] flag makes the rewrite condition case-insensitive • [L] flag saves on server processing • [QSA] flag not needed. It’s implied when using R=301. Don’t want the query string maintained, put ? at the end of the destination URL in the rule.
Speaking of Tracking Parameters • Here’s how to 301 static URLs with a tracking param appended to its canonical equivalent (minus the param) • RewriteCond %{QUERY_STRING} ^source=[a-z0-9]*$ • RewriteRule ^(.*)$ /$1? [L,R=301] • And for dynamic URLs... • RewriteCond %{QUERY_STRING} ^(.+)&source=[a-z0-9]+(&?.*)$ • RewriteRule ^(.*)$ /$1?%1%2 [L,R=301]
More Fun with Tracking Parameters • Need to do some fancy stuff with cookies before 301ing? Invoke a script that cookies the user then 301s them to the canonical URL. • RewriteCond %{QUERY_STRING} ^source=([a-z0-9]*)$ • RewriteRule ^(.*)$ /cookiefirst.php?source=%1&dest=$1 [L] • Note the lack of a R=301 flag above. That’s on purpose. No need to expose this script to the user. Use a rewrite and let the script send the 301 after it’s done its work.
301 Retired Legacy URLs • Got legacy dynamic URLs you’re trying to phase out after switching to static URLs? 301 them... • RewriteCond %{QUERY_STRING} id=([0-9]+) • RewriteRule ^get_product.php$ /products/%1.html? [L,R=301] • Switching to keyword URLs and the script can’t do anything with the keywords if passed as params? Use RewriteMap and have a lookup table as a text file. • RewriteMap prodmap txt:/home/someusername/prodmap.txt • RewriteRule ^/product/([0-9]+)$ ${prodmap:$1} [L,R=301]
301 Retired Legacy URLs • What would the lookup table for the above rule look like? • 1001 /products/canon-g10-digital-camera • 1002 /products/128-gig-ipod-classic • DBM files are supported too. Faster than text file. • You could use a script that takes the requested input and delivers back its corresponding output. • RewriteMap prodmap prg:/home/someusername/mapscript.pl • RewriteRule ^/product/([0-9]+)$ ${prodmap:$1} [L,R=301]
Canonicalization • Non-www and typo domains • (The example mentioned earlier...) • RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC] • RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301] • HTTPS • (If you have a separate secure server, you can skip this first line) • RewriteCond %{HTTPS} on • RewriteRule ^catalog/(.*) http://www.example.com/catalog/$1 [L,R=301]
Canonicalization • If trailing slash is missing, add it • RewriteRule ^(.*[^/])$ /$1/ [L,R=301] • WordPress handles this by default. Yay WordPress!
Iterative URL Optimization • When iteratively optimizing a page’s URL, 301 all previous iterations directly to the latest iteration. Don’t daisy chain 301s. • WordPress handles this beautifully, and by default • Tip: Use Netconcepts’ “SEO Title Tag” plugin to mass edit all your permalink post URLs and let WordPress handle the 301s automagically. But don’t then “set it and forget it”. Continue optimizing the URLs iteratively over time to maximize search traffic.
If You’re on Microsoft IIS Server • ISAPI_Rewrite not that different from mod_rewrite • Rewrite rules go in httpd.ini file • Precede first rewrite rule with “[ISAPI_Rewrite]” • Capitalization and IIS’ case insensitivity w.r.t. URLs • RewriteRule (.*) http://www.example.com$1 [I,RP,L] • Non-www and typo domains • RewriteCond Host: (?!www\.example\.com) • RewriteRule (.*) http://www.example.com$1 [I,RP,L]
More IIS Examples • Drop the default • RewriteRule (.*)/default.htm $1/ [I,RP,L] • Add trailing slash if it’s missing • RewriteCond Host: (.*) • RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,RP,L]
Conditional Redirects? • Risky territory! Read Redirects: Good, Bad & Conditional • To selectively redirect bots that request URLs with session IDs to the URL sans session ID: • RewriteCond %{QUERY_STRING} PHPSESSID RewriteCond %{HTTP_USER_AGENT} Googlebot.* [OR] RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [OR] RewriteCond %{HTTP_USER_AGENT} Slurp [OR] RewriteCond %{HTTP_USER_AGENT} Ask\ Jeeves RewriteRule ^(.*)$ /$1 [R=301,L] • browscap.ini provides spiders’ user agents
Conditional Redirects Not Necessary • Almost always another way (w/o using user agent or IP) • In the above example, simply 301 everybody – bots and humans alike – and stop appending PHPSESSID • See http://yoast.com/phpsessid-url-redirect/ for more on this. • If you have to keep session IDs for functionality reasons, you could use a script to detect for whether the session has expired, and 301 the URL to the canonical equivalent if it has. • Matt Cutts will be talking about this topic tomorrow in “Ask the Search Engines” session. Don’t miss it!
Capture PageRank on Dead Pages • Traditional approach is to serve up a 404, which drops that obsolete URL out of the index, squandering that URL’s link juice. • But what if you 301 redirect to something valuable (e.g. your home page or the category page one level up) and dynamically include a small error notice? • Or return a 200 status code instead, so that the spiders follow the links on the error page? Then include a meta robots noindex so the error page itself doesn’t get indexed. • IMPORTANT: Don’t respond to garbage (nonsense) URLs with anything but a 404 status code. Googlebot looks for this!
Thanks! • This Powerpoint can be downloaded from www.netconcepts.com/learn/301-redirect.ppt • For 180 minute long screencast (including 90 minutes of Q&A) on SEO for large dynamic websites – including transcripts – email seo@netconcepts.com • Questions after the show? Email me at stephan@netconcepts.com