.htaccess tricks

// January 21st, 2009 // General

Tips and tricks for using a .htaccess file on a Unix or Linux server running Apache.

What is the .htaccess file?

It’s a text file which normally resides in your main site directory and/or in any subdirectory of your main directory. There can be just one, there can be a separate one in each directory or you may find or create one just in a specific directory.

Any commands in a .htaccess file will affect both the directory it is in and any subdirectories of that directory. Thus if you have just one, in your main directory, it will affect your whole site. If you place one in a subdirectory it will affect all the contents of that directory.

In your root directory, simply add a file called: .htaccess

To prevent people viewing your .htaccess, add the following:

< Files .htaccess>
order allow,deny
deny from all
< /Files>

Page Redirects

When changing servers, moving content, or altering directory structures, to prvent 404 error pages, we can implement a 301 redirect, which will direct all incoming requests to their new permanant location:

Redirect 301 /oldpage.html http://www.yourwebsite.com/newpage.html

Want to redirect your entire website?, not a problem:

Redirect 301 / http://www.your-new-website.com/

Prevent Directory browsing

Iif you don’t have an "index.htm" file in your directories, many servers will show nthe entire content of the folder when a user browses to it. rather than have to create an index file for each folder, simply add the following directive:

IndexIgnore *

You may want the directory to list, but hide certain file types, if so, use this:

IndexIgnore *.zip *.jpg

On the other hand, you might already have directory listings hidden, so, to over-ride this and show, use:

Options +Indexes

Prevent image hotlinking

When people "hotlink" to your images, this means they are linking directly to them within their websites (we are lucky Flickr don’t use this directive!). To disallow hotlinking, and server them up another image, simply use:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain.com/.*$ [NC]
RewriteRule \.(gif|jpg)$ http://www.yourdomain.com/imagetheif.gif [R,L]

Just change "yourdomain" to your domain name, and modify the path/name of the image to replace it. (perhaps an angry image of Mr T?)

Change Directory listing

Many websites ahev a splash page, or you may want to define the order ijn which a visiter will see your default pages. This is often quite good if you want to show a static page first. Simply add: (change to what you like)

DirectoryIndex index.html index.php

MIME Types

MIME types are languages that your server understands. Most servers are setup to be able tio understand flash, mp3’s etc etc, if not, you can add like the following:

AddType application/x-shockwave-flash swf

View the entire list of MIME types at Webmaster-toolkit.

Error handling

Proper error handling is good for usability purposes, and helps your website maintain a professional feel. Some of the most common errors that occur are:

400 Bad request – Normally a malformed URL request or improper usage of a script.
401 Authorization Required – A result of attempted access to a Restricted Area without the proper credentials
403 Forbidden – When a user tries to access a file which has incorrect permissions
404 Not Found – A result of a file/image being moved, or normally a result of renaming an item
500 Internal Server – The server pooped itself

To avoid issues, and let the user know what the problem is, you can add the following directives to your .htaccess file. Note that we place these in a folder called errors, both this, and the name of the files can be changed, but I prefer to do it this way to keep it clean and easy to use. One other important thing I like to do is add this folder to the disallowed list in my robots.txt file in the root directory.

ErrorDocument 400 /errors/400.html
ErrorDocument 401 /errors/401.html
ErrorDocument 403 /errors/402.html
ErrorDocument 404 /errors/404.html
ErrorDocument 500 /errors/500.html

Block Users

Sick of pests, or want to deny certain ip’s from your website? you can deny these people by adding the following (just change the i.p. addresses to the ones you want:

order allow,deny
deny from 123.4.5.6
deny from 007.0.1.2
allow from all

Block Referring websites

Many sites may "hotlink" to your content or information, including showing your pages in inframes on theoir website. To overcome this, we can use similar to the following:

RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} websitetoblock\.com [NC,OR]
RewriteCond %{HTTP_REFERER} secondwebsitetoblock\.com
RewriteRule .* - [F]

[NC] Denotes it is case insensitive. If you get a 500 error after using this code, simply remove the # in front of Options +FollowSymlinks

Block Bots

There are MANY bad bots roaming the internet, many will scrape data, mine email addresses and more. To block these main ones, and send them to a 403 forbidden, add this:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

File Permissions

Finally, don’t forget to set the file permissions of your .htaccess file to 644

Conclusion

well, this isn’t the be all and end-all of .htaccess information, but hopefully, it’s a good start and will help you on your way!

Leave a Reply