August 18, 2022

If you run a website on an Apache server, then you have undoubtedly come across the .htaccess file. You probably already know that, with it, you can change the way your website behaves, such as redirecting users to the www version of your website or denying access for certain IP addresses. For most people, if they need to modify the .htaccess file, they will look up the code snippet required and add it to their own .htaccess file. However, most of us will not fully comprehend the actual code contained within that snippet, potentially leading to errors and other problems. Wouldn't it be nice to understand what each line of code actually does? This article will give you the power to interpret and use common .htaccess code correctly and with confidence.

More...

The role of the .htaccess file

Most of us use shared hosting facilities to host our websites. As a result, we almost never have control of the server at the main configuration-file level to configure it for our particular website. Normally, this would mean that if you want to make server level changes, you would have to contact the server administrator to make those changes for you. Shared hosting facilities often have hundreds or even thousands of clients, and if all of those clients needed to contact the server admin every time they wanted to make any change to the server, the hosting company would quickly become overwhelmed.

In the case of Apache servers, hosting administrators often activate the use of .htaccess files which helps reduce their customer service workload. An .htaccess file is a bit like the main server configuration file but only affects the directory in which it is located as well as any subdirectories. The top level directory for a website, also known as the document root, will contain the main .htaccess file for that website. This file can override directives in the main server configuration file, allowing each individual hosting client to control how the server acts for his or her particular website. In addition, subdirectories may also contain their own .htaccess files which can, in turn, override any of their parent directory .htaccess files.

The code inside .htaccess files

Apache server software is written in the C programming language and has been designed so that users can interact with it by writing directives in a server configuration file, such as http.conf or .htaccess. These directives are expressed using syntax defined by the Apache software. However, the directives also make ample use of Regular Expressions (Regex) to pattern-match various text variables. Consequently, to understand .htaccess code, means that users also have to be able to interpret Regular Expressions as well. 

WordPress needs the .htaccess file on Apache servers

In addition to being able to configure the server for a particular website, the .htaccess file is also used by the WordPress application itself to run that website. This comes in the form of some default code that is included in every installation of WordPress running on an Apache server (Nginx servers do not use .htaccess files). This default code is shown below:

# BEGIN WordPress

RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

# END WordPress

Without this code, a WordPress website running on an Apache server would cease to function properly. So what does the code mean?

Let's break it down.

RewriteEngine On

This turns on a rewriting engine that will rewrite any URL requested by a website visitor according to rules specified in the next few lines of .htaccess code.

RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]

It's not worth going into too much detail on what is actually going on in this line of code as it is more technical than useful. Suffice it to say that an Apache server usually strips out one of the headers in an HTTP request (The HTTP Authorization request header) as a security feature of the server. This is done to protect a web visitor's credentials from being passed to every script on a server, which is especially important if you do not control the whole server. As a result, WordPress uses the above line of code in the .htaccess file to fix this 'missing' header by setting the HTTP_Authorization environment variable on the server to the value of the HTTP Authorization request header before it was stripped out. Importantly, as it is just setting an environment variable, the actual URL is unimportant here so this RewriteRule instructs the server that any requested URL (.*) should be left unchanged (-).

Regex used

Header

. (dot)

The dot symbol represents any single character

*

The asterix symbol reads as "one or more occurrences of the preceding element" (in this case, the

preceding element is 'any character' represented by the dot)

RewriteBase /

This specifies the prefix that should be used when rewriting the requested URL. In this case, the '/' refers to the document root which essentially means the domain of the website itself, for example, www.example.com.

RewriteRule ^index\.php$ - [L]

Now we come to the actual rules to follow when rewriting a URL. The RewriteRule syntax is as follows:

RewriteRule Pattern Substitution [flags]

So in the above line of default WordPress .htaccess code, we are saying that if the URL that is requested has the 'pattern', index.php (the Regex symbols are explained below), it should be processed as is (-) and not rewritten. If you add the document root RewriteBase from the previous line, then, in our example, the URL, www.example.com/index.php will not be rewritten and will be parsed untouched by the system. In addition, URLs that are not /index.php will simply pass on to the next lines of code in the .htaccess file. The [L] flag (think "Last") used here indicates that if this rule is applied, then no further rules need to be processed and the rewriting request can be handed back to the URL parsing engine.

Regex used

Header

^

This symbol can be read as "the pattern should start with"

\

This symbol is used to prevent the character that follows it from being read as Regex. In the above case, dot (.) is not Regex here and should be read literally as a dot (as in dot com)

$

The dollar symbol can be read as "the pattern should end here"

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

The next lines of code deal with all the other requests that are not www.example.com/index.php.

The system checks whether the URL requested (%{REQUEST_FILENAME} being the full path to the resource) is NOT a file (!-f) and not a directory (!-d). Why does it do this? Well, WordPress posts and pages are not files or directories but are stored in the WordPress database, so to get to them we have to start the WordPress environment which happens through the file index.php located in the document root.

Therefore, if the URL requested is neither a file nor a directory, then the next RewriteRule is applied and the URL is rewritten to once again open www.example.com/index.php. Note that the RewriteCond line can be seen as an " if " statement, and sequential RewriteCond directives have an implicit AND between them, so translating the code above gives:

If REQUEST_FILENAME is NOT a file

AND

If REQUEST_FILENAME is NOT a directory

THEN

Rewrite the URL to /index.php

If, however, the URL path is a file or a directory, then it does not apply this RewriteRule and simply deals with the URL as a file or directory from the file system.

Regex used

Header

. (dot)

The dot symbol represents any single character, which here acts as a filter to match any URL except the domain name alone (when the URL is empty)

!

The exclamation symbol can be read as "not", so  !-f  can be read as "not a file"

WordPress default code summary

If we look at the total summation of the default WordPress .htaccess code, then all requested URLs that are not files or directory paths, are rewritten so that they open the file, index.php. This is located in the document root or the 'home' folder of the website. Inside it are the instructions to load the WordPress theme, environment & template. Once WordPress has loaded, the original requested URL will then lead to a post or page or other WordPress structure, all of which are stored within the WordPress database. If, however, the requested URL is a file or directory in the file system, then it is dealt with as a request to access a file or directory on the server.

Other common .htaccess code snippets

Force redirect to HTTPS

RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

This .htaccess code snippet forces any URLs requested over HTTP to be rewritten to use the secure HTTPS protocol. As with other 'rewrites', the rewrite engine has to be first turned on with RewriteEngine On.

RewriteCond %{HTTPS} off

This is essentially the 'if statement' which checks to see if the requested URL has set the HTTPS server variable to OFF, which would only occur if the requested URL did not use the HTTPS protocol. If it is set to OFF, then the RewriteRule that follows is executed.

RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

Remember the RewriteRule syntax is RewriteRule Pattern Substitution [flags], so the first part of RewriteRule is the Pattern to watch out for, which is the regular expression ^(.*)$. Individual regex symbols are explained below but essentially this says to take any URL of any length. 

The next part is the Substitution part of the RewriteRule which says to rewrite the URL to its HTTPS version. %{HTTP_HOST} can be seen as the domain part of the URL, while the %{REQUEST_URI} represents the remaining path to the requested resource on the website.

Finally, we have the [flags] which are [L, R=301]. We have come across [L] before which indicates that if this rule is applied, then it should be the last, and the rewriting request can be handed back to the URL parsing engine. As for the [R=301], this is how the change to the URL or 'Redirect' should be coded, namely as a 301 permanent redirect.

Regex used

Header

^

This symbol can be read as "the pattern should start with"

. (dot)

The dot symbol is used to represent any single character

*

The asterix symbol reads as "one or more occurrences of the preceding element" (in this case, the preceding element is 'any character' represented by the dot)

$

The dollar sign can be read as "the pattern should end here"

Force redirect to WWW

    RewriteEngine On
    RewriteCond %{HTTP_HOST} !^www\. [NC]
    RewriteRule ^ https://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

This snippet of .htaccess code forces any URLs requested without the 'www.' prefix to be rewritten to add the prefix. Once again, the RewriteEngine has to be turned on first in the initial line of code.

    RewriteCond %{HTTP_HOST} !^www\. [NC]

Then the next line is an 'if statement' that checks to see whether the domain part of the requested URL, which is represented by %{HTTP_HOST}, does NOT (!) start with the 'www.' prefix. If it does not start with the prefix, then the next line of code is executed. Note the [NC] flag can be translated as 'no case' which means that the letter case in which the URL is written is irrelevant, so it can be either upper or lower case.

Regex used

Header

!

The exclamation symbol can be read as "not", so !^www\. can be read as "not www."

^

This symbol can be read as "the pattern should start with"

\

This symbol is used to prevent the character that follows it from being read as a Regular Expression (Regex). In this case, the next character is a dot (.), which is not Regex and should be read literally as a dot (as in dot com)

    RewriteRule ^ https://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

Once again, the RewriteRule syntax is RewriteRule Pattern Substitution [flags]. The Pattern here is ^ which in Regex syntax means 'starts with', so it is essentially telling the system to go to the start of the URL. The Substitution part of the code indicates that the requested web resource should be rewritten as "https://www." followed by the rest of the requested URL. As we have seen before, [L, R=301] flags are used at the end of the line of code. The [L] flag indicates that if this rule is applied, then no further rules need to be processed and the rewriting request can be handed back to the URL parsing engine. [R=301] instructs the system as to how the change to the URL or 'Redirect' should be coded, namely as a 301 permanent redirect.

Error page handling

ErrorDocument 400 /wp-content/plugins/bulletproof-security/400.php
ErrorDocument 401 /subscription/how_to_subscribe.html
ErrorDocument 403 /wp-content/plugins/bulletproof-security/403.php ErrorDocument 404 /404.php
ErrorDocument 500 http://error.example.com/server_error.html

Using the .htaccess file, you can also provide instructions on what to do when a specific website error occurs. The error handling syntax is ErrorDocument <3-digit code> <action>. For example, ErrorDocument 400 /wp-content/plugins/bulletproof-security/400.php, means that on receiving an error code of 400, the system should redirect the website visitor to the 400.php page of the WordPress plugin, BulletProof Security.

Website access control

Order Deny, Allow               
Deny from all
Allow from 35.166.77.22
Allow from 127.0.0.1

Another important use of the .htaccess file is to control which visiting IP addresses can access your website. Here, some simple rules apply:

Order Deny, Allow

When this line is present, then the system denies all access to the website EXCEPT for those described in any accompanying ALLOW lines.

Order Allow, Deny

Alternatively, if this line is present, then the system checks the ALLOW lines first but if they are then denied in other lines of code, they stay denied.

Deny from all

This line stops all access to the website according to the rules defined in the 'Order' directive used above.

Allow from 35.166.77.22

This line allows access to the website for visitors with the indicated IP address. However, access ultimately depends on the rules defined in the 'Order' directive used (see above).

Hotlink Protection

<IfModule mod_rewrite.c>
 RewriteEngine on
 RewriteCond %{HTTP_REFERER}     !^$
 RewriteCond %{REQUEST_FILENAME} -f
 RewriteCond %{REQUEST_FILENAME} \.(gif|jpe?g?|png)$ [NC]
 RewriteCond %{HTTP_REFERER} !^https?://([^.]+\.)?domain\. [NC]
 RewriteRule \.(gif|jpe?g?|png)$ - [F,NC,L]
</ifModule>

This code (thanks to Jeff Starr for putting it together) can be used as the ultimate .htaccess code needed to protect against having your website images and other media hotlinked to by another website. Websites embedding your media directly into their webpages, known as hotlinking, is considered bad online etiquette. That's because the hotlinking website is then using your hosting resources to display the image with little benefit to you.  

<IfModule mod_rewrite.c>
...
</ifModule>

This code enclosure should really be used on all 'Rewriting' .htaccess code as it does a check to make sure that the Rewrite module of Apache, known as mod_rewrite, is available before running any of the code contained within it. This prevents an error, if the module is not active, causing your site to crash, . 

 RewriteEngine on

We have seen this line of code before. This turns on the rewriting engine that will rewrite any requested URL according to rules specified in the next few lines of .htaccess code.

 RewriteCond %{HTTP_REFERER} !^$

As alluded to earlier, using the RewriteCond directive is like using the word 'if'. As for %{HTTP_REFERER}, this represents one the HTTP headers that accompany a URL request and contains the address of the webpage from which the resource has been requested. In the case of hotlinking, it would represent the website that has embedded your image. 

This line of code, however, is written to allow URL requests with a blank HTTP_REFERER header (!^$ - see Regex details below) through without being rewritten. The line of code itself can be essentially translated as:

If HTTP_REFERER is NOT blank

Blank HTTP_REFERER headers can be the result of a whole range of different URL request situations but none involving hotlinking. So URL requests with blank HTTP-REFERER headers are something we want to let through untouched. 

Regex used

Header

!

The exclamation symbol can be read as "not"

^

This symbol can be read as "the pattern should start with"

$

The dollar sign can be read as "the pattern should end here"

 RewriteCond %{REQUEST_FILENAME} -f

We have seen something similar to this line of .htaccess code earlier. This checks to see whether the %{REQUEST_FILENAME} variable is a file (-f) and that it is available on the server. In the case of a hotlinked image, the answer to this would be yes.

 RewriteCond %{REQUEST_FILENAME} \.(gif|jpe?g?|png)$ [NC]

Similar to the previous line, this line checks to see whether the %{REQUEST_FILENAME} server variable is one of the files that we want to protect from hotlinking. In this case, any .gif, .jpg (or .jpeg / .jpe), and .png files will be protected. The [NC], which we have also seen before, means that the letter case, in which the URL is written, is 'no case', so it can be in either upper or lower case.

Regex used

Header

\

This symbol is used to prevent the character that follows it from being read as a Regular Expression (Regex). In this case, the next character is a dot (.), which is not Regex and should be read literally as a dot (as in dot com)

|

This pipe symbol reads as "or", so in this case, "either .gif, .jpg, .jpeg, .jpe, or .png"

?

The question mark symbol makes the previous element "optional". In this case, .jpe?g? means "e" and "g" are optional characters. Consequently, all forms of this type of image extension (.jpg, .jpeg, and .jpe) will be accepted.

$

The dollar sign can be read as "the pattern should end here"

 RewriteCond %{HTTP_REFERER} !^https?://([^.]+\.)?domain\. [NC]

In this line of code, "domain" needs to be replaced with one's own website domain so that the Rewrite engine can verify that any URL requests from one's own website are not rewritten. Therefore, the code checks that the referring website is NOT (!) one's own website or any subdomain of it. The Regex used is quite clever as it takes into account all possible forms of one's website, including HTTP, HTTPS, as well as www, non-www, or any other subdomain versions. Finally, the [NC] flag means 'no case' so the letter case in which the HTTP_REFERER domain is written is not important i.e it can be either in upper or lower case lettering.

Regex used

Header

!

The exclamation symbol can be read as "not"

^

This symbol can be read as "the pattern should start with"

?

The question mark symbol makes the previous element "optional". In this case, ^https? means the final "s" is an optional character, so both HTTP and HTTPS will match. In addition, the previous element for the 2nd use of the question mark symbol is the multi-character element: ([^.]+\.), which makes it optional too.

[^.]

The ^ symbol normally means "the pattern should start with", however, when it is used within brackets, as is the case here, its meaning changes to "not". So here, the translation is "not . (dot)"

+

The plus symbol here means one or more occurrences of the preceding element. In this case, the preceding element is [^.] which means that the preceding element can be any number of any characters, none of which should be a dot

\

This symbol is used to prevent the character that follows it from being read as a Regular Expression (Regex). In the above case, the next character is a dot (.) which is not Regex and should be read literally as a dot (as in dot com)

 RewriteRule \.(gif|jpe?g?|png)$ - [F,NC,L]

Finally, we come to the actual URL rewriting part. As before, the syntax for the RewriteRule is:

RewriteRule Pattern Substitution [flags]

The Pattern to match, in this case, has either of the following endings or extensions: .gif, .jpg, .ipeg, .jpe, or .png. If the URL request is for one of these image file types, then the Substitution part is "-" (dash) which means that the URL should be left alone and not rewritten. At the end of the line of code, we find the [flags]which consist of [F, NC, L]. We know from previous encounters that [NC] means 'no case' or that the letters that make up the URL can be either upper or lower case. [L] should also be familiar. It says that if this RewriteRule is applied, then no further rules need to be processed and the rewriting request can be handed back to the URL parsing engine. Finally, we have the [F] flag which represents "Forbidden" with an error code of 403. So if the pattern is matched, the URL request is redirected to serve a 403 Forbidden page instead of the requested resource.  In this way, the requested media file is prevented from being served to the hotlinking website.

Regex used

Header

\

This symbol is used to prevent the character that follows it from being read as a Regular Expression (Regex). In the above case, the next character is a dot (.), which is not Regex and should be read literally as a dot (as in dot com)

|

This pipe symbol reads as "or", so in this case, "either .gif, .jpg, .jpeg, .jpe, or .png"

?

The question mark symbol makes the previous element "optional". In this case, .jpe?g? means "e" and "g" are optional characters, so all forms of the image extension (.jpg, .jpeg, and .jpe) will match.

$

The dollar sign can be read as "the pattern should end here"

Conclusion

The .htaccess file can be a difficult file to understand but it is critically important for WordPress websites running on Apache servers. Therefore, having a little knowledge of how it works as well as how to read one, can go a long way to helping you configuring your own .htaccess files correctly.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}