Proper .htpasswd usage

Whymarrh picture Whymarrh · Sep 6, 2012 · Viewed 17.2k times · Source

Assuming a small (pages < 5) site, what is the proper usage of .htaccess and .htpassword? I recently watched a tutorial from Nettuts+ where this sample code was given:

.htaccess

AuthName "Login title"
AuthType Basic
AuthUserFile /path/to/.htpasswd
require valid-user

.htpasswd (created using htpasswd -c <file> <username> command)

username:encrypted-version-of-password

I am also curious as to the actual level of security this provides: can it be bypassed easily? If Apache by default does not allow users to access either of the two files directly, do they need to be outside the public directory? Are there any speed implications?

Answer

Will Palmer picture Will Palmer · Sep 15, 2012

What level of security does this provide?

.htpasswd does not provide much security by itself. That is, it provides a login mechanism, and Apache will not respond without the proper credentials, but unless separately configured, nothing about the exchange is encrypted (or even obfuscated). For example, listening to the GET request with Wireshark gives you a nice view of all the headers being sent by the client, including:

Authorization: Basic d3BhbG1lcjp0ZXN0dGVzdA==

"d3BhbG1lcjp0ZXN0dGVzdA==" being just the base64 encoded form of "wpalmer:testtest". These days, a hacker (or more likely, a virus) can sit on a public WiFi connection and log any requests containing the Authorization: for later perusal. In general, sending any authentication information over an unencrypted HTTP connection is considered a bad idea, even if you're over a wire or secure WiFi end-to-end. It would not, for example, meet the requirements of PCI Compliance if you were storing customer data, payment information, etc behind the .htpasswd lock.

Add https to the mix, and you completely eliminate that particular problem, however...

.htpasswd authentication as implemented by Apache httpd does not provide any form of rate-limiting or brute-force protection. You can make as many simultaneous attempts at a password guess as Apache is willing to serve simultaneous pages, and Apache will respond with success/failure as soon as it possibly can. You can use something like Fail2Ban to limit the number of failed attempts can be made before the client is blocked from talking to the server, but that will not necessarily provide any useful protection against a botnet, which may automatically target your server from thousands of unique addresses. This can lead to the decision of "do I leave myself vulnerable to password attempts from botnets, or do I leave myself vulnerable to denial-of-service attacks, when the entire account is locked-down due to failures from multiple clients?"

These angles of attack can be limited by adding IP-based restrictions to your .htaccess file, allowing connections only from certain addresses. Depending on how you operate, this may be inconvenient, but it also severely limits the types of threats which you would be vulnerable to. You would still be at risk from someone specifically targeting your site, or an infection on part of the network infrastructure itself. Depending on the type of content you are protecting, this may be "good enough". An example of this type of restriction is:

Order deny,allow
Deny from all
Allow from 127.0.0.1

This means, in short, "only allow connections from the local host". Line-by-line, it means:

  • Order deny,allow defines the order in which rules are processed, with the last match taking precedence.
  • Deny from all begin by assuming that all clients are denied
  • Allow from 127.0.0.1 if the client has the IP 127.0.0.1, then it is allowed

To some extent, IP-based restrictions will also protect you to the point where HTTPS may be considered optional. Attackers / viruses can still see your credentials, but it is harder for them to use those credentials on the page itself. Again, this would not be PCI compliant, and it would not be suitable for "important" information, but there are some situations for which this may be considered "good enough". Be aware that many people re-use credentials across multiple sites, so failing to protect login details is generally considered to be very dangerous to the user, even if the site itself is protected.

Finally, the .htaccess file itself is a bit of a liability. See the response to "do they need to be outside the public directory?" for more details on that.

Can it be bypassed easily?

No. There is no reason to expect that the server, when properly configured, would ever fail to require login details to access the protected content. While HTTP Basic authentication has its flaws, Apache httpd is very robust and is one of the most thoroughly tested pieces of software in the world. If you tell Apache that HTTP Basic authentication is required to access certain content, it will be required.

If Apache by default does not allow users to access either of the two files directly, do they need to be outside the public directory?

There are a couple of points to this. First, Apache does not default to preventing access to either of these files. Many distributions of Apache httpd include initial configuration which prevents access (using "Deny from all" rules) to, depending on the distribution, .htaccess/.htpasswd files, .ht* files, or .* files. It is very common, but there are plenty of reasons why this may not be the case. You can add a rule yourself to block these files, if they are not already blocked:

<FilesMatch "^.(htaccess|htpasswd)$">
    Order Allow,Deny
    Deny from all
</FilesMatch>

Secondly, it should be pointed out that the way .htaccess files work, they are processed when the directory they are in is matched. That is to say: .htpasswd may be elsewhere, but .htaccess needs to be in the same directory. That said, see the "speed implications" section for a bit more detail.

So, as they can be blocked so easily, why keep .htpasswd outside of the public directory? Because mistakes happen, and the .htpasswd file is a big liability. Even if you're using HTTPS, exposure of your .htpasswd file means that your passwords can be easily cracked via brute-force attacks. These days, consumer-grade GPUs can make millions of password guesses per second. This can make even "strong" passwords fall in comparatively little time. Again, this argument generally only applies to a targeted attack, but the fact remains that if an attacker has your .htpasswd file and wants access to your system, these days, they may be able to do so easily. See Speed Hashing on Coding Horror for a relatively-recent (April 2012) overview of the state of things.

With that in mind, the possibility of accidentally (temporarily) exposing your .htaccess file is worth moving it somewhere that should never be even looked at when httpd is looking for content to serve. Yes, there are still configuration changes which could expose it if it's "one level up" instead of "in the public directory", but those changes are much less likely to happen accidentally.

Are there any speed implications?

Some.

First off, the use of .htaccess files does slow down things somewhat. More specifically, the AllowOverride all directive causes a lot of potential slow-down. This causes Apache to look for .htaccess files in every directory, and every parent of a directory, that is accessed (up to and including the DocumentRoot). This means querying the filesystem for a file (or updates to the file), for every request. Compared to the alternative of potentially never hitting the filesystem, this is quite a difference.

So, why does .htaccess exist at all? There are many reasons which might make it "worth it":

  • depending on your server load, you may never notice the difference. Does your server really need to squeeze every last millisecond out of every request? If not, then don't worry about it. As always, don't worry about estimates and projections. Profile your real-world situations and see if it makes a difference.
  • .htaccess can be modified without restarting the server. In fact, this is what makes it so slow- Apache checks for changes or the presence of an .htaccess file on every request, so changes are applied immediately.
  • An error in .htaccess will take down the directory, not the server. This makes it much less of a liability than changing the httpd.conf file.
  • .htaccess can be modified even if you only have write-access to a single directory. This makes it ideal for shared hosting environments. No need to have access to httpd.conf at all, or access to restart the server.
  • .htaccess can keep the rules for access next to the files they are meant to effect. This can make them a lot easier to find, and just keeps things more-organised.

Don't want to use .htaccess, even considering all of the above? Any rule that applies to .htaccess can be added directly to httpd.conf or an included file.

What about .htpasswd? That depends on how many users you have. It is file-based, and the bare minimum in terms of implementation. From The docs for httpd 2.2:

Because of the way that Basic authentication is specified, your username and password must be verified every time you request a document from the server. This is even if you're reloading the same page, and for every image on the page (if they come from a protected directory). As you can imagine, this slows things down a little. The amount that it slows things down is proportional to the size of the password file, because it has to open up that file, and go down the list of users until it gets to your name. And it has to do this every time a page is loaded.

A consequence of this is that there's a practical limit to how many users you can put in one password file. This limit will vary depending on the performance of your particular server machine, but you can expect to see slowdowns once you get above a few hundred entries, and may wish to consider a different authentication method at that time.

In short, .htpasswd is slow. If you only have a handful of users who need to authenticate, you'll never notice, but it is yet another consideration.

Summary

Securing an admin section with .htpasswd is not ideal for all situations. Given its simplicity, it may be worth the risks and problems where security and performance are not the highest of priorities. For many situations, with a little bit of tweaking, it can be considered to be "good enough". What constitutes "good enough" is a judgement call for you to make.