Please, please, please, learn about injection attacks!

Thursday, August 19, 2004

ASP.NET HTML Injection PHP Security SQL

I answer a lot of posts on the forums of the ASP.NET site. And more often than I would like to, I answer a different question than the one the poster asked, because I happened to easily spot a potential injection attack in the posted code.

Now, what is an injection attack? If you don't know and you're a web developer, you're in trouble. Read on.

There are mainly two types of injection attacks, but both use the same vector of penetration: unvalidated user input.

Rule #1: User input is EVIL (pronounced eyveel, like the doctor of the same name) and should never be trusted. Validate all user input. In the case of a web application, user input is form fields, headers, cookies, query strings, or any thing that was input or sent by users (that may include some database data, or other sometimes more exotic input like mail or ftp).

The first type of injection attack, and the most deadly for most web sites are SQL Injection Attacks. It happens most of the time when the developer injects user input into a SQL query using string concatenation. For example:

SqlCommand cmd = new SqlCommand(
"SELECT ID, FullName FROM User WHERE Login='"
+ Login.Text
+ "' AND Password='"
+ Password.Text
+ "'");

This is C#, but I'm sure our VB programmer friends will get the idea (+ means &). This code is simply frightening, but I've seen it or a variation on it so often I just can't count. OK, why is it frigtening? Well, try to enter these strings into the login and password textboxes:

' OR ''='

There, you're authenticated! What happens is simple. Instead of being the simple text that you expected, the user input some evil text that contains the string delimiter and some SQL code that you're very generously executing.

Of course, this is not the worse that could happen. Any SQL command could be executed, especially if you've been careless enough not to restrict the rights of the ASP.NET user on your database. For example, the user could very well steal all the information in your database, completely obliterate it or even take complete control of the server. This leads us to rule #2, our fisrt counter measure:

Rule #2: Secure your database: don't use the sa user to connect to the database from your application, have a strong password on the sa user (and on any user), preferably use integrated authentication to keep all secrets out of your connection string and config file, and restrict the authorizations on your database objects to what's absolutely necessary (don't give writing rights to internet users except on log tables or forum tables, for example). This way, even if you accidentally write injectable code, the database will refuse to execute anything harmful beyond information disclosure (which could still be pretty bad). Please note that the above injection example will still work even if the database is secured.

So it may seem at first that the quotes are the usual suspects in this case. Actually, if you use this kind of code, you may have noticed a few glitches for example if people have legitimate quotes in their names. So what many people have been doing for a long time is to double the quotes in the input strings, something like Login.Text.Replace("'", "''"), or replace them with another harmless character (you can recognize these sites usually because they use the ` character instead of quotes). This gives a false sense of security, which is sometimes worse than no security at all. Consider this request:

"SELECT FullName FROM User WHERE ID=" + Identity.Text

Here, no need for quotes to inject code, all you need is space and letters. Enter 0 DELETE * FROM User into the textbox, and there goes your User table. And I'm sure a hacker creative enough could come up with wicked injections that don't even need spaces. Escape sequences in particular are a usual way to pass characters that were thought to be invalid in many applications (including, yes, Microsoft products whose name have two letters, begin with I and end with E). This leads us to the third rule:

Rule #3: Black lists are always incomplete (because hackers are many and potentially smarter than you). If you have to, rely on white lists, but never black lists.

A black list is a list of all characters you consider evil (like the quote). There will always be something missing in it. Consider this as a fact (even though of course it can be complete, but you should act as if this was not the case).

A white list is a list of authorized characters. What's great about it is that you know precisely what's permitted (what's in the list) and what's is forbidden (everything else). In the last example, restricting ID.Text to numeric characters is enough to secure the request.

And now for the good news. While it is useful to know all this about SQL Injections, the .NET framework (and all modern development frameworks, like Java) provide an excellent way to prevent injections: parametrized queries. Parametrized queries are safer, cleaner and make your code easier to read. Here are the two previous examples, rewritten as parametrized queries:

SqlCommand cmd = new SqlCommand("SELECT ID, FullName FROM User WHERE Login=@Login AND Password=@Password");
cmd.Parameters.Add("@Login", SqlDbType.NVarChar, 50).Value = Login.Text;
cmd.Parameters.Add("@Password", SqlDbType.NVarChar, 50).Value = Password.Text;

SqlCommand cmd = new SqlCommand("SELECT FullName FROM User WHERE ID=@ID");
cmd.Parameters.Add("@ID", SqlDbType.Int).Value = int.Parse(Identity.Text);

This way, there is no need to escape any characters because the parameter values are directly communicated in a strongly typed manner to the database.

N.B. In the second example, you may also want to use validator controls on the Identity TextBox, and check the validity of the page server-side before you build and execute the SQL query using Page.IsValid.

Rule #4: Use parametrized queries whenever possible.

Whenever possible? Does that mean that it's not always possible? Well, here's a little problem I got from the ASP.NET forums: you have a list of checkboxes on a page that have numeric identifiers as their values. Let's say that you must extract all the database rows that have the checked values. You'd want to write something like that (pseudo-code here):

SqlCommand cmd = new SqlCommand("SELECT FullName FROM User WHERE ID IN(@IdArray)");
cmd.Parameters.Add("@IdArray", SqlDbType.IntArray).Value = Convert.ChangeType(Request.Form["Identities"], typeof(int[]));

Unfortunately, there is no such thing as an array type for Sql. So in this case, unfortunately, unless someone comes up with something better, you have to rely on concatenation:

string idList = Request.Form["Identities"];
if (IntListValidate(idList)) {
SqlCommand cmd = new SqlCommand("SELECT FullName FROM User WHERE ID IN(" + idList + ")");
}
else {
throw new InvalidDataException("The posted data contains illegal characters.");
}
...
private bool IntListValidate(string input) {
for (int i = 0; i < input.Length; i++) {
if (!Char.IsDigit(input, i) && input[i] != ',') return false;
}
return true;
}

Of course, here, you have to use a white list, digits and comma in this case. Not even space is allowed. That's pretty safe, but I wish you could do that with parameters.

Update 8/19/2004: Kyle Heon just pointed me to this great Sql article that explains just how to do that with a parameter. Thanks for the link, Kyle! So now, there's one less reason not to use parameters everywhere.
http://msdn.microsoft.com/en-us/library/aa496058.aspx

The second common type of injection attack is Cross-Site Scripting, or X-Site Scripting. Consider this simple piece of asp page:

Bonjour, <%= Request.Form("Name") %>.

What if the user enters <script>alert("5tUp1dH4k3R ownz you");</script> in the Name form field? Well, he successfully displayed an alert on his browser using your page. Nothing to be afraid about for the moment, as he'll be the only one to see it. But what if instead of directly displaying the user input we store it in a database for other users to see, in a forum application, for example? What if the parameter is passed in the querystring parameters of a hyperlink in a mail that seems to come from your bank?

Well, then a lot of nasty things can happen (by the way, these are real scenarios, stuff that happened and continues to happen every day). For example, the attacker can inject script that will post all your cookies or some confidential information that's displayed on the page to some remote site that he owns. This can include your social security number, your authentication cookies, your credit card number, or any sensitive information that may be displayed on the page.

It is usually relatively harmless for many sites because they just don't have any information that's confidential or that could allow for other, more dangerous attacks.

Once again, ASP.NET gives a first line of defense out of the box. Since v1.1, all form and querystring data is validated on the server before the page is executed. So the above example does not work on ASP.NET 1.1, it will just throw an exception. Now, this feature is sometimes deactivated by controls such as rich text editors.

The second thing we're doing, which is what you should do in your own pages and controls, is to HtmlEncode any property that comes from user input. That includes the value of an input tag that's rendered from a TextBox. It protects this particular textbox from script injections and also makes it robust against legitimate characters in the contents, such as quotes.

Rule #5: Encode all rendered properties that come from user input when rendering them.

The above example would then become:

Bonjour, <%= Server.HtmlEncode(Request.Form("Name")) %>.

There's another often overlooked rule:

Rule #6: Don't display any secrets in error messages.

ASP.NET limits by default the complete error messages to calls from the local machine. A complete error message sent to any machine can reveal table names or other secrets that could give clues for some attacker to use. And usually, an error message gives an indication as to how to make this application fail, which can be repeated and improved on the basis of all the information the error message contains.

And of course, probably the most important rule:

Rule #7: Encrypt any page that contains sensitive data.

Of course, these rules are all important and the order in which they are presented here is irrelevant. Did I forget something?

If you need more information, here's some more reading:

On Sql Injections: http://www.governmentsecurity.org/articles/SQLInjectionModesofAttackDefenceandWhyItMatters.php

On Cross-Site Scripting: http://www.net-security.org/dl/articles/xss_anatomy.pdf

And of course, Google is your friend.

UPDATE: I've used the word "quote" in this article for both the apostrophe (or single quote) and double quote. Todd pointed out in the comments that was somehow ambiguous. The point is that anything that can be used as a string delimiter, or as a delimiter in general, should be considered suspicious. Double-quotes are more frequent, but some languages such as JavaScript use both single and double quotes. SQL uses single quotes. Bottom line: beware of delimiters, and remember you may not even know the list of possible delimiters.

UPDATE: we just released new tools that aim at helping developers scan their code for potential injection attacks.
http://www.microsoft.com/technet/security/advisory/954462.mspx

"Writing Secure Code" has a lot to say about all of this, and should be a must-read book for all developers.

George V. Reilly - Thursday, August 19, 2004 6:54:00 AM

Great article, thanks.

Mike Griggs - Thursday, August 19, 2004 10:21:00 AM

Excellent write up.

John S. - Thursday, August 19, 2004 2:57:00 PM

Jerry, this is simply wrong. If you had just tried this: <asp:textbox runat=server Text="<script>alert('blah');</script>"/>, this is the rendered html you would have got: <input name="ctl00" type="text" value="&lt;script>alert('blah');&lt;/script>" />

We do not encode literal controls because they are... literal.

Bertrand Le Roy - Thursday, August 19, 2004 6:03:00 PM

Thanks for the article!

JP.Boomtown | spIder - Thursday, August 26, 2004 6:30:00 AM

Not sure why you think this should be a standard, but you can find a lot of such regular expressions on sites such as regexplib.com.

The problem is that there would be about as many expressions as there are use cases. If you try to do something too universal, there is a serious risk to have too many false positives and to block legitimate data. The white list is often very context sensitive. I've showed an example on integer fields, which is one of the simplest cases, but it can get a lot more complicated than that.

The point is that to fight against SQL injections, the best you can do is always use parametrized queries. This way, you never even need white lists.

The cases where you can't use parametrized queries are very few.

Bertrand Le Roy - Thursday, September 9, 2004 6:16:00 PM

So what is the best way to reject scripts in input? anyone please...

jigar - Monday, October 8, 2007 12:46:46 AM

Jigar: do you mean in an html input tag? Then use an asp:textbox, or just html-encode the value when rendering it.
If you mean in input data in general, the trick is to always use a technique that makes it impossible in the current context to inject code. That means parameters  in SQL instead of concatenation, HTML-encoding when rendering HTML, etc. Finally, validate input through a white list.

Bertrand Le Roy - Monday, October 8, 2007 4:55:35 AM

hello
i am like hack login admin panel in asp.net web applection !

df - Thursday, February 14, 2008 6:59:10 PM

One thing I see all over the place in programming circles that people mess up all the time is quotes and apostrophes.

A quote is a "

An apostrophe is a '

There is no such thing as a double quote unless it's two quotation marks back to back like ""

If we all help educate people, then perhaps they will start to understand.

Todd - Wednesday, June 18, 2008 3:39:02 PM

@Todd: thanks for the tip, and I apologize for my English which may sometimes be imprecise as I'm a non-native English speaker. Still, I did some research and here's the definition I found:
"quo·ta·tion mark (plural quo·ta·tion marks) noun
Definition:
punctuation identifying quotation: either of a pair of punctuation marks, either in double (" ") or single (' ') form, used around direct speech, quotations, and titles, or to give special emphasis to a word or phrase"
http://encarta.msn.com/dictionary_1861698453/quotation_mark.html
Still, I'll update the post to make this clearer as this definition doesn't seem to be universally adopted and yours is less ambiguous.

Bertrand Le Roy - Wednesday, June 18, 2008 4:51:56 PM

Bertrand,

the link you highlighted from Kyle Heon no longer seems to work, I just get a page with a load of Chinese characters.

Please can you check or advise on what solution the page revealed?

Thanks

Dan - Friday, June 27, 2008 1:48:06 PM

@Dan: I updated the link. Thanks for the heads up.

Bertrand Le Roy - Friday, June 27, 2008 7:21:58 PM

Did you know you have two "Rule #4"'s? :)

Good writeup, stuff every beginning programmer should know but most never learn.

David Nelson - Tuesday, December 16, 2008 3:45:31 PM

using htmlencode/decode works well i have been using it awhile. it is a bit of a pain to have to encode all user input but it works.

i never really thought about the error messages being a weakness but it makes perfect sense, it really would reveal database information.

the db connection thing in webconfig i can't seem to get around not having to put user & password information in it.

any ideas ?

bahamas4ever - Wednesday, January 7, 2009 3:56:38 AM

@bahamas4ever: you should use integrated security to avoid sticking passwords in config http://msdn.microsoft.com/en-us/library/ms254500.aspx

Bertrand Le Roy - Thursday, January 8, 2009 12:37:11 AM

16 Comments