Tales from the Evil Empire

Bertrand Le Roy's blog

News


Bertrand Le Roy

BoudinFatal's Gamercard

Tales from the Evil Empire - Blogged

Blogs I read

My other stuff

Archives

August 2004 - Posts

Yield and generics rock!
In this post, I'll try to show you why you should not limit your use of generics to strongly typed lists, and why yield may become your favorite new keyword.
Here's an example of how you can use both in a very simple piece of code.
Imagine you want to filter out an enumeration using an arbitrary function. For obvious performance reasons, you don't want to create a temporary list and filter that. You have to construct the filtered enumeration on-the-fly as the original enumeration is enumerated.
Of course, you can do that with .NET 1.1, but if you try it, you'll see that the code will be considerably more verbose. Here's how it may be done with .NET 2.0:
 

public sealed class FilteredEnumerable<T> : IEnumerable<T>, IEnumerable {

  private IEnumerable<T> _enumerable;
  private Predicate<T> _filter;

  public FilteredEnumerable(IEnumerable<T> enumerable, Predicate<T> filter) : base() {
    _enumerable = enumerable;
    _filter = filter;
  }

  IEnumerator<T> IEnumerable<T>.GetEnumerator() {
    foreach (T item in _enumerable) {
      if (_filter == null || _filter(item)) {
        yield return item;
      }
    }
  }

  IEnumerator IEnumerable.GetEnumerator() {
    return (IEnumerator)(((IEnumerable<T>)this).GetEnumerator());
  }
}

The class is constructed from two generics of types IEnumerable<T> and Predicate<T>. A Predicate<T> is a boolean function that takes an argument of type T. That will be our filtering function. The actual code, the generic GetEnumerator function, enumerates the original enumeration, tests if the predicate returned true for this item, and yields it only in this case.
You can use this class this way:
 

string[] stringsToFilter = new string[] {"Red", "Green", "Blue", "Pink"};
Predicate<string> filter = delegate(string stringToFilter) {
  return (stringToFilter.IndexOf('e') != -1);
};
filteredStrings = new FilteredEnumerable<string>(stringsToFilter, filter);

The filteredStrings enumerable then contains all strings from stringsToFilter that contain the letter "e": red, green and blue, but not pink. Pretty slick, eh? Note the use of an anonymous function here to construct the filtering predicate. Now, you've got a generic class that filters anything by any function.
Please, please, please, learn about injection attacks!
I answer a lot of posts on the forums of the ASP.NET site. And more often than I would like to, I answer a different question than the one the poster asked, because I happened to easily spot a potential injection attack in the posted code.
 
Now, what is an injection attack? If you don't know and you're a web developer, you're in trouble. Read on.
 
There are mainly two types of injection attacks, but both use the same vector of penetration: unvalidated user input.
 
Rule #1: User input is EVIL (pronounced eyveel, like the doctor of the same name) and should never be trusted. Validate all user input. In the case of a web application, user input is form fields, headers, cookies, query strings, or any thing that was input or sent by users (that may include some database data, or other sometimes more exotic input like mail or ftp).
 
The first type of injection attack, and the most deadly for most web sites are SQL Injection Attacks. It happens most of the time when the developer injects user input into a SQL query using string concatenation. For example:

SqlCommand cmd = new SqlCommand(
  "SELECT ID, FullName FROM User WHERE Login='"
  +
Login.Text
  + "' AND Password='"
  +
Password.Text
  + "'");
 
This is C#, but I'm sure our VB programmer friends will get the idea (+ means &). This code is simply frightening, but I've seen it or a variation on it so often I just can't count. OK, why is it frigtening? Well, try to enter these strings into the login and password textboxes:
' OR ''='
 
There, you're authenticated! What happens is simple. Instead of being the simple text that you expected, the user input some evil text that contains the string delimiter and some SQL code that you're very generously executing.
 
Of course, this is not the worse that could happen. Any SQL command could be executed, especially if you've been careless enough not to restrict the rights of the ASP.NET user on your database. For example, the user could very well steal all the information in your database, completely obliterate it or even take complete control of the server. This leads us to rule #2, our fisrt counter measure:
 
Rule #2: Secure your database: don't use the sa user to connect to the database from your application, have a strong password on the sa user (and on any user), preferably use integrated authentication to keep all secrets out of your connection string and config file, and restrict the authorizations on your database objects to what's absolutely necessary (don't give writing rights to internet users except on log tables or forum tables, for example). This way, even if you accidentally write injectable code, the database will refuse to execute anything harmful beyond information disclosure (which could still be pretty bad). Please note that the above injection example will still work even if the database is secured.
 
So it may seem at first that the quotes are the usual suspects in this case. Actually, if you use this kind of code, you may have noticed a few glitches for example if people have legitimate quotes in their names. So what many people have been doing for a long time is to double the quotes in the input strings, something like Login.Text.Replace("'", "''"), or replace them with another harmless character (you can recognize these sites usually because they use the ` character instead of quotes). This gives a false sense of security, which is sometimes worse than no security at all. Consider this request:

"SELECT FullName FROM User WHERE ID=" + Identity
.Text
 
Here, no need for quotes to inject code, all you need is space and letters. Enter 0 DELETE * FROM User into the textbox, and there goes your User table. And I'm sure a hacker creative enough could come up with wicked injections that don't even need spaces. Escape sequences in particular are a usual way to pass characters that were thought to be invalid in many applications (including, yes, Microsoft products whose name have two letters, begin with I and end with E). This leads us to the third rule:
 
Rule #3: Black lists are always incomplete (because hackers are many and potentially smarter than you). If you have to, rely on white lists, but never black lists.
 
A black list is a list of all characters you consider evil (like the quote). There will always be something missing in it. Consider this as a fact (even though of course it can be complete, but you should act as if this was not the case).
 
A white list is a list of authorized characters. What's great about it is that you know precisely what's permitted (what's in the list) and what's is forbidden (everything else). In the last example, restricting ID.Text to numeric characters is enough to secure the request.
 
And now for the good news. While it is useful to know all this about SQL Injections, the .NET framework (and all modern development frameworks, like Java) provide an excellent way to prevent injections: parametrized queries. Parametrized queries are safer, cleaner and make your code easier to read. Here are the two previous examples, rewritten as parametrized queries:

 

SqlCommand cmd = new SqlCommand("SELECT ID, FullName FROM User WHERE Login=@Login AND Password=@Password");
cmd.Parameters.Add("@Login",
SqlDbType.NVarChar, 50).Value = Login
.Text;
cmd.Parameters.Add("@Password",
SqlDbType.NVarChar, 50).Value = Password.Text;

SqlCommand cmd = new SqlCommand("SELECT FullName FROM User WHERE ID=@ID");
cmd.Parameters.Add("@ID",
SqlDbType.Int).Value = int.Parse(Identity.Text);

This way, there is no need to escape any characters because the parameter values are directly communicated in a strongly typed manner to the database.
 
N.B. In the second example, you may also want to use validator controls on the Identity TextBox, and check the validity of the page server-side before you build and execute the SQL query using Page.IsValid.
 
Rule #4: Use parametrized queries whenever possible.
 
Whenever possible? Does that mean that it's not always possible? Well, here's a little problem I got from the ASP.NET forums: you have a list of checkboxes on a page that have numeric identifiers as their values. Let's say that you must extract all the database rows that have the checked values. You'd want to write something like that (pseudo-code here):

 

SqlCommand cmd = new SqlCommand("SELECT FullName FROM User WHERE ID IN(@IdArray)");
cmd.Parameters.Add("@IdArray",
SqlDbType.IntArray).Value = Convert.ChangeType(Request.Form["Identities"], typeof(int[]));

Unfortunately, there is no such thing as an array type for Sql. So in this case, unfortunately, unless someone comes up with something better, you have to rely on concatenation:

 

string idList = Request.Form["Identities"];
if (IntListValidate(idList)) {
 
SqlCommand cmd = new SqlCommand("SELECT FullName FROM User WHERE ID IN(" + idList + ")");
}
else {
 
throw new InvalidDataException("The posted data contains illegal characters.");
}
...
private bool IntListValidate(string input) {
  for (int i = 0; i < input.Length; i++) {
   
if (!Char.IsDigit(input, i) && input[i] != ',') return false;
  }
 
return true;
}

Of course, here, you have to use a white list, digits and comma in this case. Not even space is allowed. That's pretty safe, but I wish you could do that with parameters.

Update 8/19/2004: Kyle Heon just pointed me to this great Sql article that explains just how to do that with a parameter. Thanks for the link, Kyle! So now, there's one less reason not to use parameters everywhere.
http://msdn.microsoft.com/en-us/library/aa496058.aspx

The second common type of injection attack is Cross-Site Scripting, or X-Site Scripting. Consider this simple piece of asp page:

Bonjour, <%= Request.Form("Name") %>.

What if the user enters <script>alert("5tUp1dH4k3R ownz you");</script> in the Name form field? Well, he successfully displayed an alert on his browser using your page. Nothing to be afraid about for the moment, as he'll be the only one to see it. But what if instead of directly displaying the user input we store it in a database for other users to see, in a forum application, for example? What if the parameter is passed in the querystring parameters of a hyperlink in a mail that seems to come from your bank?

Well, then a lot of nasty things can happen (by the way, these are real scenarios, stuff that happened and continues to happen every day). For example, the attacker can inject script that will post all your cookies or some confidential information that's displayed on the page to some remote site that he owns. This can include your social security number, your authentication cookies, your credit card number, or any sensitive information that may be displayed on the page.

It is usually relatively harmless for many sites because they just don't have any information that's confidential or that could allow for other, more dangerous attacks.

Once again, ASP.NET gives a first line of defense out of the box. Since v1.1, all form and querystring data is validated on the server before the page is executed. So the above example does not work on ASP.NET 1.1, it will just throw an exception. Now, this feature is sometimes deactivated by controls such as rich text editors.

The second thing we're doing, which is what you should do in your own pages and controls, is to HtmlEncode any property that comes from user input. That includes the value of an input tag that's rendered from a TextBox. It protects this particular textbox from script injections and also makes it robust against legitimate characters in the contents, such as quotes.

Rule #5: Encode all rendered properties that come from user input when rendering them.

The above example would then become:

Bonjour, <%= Server.HtmlEncode(Request.Form("Name")) %>.

There's another often overlooked rule:

Rule #6: Don't display any secrets in error messages.

ASP.NET limits by default the complete error messages to calls from the local machine. A complete error message sent to any machine can reveal table names or other secrets that could give clues for some attacker to use. And usually, an error message gives an indication as to how to make this application fail, which can be repeated and improved on the basis of all the information the error message contains.

And of course, probably the most important rule:

Rule #7: Encrypt any page that contains sensitive data.

Of course, these rules are all important and the order in which they are presented here is irrelevant. Did I forget something?

If you need more information, here's some more reading:

On Sql Injections: http://www.governmentsecurity.org/articles/SQLInjectionModesofAttackDefenceandWhyItMatters.php

On Cross-Site Scripting: http://www.net-security.org/dl/articles/xss_anatomy.pdf

And of course, Google is your friend.

UPDATE: I've used the word "quote" in this article for both the apostrophe (or single quote) and double quote. Todd pointed out in the comments that was somehow ambiguous. The point is that anything that can be used as a string delimiter, or as a delimiter in general, should be considered suspicious. Double-quotes are more frequent, but some languages such as JavaScript use both single and double quotes. SQL uses single quotes. Bottom line: beware of delimiters, and remember you may not even know the list of possible delimiters.

UPDATE: we just released new tools that aim at helping developers scan their code for potential injection attacks.
http://www.microsoft.com/technet/security/advisory/954462.mspx

The ASP.NET 2.0 page lifecycle in details
In his new and already excellent blog, Léo gives us a great diagram that details the ASP.NET 2.0 page lifecycle.
Needless to say, this is a poster you can now find in nearly all offices here in the ASP.NET team at Microsoft. Thanks for the great work, Léo!
Read it, print it, use it every day!
 
http://blog.rioterdecker.net/blogs/avalonboy/archive/2006/06/24/114.aspx
 
UPDATE: updated the links to the new locations for these resources. The poster would probably need some updating in particular where callbacks are concerned but it's still very useful.
What level of control do you need over the rendered HTML?
I'm answering a post from Dimitri Glazkov here. Dimitri tracked this back to my post about UI reusability. It's probably a good idea to read his post before you go on reading this if you want to understand what this is about.
 
In an architect's ideal dreamworld, I'd say you're absolutely right, Dimitri. In the real world, though, I'd mitigate this.
After all, that's what server controls are all about: abstracting the HTML rendering and substituting higher-level abstractions for it. The controls are not ethereal entities, and they need to have some level of control over their rendering to actually work. If you want to have complete control over the rendered HTML, the only thing you can do is output it yourself, and you're back to classic ASP (or PHP). So we should probably be somewhere between complete control and pages made of only server controls.
 
I'm sure you're aware of this, but I'll say it anyways for your readers and mine who may not be as advanced as you are in ASP.NET.
 
There are a few things you can do to control the rendering of ASP.NET controls:
- Use CSS (works with any server-side web technology)
- Use styles (and in particular their CssClass property to link them to CSS) (v1)
- Use templates, which give you total control over the HTML that's rendered by some parts of the controls (usually the ones that are the most visual and are not vital for the control to actually work). Templates rule! (v1)
- Know the control set: there is a fine granularity over the control you can have over the rendering just by choosing the right control. For example, DataGrid, DataList and Repeater are similar list controls that give you more and more control over the final rendering. (v1)
- Develop your own controls, from scratch or by inheriting from an existing one. This way, you can override part or all of the rendering code. (v1)
- Use themes and skins to isolate the general presentation of the site. Themes are more or less equivalent to server-side CSS: they act at the same level of abstraction as controls, and enable to set any property (hint: even TEMPLATES) of any control, site-wide or based on a skin ID. Themes are very easy to write as they have the same syntax as a page. (v2)
 
About adapters, you're right in mentioning that there is a yet unfulfilled potential there. But it may be not in their implementation but in their very use. They may be used for something else than just device adapting. I'll try to blog on that if I have time to experiment with the concept a little more.
 
Your point about the three roles in the designer is a good one and there may be things more or less along these lines in Orcas. But if you look at it as it is currently, we're already kind of there... You have the visual design view, for designers, you have the HTML view, for what you call the prototype, and you have codebehind for the actual plumbing of the page. Yes, the first two actually act on the same thing, but at a different abstraction level.
I do not understand your third role, though: why would theme development be the role of an advanced developer? I would have given this role to the graphics designer. Well, at least, the designer can determine the general look of the page and a developer can transform that into a theme.
Don't redirect after setting a Session variable (or do it right)
A problem I see over and over again on the ASP.NET forums is the following:
In a login page, if the user and password have been validated, the page developer wants to redirect to the default page. To do this, he writes the following code:
Session["Login"] = true;
Response.Redirect("~/default.aspx");
Well, this doesn't work. Can you see why? Yes, it's because of the way Redirect and session variables work.
When you create a new session (that is, the first time you write to a Session variable), ASP.NET sets a volatile cookie on the client that contains the session token. On all subsequent requests, and as long as the server session and the client cookie have not expired, ASP.NET can look at this cookie and find the right session.
Now, what Redirect does is to send a special header to the client so that it asks the server for a different page than the one it was waiting for. Server-side, after sending this header, Redirect ends the response. This is a very violent thing to do. Response.End actually stops the execution of the page wherever it is using a ThreadAbortException.
What happens really here is that the session token gets lost in the battle.
There are a few things you can do to solve this problem.
First, in the case of the forms authentication, we already provide a special redirect method: FormsAuthentication.RedirectFromLoginPage. This method is great because, well, it works, and also because it will return the user to the page he was asking for in the first place, and not always default. This means that the user can bookmark protected pages on the site, among other things.
Another thing you can do is use the overloaded version of Redirect:
Response.Redirect("~/default.aspx", false);
This does not abort the thread and thus conserve the session token. Actually, this overload is used internally by RedirectFromLoginPage. As a matter of facts, I would advise to always use this overloaded version over the other just to avoid the nasty effects of the exception. The non-overloaded version is actually here to stay syntactically compatible with classic ASP.
UPDATE: session loss problems can also result from a misconfigured application pool. For example, if the application pool your site is running is configured as a web farm or a web garden (by setting the maximum number of worker processes to more than one), and if you're not using the session service or SQL sessions, incoming requests will unpredictably go to one of the worker processes, and if it's not the one the session was created on, it's lost.
The solutions to this problem is either not to use a web garden if you don't need the performance boost, or use one of the out of process session providers.
More on web gardens: http://technet2.microsoft.com/WindowsServer/en/library/f38ee1ff-bdd5-4a5d-bef6-b037c77b44101033.mspx?mfr=true
How to configure IIS worker process isolation mode: http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/26d8cee3-ec31-4148-afab-b6e089a0300b.mspx?mfr=true
Thanks to Frédéric Gareau for pointing that out.
UPDATE 2: Another thing that can cause similar problems is if your server has a name that contains underscores. Underscores are not allowed in host names by RFC 952 and may interfere with the ability to set cookies and thus to persist sessions.
UPDATE 3: It appears like some bug fixes to Session have permanently fixed this problem. At least the one caused by the thread aborted redirect. Still, it is good practice to not abort the thread (and thus use the overload with the false parameter).
More Posts