URL Rewrite – Multiple domains under one site. Part II
I believe I have it …
I’ve been meaning to put together the ultimate outgoing rule for hosting multiple domains under one site. I finally sat down this week and setup a few test cases, and created one rule to rule them all.
In Part I of this two part series, I covered the incoming rule necessary to host a site in a subfolder of a website, while making it appear as if it’s in the root of the site. Part II won’t work without applying Part I first, so if you haven’t read it, I encourage you to read it now.
However, the incoming rule by itself doesn’t address everything. Here’s the problem …
Let’s say that we host www.site2.com in a subfolder called site2, off of masterdomain.com. This is the same example I used in Part I.
Using an incoming rewrite rule, we are able to make a request to www.site2.com even though the site is really in the /site2 folder.
The gotcha comes with any type of path that ASP.NET generates (I’m sure other scripting technologies could do the same too). ASP.NET thinks that the path to the root of the site is /site2, but the URL is /. See the issue? If ASP.NET generates a path or a redirect for us, it will always add /site2 to the URL. That results in a path that looks something like www.site2.com/site2.
In Part I, I mentioned that you should add a condition where “{PATH_INFO} ‘does not match’ /site2”. That allows www.site2.com/site2 and www.site2.com to both function the same. This allows the site to always work, but if you want to hide /site2 in the URL, you need to take it one step further.
One way to address this is in your code. Ultimately this is the best bet. Ruslan Yakushev has a great article on a few considerations that you can address in code. I recommend giving that serious consideration. Additionally, if you have upgraded to ASP.NET 3.5 SP1 or greater, it takes care of some of the references automatically for you.
However, what if you inherit an existing application? Or you can’t easily go through your existing site and make the code changes? If this applies to you, read on.
That’s where URL Rewrite 2.0 comes in. With URL Rewrite 2.0, you can create an outgoing rule that will remove the /site2 before the page is sent back to the user. This means that you can take an existing application, host it in a subfolder of your site, and ensure that the URL never reveals that it’s in a subfolder.
Performance Considerations
Performance overhead is something to be mindful of. These outbound rules aren’t simply changing the server variables. The first rule I’ll cover below needs to parse the HTML body and pull out the path (i.e. /site2) on the way through. This will add overhead, possibly significant if you have large pages and a busy site. In other words, your mileage may vary and you may need to test to see the impact that these rules have. Don’t worry too much though. For many sites, the performance impact is negligible.
So, how do we do it?
Creating the Outgoing Rule
There are really two things to keep in mind.
First, ASP.NET applications frequently generate a URL that adds the /site2 back into the URL. In addition to URLs, they can be in form elements, img elements and the like. The goal is to find all of those situations and rewrite it on the way out. Let’s call this the ‘URL problem’.
Second, and similarly, ASP.NET can send a LOCATION redirect that causes a redirect back to another page. Again, ASP.NET isn’t aware of the different URL and it will add the /site2 to the redirect. Form Authentication is a good example on when this occurs. Try to password protect a site running from a subfolder using forms auth and you’ll quickly find that the URL becomes www.site2.com/site2 again. Let’s term this the ‘redirect problem’.
Solving the URL Problem – Outgoing Rule #1
Let’s create a rule that removes the /site2 from any URL. We want to remove it from relative URLs like /site2/something, or absolute URLs like http://www.site2.com/site2/something. Most URLs that ASP.NET creates will be relative URLs, but I figure that there may be some applications that piece together a full URL, so we might as well expect that situation.
Additionally, we don’t want this to apply to non-code pages like images, binaries, .axd files.
Let’s get started. First, create a new outbound rule. You can create the rule within the /site2 folder which will reduce the performance impact of the rule. Just a reminder that incoming rules for this situation won’t work in a subfolder … but outgoing rules will.
Give it a name that makes sense to you, for example “Outgoing – URL paths”.
It’s important that you create a precondition so that this only applies to text based pages, like .aspx. Running this on binary images will add needless server overhead and running this rule on some .axd files will cause errors when viewing your site.
Additionally, if you place the rule in the subfolder, it will only run for that site and folder, so there isn’t need for a HTTP_HOST precondition. Run it for all requests. If you place it in the root of the site, you may want to create a precondition for HTTP_HOST = ^(www\.)?site2\.com$.
To create the filter for just .aspx pages, select <Create New Precondition…> from the Precondition dropdown. Give it a name like “.aspx pages only” and add a Condition with an input of {SCRIPT_NAME} and Pattern of “\.aspx$”.
Save the Add Condition and Add Precondition dialogue boxes until you’re at the Edit Outbound Rule form again.
For the Match section, there are a few things to consider. For performance reasons, it’s best to match the least amount of elements that you need to accomplish the task. For my test cases, I just needed to rewrite the <a /> tag, but you may need to rewrite any number of HTML elements. Note that as long as you have the exclude /site2 rule in your incoming rule as I described in Part I, some elements that don’t show their URL—like your images—will work without removing the /site2 from them. That reduces the processing needed for this rule.
Leave the “matching scope” at “Response” and choose the elements that you want to change.
Set the pattern to “^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)”. Make sure to replace ‘site2’ with your subfolder name in both places. Yes, I realize this is a pretty messy looking rule, but it handles a few situations. This rule will handle the following situations correctly:
Original | Rewritten using {R:1}{R:2} |
http://www.site2.com/site2/default.aspx | http://www.site2.com/default.aspx |
http://www.site2.com/folder1/site2/default.aspx | Won’t rewrite since it’s a sub-sub folder |
/site2/default.aspx | /default.aspx |
site2/default.aspx | /default.aspx |
/folder1/site2/default.aspx | Won’t rewrite since it’s a sub-sub folder. |
For the conditions section, you can leave that be.
Finally, for the rule, set the Action Type to “Rewrite” and set the Value to “{R:1}{R:2}”. The {R:1} and {R:2} are back references to the sections within parentheses. In other words, in http://domain.com/site2/something, {R:1} will be http://domain.com and {R:2} will be /something.
If you view your rule from your web.config file (or applicationHost.config if it’s a global rule), it should look like this:
<rule name="Outgoing - URL paths" preCondition=".aspx pages only" enabled="true">
<match filterByTags="A" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" />
<action type="Rewrite" value="{R:1}{R:2}" />
</rule>
<preConditions>
<preCondition name=".aspx pages only">
<add input="{SCRIPT_NAME}" pattern="\.aspx$" />
</preCondition>
</preConditions>
Solving the Redirect Problem
Outgoing Rule #2
The second issue that we can run into is with a client-side redirect. This is triggered by a LOCATION response header that is sent to the client. Forms authentication is a common example. To reproduce this, password protect your subfolder and watch how it redirects and adds the subfolder path back in.
Notice in my test case the extra paths:
http://site2.com/site2/login.aspx?ReturnUrl=%2fsite2%2fdefault.aspx
I want to remove /site2 from both the URL and the ReturnUrl querystring value. For semi-readability, let’s do this in 2 separate rules, one for the URL and one for the querystring.
Create a second rule. As with the previous rule, it can be created in the /site2 subfolder. In the URL Rewrite wizard, select Outbound rules –> “Blank Rule”.
Fill in the following information:
Name | response_location URL |
Precondition | Don’t set |
Match: Matching Scope | Server Variable |
Match: Variable Name | RESPONSE_LOCATION |
Match: Pattern | ^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*) |
Conditions | Don’t set |
Action Type | Rewrite |
Action Properties | {R:1}{R:2} |
It should end up like so:
<rule name="response_location URL">
<match serverVariable="RESPONSE_LOCATION" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" />
<action type="Rewrite" value="{R:1}{R:2}" />
</rule>
Outgoing Rule #3
Outgoing Rule #2 only takes care of the URL path, and not the querystring path. Let’s create one final rule to take care of the path in the querystring to ensure that ReturnUrl=%2fsite2%2fdefault.aspx gets rewritten to ReturnUrl=%2fdefault.aspx.
The %2f is the HTML encoding for forward slash (/).
Create a rule like the previous one, but with the following settings:
Name | response_location querystring |
Precondition | Don’t set |
Match: Matching Scope | Server Variable |
Match: Variable Name | RESPONSE_LOCATION |
Match: Pattern | (.*)%2fsite2(.*) |
Conditions | Don’t set |
Action Type | Rewrite |
Action Properties | {R:1}{R:2} |
The config should look like this:
<rule name="response_location querystring">
<match serverVariable="RESPONSE_LOCATION" pattern="(.*)%2fsite2(.*)" />
<action type="Rewrite" value="{R:1}{R:2}" />
</rule>
It’s possible to squeeze the last two rules into one, but it gets kind of confusing so I felt that it’s better to show it as two separate rules.
Summary
With the rules covered in these two parts, we’re able to have a site in a subfolder and make it appear as if it’s in the root of the site. Not only that, we can overcome automatic redirecting that is caused by ASP.NET, other scripting technologies, and especially existing applications.
Following is an example of the incoming and outgoing rules necessary for a site called www.site2.com hosted in a subfolder called /site2. Remember that the outgoing rules can be placed in the /site2 folder instead of the in the root of the site.
<rewrite>
<rules>
<rule name="site2.com in a subfolder" enabled="true" stopProcessing="true">
<match url=".*" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{HTTP_HOST}" pattern="^(www\.)?site2\.com$" />
<add input="{PATH_INFO}" pattern="^/site2($|/)" negate="true" />
</conditions>
<action type="Rewrite" url="/site2/{R:0}" />
</rule>
</rules>
<outboundRules>
<rule name="Outgoing - URL paths" preCondition=".aspx pages only" enabled="true">
<match filterByTags="A" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" />
<action type="Rewrite" value="{R:1}{R:2}" />
</rule>
<rule name="response_location URL">
<match serverVariable="RESPONSE_LOCATION" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" />
<action type="Rewrite" value="{R:1}{R:2}" />
</rule>
<rule name="response_location querystring">
<match serverVariable="RESPONSE_LOCATION" pattern="(.*)%2fsite2(.*)" />
<action type="Rewrite" value="{R:1}{R:2}" />
</rule>
<preConditions>
<preCondition name=".aspx pages only">
<add input="{SCRIPT_NAME}" pattern="\.aspx$" />
</preCondition>
</preConditions>
</outboundRules>
</rewrite>
If you run into any situations that aren’t caught by these rules, please let me know so I can update this to be as complete as possible.
Happy URL Rewriting!