Scott Van Vliet

Less Talk, More Rock

Search Engine Safe Urls in ASP.NET

I first came across Search Engine Safe (SES) URLs a year or two ago when Erik Voldengen and Bert Dawson created the CF_sesConverter Custom Tag for ColdFusion applications (http://www.fusium.com/index.cfm?fuseaction=home.buildmaster&bodyFuseaction=ses.intro). 

This tag would accept a modified Query String, which looked like a reference to a static URL, and convert the parameters and add them to the Query String (URL in ColdFusion) scope.  This process made it easier for search engines to spider dynamic Web pages and return them to the user in the search results.

Here’s an example of a standard ASP.NET URL and its SES equivalent.

Standard URL:
http://www.opgirlslearntoride.com/home.aspx?tab=371A6304

SES URL:
http://www.opgirlslearntoride.com/home.aspx/tab/371A6304

I always thought it would be cool if this would be doable in ASP.NET.  At first glance, I remembered that the Request.QueryString collection is read-only, thus you cannot add parameters at runtime.

After looking around the Web a bit, I discovered the RewritePath() method of the HttpContext class.  This method is used to rewrite the path of the request prior to page processing, and is actually used by ASP.NET in a cookieless session state.  Using this method, the URL can be rewritten before being processed by the Page methods and events, thus providing the ability to manipulate the URL Query String at runtime.

Based on this method, an SES URL can be converted into a standard URL using Regular Expressions and some string splitting:

public static string FromSesUrl(string path)

{

Match sesMatch = Regex.Match(

path,

"([\\/].[^\\/]*.aspx)(.*)",
RegexOptions.IgnoreCase);

 

string sesUrlBase = sesMatch.Groups[1].Value;

string sesUrlParams = sesMatch.Groups[2].Value;

 

string urlBase = sesUrlBase;

string urlQueryString = String.Empty;

 

       if (sesUrlParams.Trim().Length > 0)

       {

              sesUrlParams = sesUrlParams.Replace('\\', '/');

              string[] urlParams = sesUrlParams.Split('/');

              for (int idx = 1; idx < urlParams.Length; idx += 2)

              {

                     if (urlParams[idx].Trim().Length > 0)

                     {

                           urlQueryString += (idx == 1) ? "?" : "&";

                           urlQueryString += urlParams[idx];

                           urlQueryString += "=";

                           urlQueryString += (idx + 1 != urlParams.Length) ?

urlParams[idx + 1] : String.Empty;

                     }

              }

       }

 

       return Regex.Replace(

path,

"([\\/].[^\\/]*.aspx)(.*)",

urlBase + urlQueryString,

RegexOptions.IgnoreCase);

}


This method works great and quite fast.  Using this method, you can enable an ASP.NET application to automatically parse SES URLs and Rewrite them to standard URLs. 

However, this is only one piece of the puzzle.  Although you can use SES URLs, it does not make sense to go through your application and replace the existing standard URL references with SES URLs.  So, in order for this method to be effective, there needed to be a way to automatically convert existing standard URL references to SES URLs.  But how can this be done without rewriting source code?

Again, after looking around the Web, I discovered a pretty cool property of the HttpResponse class – the Filter property.  This Stream instance wraps around the HTTP entity body before transmission to the client.  Utilizing this property, I created a custom filter and added some code that would search through the response body and replace standard URL references with SES URLs.

public override void Write(byte[] buffer, int offset, int count)

{

       string sBuffer = Encoding.Default.GetString(buffer, offset, count);

       MatchCollection hrefMatches = Regex.Matches(

sBuffer,

SesRegexPattern.HrefPattern,

RegexOptions.IgnoreCase);

 

       if (hrefMatches.Count > 0)

       {

              foreach (Match match in hrefMatches)

              {

                     string href = match.Groups[match.Groups.Count - 2].Value;

                     if (Regex.IsMatch(href, SesRegexPattern.AspxPattern))

                     {

                           href = href.Replace(href, SesUrlUtil.ToSesUrl(href));

                     }

                     if (!Regex.IsMatch(

href,

SesRegexPattern.HttpProtocolPattern))

                     {

                           if (!Regex.IsMatch(

href,

SesRegexPattern.AbsolutePathPattern))

                           {

                                  href = Regex.Match(

this.Context.Request.Path, SesRegexPattern.CurrentPathPattern)

.Groups[1].Value + href;

                           }

                           sBuffer = sBuffer.Replace(

match.Value,

match.Value.Replace(match.Groups

[match.Groups.Count - 2].Value, href));

                     }

              }

       }

 

       byte[] bufferNew = Encoding.Default.GetBytes(sBuffer);

       this.BaseStream.Write(bufferNew, 0, bufferNew.Length);

}


This method works like a charm, and is also quite fast.

Finally, in order to tie these pieces of functionality together, I created an HttpModule that handled the BeginRequest event of the HttpApplication class.  This event handler first rewrites the SES URL to its standard URL and then adds the custom filter to the Response object of the current context.

It is also good to note that although this filter will rewrite SES URLs, it will allow standard URLs to pass-through without being altered.  Thus, implementing SES for ASP.NET is as simple as two easy steps:

1)      Add the Boardworks.Utilities.SearchEngineSafe.dll to your /bin directory of the ASP.NET application you wish to implement

2)      Modify the application’s Web.config file to include the following code:


<httpModules>

<add name="SesHttpModule"

type="Boardworks.Utilities.SearchEngineSafe.SesHttpModule,

Boardworks.Utilities.SearchEngineSafe"/>

</httpModules>


To see this module in action, check out the following site:
http://www.opgirlslearntoride.com

If you have any comments, or better ideas for SES URLs in ASP.NET, please drop me line!  Also, if you would like a copy of this project, please shoot me an email.  If there are enough requests, I will post the Visual Studio .NET Project to this post.

UPDATE:

There's a new version of this module, which includes a small fix to some errant debug code.  NOTE: This link was previously broken, and has been fixed - sorry about that!

http://www.scottvanvliet.com/downloads/ASP_NET_SES_1_0_2.zip

As always, feedback on this would be greatly appreciated!

Comments

brady gaster said:

Without looking too carefully at the code I ask the question - would it be able to handle situations in which you had a QS variable with an empty value - for example...

?personId=21&orderid=&userId=5
# May 4, 2004 8:55 AM

brady gaster said:

and please! post the code!
# May 4, 2004 8:57 AM

Scott Van Vliet said:

Thanks for your comments. Yes, the filter will handle empty values within a standard URL. It will turn the following URL into its SES counterpart below:

Standard URL:
/profile.aspx?personId=21&orderid=&userId=5

SES URL:
/profile.aspx/personId/21/orderid/+/userId/5

Note that the value of the parameter, orderid, is actually a white space character. This is inserted by the filter, and is required for the parameters to be parsed correctly. The reason for this is that when a URL is passed from the browser to the client, multiple slashes that follow each other are condensed into one slash. For example:

SES URL:
/profile.aspx/personId/21/orderid//userId/5

Converted Standard URL:
/profile.aspx/personId=21&orderid=userId&5=

So, in order to compensate for this condensing, the filter will add a UrlEncoded white space character as depicted above.

Please let me know if this answers your question.
# May 4, 2004 10:53 AM

Mario Davalos said:

It looks great. I just test the URL rewriting code and it works very good.

Can you post more code of the HTML Convertion filter?


Thanks.
# May 6, 2004 11:51 AM

Scott Van Vliet said:

The code listed in this post contains almost all of the code from the Request.Filter. If you would to see how this all fits together, send me an email and I will get you the source code.
# May 6, 2004 11:55 AM

Robert J Collins (DNN Core) said:

Hi,

I have been workigng on something similar and I have had issues with images that do not have fully qualified URL (i.e. images/image.gif vs. http://site/images/image.gif). Does this handle them as well?
# May 7, 2004 5:59 PM

Scott Van Vliet said:

Robert,

Yes, this filter automatically rewrites relative image URL paths to absolute paths.

Consider the following SES URL:

http://www.opgirlslearntoride.com/home.aspx/tab/371A6304">http://www.opgirlslearntoride.com/home.aspx/tab/371A6304

And let's say that the following image tag was in the Response body:

<img src="images/foo.jpg">

When the browser parses this <img> tag, the src="" attribute would be evaluated as:

<img src="http://www.opgirlslearntoride.com/home.aspx/tab/371A6304">http://www.opgirlslearntoride.com/home.aspx/tab/371A6304
/images/foo.jpg">

Since this is not desirable, the SesUrlFilter class will rewrite the src="" attribute of all <img> and <script> tags, as well as the href="" attribute of all <a> and <link> tags based on the Request.Path property, to the result below:

<img src="/images/foo.jpg">

It is important to note that there are some instances in which we would not want to rewrite these attributes. The SesUrlFilter class understands this, and will not rewrite these attributes if they are off-site references (i.e. <img src="http://www.foo.com/image.jpg">), email links (i.e. <a href="mailto:foo@bar.com">), Javascript statements (i.e. <a href="javascript:foo();">) and anchor references (i.e. <a href="#Top">).

There may be other instances that need accomodation, but these provide a baseline of exclusions.

Another caveat to this utility is that some browsers do not handle relative paths in Javascript when using the SES URLs. In Netscape, Mozilla and FireFox, this can be fixed by using the <base href=""> tag. However, this does not have the same effect in IE. Additional logic can be written in the filter to rewrite URLs in Javascript, but this would not be helpful when dealing with Javascript include files (which is most common in applications).





# May 7, 2004 8:30 PM

Heinz said:

Your code doesn't work if the web is using Forms Authentication.
If I access a page where I have to login first, I am redirected to my login page like so:

http://myserver.com/login.aspx?ReturnUrl=/loginrequired.aspx

The SesHttpModule turns this into the following:

http://myserver.com/login.aspx/ReturnUrl/loginrequired.aspx

This will not work, because this file does not exist and I am redirected to a 404 page after login.
# May 11, 2004 4:17 AM

Mike said:

I have done the following steps:
1) Add the Boardworks.Utilities.SearchEngineSafe.dll to your /bin directory of the ASP.NET application you wish to implement

2) Modify the application’s Web.config file to include the following code:

<httpModules>

<add name="SesHttpModule"

type="Boardworks.Utilities.SearchEngineSafe.SesHttpModule,

Boardworks.Utilities.SearchEngineSafe"/>

</httpModules>

However, it does not seem to be changing the url to a SES URL. Are there any other steps I need to take?

Thanks,
Mike
# May 12, 2004 10:12 AM

Scott Van Vliet said:

Mike,

Are you setting the Response.Filter property any where in your code? If so, then this would override the SesUrlFilter. Also, where are you putting the <httpModules> code in the Web.config? It should be under the <configuration><system.web> node.

In addition, I have posted a new version of the package. Download it at the URL below:
http://www.brdwrks.com/downloads/ASP_NET_SES_1_0_1.zip

This new package accomodates for HTML encoded "&" delimiters in URLs created by ASP.NET (i.e. &amp;).
# May 12, 2004 10:50 AM

Robert J Collins (DNN Core) said:

I am noticing some issues when there are spaces in a URL the override void Write method appears to be chopping off the querystring at the space then reattaching it after the write is completed so the following results:

This URL
default.aspx?this=that&name=Robert Collins&age=32

Becomes
defeult.aspx/this/that/name/Robert Collins&age=32

This in turn when click causes an error…

Any thoughts?
# May 14, 2004 12:34 PM

Scott Van Vliet said:

Robert,

Thanks for your post. I looked into this, and the reason the Query String parameter value is being cut off is due to the HrefPattern Regex. It currently accomodates for any actionable HTML tags (i.e. <A>, <IMG>, <LINK>, etc.) to allow double quote ("), single quote (') and no quote attributes.

For example, the following tags would all be valid with the filter:

<a href="default.aspx?this=that&name=Robert Collins&age=32">
<a href='default.aspx?this=that&name=Robert Collins&age=32'>
<a href=default.aspx?this=that&name=Robert Collins&age=32>

However, even without the filter, the unescaped Query String parameter value would give the same results as the SES value in the third example where the HREF attribute is not enclosed in quotes. HOWEVER, if using <asp:Hyperlink> controls, this is a non-issue, as the NavigateUrl must be enclosed in quotes.

To resolve this, you can update the SesRegexPattern.HrefPattern to this:

<(a|link|img|script|input|form).[^>]*(href|src|action)=(\"|'|)(.[^\"']*)(\"|'|)[^>]*>

Notice that the only change here is the exclusion of the \\s identified from the middle group of the HREF attribute pattern. This will now include white space characters in the Query String parameter value.

Please let me know if this helps, or if you have other questions.

Thanks!

Scott
# July 14, 2004 1:12 AM

TrackBack said:

# October 7, 2004 4:06 AM

TrackBack said:

# October 10, 2004 8:43 PM

TrackBack said:

# December 19, 2004 6:59 PM

TrackBack said:

# January 19, 2005 10:58 PM

Rafiq said:

There is one problem I am encountering during a postback from a drop-down list or other controls which have a re-edirect in their event. They are lost since the module does its own re-direct and the page events never happen.

Is there a work around this?

Thanks

# July 27, 2006 12:29 PM

teo said:

Just a question.

If my hosting support ASP.Net but my webpage has *.asp files clasic ASP you think i could use this to rewrite ASP?

Thank you

# August 26, 2006 2:08 PM

skillet said:

Rafiq,

I'm not sure I follow your question. The module rewrites the requested URL, which should not preclude an automatic PostBack from the DropDownList Control.

Teo,

Unfortunately, you will not be able to use a module written in .NET within a Classic ASP application, as these two platforms have different runtime environments.  However, you might be able to figure some type of hybrid solution by converting your Classic ASP pages to ASP.NET pages using the aspcompat=”true” Page directive.

Thanks for your comments!

# August 30, 2006 1:10 AM

Kishore said:

I am facing a problem when query string contains the special characters like &,; etc. Problem exists if querystring is encoded. It saying as bad request.

# August 10, 2007 2:48 AM

Neha said:

i have made a site search engine.

its not a google or MSDN SEARCH ENGINE

my problem is that it does not take query strings in the search results.

also i have to restrict some pages from unregistered users in search results...

can somebody help me on this...

# September 5, 2007 6:32 AM
Leave a Comment

(required) 

(required) 

(optional)

(required)