April 2005 - Posts

The two people that read my blog (besides my sister) know this already, but if you author a class library, please mark all classes deriving from Exception (or even better, ApplicationException) as [Serializable].  If I am using your library on the other side of a remoting channel, non-Serializable exceptions will cause me much heartache.

With Affection,

Your Customer.

 

I know that there are other samples of web scraping out there, but here's mine.  One of my customers asked me how to scrape our ASP.NET Web application, so I though that I might post the example code.  I like the viewstate regex - it's my first time using lookarounds in a regular expression.

using System;
using System.Text;
using System.Text.RegularExpressions;
using System.Net;
using System.IO;

namespace Dennany.WebScrape {
 
class MainClass {
   
    [
STAThread]
   
static void Main(string[] args) {

      try {
       // Modify as appropriate:
      
const string baseUri = "http://remotewebhost/webpagedirectory/";
       const string loginDlgUri = baseUri + "LoginDlg.aspx";
       
const string mainConsoleUri = baseUri + "Mainpage.aspx";
       
const string username = "myuser";
       
const string password = "p@ssw0rd";

      
// This cookie container will persist the ASP.NET session ID cookie
      
CookieContainer cookies = new CookieContainer();

      // perform the first http request against
     
// the asp.net application login dialog.
     
HttpWebRequest request =
        (
HttpWebRequest) WebRequest.Create(loginDlgUri);

      //get the response object, so that we may get the session cookie.
     
HttpWebResponse response =
       (
HttpWebResponse)request.GetResponse();
    
     
// populate the cookie container.
     
request.CookieContainer = cookies;
      response.Cookies =
        request.CookieContainer.GetCookies(request.RequestUri);

      // read the incoming stream containing the login dialog page.
      
StreamReader reader =
       
new StreamReader(response.GetResponseStream());

      string loginDlgPage = reader.ReadToEnd();

      reader.Close();

     
// extract the viewstate value from the login dialog page.
     
// We need to post this back,
     
// along with the username and password
     
string viewState = GetViewState(loginDlgPage);

      // build postback string
      
// This string will vary depending on the page. The best
     
// way to find out what your postback should look like is to
     
// monitor a normal login using a utility like TCPTrace.
     
string postback = 
       
String.Format("__VIEWSTATE={0}&txtUserName={1}" +
         "&txtPassword={2}&txtMessage=&btnOK=OK"
,
         viewState, username, password);

      // our second request is the POST of the username / password data.
     
HttpWebRequest request2 =
      
(HttpWebRequest)WebRequest.Create(loginDlgUri);

      request2.Method = "POST";
     
request2.ContentType = "application/x-www-form-urlencoded";
     
request2.CookieContainer = cookies;

     
// write our postback data into the request stream
    
StreamWriter writer =
      
new StreamWriter(request2.GetRequestStream());
    
writer.Write(postback);
    
writer.Close();

     request2.GetResponse().Close();

     // our third request is for the actual webpage after the login.
    
HttpWebRequest request3 =
     
(HttpWebRequest)WebRequest.Create(mainConsoleUri);
    
request3.CookieContainer = cookies;

     reader =
       new StreamReader(request3.GetResponse().GetResponseStream());

     // and read the response
    
string page = reader.ReadToEnd();

     reader.Close();

    // our webpage data is in the 'page' string.
   
Console.WriteLine(page);
 
}

  catch(Exception ex) {
   
Console.WriteLine(ex);
 
}
 
}

  // extract the viewstate data from a page.
 
private static string GetViewState(string aspxPage) {
   
Regex regex =
    
new Regex("(?<=(__viewstate\".value.\")).*(?=\"./>)",RegexOptions.IgnoreCase);

    Match match =
     
regex.Match(aspxPage);

    return System.Web.HttpUtility.UrlEncode(match.Value);
 
}
 
}
}
// EOF

Larry Osterman and Raymond Chen are constantly pointing out that third party developers use undocumented APIs and implementation side effects of the Windows APIs, and other OS internal details.  A common refrain among developers is that such things shouldn't be done, and Microsoft shouldn't be afraid to 'break' applications that dare to use undocumented features.

Unfortunately, I think that we are looking at the past through 'revisionist history'.  If I hit F1 in VS.NET today, or go to msdn.microsoft.com, I can easily find information on just about anything I want to do.  If MSDN is insufficient, I can hit Google, and I'm all set.  There are plenty of books lining the local Barnes & Noble, and I can always ping people like Yang Cao, or Paul Wilson if I need an answer.

But that's not how things have always been.  Back in the '90s, programming against windows wasn't so easy.  There was no google.  Microsoft API documentation was anemic, and often incorrect.  It was common for a programmer to have to figure out how an API worked through trial and error, and a bit of debugging.  On top of this, it was common for other Microsoft applications to use undocumented APIs, or use documented APIs in undocumented ways.  In fact, one analysis points out that Microsoft still does this.

Microsoft isn't the only party guilty of poor documentation - I've been on several software teams that have produced reams of useless docs, while still not addressing core user needs. I'm not picking on Microsoft documention, but I am pointing out that some things need to be put in historical context.

 

I really, really dislike Microsoft Visual Source Safe.  I've worked with it day in and day out for the last seven years.  I've come to learn its quirky ways, and for the most part I get by.  I do warn my employers that putting your source code in VSS is akin to putting your life savings in a wet paper bag in the back alley, but most PHBs see that it's 'free' with MSDN, and that's the end of the topic.

 

Over the course of years, I've seen vss databases 'corrupt' themselves.  This is often recoverable using analyze.exe, but not always.  I schedule regular vss database maintenance to prevent this, and I've been fortunate enough lately that nothing has happened.

 

Until today.

 

We use CruiseControl.NET and nAnt to run our continuous integration builds.  We get regular emails after checkin, and these usually assure developers that their code didn't 'break' the build.  We do get the occasional 'Build Failed' email, but these are easily rectified with a quick turnaround.  (Continuous Integration rocks!).

 

Today, I got a build failed email.  Well, not just one.  Twenty-Five 'Build Failed' emails.  They weren't terribly clear, but obviously something major was wrong:

 

 

ThoughtWorks.CruiseControl.Core.CruiseControlException: Source control operation failed: No VSS database (srcsafe.ini) found. Use the SSDIR environment variable or run netsetup. . Process command: C:\Program Files\Microsoft Visual Studio\VSS\win32\SS.EXE history $/polaris_Dev -R -Vd4/5/2005;2:22:44p~4/5/2005;2:52:52a -Yguest, -I-Y at ThoughtWorks.CruiseControl.Core.Sourcecontrol.ProcessSourceControl.Execute(ProcessInfo processInfo)  [etc, etc]

 

 

 

The CC.NET stacktrace was more informative:

 

4/5/2005 3:37:38 PM: [polaris:Error]: Exception: Source control operation failed: File "\\vss\vss\Active\polaris_Dev\data\b\bvsaaaaa.a" not found

. Process command: C:\Program Files\Microsoft Visual Studio\VSS\win32\SS.EXE history $/polaris_Dev -R -Vd4/5/2005;3:37:34p~4/5/2005;3:36:28p -Yguest, -I-Y

--

 

Anyone who's spent time with VSS should dread the familiar \data\a\aaaaaaaa.a error message.  This almost always points to database corruption.

 

We shut down our vss access immediately, and ran analyze on the database.

 

Analyze seems to have found the problem:

 

Writing a new copy of 'f:\vss databases\active\elink_dev\data\b\bvsaaaaa'.

 

But it unearthed another issue:

 

The file 'f:\vss databases\active\elink_dev\data\p\pjraaaaa' appears to be corrupt.  Unable to read the format or header.

 

So far, I’ve not been successful in fixing the above error, and it’s vexing me.

 

Sometimes I feel like I'm sitting on a vss sinkhole.  One day it's going to swallow all of our source code.

 

--

 

I wrote this for all of the people who say "I've never heard of a problem with VSS".  I've had several over the course of the last half-decade, and each one is a significant cause of distress.  Usually these problems manifest themselves in lost history, which many teams never notice because they don’t do historical builds.  Sometimes they blow up the database, and you’ve got to pull out the backups.  Neither scenario is fun, and I’d like to avoid them in the future. 

 

Perhaps I should print out my source code and put it in the file cabinet next to me.  That should keep my source safe.

 

 

I went to last night's C# user group.  Although I've been to a few DNUGs, I had not been to the C# group before.  I was not disappointed.

The pre- networking session was somewhat lively, with several people carrying on a discussion of the pros and cons of DotNetNuke.  It was interesting, as I don't normally pay a lot of attention to what is going on in ASP.NET space.

I also met the .NET Regular Guys, pretty cool.  Brendon has a picture of the back of Clark Allen's head.  Paul Wilson is the invisible guy sitting to the right of Clark.  It was really good to see Paul and Clark again.  I really don't get a chance to see them as often as I'd like.

Keith Rome had a really good presentation of Asynchronous Programming, with one of the best descriptions that I've seen on locking issues.  He's delivering the same presentation at the Atlanta Code Camp next month.  Don't miss it!

Also presenting on new c# 2.0 language features was Mitch Harpur.  My favorite part was his talk on preconditions on generics.  I wasn't aware that these were part of c# generics.
Mitch also did a good job of giving real life examples - so many talks on generics all seem to boil down to "Look, you can make a strongly typed Hashtable!"  With Mitch's experience with .NET 2.0, he was able to bring real-world scenarios to the table.

There's an un- attributed saying: "Behind every great man, there is a great woman."

I'm not "great", but I certainly owe a portion of my relative success to Mrs. Dennany.

So, Blessing, here's to your own success.  I'll be certain to be behind you for yours.

More Posts