March 2004 - Posts
I have been fighting a bit with how to "correctly" parse DateTimes from RSS & Opml feeds. After much wrangling, I decided to go with the most strict form of parsing, whereby I assume the datetime is in RFC822 or RFC1123 format.
My Opml library uses Serialization, so I use the following technique for Serializing/Deserializing DateTimes:
/// <summary>
/// Indicates when the document was created
/// </summary>
[XmlIgnore()]
public DateTime DateCreated
{
get{ return _dateCreated;}
set{ _dateCreated = value;}
}
/// <summary>
/// Indicates when the document was created
/// In Universal Time.
/// </summary>
[XmlElement("dateCreated")]
public string DateCreatedUT
{
get
{
if (_dateCreated == DateTime.MinValue)
return String.Empty;
else
return _dateCreated.ToUniversalTime().ToString("r");
}
set
{
DateTime tmpDate = DateTime.MinValue;
if (value.Trim() != String.Empty)
{
try
{
// Ty to parse as an RFC1123
tmpDate = DateTime.Parse(value, CultureInfo.InvariantCulture.DateTimeFormat, DateTimeStyles.None);
}
catch
{
}
}
_dateCreated = tmpDate;
}
}
I store the DateTime internally in a LocalDate format, but Serialize/Deserialize it as UniversalTime. As a result, I run into two problems. 1) What do I store if DateTime parsing fails? (I chose DateTime.MinValue), and 2) What should I return if the DateTime value is invalid? (I chose String.Empty)
There must be a better way...any suggestions?
After my previous post/rant on the abuse of “pubDate” in RSS feeds, I received comments about and discovered other, often more eloquent, discussions of RSS's RFC822 dates. One of the more useful ones being Scott Mitchel's comment about RFC1123 being an extension of RFC822.
So, here are a few links I found helpful.
I'll update this post with more links and info as I have time.
I constantly see people write code like this for formatting numbers as strings, and wonder why:
int testNumber = 12345;
string myString = testNumber.ToString();
for(int i=0; i< 8 - myString.Length; i++)
{
myString = "0" + myString;
}
Especially when this is much easier:
int testNumber = 12345;
string myString = testNumber.ToString("00000000");
And you can do even greater things with:
String.Format(myFormatString, args[]);
I think a lot of this comes from people (like me) who came from VB6 and other primitive languages that didnt have String Formatting features built-in. Here are some links I frequently suggest for those interesting in learning:
Moral of the story; The .NET Framework gives developers alot of tools to do your job. If you ever start thinking, "Gosh, this should be wrapped in the framework", you will likely find that it already is. Obviously, I left-out another option; Regular Expressions, but that discussion is left for a day with more time.
As a follow-up to my previous post on the abuse of software patents, here, I wanted to share a couple links to sites fighting against Software Patents:
Foundation for a Free Information Infrastructure (FFI)
A European site that oposes software patents
League for Programming Freadom (LPF)
An organization that opposes software patents and user interface copyrights
Are there other more prominant sites/organizations that I am missing?
An often overlooked performance enhancement is to set the Capacity for a type. This leads to the following Best Practice recommendation:
"Always initialize a type with a capacity if it is supported. Lists and other collections often provide a "Capacity" parameter on a constructor-overload or a MinimumCapacity property. This capacity helps the type to better manage its initialization and prevent excessive re-initialization to "grow" the allocated memory during usage."
DataSets are a common offender for capacity-initialization related overhead. By setting the MinimumCapacity property on each contained DataTable, the overall performance of your DataSets could improve dramatically depending on your data. Unfortunately, many developers often see DataSets as a black-box and fail to look any further. As a result it is frequently blamed for memory or performance woes.
To solve this problem I often write code like:
void Main()
{
DataSet ds = new DataSet();
ds.Tables.Add("ParentTable");
ds.Tables.Add("ChildTable");
InitMyDataSet(ds);
// ... do Work ...
}
void InitMyDataSet(DataSet ds)
{
ds.Tables["ParentTable"].MinimumCapacity = 5;
ds.Tables["ChildTable"].MinimumCapacity = 200;
}
Or for better encapsulation:
private static void InitDataset(DataSet ds, int[] minCapacities)
{
if (ds.Tables.Count != minCapacities.Length)
throw new ArgumentOutOfRangeException("minCapacities", minCapacities.Length, "Invalid minCapacity array length!");
for(int i=0; i < minCapacities.Length; i++)
{
ds.Tables[i].MinimumCapacity = minCapacities[i];
}
}
The Technologist has an
excellent blog post explaining what happens when IIS handles a request for a page served by ASP.NET.
I just watched the live news-conference from NASA on the results of tests from the Mars rover “Opportunity”.
The basic gist is that they believe they have definitive proof that this section of Mars was covered with water for “some time”, and that it appears to show that Mars could have sustained life for that period.
Cool stuff!
Here is the official Link to the release.
I quickly recognized many of my pet peaves with DateTime problems from the recent MSDN article, "Coding Best Practices Using DateTime in the .NET Framework".
The single biggest offender in the category of "inaccurate DateTime usage" is RSS feeds!
Today, several popular & commonly used RSS class libraries introduce inaccurate dates due their impenetrable API's, and the way they are consumed by developers. The main cause is that the RSS specification requires dates to be expressed as an RFC 822 date/time.
ex.
Mon, 11 Mar 2004 10:42:02 CST
This is just a different string-expression for a date/time that includes the Timezone. This isnt a problem in itself but can become a problem because most RSS library API's use a standard DateTime type, but use parameter names such as "pubDate" which doesnt explicitly tell the developer that RSS date-times must be timezone-specific. So, after the RSS feed is created, and output as XML, it takes a keen eye to notice that the DateTime says it is in GMT, but in fact is incorrect to the tune of 6 hours (this .Text weblog engine is a perfect example). This happens because the .NET Framework doesnt provide a built-in converter for RFC822, so RSS libraries often use a format-string that forces the timezone to GMT or UT:
private const string DateTimeFormatString = "ddd, dd MMM yyyy HH':'mm':'ss 'GMT'";
instead of:
private const string DateTimeFormatString = "ddd, dd MMM yyyy HH':'mm':'ss zzz";
The RSS Library creators will simply say "Its the developer's fault for not converting to GMT", others will say "Its the .NET Framework's fault for not supporting RFC822". Both are true, but it is equally the fault of the RSS API implementation for not leading the eager developer to the correct implementation. As a result, most developers will start by creating a new instance of an RSS feed and begin adding new RSSItems based upon their local data-source without regard to timezone. Frequently this local data source is based upon local time, not GMT or UT (Universal Time) therefore you end up with skewed datetimes.
So, I would just like to add a Best Practice of:
Always make your DateTime variable-names explicitly state the DateTime type expected. For example, use "pubUTDate" or "pubGMTDate" instead of just "pubDate" for your parameter or property name. This will make your API's behavior much more transparent when dealing with DateTime values. Alternately, for RSS feeds, you should consider names such as Rfc822Date. Perhaps someone could do the work of creating a new Rfc822Date formatter, converter, and type to simplify this further.
More Posts