String concatenation considered harmfull (or how is HTML different than XML with regards to creating it)

Note: this entry has moved.

 From Brad, I got to Craig's post about the bad things that happen when you build XML by string concatenation. His core statement is:
The moral of the story here is that if you find yourself doing something like this:
      xml = "<FOO>" + fooContents + "</FOO>";
then you should lose points on your programming license.
Now, I wonder how is XML different than HTML with regards to creating it. After all, you should probably be creating *X*HTML anyway, as it's the most compatible way of doing it. So, you should *also* lose points if you're doing the following:
output.WriteLine("<FIELDSET><LEGEND>Federation namespace list</LEGEND></FIELDSET>");
Guess where you will find **TONS** of code like that. This is so 1996 of them!!

6 Comments

  • Guilty as charged! And this is one of the things that I'd love to fix about FlexWiki, believe me...I never would have written it that way in the first place.



    Although I would argue that HTML and XML differ in that the primary consumer of HTML is typically a browser, which is already set up to deal with stuff like this. And unclosed tags are well within the HTML spec, right?



    The XML in question is meant as a generic export, presumably to be consumed by an XML parser. Unclosed tags are explicitly disallowed by the spec. So while valid XHTML would be better, FlexWiki isn't committing quite the same degree of sin...perhaps merely a web misdemeanor, rather than an XML felony. ;)

  • Hehee... I know you didn't write that code, of course :).

    But I'm not sure I agree completely with you with regards to HTML. The web (for authors and users) would be a much better place if browsers were not so forgiving (in incompatible ways almost always) with malformed HTML (which is what people usually complain about XML... not being able to be LAZY!). Imagine what would it be if XHTML were mandatory... web scrapping paradise!! Hahaa... (althought SgmlReader does an excelent job at converting to XHTML)

  • To be sure, it would be better if FlexWiki produced XHTML. And perhaps someday I'll get the cycles to do the massive overhaul of the formatting engine it would require. Certainly, when I get around to writing HTML export from my FlexWikiPad text editor control, it *will* be well-formed XHTML.



    I guess my point is that when emitting HTML via string concatenation, you're probably not rendering it useless, given its likely use. But when emitting XML via string concatenation, you likely are. So don't do it in either case, but if you *have* to do it, only do it with HTML.



    So I completely agree with you.

  • &quot;I guess my point is that when emitting HTML via string concatenation, you're probably not rendering it useless, &quot;



    No, you're opening yourself up to HTML script injection instead.



    [)

  • Well, in the general case you are not, unless you are concatenating with input from the client, which is not the case, usually. Rather, I've seen people take data from their storage (DB, XML files, etc.) and generate an HTML representation by using Response.Write and/or string concatenation.

  • That's a great point, Damien. Cazzu's point is valid, too, but in the FlexWiki case most of the input comes from the users (that being rather the point).



    Like I said, I've long known the FlexWiki formatting engine needs a massive overhaul. This is just another (and scary) reason. I'm not aware of any particular vulnerabilities (the input gets rather heavily processed on its way to being rendered), but there probably are a few.

Comments have been disabled for this content.