Stripping out HTML tags


Is somebody know the best way to remove HTML tags from a parsed document ?

I tried Regs expression but not so conclusive.

3 Comments

  • Here's how I do it (the variable 'HTML' is a String containing the HTML code):





    Dim regEx As New Regex("<[^>]+>")


    Dim Text As String = regEx.Replace(HTML, "")


  • If your HTML is not "simple" then RegEx will not work... (what if your HTML contains "<" etc.).





    Another option is to parse your HTML as XML with SgmlReader (available on GotDotNet) and then treat the XML as you wish (for instance using a default XSL Template will remove all tags...)

  • gljhgljh<hr>

Comments have been disabled for this content.