Stripping out HTML tags

Monday, April 14, 2003

.NET

Is somebody know the best way to remove HTML tags from a parsed document ?

I tried Regs expression but not so conclusive.

3 Comments

Here's how I do it (the variable 'HTML' is a String containing the HTML code):

Dim regEx As New Regex("<[^>]+>")

Dim Text As String = regEx.Replace(HTML, "")

Phil Weber - Monday, April 14, 2003 3:14:00 AM
If your HTML is not "simple" then RegEx will not work... (what if your HTML contains "<" etc.).

Another option is to parse your HTML as XML with SgmlReader (available on GotDotNet) and then treat the XML as you wish (for instance using a default XSL Template will remove all tags...)

Christian Dehaeseleer - Monday, April 14, 2003 4:34:00 AM
gljhgljh<hr>

fasdf - Wednesday, July 21, 2004 5:46:00 AM

Comments have been disabled for this content.