I have seen many versions of these and a lot of the time people are expecting that a bad word would be written complete, I.e. BADWORD. Sometimes they overlook the fact that others get hold of this rule and simply bypass by adding symbols in between, I.e. B*A*D*W*O*R*D. Of course this would not be recognized if simply searching the string for BADWORD.
This technique I have used here relies on a base list in XML. I have created a class which is called BarWordFilter and with this I use the singleton pattern. I do this because the class has to first compile a list of Regexs from the words inside the base XML File, and as I do not want a re compilation of these at every bad word check, I have opted for the singleton pattern.
for any word which is in the list the rendered pattern will follow a set trend. So if we look again at BADWORD, the regular expression I have come with would be as follows.
Hide Code [-] ([b|B][\W]*[a|A][\W]*[d|D][\W]*[w|W][\W]*[o|O][\W]*[r|R][\W]*[d|D][\W]*)
{..} Click Show Code
What I do is I create the pattern at runtime. I look for instances of lower or upper case, and ultimately anything which, if we ignore anything which is not a character, spells our bad word.
I have create a simple test page here to have a go. Please note I have only got the real serious words in the list for the purposes of this demonstration. I have not published this list as I do not think it is necessary. I have used a simple XML structure so please feel free to copy the code here, and generate as many bad words as you like <s>.
Example Page : http://andrewrea.co.uk/badwordfilter/Default.aspx
The BadWordFilter class
Hide Code [-]
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using System.Web;
using System.Xml;
/// <summary>
/// Summary description for BadWordFilter
/// </summary>
public class BadWordFilter
{
/// <summary>
/// These are the options which I use in order to determine the way I handle any bad text
/// </summary>
public enum CleanUpOptions
{
ReplaceEachWord,
BlankBadText,
ReplaceWholeText
}
/// <summary>
/// Private constructor and instantiate the list of regex
/// </summary>
private BadWordFilter()
{
//
// TODO: Add constructor logic here
//
patterns = new List<Regex>();
}
/// <summary>
/// The patterns
/// </summary>
private List<Regex> patterns;
public List<Regex> Patterns
{
get { return patterns; }
set { patterns = value; }
}
private static BadWordFilter m_instance = null;
public static BadWordFilter Instance
{
get
{
if (m_instance == null)
m_instance = CreateBadWordFilter(HttpContext.Current.Server.MapPath("listofwords.xml"));
return m_instance;
}
}
/// <summary>
/// Create all the patterns required and add them to the list
/// </summary>
/// <param name="badWordFile"></param>
/// <returns></returns>
protected static BadWordFilter CreateBadWordFilter(string badWordFile)
{
BadWordFilter filter = new BadWordFilter();
XmlDocument badWordDoc = new XmlDocument();
badWordDoc.Load(badWordFile);
//Loop through the xml document for each bad word in the list
for (int i = 0; i < badWordDoc.GetElementsByTagName("word").Count; i++)
{
//Split each word into a character array
char[] characters = badWordDoc.GetElementsByTagName("word")[i].InnerText.ToCharArray();
//We need a fast way of appending to an exisiting string
StringBuilder patternBuilder = new StringBuilder();
//The start of the patterm
patternBuilder.Append("(");
//We next go through each letter and append the part of the pattern.
//It is this stage which generates the upper and lower case variations
for (int j = 0; j < characters.Length; j++)
{
patternBuilder.AppendFormat("[{0}|{1}][\\W]*", characters[j].ToString().ToLower(), characters[j].ToString().ToUpper());
}
//End the pattern
patternBuilder.Append(")");
//Add the new pattern to our list.
filter.Patterns.Add(new Regex(patternBuilder.ToString()));
}
return filter;
}
/// <summary>
/// The function which returns the manipulated string
/// </summary>
/// <param name="input"></param>
/// <param name="options"></param>
/// <returns></returns>
public string GetCleanString(string input, CleanUpOptions options)
{
if (options == CleanUpOptions.BlankBadText)
{
for (int i = 0; i < patterns.Count; i++)
{
//In this instance we want to return an empty string if we find any bad word
if (patterns[i].Match(input).Success)
return String.Empty;
}
}
else if (options == CleanUpOptions.ReplaceWholeText)
{
for (int i = 0; i < patterns.Count; i++)
{
//In this instance we want to return a specified statement if we find any bad word
if (patterns[i].Match(input).Success)
return "The text contains unsuitable content";
}
}
else
{
for (int i = 0; i < patterns.Count; i++)
{
//In this instance we actually replace each instance of any bad word with a specified string.
input = patterns[i].Replace(input, "**Unsuitable Word**");
}
}
//return the manipulated string
return input;
}
}
{..} Click Show Code
The XML file which I have used is below. Dead simple, but does the job.
Hide Code [-]
<?xml version="1.0" encoding="utf-8" ?>
<words>
<word>bad word</word>
<word>ugly word</word>
<word>bla bla bla</word>
</words>
{..} Click Show Code
Cheers,
Andrew :-)