Mark Brown

Nota Bene

Comment Spam and Blacklisting

Eric's “Killing Comment Spam“ post and Luke Hutteman's comment sparked a little adventure for me. Of course, it's not complete or very functional. It's more of a motivation piece to get something started. I might have to jump into the .Text source to see about implementing some sort of plugin. I know .Text (Community Server :: Blogs) is provider based so it might be fairly easy to implement this into the Comment Provider... if there is such a beast. Here's a little bit of test code that grabs the MT-BlackList from the Comment Spam ClearingHouse.

namespace BlackList
{
    class Check
    {
        public static void Main(string[] args)
        {
            ArrayList expressions = new ArrayList();
            expressions = BuildBlackList();
            if ( expressions.Count > 0 )
            {
                foreach (string expression in expressions)
                {
                    try
                    {
                        Regex pattern = new Regex(expression,
                         RegexOptions.Multiline|RegexOptions.IgnoreCase);
                        if (pattern.IsMatch("01-logo.com"))
                            Console.WriteLine("Found A Match");
                        pattern = null;
                    }
                    catch {}
                }
            }
            Console.ReadLine();
        }
        private static ArrayList BuildBlackList()
        {
            // In reality this file would be local and downloaded once a
            // day. Or, if we were in a web environment we could cache it.
            // But, hey, its just a demo to spark some thought.
            //
            string url = "http://www.jayallen.org/comment_spam/blacklist.txt";
       
            ArrayList expressions = new ArrayList();
            WebResponse response = null;
            try
            {
                WebRequest request = WebRequest.Create(url);
                if (request != null)
                {
                    response = request.GetResponse();
                    using (StreamReader sr =
                     new StreamReader(response.GetResponseStream()))
                    {
                        String line;
                        while ((line = sr.ReadLine()) != null)
                        {
                            if (line.Substring(0,1) != "#" )
                                expressions.Add(line);
                        }
                    }
                }
            }
            catch(Exception)
            {
                throw;
            }
            return expressions;
        }
    }
}

Comments

Scott Galloway said:

Very nice...I certainly plan to stick this into my .TEXT implementation this weekend...
# July 27, 2004 5:14 PM

Luke Hutteman said:

I've never really looked at Jay's MT-BlackList code (perl scares me:-) but I wonder if it would be faster to build and prepare one huge Regex from all the individual lines instead of going through them one at a time.

IMO, true integration of this in .Text should at least:
* keep track of it's own blacklist which users can add new regex's to that may not be in the master blacklist yet. Adding of new regex's should be done through a textarea instead of one-line-input so new ones can be mass-imported (by copy-n-pasting the master blacklist for instance)
* block comments and trackbacks on-the-fly if they match a Regex in the blacklist, giving the user some level of feedback in case of false positives (contact me at user[at]domain[dot].com if you think this is in error yada yada yada)
* give the user the option of checking the last x comments/trackbacks against the blacklist (very useful if you just got spambombed and all of a sudden have hundreds of new comments to get rid of; just add the spam-url and mass-delete all matching comments)

any particular reason for the "catch (Exception) { throw; }" btw? :-)
# July 29, 2004 4:31 PM

Mark Brown said:

Luke, Those are great suggestions for true integration. I wonder if it makes sense to build a tool that would integrate other blacklists and filters. Spam Assassin comes to mind and I'm sure there are others. The Bayesian filters won't be of much use but the content filters might fit into a solution nicely. Maybe some sort of Spam Assassin proxy.

Did you ever look at something and wonder 'What was I thinking...?" and not know the answer.
# July 29, 2004 11:46 PM

TrackBack said:

^_^,Pretty Good!
# April 10, 2005 8:29 AM

Resume said:

I think it'd be better if IP can be taken into consideration.

Also this function would also cost some computer capacity

# September 12, 2007 2:06 AM

Bob24 said:

What is your Country of Nationality? ,

# October 22, 2009 7:39 PM

Bella Smith said:

It is rather interesting for me to read that article. Thanks the author for it. I like such topics and everything that is connected to this matter. I definitely want to read more soon.

Bella Smith    

<a href="milanescorts.com/">escort agency milan</a>

# March 22, 2011 2:25 PM

Bella Swenson said:

Rather cool site you've got here. Thanx for it. I like such themes and anything that is connected to this matter. I would like to read a bit more soon.  

Bella Swenson    

<a href="www.pickescort.com/">escort cim london</a>

# April 15, 2011 3:35 PM

bustuva said:

some greate lines are here

# November 18, 2011 9:36 AM

camaropl said:

<a href=http://2yd.net/1jk>coffee shop millionaire reviews</a>

# January 9, 2012 10:04 PM

camarooo said:

<a href=http://2yd.net/1ji>fat burning furnace review</a>

# January 11, 2012 8:47 AM
Leave a Comment

(required) 

(required) 

(optional)

(required)