A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...

Okay, so the BlogX engine now has a security word.  Well, that is fine I  guess.  These strips take advantage of the ability of the human mind to process patterns and make out words in a distorted image.  So I'll start by saying, I've failed the recognition test 5 or 6 times on the same form before.  The whole darn process becomes guesswork as the images become more and more distorted and the spammers get more highly qualified processing software.  Eventually we won't be able to process the images ourselves and the test will be that if the correct answer is given, then the user must be a spammer.

Some issues I have with the pattern matching anti-spam measures.

  • False sense of security - I remember a few years ago while I was working at Microsoft, that one of the employees there had actually written a bot that was able to process the images, and submit entries.  If I recall the entires were somehow linked to getting a small payout (possibly Paypal?), and the security mechanism was in place to simply prevent users from submitting thousands of entries and therefore turning the small money into an actual pay-day.  Well, the false sense of security the company had in their system would have cost them dearly.
  • I can't read them half the time - Half the time I can't read them.  I actually wrote a small processing application that I will be using to post comments to Chris's blog from now on, since I couldn't read the image supplied to me.  Maybe this won't always be the case, but in the case of the word I was given, I simply couldn't read it.
  • They suck for International Users - The features require not only the human ability to pattern match, but also the human ability to understand a written language.  That means they suck for children who are capable of reading well-formed text, but not obfuscated text, they suck for international users that might not even understand english, and they must really be a kick in the groin for users that spend 5 years learning english only to find out they can't make out the words.  So much for all that money you spent on english classes.

Anyway, in the interest of getting rid of these devices I'll give the spammers a little start.  If they weren't using .NET and GDI+ before, then they should be.  After running the below, you still need an OCR program to pull out words.  However, I have another piece of code that I use for non transformed fonts (hence the wavy lines that a lot of the sites are starting to use) that involves caching a bunch of font data  and super-imposing it over the resulting text I get from something like the algorith below.  It takes about 15 seconds unoptimized and gives you an 80% chance of getting the word right.  If you hook it up to a dictionary, it'll add a dictionary look-up to see if the word is real, the problem there is they are starting to use random letters and numbers.  The key there is they always use the same letter number formatting, so you know where to look for numbers and where to pattern match for letters.  These in my opinion are completely inferior as they let me cut my sample matching to just numbers or letters.

using System;
using System.Drawing;
using System.Drawing.Imaging;

public class FilterWord {
    private static void Main(string[] args) {
        Image img = Image.FromFile(args[0]);
        int delta = 3;
       
        Bitmap b = new Bitmap(img.Width, img.Height);
        b.SetResolution(img.HorizontalResolution, img.VerticalResolution);
        using(Graphics gfx = Graphics.FromImage(b)) {
            gfx.DrawImage(img, 0, 0);
            gfx.Dispose();
        }
       
       
       
        // Clear Space
        for(int i = 0; i < b.Height; i++) {
            for(int j = 0; j < b.Width; j++) {
                // Top/Bottom third Check
                if ( i > (b.Height * .35) && i < (b.Height * .7) ) {
                    // Grayscale check
                    Color check = b.GetPixel(j, i);
                    if ( check.R == check.G && check.G == check.B ) {
                        // Color range check
                        if ( check.R > 10 && check.R < 100 ) {
                            continue;
                        }
                    }
                }
               
                b.SetPixel(j, i, Color.White);
            }
        }

        // Clear dots
        for(int i = 1; i < b.Height - 1; i++) {
            for(int j = 1; j < b.Width - 1; j++) {
                // Up 3
                Color check1 = b.GetPixel(j-1,i-1);
                Color check2 = b.GetPixel(j,i-1);
                Color check3 = b.GetPixel(j+1,i-1);
               
                // Mid
                Color check4 = b.GetPixel(j-1,i);
                Color check5 = b.GetPixel(j, i);
                Color check6 = b.GetPixel(j+1,i);
               
                // Down 3
                Color check7 = b.GetPixel(j-1,i+1);
                Color check8 = b.GetPixel(j,i+1);
                Color check9 = b.GetPixel(j+1,i+1);

                if ( check5.R < 255 ) {
                    if ( check2.R == 255 && check4.R == 255 && check6.R == 255 && check8.R == 255 ){
                        b.SetPixel(j, i, Color.White);
                    }
                } else {
                    int surroundingDots = 0;
                   
                    // Left Right
                    if ( check4.R < 255 && check6.R < 255 ) {
                        // surroundingDots++;
                    }
                    // Up Down
                    if ( check2.R < 255 && check8.R < 255 ) {
                        surroundingDots++;
                    }
                   
                    if ( surroundingDots > 0 ) {
                        b.SetPixel(j, i, Color.Black);
                    }
                }
            }
        }
       
        b.Save(args[1], ImageFormat.Bmp);
    }
}

Published Sunday, June 13, 2004 8:09 AM by Justin Rogers
Filed under: , ,

Comments

Sunday, June 13, 2004 1:01 PM by denny

# re: A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...

Hmmm.... some very good points in there...

Sunday, June 13, 2004 1:06 PM by Justin Rogers

# re: A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...

Yes, truly a question of what you are gaining and at who's expense. I clearly think this new paradigm is a shift towards fraud protection at the expense of every user that has to fill out a form. So much for Gator and my automatic form filler.
Sunday, June 13, 2004 10:13 PM by I have to disagree on your second and third points

# re: A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...

I have written a custom CAPTCHA (the generic term for the type of human-detection code you abhor) for my browser-based game because after all there's not much point in the game if it becomes an exercise in who runs his script the longest. So for some purposes there's a choice between trying to foil scripts and shuting down your site entirely... Anyway, doing this since 2001 I have found the following:

- humans are remarkably good at pattern recognition. Most people miss a couple while they are starting out, but very seldom after they get the hang of it. (The penalty for missing in my game is a suspension of one hour so it's not the end of the world, and you can pay a game-currency fee to avoid even that.)
- anyone with good written English skills can read them. A significant portion of my user base does not speak English as a first language. (Actually, the people who have the most trouble are American school children; I use a cursive font to make things a bit more difficult for the would-be cheater, and apparently cursive is becoming something of a lost art in today's education system. So I link a cursive tutorial.)

And no, my captcha doesn't rely on adding noise to the image for the simple reason that, as you have shown, it only takes a few minutes to write code to strip that sort of stuff out. I've included the url if you're curious.

P.S. warning: The variable 'delta' is assigned but its value is never used
Sunday, June 13, 2004 10:15 PM by Jonathan

# re: A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...

wow, I flubbed that pretty nicely. url is nicely hyperlinked with my "name" in the above post.
Sunday, June 13, 2004 10:46 PM by Justin Rogers

# re: A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...

Where exactly do you implement your pattern system? I just messed around the site for about half an hour and didn't see one. I think using your limiting system is probably more than enough to stop the average bot.
Monday, June 14, 2004 10:36 AM by Jonathan

# re: A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...

yeah, it starts serving them to new players after 6h? I forget the cutoff. But here's a page with some examples:

http://www.carnageblender.com/challenge/help.tcl
Monday, June 14, 2004 7:58 PM by Justin Rogers

# re: A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...

Okay, so I've played the game a bit more and gone through a number of the challenges. I would dare to take the position that the idea of noone hacking this yet is absurd. You clearly used a fixed font set when creating these words, so with a glyph set of only 26, a dictionary file of only 3 and 4 letter words, and the ability to make annotations within the dictionary, the bot could be trained to do the scanning in less than 20 seconds per page, and if you ever repeat the same item (which I swear I've seen at least once so far, I've actually played quite a bit I guess ;-) then a very simple CRC to word lookup could be devised to answer questions on known samples in under a second.

About 8 years ago I did work for a small phone company branch (the phone company wasn't small, just the branch). The position of the company was to sell various services in previously untapped markets, so they tended to sell to third world countries. Now, being a small branch, they often got tasked with doing input jobs based on polls and collections they had taken. Generally the documents were printed on a set of very specific printers, each of which had specific glyphying problems. You start to see a problem in that the software based OCR's of 8 years go simply couldn't handle this type of data. Hell, it was hard enough to put the data in by hand, since you often times couldn't read the document without scanning and enhancing it anyway.

Long story short the process was very tedious. However, not creating a general solution like the OCR software was doing, I created a specific solution that identified the printer the document came off of by looking for tell-tale glyph abnormalities. Once this was done I could easily load the appropriate subset of matching glyphs for processing the document in question in much faster time than the OCR software and with many less errors. What had previously been a full day job of correcting OCR errors at the average of 25 per document, became a 30 minute job of correcting the 1 or 2 errors present in 1 of every 10 documents. Humans may be great at general OCR, but we sure can't touch a computer when it knows exactly how to expect the incoming data stream to look.
Monday, June 14, 2004 8:40 PM by Justin Rogers

# re: A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...

Quick correction, the number of glyphs is closer to 30 something, a couple of the letters have some strange deviations (the e for one). Randomizing the skew translation would stop a uniform glyph approach, but would make the words harder to read for human. I still like the concept of overlapping two words since the algorithm tends to have a hard time differentiating which pixels belong to which word. Still, it doesn't take long to converge onto a matching glyph with the current system.
Monday, July 05, 2004 3:01 PM by Phil

# bringing that post back to life

Hi everyone,

(first of all, excuse my English, I'm not a native speaker)

Seems like I'm a bit late on this topic but since I'm working on a anti-bot image generator at the moment, I thought I might get a few more informations here...

I quickly wrote a PHP-script that generates an image (http://www.blutch.net/checkimage/index.php), and even though its quite simply construction, I feel like it's not really easier to OCR'ize than Microsoft Passport's one, for instance (yet I feel like the characters are easier to read for a human). It merely uses random rotations, two different blurred fonts and a few random lines (whose utility hasn't convinced me yet).

You seem to know quite a lot on the topic, so you might want to give a few more recommendations on what I could do to improve the proofness (if that word exists) of the image.

Thanks in advance (if anyone ever reads this),
PS

(answer by mail in addition to posts here would be appreciated: blogs [ a t ] lar ampe·com (you know of course what to to with the whitespaces)).
Sunday, November 22, 2009 8:32 PM by Cornelius

# re: A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...

I read a few topics. I respect your work and added blog to favorites.

Thursday, December 17, 2009 6:22 AM by A quick

# re: A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...

[url:<a href="yahoo.com">urL</a>]

werwerwerwer

Leave a Comment

(required) 
(required) 
(optional)
(required)