A quick note on security and anti-spam tactics that take advantage of human pattern matching abilities...
Okay, so the BlogX engine now has a security word. Well, that is fine I guess. These strips take advantage of the ability of the human mind to process patterns and make out words in a distorted image. So I'll start by saying, I've failed the recognition test 5 or 6 times on the same form before. The whole darn process becomes guesswork as the images become more and more distorted and the spammers get more highly qualified processing software. Eventually we won't be able to process the images ourselves and the test will be that if the correct answer is given, then the user must be a spammer.
Some issues I have with the pattern matching anti-spam measures.
- False sense of security - I remember a few years ago while I was working at Microsoft, that one of the employees there had actually written a bot that was able to process the images, and submit entries. If I recall the entires were somehow linked to getting a small payout (possibly Paypal?), and the security mechanism was in place to simply prevent users from submitting thousands of entries and therefore turning the small money into an actual pay-day. Well, the false sense of security the company had in their system would have cost them dearly.
- I can't read them half the time - Half the time I can't read them. I actually wrote a small processing application that I will be using to post comments to Chris's blog from now on, since I couldn't read the image supplied to me. Maybe this won't always be the case, but in the case of the word I was given, I simply couldn't read it.
- They suck for International Users - The features require not only the human ability to pattern match, but also the human ability to understand a written language. That means they suck for children who are capable of reading well-formed text, but not obfuscated text, they suck for international users that might not even understand english, and they must really be a kick in the groin for users that spend 5 years learning english only to find out they can't make out the words. So much for all that money you spent on english classes.
Anyway, in the interest of getting rid of these devices I'll give the spammers a little start. If they weren't using .NET and GDI+ before, then they should be. After running the below, you still need an OCR program to pull out words. However, I have another piece of code that I use for non transformed fonts (hence the wavy lines that a lot of the sites are starting to use) that involves caching a bunch of font data and super-imposing it over the resulting text I get from something like the algorith below. It takes about 15 seconds unoptimized and gives you an 80% chance of getting the word right. If you hook it up to a dictionary, it'll add a dictionary look-up to see if the word is real, the problem there is they are starting to use random letters and numbers. The key there is they always use the same letter number formatting, so you know where to look for numbers and where to pattern match for letters. These in my opinion are completely inferior as they let me cut my sample matching to just numbers or letters.
using System;
using System.Drawing;
using System.Drawing.Imaging;
public class FilterWord {
private static void Main(string[] args) {
Image img = Image.FromFile(args[0]);
int delta = 3;
Bitmap b = new Bitmap(img.Width, img.Height);
b.SetResolution(img.HorizontalResolution, img.VerticalResolution);
using(Graphics gfx = Graphics.FromImage(b)) {
gfx.DrawImage(img, 0, 0);
gfx.Dispose();
}
// Clear Space
for(int i = 0; i < b.Height; i++) {
for(int j = 0; j < b.Width; j++) {
// Top/Bottom third Check
if ( i > (b.Height * .35) && i < (b.Height * .7) ) {
// Grayscale check
Color check = b.GetPixel(j, i);
if ( check.R == check.G && check.G == check.B ) {
// Color range check
if ( check.R > 10 && check.R < 100 ) {
continue;
}
}
}
b.SetPixel(j, i, Color.White);
}
}
// Clear dots
for(int i = 1; i < b.Height - 1; i++) {
for(int j = 1; j < b.Width - 1; j++) {
// Up 3
Color check1 = b.GetPixel(j-1,i-1);
Color check2 = b.GetPixel(j,i-1);
Color check3 = b.GetPixel(j+1,i-1);
// Mid
Color check4 = b.GetPixel(j-1,i);
Color check5 = b.GetPixel(j, i);
Color check6 = b.GetPixel(j+1,i);
// Down 3
Color check7 = b.GetPixel(j-1,i+1);
Color check8 = b.GetPixel(j,i+1);
Color check9 = b.GetPixel(j+1,i+1);
if ( check5.R < 255 ) {
if ( check2.R == 255 && check4.R == 255 && check6.R == 255 && check8.R == 255 ){
b.SetPixel(j, i, Color.White);
}
} else {
int surroundingDots = 0;
// Left Right
if ( check4.R < 255 && check6.R < 255 ) {
// surroundingDots++;
}
// Up Down
if ( check2.R < 255 && check8.R < 255 ) {
surroundingDots++;
}
if ( surroundingDots > 0 ) {
b.SetPixel(j, i, Color.Black);
}
}
}
}
b.Save(args[1], ImageFormat.Bmp);
}
}