Natural Language & AI: Dissecting the process of reading.

It has been a few years since I've seen the reports on how people read and more specifically how speed readers get so darn fast, but I recently got an email that demonstrated the concept rather well. Rather than explain the research study in plain words, it explained the research study in plain words! Only they weren't exactly plain. Each word consisted of the first and last letters completely in place, but the remainder were munged and out of place. Strangely enough you read the email without much thought as to the out of place characters. In fact the document is rather easy to read. As an example:

It has been a few yreas scine Iv'e seen the rortpes on how popele raed and mroe seifcpically how seped redares get so dran fsat, but I rlecenty got an eaiml taht darmtteonsed the cpeonct rtaher wlel. Rthear tahn elaipxn the rseearch sutdy in pailn wsdor, it elxpained the rseaerch sdtuy in pilan wsord! Olny tehy wrnee't eltxacy pnila. Ecah wrod ctsiseond of the frist and lsat lreetts cmpetolely in pacel, but the rienamder wree mngeud and out of pcela. Sengtraly eonugh you raed the eamil wtihout mcuh toughht as to the out of pclae ccehtarars. In fcat the deumocnt is rthaer esay to rade. As an eamxple:

Don't ask me for the program, since I don't find the process interesting enough to post readable code. What I will point out is that if, as humans, we really only use the first and last characters of the word initially, then there must be some other properties or features that help us quickly identify words. If we can find out what those are exactly, we can probably write better software. The people over at Google and Yahoo most likely have a nice jumpstart. Mathematically the first and last characters act as a reduction by allowing us to partition our list of words down into some number of groups:

26^2 = 676 possible combinations for the first and last characters.
470,000 words in the unabridged dictionary

As you can see, breaking down by first and last character doesn't buy us all that much in terms of reducing the problem set. Or does it? Humans don't really understand 470,000 words. Hell a bunch of those would seem pretty strange, even incorrect. We only use a few thousand really and so for the 676 possible combinations there are only a few words left by the time we are done parsing just the first and last characters. If we combine that with a relative word lengths maybe we don't even need the letters in the middle. There is probably more to the patterns, namely defining features of the character shapes, such as curvy, tall, short.

Why in the hell would I even bring this up? Well, I figured it played in with some of the other natural language posts like the telephone number to words algorithm. I'm also working on a neat little social game that involves a lot of dictionary processing to see how little virtual people pass around rumors. It is not very easy to allow a rumor to grow, shrink, be remembered, and/or affect its power on the various people of an area. I think understanding how we munge things mentally can help define some basic transforms, or perhaps I should call them compressions, that occur while people transfer and remember information. If you think of rumors as a fixed asset, only allowing a certain space for each person to store them, then only important properties will be stored. Other properties may remain, but be reduced. Each player may store a small dictionary and reduce words to look-ups. In the case of a redundant look-up the character has to guess based on context... Rumors .NET is low on my radar for now, but it isn't that much work. Its one of those games I go back to every now and then as I get the time and find it fun to play with the algorithms. If I get over the fact that the graphics suck I might give you all a glance.

Published Thursday, September 23, 2004 10:52 PM by Justin Rogers

Comments

Friday, September 24, 2004 12:51 PM by nospamplease75@yahoo.com (Haacked)

# RE: Natural Language & AI: Dissecting the process of reading.

The algorithm you mentioned doesn't munge words with three letters or less. This means we get a LOT of context words (such as prepositions) to help make sense of the rest of it.

I think it's a combination of pattern recognition and context recognition. The first and last letters matter the most, but the letters in between still have to be the correct letters. You can't replace those arbitrarily.

Likewise, some words do slow reading down. Those are the longer ones without "tall" letters to help guide the eye.
Friday, September 24, 2004 5:13 PM by Justin Rogers

# re: Natural Language & AI: Dissecting the process of reading.

Well, this is a nationally renowned study. The qualifications for the algorithm were set by them, so I'm sure they are aware of 1, 2, 3, and sometimes even 4 letter symmetries.

My main concern was whether or not 2 letters and a dictionary of words you are very familiar with would be enough to reconstruct a message. I'll have a game ready that tests the theory using some algorithms. I'm going to start light with basic matching, but then add some context matching code if and only if it is required.
Tuesday, June 17, 2008 7:48 AM by pat

# re: Natural Language & AI: Dissecting the process of reading.

well thanks alot. im going to use this conept for my science assignment and your blog really helped! THANKS! i will give you credit. im not gonna use it as info or anything... i just like the concept

Leave a Comment

(required) 
(required) 
(optional)
(required)