September 2004 - Posts

I think I've narrowed down the new Whidbey generic Stack class to using a linked array Performance: Linked Arrays's now and later. I figured, hey, they must have used the same features in the Queue right? Not quite, in fact the generic queue can actually be slower in some cases. You really have to understand how the old collections worked and how the new ones work. When testing the old collections you had two seprate code-paths that are important. First you have to test the collections with boxing, and that means using a value-type, such as int. Then you test them without boxing using a reference type. Just passing in an object will do for this second test. Generics should have the same performance no matter what we use, so we can just test a single code-path there (or both for completeness).

That gives us three tests, but we'll add three stack tests as well to show how the performance bars are completely different for the two collections.

Testing 100 values for 1000000 iterations
Generic Queue Usage: 00:00:11.6868048
Generic Stack Usage: 00:00:05.0773008
Queue Usage (int): 00:00:11.9672080
Queue Usage (Object): 00:00:10.6953792
Stack Usage (int): 00:00:08.5322688
Stack Usage (Object): 00:00:06.0286688

That is pretty amazing stuff. The Stack is definitely showing clear signs of being improved in terms of performance. I've already shown that in earlier articles, but the percentages between the various tests are important because you'd expect the Queue to share some of that attention to performance. Testing 100 values (enqueue and dequeue) the generic queue has slightly better performance than the boxed integer Queue... What? Hell, it should get quite a bit faster right? Even worse, the old Queue when using reference types is actually faster than the generic version. Something changed. Not sure what, but it is possible to use LinkedArray structures again to make the Queue almost as fast as the Stack. I say almost, because there are some extra features in the Queue, extra checks as you were, that take more time to resolve. As an example, you can see a multi-segment queue implementation of the Enqueue method. Each item in the linked list of arrays is of size growthSize, and we use this to allocate more arrays as needed rather than growing this value and creating increasingly larger arrays as we advance.

public void Enqueue( T value ) {
    if ( tail == growthSize ) {
        if ( tailList.Next == null ) {
            tailList.Next = new LinkedArray<T>( growthSize, null );
        }
        tailList = tailList.Next;
        tail = 0;
    }

    tailList.Data[tail++] = value;
}

Thats just the start of a performant collection, but you can start to see where the operations required are just about the same as those required for the stack, really bringing this collection into the same league in terms of performance. MS must have some data that people use stacks more than queues?

I love testing my code against the latest and greatest available in Whidbey. Turns out there are some collections that are insane when it comes to performance. One such collection is the generic Stack and of course the new Queue. Now both of these new collections are fast, and I mean really fast. Bare-bones do nothing code for the most part, just the minimum to make them work. At least that is what I produced in order to compare them. I figured I'd start with a LinkedArray implementation so that I didn't have to do a bunch of array copies as the stack grows.

public class LinkedArray<T> {
    public T[] Data;
    public LinkedArray<T> Next;

    public LinkedArray() : this( 10, null ) { }
    public LinkedArray( int initialSize ) : this(initialSize, null) { }

    public LinkedArray( int initialSize, LinkedArray<T> next ) {
        this.Data = new T[initialSize];
        this.Next = next;
    }
}

Again, the above is completely bare. We are using fields for extra speed, even though the JIT would most likely inline properties that return the vitals. Nice and generic so we'll get the same performance no matter what kinds of objects we are pushing and popping. To implement a stack, we'll actually use the linked arrays in reverse order. That way we can hold onto the bottom of the stack and the next pointer will actually point towards our previous a upper segment. In this way, we truly do grow downwards like a real stack, not that this matters very much. The only thing this really sucks at is ToArray, but being a personal collection, I'm not going to worry about that particular method. If I want to do a ToArray, I'd just return the collection in top down order in a larger array, rathe than reversing the entire thing.

public void Push( T value ) {
    if ( this.offset == 0 ) {
        this.growthSize <<= 1;
        this.stack = new LinkedArray<T>( growthSize, this.stack );
        this.offset = growthSize;
    }

    this.offset--;
    this.stack.Data[this.offset] = value;
}

We allocate at the top of the array so that we can do a delay allocation... Say we we just consumed the last element slot in our current array, but then we never add another element. By growing the array for data coming in, we ensure that we are going to use at least one element of the newly allocated segment. Pretty sweet. We do something similar for the Pop command, but what we want to prevent is the user popping the last element of a segment and then having us throw away the array. For that reason, during a Pop we position the offset right past the end of the used array. If we pop a second time, the array will be lost, but if we start to push elements again, we'll re-use that array and it won't be a loss.

public T Pop() {
    if ( this.offset == growthSize ) {
        this.stack = this.stack.Next;
        this.offset = 0;
        this.growthSize >>= 1;
    }

    return this.stack.Data[offset++];
}

That is only a partial implementation of a stack, but it is the most interesting part, and it took me a while to tune that code just right in order to outperform the actual Whidbey Stack implementation. I'm kind of assuming that they must be using linked arrays as well in order to perform at the same level as a linked array implementation. The most likely differences in the code would be in the ordering of elements in the array (I walk high->low they probably walk low->high) and the tuning metrics (I'm doing a *2 growth factor, they have better heuristics based on tuning by the performance team).

I did say Linked Array's now and later though didn't I? While I do battle with Whidbey, you can also make heavy use of linked array's today. Collections with random access potential don't tend to have the greatest use of linked arrays, but any time you can directly control which element the user has access to next, that is the algorithm that linked array's work well for. A basic stack implementation in V1.1 for intrinsic types is about 4 times faster than the equivalent collection classes and an object array based implemenation for storing any reference type objects is about 3 and a half times faster after you perform your casting on the way out. Hell, even boxing/unboxing integers into and out of the stack you wind up with a 3 times faster implementation. This just goes to show how much work has gone into these new generic collections, and for good reason if they are used throughout the rest of the BCL, ASP .NET, and Windows Forms.

If you take a look through Rotor you'll find a number of methods that have the signature TrySZ[...]. What these methods are meant to do is take intrinsic data types (int, float, long, etc...) and perform faster than normal operations on them. That is at least the hope. Another thing you wind up with are a bunch of conditional checks at the top of the files that tend to slow down the method if it is called often. Since we have been doing analysis of sorting algorithms, I figured, what the hell, might as well make sure all of our helper functions are as fast as possible.

If BinarySearch was a complicated function, you might be enticed to just use the available version. It isn't that difficult, and you can get some drastic performance improvements out of writing your own custom version. A custom implementation is about twice as fast as the built in version. Waiting for Whidbey might be a good idea, since intrinsics get faster, but if you check out that TrySZ[...] you can instantly note that generics won't make much of a difference... Why? Because TrySZ is taking your data back into unmanaged code and operating on it at an intrinsic level already. You are basically eating the cost of boxing only once, which doesn't have much impact. TrySZ is also skipping the use of the IComparer interface (unless you pass an IComparer in) so it is already as fast as using the generics version of the IComparer. The basic premise here, is that if our functional replacement is faster, it will more than likely remain faster even through version 2.0 of the CLR.

private static int BinarySearch(int[] searchArray, int low, int high, int start, int value) {
    while(low <= high) {
        if ( searchArray[start] == value ) {
            return start;
        }

        if ( searchArray[start] > value ) {
            high = start - 1;
        } else {
            low = start + 1;
        }

        start = (low + high) >> 1;
    }

    return ~low;
}

To explain the new algorithm a bit, we are using the same algorithm that the CLR does internally. We are inlining the comparison routines which makes a very small difference in performance. We might be able to twiddle the comparison order and see if we can get a bit more speed, but I won't worry about that for now. The method takes a start index. Where the initial binary search will begin. This is actually for an additional optimization that is talked about in Algorithms in C where you can tune the BinarySearch to be a bit faster in the case of regularly distributed data. I haven't changed any of the sorting methods to actually use the new tuning, but maybe later.

I removed all of the exceptional code-paths. Do you really need an ArgumenException instead of a NullReferenceException? What about an ArgumentOutOfRangeException in place of an IndexOutOfBoundsException? It may be nice, but it clearly isn't crucial to debugging the application if something goes wrong. The default exceptions will certainly be enough. Since I offset the initial start index, rather than compute it, you may also want to move the start computation to the top of the loop, and remove start from the parameter list. Up to you. This isn't a BinarySearch direct replacement either, as the method takes high instead of length. If you are passing length, then you'll need to do a high = low + length - 1.

The one thing I've taken away from the .NET Terrarium in terms of AI development is that players value their code over everything else. Protecting code from being viewed by other users is extremely difficult and hard to implement. After all, the focus of a distributed AI game is to pass the intelligence all over the place, to multiple machines, so it can operate across a large number of machines and potentially become the dominant system. If the code leaves your machine, it becomes possible that other users are going to steal your code, plain and simple. You can try trusting some other environments, perhaps a central server that hosts your code, but even then your code exists somewhere that you aren't in charge of basic protection.

Code as an Asset:
The development of new AI in the Terrarium was driven by the decompilation and examination of existing code. To show you how pervasive code dissassembly was, we introduced an Animal Farm, or location where users could share code. Even with this central repository of code, we constantly saw users working in ILDasm and ILAsm to produce verbatim copies of existing AI. The existing code as considered an asset by not only the developer, but also by the rest of the players. Controlling that asset meant possibly controlling the Terrarium's game world. Code is such an asset that users will go to any length to get their hands on it.

What is the best way to protect your assets? Well, don't create a situation where your assets fall into the hands of other users. Why do you think ASP .NET is so popular and Windows Forms is still riding coat tails? People can protect their code assets when it comes to server side coding, but find it much harder to protect code on the client. This is a common problem. So if you want code protection, you need to host your assembly either at a central server level or from your own machine. If you follow these guidelines then users will be more likely to participate and generate professional level AI while still protecting their investment in programming the code.

Execution Time as an Asset:
We've identified three places to run the AI. You can operate at the local client level, at the server level, or at a distributed client level. Identifying execution time as an asset, load balancing across the greater distributed network of all clients is going to yield the best results in terms of execution time. The resources are apparently limitless as long as you can maintain a large user base. In reality, users aren't always connected. The only 100% up-time resource is going to be your server. You can run AI on the server, but then you run the risk of not having enough execution time to schedule the AI as the game becomes more and more popular. You can offload this risk by allowing users to host their own personal servers and implementing relatively simple queueing routines.

The most limited location to run the AI is on the local client. Each user that produces AI hosts a networkable interface through which their AI can operate. This is the most resource consumptive option because it entails replicating a large amount of game state down to the local client and the propagation of AI commands back up to wherever the game is being hosted. Your AI is bound by the amount of processor time you want to give it. Further, you can only host a single instance of your AI at a time (potentially more in a more robust hosting environment).

Up-Time as an Asset:
Looking at our three models again, the natural next step is examining up-time. If you are programming an AI it is truly an investment of time and effort, but should it also be an investment of your own computer time? In the distributed model it up-time becomes a non-issue as long as the server is available to server AI assemblies down to distributed clients that do all of the processing. The server model is fairly identical in that your AI is playable as long as the server is up and people are selecting it as an opponent. The problem with local client interfaces is that your AI is only around as long as you are. You have to maintain an interface for people to connect through.

Can we solve the up-time problem for local only hosting? Yes, if we trust someone else to host our code for us. This is a sticky point, but we can share a hosting location with a number of friends, family, colleagues where we each host an AI off of the same box. This box might be a web server or similar online resource, and once the AI is loaded, it now acts as an AI interface in addition to it's other roles. A third option is the pay for play option where you AI is hosted by the game originator for a small maintenance fee proportional to keeping their server up and running with your AI on it.

Penalties:
This is really important and something a lot of people don't think about. How many different ways can code hose a machine? Plenty right? You can get stuck in an infinite loop and never return, you can run a machine out of memory, you can screw up a static initializer, you write finalizers in your code to hang the GC. Who should be penalized for bad code? If you allow AI to run on the server or distributed clients then everyone is going to pay the price for bad code. Forcing code to run at the local client level means only the original AI developer is going to pay the penalty for bad code.

Can you get around all of those issues in server or distributed hosting? Sure, but I offer the following advice. Fixing the hosting problems comes in the form of adding constraints. These constraints are based on usage scenarios and known patterns of attack. They leverage existing code protection features as much as possible. In the end, you have a system capable of doing nearly all of the things the original system could do, minus a select few things that are considered unsafe. From the other end of the spectrum, you can explicitly enable features one by one. This direction is safer, but much more limited. It is akin to a custom scripting language because you have full control over what the underlying code is capable of. .NET doesn't currently offer good facilities to make this a possibility. You can't explicitly enable access to only certain classes, and disable all other access. You can't hook into the memory management functionality and suppress creation of objects once the component has reached it's limits. They may be available as Whidbey roles around in the form of hosting protections, but that will be a code-path much more difficult to code than the average .NET programmer will be able to handle.

At the end of the day the penalties should probably be placed right back on the AI developer. In a real game, AI developers are just as responsible for bugs as the rest of the development team. In fact, if there is a bug in the AI, it often takes down the entire game. In a developer game the model is switched because the AI now becomes an unknown and the AI developer may not bear the brunt of poorly written code. In fact, there are issues with developer code in the Terrarium that only manifest after the code has traversed several machines and is widely available when an odd code-path is run through.

I'm leaning more and more towards a local client hosting model for developer based AI games for a number of reasons. I think I've mentioned the potential network models that allow AI and Networked players to operate almost transparently by taking advantage of a network protocol. Now stack the asset protection on top of that along with the ability to remove advanced code hosting features and you have yourself a definite winner.

I might do this every now and then in the hopes of getting together a few people that might be interested in doing some .NET RPG and MUD development. In general the community for using C# and VB .NET for these types of games is very small. The web variants are dominated by CGI and PHP while the client/server variants are still dominated by C/C++, Java, and Python. I think we have some great options for spec'ing out new technology using .NET rather than trying to combat these existing variants using their own tricks.

Integrating Information - There is discussion of trying to integrate many different out-of-game elements into the game in order to prevent players from having to exit the game to achieve a goal. Currently any large game consists of the main gaming interfaces in the form of a client and a large server cluster, a number of additional servers are allocated for servicing the marketing site, and then the information starts to flow with network outtage pages, game update information, and finally forums. Existing architectures jump through hoops to put all of the pieces together and mistakes are often made. The Forums system doesn't work correctly or doesn't scale, they can't find great ways to get information from the web into the game. These are all places where .NET tends to excel. It doesn't have to stop there though. The ability to use out-of-game systems in game can be really important. Enabling those through web services features where the client talks directly with the remote systems, means no additional load on your primary game system. You can add plenty of meta-gaming to keep long term playability a reality.

Alternative Levelling Systems - Everything comes down to experience points in games today. Especially games that feature character development. Games have come and gone that try to overcome the basic system with something ingenious, but nobody so far has gotten it right. One of the major problems is investing a large amount of money into a new system and have it go bust when you put it on the shelves. The logical location for new ideas is in protoypes that you can develop quickly and target groups of users to identify problems. .NET again comes to the rescue and allows you to quickly develop prototypes. Java is already taking the lead with progressive multiplayer games such as Puzzle Pirates. But there aren't many, so we have the time to catch up. Especially with advanced graphics through Managed DirectX to compete with Java's mastery of the 2D API's.

Creative Player Narratives - This is that last important discussion that has happened over the last month and it entails generating story-line content based around player actions within the game. I've been pushing this feature for a while, through a number of different sources, and the general consensus is that while it is cool, it isn't understood well enough to implement. While books and movies drive the gaming industry, it isn't understood how to reverse the concept so that games can drive the book and movie industries. It is starting to happen where books focused on the social and engineering aspects of a particular game are popular, but that clearly isn't storytelling nor is it a player narrative. I think the solution to the problem is going to be either a lot of code, or some very ingenious code refactored through a number of failing attempts. That spells .NET and C# to me for sure, since I can work a different implementation each week, whereas I might be limited to one implementation a month using C++. Taking an interesting step in this direction is Rumors .NET, but I'm interested in other concepts as well that have the same feature of building a player narrative or story whether it be inside or outside of a game.

Plenty of .NETophiles read weblogs, so I figure this is a choice audience to point out areas where .NET has little or no presence. We have the best and the brightest tuned in and ready to look for the next innovation. Well, these are defnitely places for innovation. The gaming industry doesn't want a piece of .NET yet, they keep finding it unsuitable with too many unknowns to use in a production system, so develop some smaller systems that prove differently and work into larger models.

Posted by Justin Rogers | with no comments
Filed under:

Well, not really. I found this link on MUD-Dev and thought it was stunningly hilarious, while at the same time finding it extremely accurate. Hopefully some others enjoy it as much as I do. It takes a while to read through, so allocate a good ten minutes, fifteen if you expect that you might laugh a bit.

http://thenoob.keenspace.com/d/20040608.html

Posted by Justin Rogers | 1 comment(s)
Filed under:

I'm starting to get all excited now. Before I go too much into Rumors .NET, the idea isn't entirely mine. During one of my projects working at Microsoft I had some chats with an excellent engineer on the concepts of messaging and rumor systems within games. That engineer was Mark Gabarra, and I told him that if I was ever able to get the time to publish or produce anything on the subject, I'd give him his due attribution.

The Meetings
Can't really call them meetings, more short term chats. The idea is simple, in that the people remember things, events travel, and people talk about whatever might be on their mind. The dynamics can be complex or simple and we really wanted something simple at first just to play with. To sum this up in a kind of gaming scenario, imagine that a monstrous invasion has occured in some area of the world. At this point in time, some scouts or messengers are dispatched to send the news to another town. By the time those guys get to the town or are killed several things will have happened. They may be unintelligible, they may have crossed far too much distance for the people to even care, the event in the remote town may be over, etc... You can start to see where NPC propagation of messages is important. Now, in that context the messages were all defined, and well defined, in order to be passed, perhaps with some properties that define how future NPCs will interact with them. Here is where the disconnect came for me and I wasn't able to produce anything interesting because the system was only as good as you were willing to spend time tuning the message protocol.

The Ponderings
Now, I've been working on the idea for at least a year now in sketches and other forms. Probably still only spent 5-10 hours, but spread that over a long period of time where I may have also pondered it while in the shower or burning time in the car while going somewhere. The idea still has great warrant and potential, but I just needed to find the right thing to tweak the system and make it independent of system generated messages and more dependent on player generated messages, or something I'm going to call rumors. Rumors are awesome because they embody the natural laws of communication without making any assumptions. What are some common assumptions?

  1. How much do I trust this person? Trust is an assumption that a person is going to tell the truth or lie.
  2. Requirements of belief. This is when you can't afford not to believe something.
  3. Shared vocabulary. This is where you know what the person is talking about and share vocabulary and definitions of words others might not understand.

Rumors break all of these down. In general, every time you hear a rumor, you examine it with as much as possible. You don't trust the source, or the source of the source, but rather question them both. You aren't required to believe rumors, because they are something extra. They do required shared vocabulary and context, but only in trying to understand them. The beauty of a rumor is the ability to not understand it, still take something away from the conversation, and not have any negative impacts from it. You see, understanding the rumor is unimportant, and nobody cares if the (woman/man) that fell in the (mud/puddle/elevator/stairwell) was wearing a (red/pink/brick) colored (dress/shirt/blazer).... We listen to rumors in most cases for shear enjoyment, but we also tend to be more accurate, more conniving when we run across rumors that hit close to home, perhaps about people we know. And this is where the system comes in.

The System
Rumors .NET is going to come with 2 problem sets that you'll get to set for all of the individuals involved. Normally, you'll host a town, and you'll be responsible for passing rumors to people in the town. Depending on how you set up your players, things will either go well or not. To add to the chaos, people will visit your town every now and then to pick up rumors. Call these people bards, merchants, Jay Leno, they are just out to get the dirt and find out some funny things to take back home. In effect your rumors may evolve and spread across several systems. Simple for now, so don't expect much. I may improve the logic later. The first set of attributes I call memory control, and the second set are relay control. I'll go more into them later, but for now they are listed.

  1. Intelligence, Forgetfullness, Vocabulary, Photographic, Adaptability
  2. Gossip, Social, Banter, Reminisce, Importance, Certainty

These will be outlayed on a specially designed UI that I'm showing off to the right. You can see the diagram is a basic polar coordinates or star graph. The nodes with drag handles are able to be changed, and are unlocked. Once you get something perfect, you can choose to lock the node so you don't mess things up. Every attribute will take from 0.0 to 1.0f points and currently there is no maximum to the number of points available. Later this may get changed.

I accidentally clipped the text, but each point is labelled. I'm not sure yet if I should add some input controls for manually setting the attributes as well? Maybe you guys can weigh in on that. I'm also not quite sure that the model shown is going to be very usable, so maybe you can weigh in on that as well. One thing to imagine is that eventually you may be capped to the number of points you can spend. Every time you change an attribute, you might conversely drop the others by some amount. So imagine that dragging one up say .1 points, will drop the other 4 down by say .025 points. Hopefully that explains why the star graph is being used and why locking is important.

I have some other UI concepts, but I welcome any input. Each game is going to consist of up to 16 or so characters per town, each of which will get controlled by the shown graph (of which there will be one for memory and one for relay control). The game itself is a sleeper, meaning it will run in real-time, so you'll be able to check each day and see if anyone has told you any rumors. Or you can proactively talk with various townsfolk to get the rumors they've accumulated. Again, expect nothing more than incoherent banter at first, but I suspect as each townsfolk builds a small dictionary of vocabulary, that they are able to pass rumors with much more efficiently.

Posted by Justin Rogers | with no comments
Filed under: , ,

I often use very small powers in approximating math functions used in gaming. Normally these are for AI activations where I'm computing a term that will be used in another expression, most likely a division, and all of my precision lies in the balance in this last expression. The speed of computing the power definitely matters, but if it is a bit off, that doesn't have nearly as much impact. First, I tend to compute powers of small integers. You can't get too large with these or you'll overflow. In most cases, IntPow and LongPow are sufficient though.

private static int IntPOW(int b, int e) {
    if ( e < 0 ) e = -e;

    for(int i = 1;;b*=b) {
        if ( (exp&1) != 0 ) i*=b;
        if ( (exp>>=1) == 0 ) return i;
    }
}

private static long LongPOW(long b, long e) {
    if ( e < 0 ) e = -e;

    for(long i = 1;;b*=b) {
        if ( (exp&1) != 0 ) i*=b;
        if ( (exp>>=1) == 0 ) return i;
    }
}

They don't support negative exponents, because that requires you return a floating point number, or somehow encode the number. For floating point arithmetic you can use DoublePow and SinglePow depending on the accuracy you require. No matter which function you choose, they all take integer exponents because of the trick or algorithm being used to compute the final value.

private static double DoublePOW(double b, int e) {
    int exp = ( e < 0 ) ? -e:e;

    for(double i = 1;;b*=b) {
        if ( (exp&1) != 0 ) i*=b;
        if ( (exp>>=1) == 0 ) return (e<0)?(1.0f/i):i;
    }
}

private static float SinglePOW(float b, int e) {
    int exp = ( e < 0 ) ? -e:e;

    for(float i = 1;;b*=b) {
        if ( (exp&1) != 0 ) i*=b;
        if ( (exp>>=1) == 0 ) return (e<0)?(1.0f/i):i;
    }
}

Most of these are fairly well tested. Because of the incremental approach we take there is precision being lost at a number of different levels. More complete approaches implemented in say the standard C compiler are much more accurate and accounts for the accuracy present in the CLR Math libraries. Don't expect much, use em at your own risk, and try to test a good range of expected values for adequate accuracy. Once you've decide the method running 10x faster is worth the drop in accuracy for whatever you are doing, drop me a line back and let me know they helped you out.

I realized there were quite a few APIs inside of the BCL that were using Math.Max and figured that couldn't be good, not good at all. Math.Max has more meaning when you are talking about floating point values, because of IsNaN checks. After all, if it is more than one line of code, then you are probably better off calling the method.

My tests focus on the concept of picking the largest element in a given array. This is quite popular in locations such as the DataSet, where you can get a Max/Min aggregate of a given column. Also popular in charting applications and spreadsheet style applications where you need to build boundings and all sorts of other goodies. I'm testing three methods, instead of just the two, since three are available. First, I'm testing Math.Max, which internally is just a tenerary operator wrapped in a function call. The function all aspect appears to be optimized away by the JIT... Good job JIT team! The second method is a basic if statement which turns out to be the fastest (because we aren't doing storage in every case). Finally you have a ternary operator thrown in to ensure we really are getting an inlined version of Math.Max earlier in the program.

using System;

public class MaxVersusIf {
    private static void Main(string[] args) {
        int[] values = new int[5000]; Random rand = new Random();
        int i = 0; int j = 0; int max = 0; int iterations = 1000000;
        for(i = 0; i < values.Length; i++) {
            values[i] = rand.Next(0, 100000);
        }
       
        DateTime start, end;
        start = DateTime.Now;
        for(i = 0; i < iterations; i++) {
            for(j = 0; j < values.Length; j++) {
                max = Math.Max(values[j], max);
            }
        }
        end = DateTime.Now;
        Console.WriteLine("Math.Max: {0}... Double Check {1}", end - start, max);

        max = 0;
        start = DateTime.Now;
        for(i = 0; i < iterations; i++) {
            for(j = 0; j < values.Length; j++) {
                if ( values[j] > max ) {
                    max = values[j];
                }
            }
        }
        end = DateTime.Now;
        Console.WriteLine("If Statement: {0}... Double Check {1}", end - start, max);

        max = 0;
        start = DateTime.Now;
        for(i = 0; i < iterations; i++) {
            for(j = 0; j < values.Length; j++) {
                max = (values[j] > max) ? values[j] : max;
            }
        }
        end = DateTime.Now;
        Console.WriteLine("Ternary Operator: {0}... Double Check {1}", end - start, max);
    }
}

The winner is the if statement by a land-slide, so it might be smarter to use that if you have a number of values you are checking, rather than using the built-in prototyping method in the Math library. Other functions, such as Min suffer the same performance hits, and you can probably better write the absolute value functions if you are careful of the MinValue overflow situations.

Math.Max: 00:00:07.3806128... Double Check 99976
If Statement: 00:00:05.5079200... Double Check 99976
Ternary Operator: 00:00:07.3205264... Double Check 99976

It has been a few years since I've seen the reports on how people read and more specifically how speed readers get so darn fast, but I recently got an email that demonstrated the concept rather well. Rather than explain the research study in plain words, it explained the research study in plain words! Only they weren't exactly plain. Each word consisted of the first and last letters completely in place, but the remainder were munged and out of place. Strangely enough you read the email without much thought as to the out of place characters. In fact the document is rather easy to read. As an example:

It has been a few yreas scine Iv'e seen the rortpes on how popele raed and mroe seifcpically how seped redares get so dran fsat, but I rlecenty got an eaiml taht darmtteonsed the cpeonct rtaher wlel. Rthear tahn elaipxn the rseearch sutdy in pailn wsdor, it elxpained the rseaerch sdtuy in pilan wsord! Olny tehy wrnee't eltxacy pnila. Ecah wrod ctsiseond of the frist and lsat lreetts cmpetolely in pacel, but the rienamder wree mngeud and out of pcela. Sengtraly eonugh you raed the eamil wtihout mcuh toughht as to the out of pclae ccehtarars. In fcat the deumocnt is rthaer esay to rade. As an eamxple:

Don't ask me for the program, since I don't find the process interesting enough to post readable code. What I will point out is that if, as humans, we really only use the first and last characters of the word initially, then there must be some other properties or features that help us quickly identify words. If we can find out what those are exactly, we can probably write better software. The people over at Google and Yahoo most likely have a nice jumpstart. Mathematically the first and last characters act as a reduction by allowing us to partition our list of words down into some number of groups:

26^2 = 676 possible combinations for the first and last characters.
470,000 words in the unabridged dictionary

As you can see, breaking down by first and last character doesn't buy us all that much in terms of reducing the problem set. Or does it? Humans don't really understand 470,000 words. Hell a bunch of those would seem pretty strange, even incorrect. We only use a few thousand really and so for the 676 possible combinations there are only a few words left by the time we are done parsing just the first and last characters. If we combine that with a relative word lengths maybe we don't even need the letters in the middle. There is probably more to the patterns, namely defining features of the character shapes, such as curvy, tall, short.

Why in the hell would I even bring this up? Well, I figured it played in with some of the other natural language posts like the telephone number to words algorithm. I'm also working on a neat little social game that involves a lot of dictionary processing to see how little virtual people pass around rumors. It is not very easy to allow a rumor to grow, shrink, be remembered, and/or affect its power on the various people of an area. I think understanding how we munge things mentally can help define some basic transforms, or perhaps I should call them compressions, that occur while people transfer and remember information. If you think of rumors as a fixed asset, only allowing a certain space for each person to store them, then only important properties will be stored. Other properties may remain, but be reduced. Each player may store a small dictionary and reduce words to look-ups. In the case of a redundant look-up the character has to guess based on context... Rumors .NET is low on my radar for now, but it isn't that much work. Its one of those games I go back to every now and then as I get the time and find it fun to play with the algorithms. If I get over the fact that the graphics suck I might give you all a glance.

More Posts Next page »