March 2004 - Posts

Omar Shahine is doing some great work on his JPEG Hammer application.  He talks about finding some hot-fixes for the System.Drawing.Image class that provide overloads for disabling image verification.  While this is inherently unsafe, and you now have to have elevated permissions to use the class, I figured, what the hell, why not make this readily available to everyone without the need to install a hot-fix.  System.Drawing.Image Performance.

Note my method isn't any better than the hot-fix.  It does require private reflection permissions.  The reason for this is the absurd protection of the Image and Bitmap classes in System.Drawing.  I start by doing a simple Gdip load on the stream, then I use private reflection to instantiate a Bitmap over my loaded file.  This prevents the GdipImageForceValidation function from being called, and does all of the necessary internal fix-ups of the native image pointers inside of the Image class.  Note the method, EnsureSave, is not being called, so animated gifs might not load properly using this method.  Also, only bitmapped types will be loaded.  I could add additional logic to also load metafiles, but I don't see the point.  Enjoy!

using System;
using System.Drawing;
using System.IO;
using System.Reflection;
using System.Runtime.InteropServices;

public class ImageFast {
    [DllImport("gdiplus.dll", CharSet=CharSet.Unicode)]
    public static extern int GdipLoadImageFromFile(string filename, out IntPtr image);
   
    private ImageFast() {
    }
   
    private static Type imageType = typeof(System.Drawing.Bitmap);

    public static Image FastFromFile(string filename) {
        filename = Path.GetFullPath(filename);
        IntPtr loadingImage = IntPtr.Zero;
       
        // We are not using ICM at all, fudge that, this should be FAAAAAST!
        if ( GdipLoadImageFromFile(filename, out loadingImage) != 0 ) {
            throw new Exception("GDI+ threw a status error code.");
        }
       
        return (Bitmap) imageType.InvokeMember("FromGDIplus", BindingFlags.NonPublic | BindingFlags.Static | BindingFlags.InvokeMethod, null, null, new object[] { loadingImage });
    }
}

This was a surprise to me.  Almost every character set has an ellipses defined in the unicode range.  The ellipses is assigned to U+2026 or Alt+0133.  The actual name of this character is the Horizontal Ellipses and you can find it by launching the handy dandy Character Map utility that ships, hopefully, with every version of windows.  The idea that people go to extremes in order to create these is confusing.  I guess while you can't do much to change the appearance of a unicode based ellipses, you could define it as a textual element in just about anything and mess with the formatting until you got the appropriate *appearance* of size, etc...  This is similar to what Wayne has done in his entry Ellipsis, only he chooses a drawing technique to achieve his ellipses.

Note, there are many ways to make an ellipses.  The most readily apparent to those of us that blog, would be to use the ellipses character directly from the HTML view &#0133.  Just make sure the character set being used has such a character.  Another easy method is to use HTML and create a dotted border around an element that is properly sized to appear where you need it to.  This is more similar to Wayne's method.

Strangely enough, I'm pointing this out, because so many game engines don't properly use the ellipses for continuation.  Many of them either don't include it in their character set (many games use pre-generated graphics for printing text since it looks *cool*), or don't think to use it, instead inserting the three periods and taking up extra screen real-estate.  Even more strange is that most people actually think the ellipses is three periods rather than actually being it's own character.  I know I've been using extensive use of the three periods when writing blog entries.  When writing Word documents three periods will automagically get converted into the ellipses character.  I really do love that feature.

Posted by Justin Rogers | 7 comment(s)
Filed under: ,

Figuring out which events you are going to need for your creature to process can be very difficult within the Terrarium, especially for new users overwhelmed by the idea of having to optionally handle up to 10 events.  Some creature authors don't even hook an event and wonder why their creature is never called.  Here are the events, in order that they are called, along with their purpose and how often they are called.

Event Times Called Order Description
Load 1 1 This event is called at the beginning of your turn. It is always called unless you have been skipped, in which case none of your events are called. This event is best used to set up your processing state in order to prepare for the remainder of the events. Many people erroneously use this event in order to do other processing that later gets overwritten by processing inside of the remaining events.
MoveCompleted 0-1 2 If your move completed because you reached your destination, or you were blocked, then this event is fired. This event is best processed by storing the event args locally on your creature and processing movement in the Idle event along with the rest of the events. I highly recommend aggregating all data from events 2 through 9 so they can be processed by event 10 in one fell swoop. Note this event may or may not fire immediately following a movement action in your previous tick. This is a multi-round action and so the event can take multiple rounes before it fires.
AttackCompleted 0-1 3 Your previous attack action has completed. This will happen if you previously attacked. Attacking is a one round event, and so this event will fire on the tick immediately following an attack. You can check how much damage you were able to do, and if the target was killed, or was able to escape without any damage being done.
EatCompleted 0-1 4 Your previous eat action has completed. This will tell you if your eat action was successful. This is a one round even and will fire immediately following an eat action in your previous tick.
Teleported 0-1 5 If you get teleported, either locally or remotely, this will notify you of that event immediately following your insertion back into the processing queue. Note the only evidence of your teleport is that you were sent back to the local terrarium, or your were teleported remotely. This event may be fired after your creature was taken out of the queue for multiple rounds. The results of the LocalTeleport property will greatly change how you process this event.
ReproduceCompleted 0-1 6 This event tells you that your baby has been born. It gives you no information. However, this event, under normal circumstances, should fire IncubationTicks after you start your reproduction. This is normally 10 ticks, but under various circumstances can take much longer.
Born 0-1 7 This is a brand new creature, and you were just born. The sole purpose of this event is to pass you a byte[] representing your Dna from the parent creature. This event should probably be processed in-line inside of the event, since it is only called once and it is used to set up your initial state information for your creature.
DefendCompleted 0-1 8 If you tried to defend in the previous round then this event will fire. There is no notification of a success or failure on this action. Damage is computed within a random range based on your attributes, and so depending on the layout of the random numbers, you'll either take more or less damage.
Attacked 0+ 9 This event will fire once for every creature that attacks you in the previous round. You can use this event to get the Attacker, so that you can defend against them in the future. This is important, since if you don't get the attacker here, you might not see the attacker later in a Scan due to camouflage. On the following round, you will only be able to defend against a single creature, so choose carefully.
Idle 1 10 This is the final event called in the sequence. Like Load it is guaranteed to be called, so long as you aren't being skipped. I highly recommend that the results of all previous actions be stored and saved to be processed in this event. Users always run into concurrency problems when logic from various events overwrites the logical decisions of other events, and it is so much easier to process your creature in a linear and deterministic manner. I almost think that a special inherited Animal class could be developed that allowed for just this model.

I make some assertions in the descriptions of events that people might not like, but I will stick by them.  I think that the conditional events 2-9 should always be used (with the exception of Born), to set state locally on the creature.  Once the state is set, it should later be processed in Idle.  A large number of creatures developed do the exact opposite of this.  They instead put all of their logic in the Load event, and then later process all of the optional events.  This leads me to a list of common pit-falls to watch out for.

  1. Make sure to hook at least one event.  Some creatures forget to hook them (first time authors mainly), and then wonder whey they aren't getting called for processing.
  2. Make sure to process the TurnsSkipped property.  Your creature may be going over the time limit and you may never know.  Process the TurnsSkipped property so you can realize this.  Once your creature is tuned, you should process this anyway to make sure you aren't waiting on events that will NEVER fire.  Some creatures issue a move command and then wait for the MoveCompleted event.  This even can be skipped over during a skipping turn and your creature would never get notified.
  3. Don't do extraneous processing in optional events, that you are later going to overwrite in the Idle event.  Lots of authors wonder why the actions being set in their MoveCompleted event or Attacked event later get overwritten.  It is because they aren't taking into account the code running in their Idle event.  This is also a huge perf sink, since often creatures will run the same logic twice, from two locations in your code.
  4. Don't use the Load event for your main processing code.  If you do, you are going to lose all of the extra information from the optional events that could help you short-circuit code or better process your turn.  Also setting actions in the Load event can cause them to later be overwritten by code in your optional handlers.
  5. You have to hook the Teleported event.  Most authors don't bother because they don't think it is important.  Instead they rebuild their world state every single tick doing the same scans over and over again, not reusing information.  The extra processing time used scanning could be going into more complex processing algorithms.

That is all for now.  Enjoy!

Alex Campbell followed up with some comments on his blog that he saw the continued use of int.Parse as acceptable because of it's additional parsing abilities to determine if a number truly was an integer.  C# if(!FindFunction(IsNumeric)) { return "this language is retarded"; }, while I find this admirable, I can't see how possibly throwing exceptions could ever be considered a better method.  In the interest of providing faster numeric processing methods than are available I'll be parking a new number processing class under my articles, at DWC.Algorithms.NumberUtilities.

Why would I even bother writing these number processing functions to begin with?  Well, I create a large number of applications that have to process input from streams.  I also write a large number of XML applications.  And further, I use regular expressions quite often.  In all of these cases you have to process the string result into a number on the back-end.  This always has the opportunity to throw an exception.  If you have a few thousand numeric strings, and 50 of them throw an exception, you just killed the performance of your application.  Not to mention, int.Parse, even without exceptions, is VERY slow.  My current algorithm is approximately 4x faster than int.Parse when processing valid integers.  It is even faster when int.Parse would throw an exception, somewhere along the lines of 15-20x faster.  You can draw an average use case based on how many invalid integers you might parse and determine exactly how much more performant my algorithm would be for your use case.

Note the algorithm is complex, and I am only concerned with US english processing.  You should still use int.Parse if you want to numbers under different locales.  This is the first iteration of the algorithm, and while I tested against 40 billion numeric strings, it doesn't mean there aren't issues in the algorithm.  Let me know if you find any please, and if you actually decide to use this function, let me know in the comments, so I can help determine its beneficial impact on the community and further develop it for use in string->number processing.  (Sidenote: This improved a stream processing algorithm I was using from 41 ms down to 28 ms, and that was in the case of no invalidly formatted numbers).

Photon mapping is a somewhat new technique for computing the flow of light enery within a scene (I will attempt to use layman's terms and won't be defining any of the domain specific terms in this entry if I can help it).  I say somewhat new, because it has been around for a while in the research community, and only recently, has it acquired actual usage in real-time computer graphics in gaming.  I was actually somewhat stunned that it was being used to some degree in Halo 2.  Seems Adrian Perez, author of Advanced 3D Game Programming in DirectX 7.0 (later updated by Peter Walsh for 8/9), had a small chat with some college students, and the article was later published (thought I can't seem to find it, if someone wants to post it in the comments then please do).  The results were that various tools used by the modelers used photon mapping so they could get a better feel for what their graphics would look like in game.  Impressive to say the least.

Why was Adrian using photon mapping?  Well, it is faster than radiosity.  Radiosity was previously the number one lighting technique used in games.  But it is very slow, and so you generally run a radiosity tool to generate radiosity maps, rather than in real-time.  If they decided later to use radiosity instead, they still could, but the artists tools needed more immediate feedback if they were going to be useful and the artists were going to be productive.

The alternatives to photon mapping tend to be ray tracing and radiosity.  Both methods are tried and true, and I'm sure everyone has seen the results of these techniques.  They just take huge amounts of processing power that don't exist on end-user-desktops (at least not for real-time use).  The largest issue with ray-tracing is the fact that it isn't a global illumination algorithm (it doesn't properly compute all of the features of shadows, namely the penumbra or soft edge that exists the further something is from the surface it is casting the shadow on).  While the largest issue with radiosity is it's huge consumption of memory while processing.

Photon mapping overcomes these two issues by being lightweight (doesn't consume that much memory), and still providing a model for global illumination.  Don't get me wrong, it still isn't perfect, but it definitely fills a slot in the graphics pipeline quite well.  It achieves lightweight status by only storing primitive point information for each photon, namely direction, position, and some sort of color that demonstrates the power of the photon.  This is much smaller than the equivalent form of storing light using subdivided mesh patches for the objects in the scene (aka radiosity).

As the photons interact with the geometry in the scene, they are stored in a global collection, generally a BSP or KD tree for fast access.  They are accessed later to compute lighting while the scene is being rendered.  This means photon mapping is always going to be a two step process.  First you have to emit the light from the light-source, and then you have to fire rays from the observer's viewpoint to calcuate the effect of those photons on each pixel in the resulting image.  The space partioning tree speeds up the search for these photons later, and photons are sampled by averaging them together to get a final lighting sample.  Since we average photons together we get a softer shadow, and thus can simulate shadowing more accurately.

There are some issues with photon mapping:

  • The more photons you shoot the more accurate the scene is, so you tend to want a large number of photons.  The more photons the more processing power is used.
  • You need a large number of photons to accurately sample the light at a given point.  Marlon John points out in his book Focus on Photon Mapping, that he uses about 50.
  • Current techniques don't take very many short-cuts.  Photon mapping is meant to simulate the way photons traverse the natural world, but yet photon mapping doesn't intrinsically build in various laws of nature.
  • The more lights you have, the more photons will be emitted into the scene to be processed.
  • Materials have to be carefully designed so that photon interactions with the material are accurate.  This produces accurate surfaces, while improper material design can really mess up a scene.

Some things I'll be talking about to help augment photon mapping:

  • Photon ownership.  In nature photons are constantly being processed by a massive, highly parralel computing system.  Don't believe me do you?  Well, each time a photon interacts with an atom, the atom individually processes that photon, and decides what to do with it.  That means photons are individually being processed by their own CPU.  Once an atom is done processing a photon it gets to decide whether or not to release a new one (it doesn't decide, rather the energy computations involved decide, aka the massive super computer known as the laws of nature) and in what direction.
  • Locality of photons.  This bridges from the first assertion.  Once a photon hits an object that object determines the future of the photon.  Current photon mapping techniques compute the bounce and continue going.  I'll be talking about how you can gain performance by culling the processing of photons that wouldn't change the scene (note that you can compute the photon mapping for a scene once and re-use it, assuming nothing moves within the scene and nothing about the emission of photons changes).
  • SOL Rendering Principles.  Turn on a light in a room and it takes some fraction of time (even if this is really fast) for the room to *balance* the flow of light within the room.  This isn't just our eyes adjusting (though that is one thing that is happening), but also the concepts that photons only travel so fast, and as they interact with things on the atomic level, it takes a while for the flow of light energy (flux, so I don't have to keep saying flow of light energy), to balance.  This leads us to an optimization where we can slowly build the photon map over time, and re-use previous results.  This will mean the quality of the scene will get better over time, depending on how much process power we have, this could be instantaneous, or it might take a seconds worth of frames to reach parity.  The human eye will appreciate the increased detail in shadows and reflections.
  • Lights or Emissive properties.  Lights are generally defined in a 3D environment.  But I hate lights, since they represent some pseudo-element.  I prefer that the actual object be emissive (aka a lightbulb), rather than declare an overlapped light on top of the model of a light bulb.  Or in most cases, you can't even see the light bulb model.  I have some great optimizations that take into account dynamic lights such as flashlights, strobes, and other types of extremely dynamic lighting that can really add to the atmosphere of the scene.  Currently in 3D gaming, the user doesn't appreciate when the scene is too interactive for them to be able to clearly interpret the image, but Hollywood makes use of this technique all the time to provide an atmosphere.  One of my first photon mapping demos will be a nice techno club scene.
  • Instance lights, continuous lights, and turning lights off.  Changing the flux in an environment is easy.  You just have to change the light source.  Some light sources are extremely continous, some are constantly changing.  Notifying the photon map of the origin of various photons is important.  This lets us tell materials that the photon affecting their surface should either continue effecting (building up our LOD) the material in question or whether it should hold only a specific few photons from that source (a queue so that older photons fade off, and possibly a timed queue, so that if the source stops emitting, eventually all the photons are lost).  My example for this will be a television set.  While I can't render this example in real-time, I expect to be able to eventually.  As the television changes picture, the entire scene oustide of the TV changes as well.

The biggest question is whether or not photon mapping is here to stay?  I really think so.  I mean look at all of the great optimizations that still exist for the algorithm (or at least the optimizations that I've noted from personal experience against currently published examples of the algorithm).  Photon mapping for me is a passion, but I have a lot of them.  And like most passions, they take a back burner to more realistic goals.  Hopefully I find enough time to continue talking on this subject, since this was a thoroughly enjoyable article for me.  Ask lots of questions in the comments, or make comments of your own.  I'll definitely make sure to respond.

The algorithm for grabbing samples in the System.Random class comes from Numeric Recipes in C, Second Edition.  They've made some small changes to how the algorithm is run, to either improve performance, or improve the random characteristics of the resulting data.  However, they did change some key lines, or what might be considered key lines (according to Knuth).

Currently, I have a couple of problems with the random number generator.  It is virtual, so there is some call overhead.  They are using a possibly unproven method (not really, but there isn't as much testing on this method as some others).  It is pretty fast, I do have to give them that.  Knuth's method is one of the fastest RNG's that actually produces decent random results over a large period.  Here are the items I want to improve on over the next week:

  • Additional options: I'd love to provide 5-10 different random number generators of varying speeds, randomness, and period repeats.  I'll be writing each method in C#, along with any statistical data detailing how strong the algorithm is and how fast it is.  I'll try to include pitfalls for using the algorithm or invalid seeding notes, but only if I find reference to them on the Internet (hint: this is not a full examination of random numbers, but instead an attempt to provide randon number options for people making games or doing simulations).
  • Theory of Randomness: I'd love to provide some good history on the theory of randomness.  Just having an algorithm often isn't enough if you don't know why you need to go to such lengths in order to get random numbers.  In most cases linear congruential generators would appear very appealing to the majority of users, because looking at the randomly generated numbers it would seem okay.  It is only after you look at millions of them, that you realize the faults in the system.
  • Testing for Randomness: Personally, I've never really found any good testing models for randomness.  I've used some statistical modelling for bit runs, bit group runs, total bits, etc...  All of the basic stuff.  But there are so many hard core and off-the-wall tests out there that I thought it might be cool to examine some of them.  This step will really have to be examined first, since the test suites found and the test suites created will be used to examine each of the random number generators.
  • Game Generator Considerations: Recently there have been concerns that true random number generators are too random for game players to trust.  In true random generators players get the feeling that the generator is broken if they get a string of bad luck.  This is exacerbated by the methods most generators use for creating random numbers within small ranges that are much smaller than the underlying random sample.  So while in some cases the generators appear broken, in others they actually are broken, and believe me, players see this.  We'll discover enhancements that can help weed out strings of *improbable* numbers that the user might confuse for a broken random number generator.

References:
Numeric Recipes in C, 2nd Edition - This book definitely has loads of random number generators, but they don't talk about why the random number generators are any good.  They do, however, cover random number generators with uniform devitates (we'll be focused on these), and also many without focused on the Monte Carlo method, Poisson distributions, and Gaussians.
AI Game Programming Wisdom 2 - This book has some great insights on not-so-random random number generation to confuse the player ;-)  I say confuse, but it is really to appease the player and give them the feeling the world is in order.
Intel Random Number Generation - A good article on the intel random number generator and the tests they've undergone to make sure everything is up to speed.  I've also heard of a random number generator being created by intel that resides on the chip and uses thermal variances (why by call accounts should be truly random) to compute an actual random number.
Florida State, DIEHARD - Florida State has created a series of tests known as DIEHARD.  These have been used by the intel group in the brief, so I figured I'd either implement their testing suites in the managed world, or at least use them for sanity checks.

So there was a blog entry about the VB .NET IsNumeric function today.  The question was in regards to a C# equivalent.  I have two things to say really.  First, if you really want the IsNumeric function from VB you can always grab it out of Microsoft.VisualBasic.dll.  Everything from VB .NET is in that library and you can use any of it that you want.  I made explicit use of their array redimensioning code a while back until I found out that I could write faster code in C#.  In some cases I just assumed the VB team wrote performant methods to back all of their functions, but in some cases the performance is lacking.

So how many different ways can you test for a number?  Well, I've picked 5 methods.  4 are available through the CLR.  The 5th is a hand-made algorithm that is meant for high speed ASCII only processing.  So here are the 5 methods and their functions:

  • IntParse - This method uses the int.Parse and traps the exception.  This is the slowest method of all 5 because int.Parse throws and we know that exceptions are costly.  In this case they are so costly, that the method is several orders of magnitude slower than the other routines.
  • HandCodeSwitch - This is a method that I created using my own performance knowledge.  For instance, I made use of a switch statement to mask out digits since I know this is extremely fast when working over character data.  I know that behind the scenes the switch statement is causing a single subtraction and then either a jump into the digits portion of the case statements, or is jumping directly to the default clause and exiting the method.
  • HandCodeIf - Now, since we are doing the sub, it might be faster to do an if statement.  I've ordered the if to check > '9' first, since in most cases we'll see characters above 0x39 rather than characters below 0x30, followed by < '0'.  This performs a bit faster than the switch statement because we know the ordering of things.  However, the switch statement will be much faster if we add more characters that aren't sequentially ordered (perhaps the decimal point, or if we use a culture comparer to build things, any number of other *numeric* characters).
  • IsNumber - This uses the CLR char.IsNumber method.  This checks for digits and hex digits.  Have to be careful with this.
  • IsDigit - This is more like what the rest of the tests.  This uses the CLR char.IsDigit and only pulls through on digit characters.
  • RegexDigit - This uses a regular expression.  If anyone wants to come up with something faster than what I've created then so be it.  I'll insert it into the test.  As it stands a Regex is 6x slower than the CLR methods and over 13x slower than my hand generated methods.

Okay, so the tests aren't 100% fair since all of the CLR versions are using culture comparers and all sorts of other things, but I'm trying to show you how to write the faster integer routine in the West (as in US!).  The HandCodeIf and HandCodeSwitch win hands down.  If you need to create a method for a culture comparer that took into account things like decimal points, group separators, or any number of other issues, I would highly recommend using the switch model, and dynamically generating the method using what I've written as a template.  If there is any interest I'll show you how you can write such a model in another entry.  It would be just as fast as the current HandCode'ed methods, only it would take into account the additional characters when determining if something was a number or not.  Below is the code.  Enjoy!

using System;
using System.Text.RegularExpressions;

public class IsNumeric {
    private static void Main(string[] args) {
        int iterations = 10000000;
        bool makeSureTheJitDoesNotOptimizeMeOut = false;
        string[] testStrings = new string[] { "1234M", "12345" };
       
        Console.WriteLine("IsNumber String 1: {0}", IsNumber(testStrings[0]));
        Console.WriteLine("IsNumber String 2: {0}", IsNumber(testStrings[1]));
        Console.WriteLine("IsDigit String 1: {0}", IsDigit(testStrings[0]));
        Console.WriteLine("IsDigit String 2: {0}", IsDigit(testStrings[1]));
        Console.WriteLine("HandCodeSwitch String 1: {0}", HandCodeSwitch(testStrings[0]));
        Console.WriteLine("HandCodeSwitch String 2: {0}", HandCodeSwitch(testStrings[1]));
        Console.WriteLine("HandCodeIf String 1: {0}", HandCodeIf(testStrings[0]));
        Console.WriteLine("HandCodeIf String 2: {0}", HandCodeIf(testStrings[1]));
        Console.WriteLine("RegexDigit String 1: {0}", RegexDigit(testStrings[0]));
        Console.WriteLine("RegexDigit String 2: {0}", RegexDigit(testStrings[1]));
        Console.WriteLine("IntParse String 1: {0}", IntParse(testStrings[0]));
        Console.WriteLine("IntParse String 2: {0}", IntParse(testStrings[1]));
       
        DateTime start, end;
        start = DateTime.Now;
        for(int i = 0; i < iterations; i++) {
            makeSureTheJitDoesNotOptimizeMeOut = IsNumber(testStrings[i%2]);
        }
        end = DateTime.Now;
        Console.WriteLine("IsNumber {0}", end - start);

        start = DateTime.Now;
        for(int i = 0; i < iterations; i++) {
            makeSureTheJitDoesNotOptimizeMeOut = IsDigit(testStrings[i%2]);
        }
        end = DateTime.Now;
        Console.WriteLine("IsDigit {0}", end - start);

        start = DateTime.Now;
        for(int i = 0; i < iterations; i++) {
            makeSureTheJitDoesNotOptimizeMeOut = HandCodeIf(testStrings[i%2]);
        }
        end = DateTime.Now;
        Console.WriteLine("HandCodeIf {0}", end - start);

        start = DateTime.Now;
        for(int i = 0; i < iterations; i++) {
            makeSureTheJitDoesNotOptimizeMeOut = HandCodeSwitch(testStrings[i%2]);
        }
        end = DateTime.Now;
        Console.WriteLine("HandCodeSwitch {0}", end - start);

        start = DateTime.Now;
        for(int i = 0; i < iterations; i++) {
            makeSureTheJitDoesNotOptimizeMeOut = RegexDigit(testStrings[i%2]);
        }
        end = DateTime.Now;
        Console.WriteLine("RegexDigit {0}", end - start);

        start = DateTime.Now;
        for(int i = 0; i < iterations; i++) {
            makeSureTheJitDoesNotOptimizeMeOut = IntParse(testStrings[i%2]);
        }
        end = DateTime.Now;
        Console.WriteLine("IntParse {0}", end - start);
        Console.WriteLine(makeSureTheJitDoesNotOptimizeMeOut);
    }
   
    public static bool IntParse(string test) {
        try {
            int.Parse(test);
            return true;
        } catch { return false; }
    }
   
    public static bool HandCodeSwitch(string test) {
        for(int i = 0; i < test.Length; i++) {
            switch(test[i]) {
                case '0':
                case '1':
                case '2':
                case '3':
                case '4':
                case '5':
                case '6':
                case '7':
                case '8':
                case '9':
                    continue;
                default:
                    return false;
            }
        }
       
        return true;
    }

    public static bool HandCodeIf(string test) {
        for(int i = 0; i < test.Length; i++) {
            if ( test[i] > '9' || test[i] < '0' ) {
                return false;
            }
        }
       
        return true;
    }
   
    public static bool IsNumber(string test) {
        for(int i = 0; i < test.Length; i++) {
            if ( !char.IsNumber(test[i]) ) {
                return false;
            }
        }
       
        return true;
    }
   
    public static bool IsDigit(string test) {
        for(int i = 0; i < test.Length; i++) {
            if ( !char.IsDigit(test[i]) ) {
                return false;
            }
        }
       
        return true;
    }
   
    private static Regex matchString = new Regex("^[0-9]+$", RegexOptions.Compiled);
    public static bool RegexDigit(string test) {
        return matchString.IsMatch(test);
    }
}

Okay, so Darren Neimke thinks this deserves a blog entry and I think he is probably correct.  The method for punishing creatures that take up too much time really isn't understood and the methods are actually somewhat complicated, so they deserve to be examined.  I talked about the Terrarium game loop in an earlier discussion that you can view here with specific comments about punishing creatures time-wise, without explaining how, “Terrarium: A focus on game loops and scheduling”.

The first thing we do is time an average creature.  We assume that an average creature will do specific amounts of specific things.  Right now those things include adding/removing items from collection, processing collections in loops, and making mathematical decisions.  We do this using three methods:

  • InitAI - Simulates initializing data storage for AI components.  Here we basically create all of the collection storage and create a sample organism state that defines our creature for later comparison.
  • DistanceChecks - Assume the creature will do a certain number of distance checks.  Currently we time 50 complex distance checks.  If you optimize your distance checks you could easily quadruple this number.  We end up making a bunch of calls to rand to create the random locations, and then compute a solid square root using the Math.Sqrt function. (You can use an approximation to speed things up here).
  • StoreInformation - StoreInformation stores a bunch of random creatures and plants into varying arrays depending on whether or not they are friend or foe.  This again lets us do a series of additional distance checks to determine the closest creature.

We do this a few times to warm-up the GC and get the methods JIT'ed.  Then we clear the timing information and run these tests 100 times so we can get the average.  Depending on the timing returned, creatures will on average get about 2 ms to operate their tick based on the timings that we do.  There is a kind of minimum timing that creatures will get and a theoretical maximum, that causes an assertion that only happens if you break in the debugger during this method (or you have a horked run-time since you work at MS and you just compiled yourself a bug in your code ;-).

We use the base-time to determine how many creatures your computer should run as well, since we want to fit all of the creatures within about 300 milliseconds or 3/5ths of a game tick.  That is why some machines have smaller world sizes than others.  The maximum timings for creatures dictate this size, even if creatures only use half the time, we still don't allow more to operate since if the creatures did use the time it would elongate the game tick and cause choppy speed artifacts.

Now for the fun stuff.  When the creature gets run, we time them to make sure they don't go over their maximum time allowance.  We do this unobtrusively, so that we don't break creatures that are only going over by a little bit.  We never interrupt the creature unless they go really over the time limit.  In fact, the deadlock detection routine allows for up to a couple of seconds before the creature actually gets blacklisted using the deadlock routine.  Blacklisting is really bad, since we zero out the creature's assembly and it no longer loads or allows that creature to participate.

Under normal operation, creatures actually do return though, and we check the amount of time they spent processing against the allowed quantum.  Depending on the state of things, we add points to a special overage variable or subtract them off if they didn't use all of their time.  Overage is a number between 0 and some positive amount, and we never let you have a negative overage amount that allows you to essentially process longer in the future because you didn't use your time earlier.

At the beginning of the following ticks, if your overage ever exceeds _maxOverage (something like a couple times the quantum), then we process your tick without you.  Basically, the creature class processes it's portion of the tick, but you don't get called.  You can check if you get skipped at any time by checking the TurnsSkipped property on your creature.  Unfortunately overage is stored within the game engine itself, so you can't really check if you are operating too long until you've been skipped at least once and been notified using the TurnsSkipped property.

Note you have to check this property every time you run your code because it does get reset if callbacks into your code are made.  This is to ensure that it only holds value when you need to know how many ticks have elapsed since the last time you got to process a turn.  This can be important since various actions may have completed and you'll never get the event handler callbacks for them (they were processed while you were taking a processing nap for being naughty ;-)

A large number of developers have expressed concern that they don't know how much time they have left or how much overage they used.  Our response is that the Terrarium is not about maximizing the amount of time you have available to you, and it isn't about managing your overage allowing you to think longer than you should some ticks and then buy your overage back down on other ticks.  It is simply about writing simple and effective algorithms to solve the problem space provided to your buy the Terrarium environment and other creature authors.  Note, the rules for stopping the processing of a creature would have to be quite a bit more strict if we did give out the overage information.  There are a number of gameplay attacks and tick elongation issues that developers could introduce and then completely work around the detection logic.  That would be extremely bad.

To wrap up:

  • Your creature is granted a time quanta within the Terrarium based on tuning metrics of the average creature developed very early in the Terrarium process.  This tuning logic has worked for a long time now and is an effective method for load balancing the number of creatures per machine and the amount of time they should get to run.
  • Creatures that hang can cause deadlocks.  These creatures are removed from the system (the entire species, not just the instance), and never allowed to run on the machine again (assuming Ecosystem mode).
  • Creatures that go over their time by small amounts here and there are never affected.  These creatures end up buying back their overage by going under the allowed quanta on some ticks.  This is how the average creature functions in the Terrarium.
  • Creatures that go over consistently eventually have their ticks skipped.  These creatures can determine that they are being bad by checking the TicksSkipped propery in their code to determine if they have been skipped during the previous tick(s).  This can be used to custom tune depth in path finding algorithms, shrink problem spaces, or provide better maximums for collections that are being processed.
  • Giving programmatic access to overage variables means possible attacks against the scheduling logic that would allow for UI/gameplay issues to arise.  To combat these attacks the Terrarium would need to employ extremely strict rules on how creatures are run, and the liberal allowance of overage currently allowed that works so well would have to be revoked and replaced by a system that didn't allow nearly as much overage.  The threading system would also have to be replaced with a much stricter system that would, in many cases, break into user code while it was running and short-circuit user code execution paths.  This would place a great deal of stress on the user to write code that would recover from such a situation, along with numerous gameplay issues since the creature wouldn't complete it's tick.

So it turns out that while I'm doing some research work to help build a history behind my, soon to be released, C#/Direct3D photon mapping software, I actually find the creator of ping.  Now, you'd think that someone in the computer industry for over 8 years would know the creator of the ping program we use every day, but I guess not, because I sure as hell didn't know (or probably just didn't remember).

Turns out the program was created by Mike Muuss, a computer research scientist for the US military.  Even cooler than ping, though, this guy worked on some really awesome 3D ray-tracing and modelling software.  Now, his plans seem to all center around hundreds of CPUs processing every aspect of the environment in real-time, but some of his 200 processor ideas are nearly put into action on a single computer running a high end graphics cards.  We are probably still 5-10 years out from seeing his true vision come true, and it will be a shame that he won't around to see it.  If you think you might want to check out some of his research, he has a repository that appears to be maintained by Lee Butler “Mike John Muuss Repository”.  For more about the ping utility and some additional history on the name, you can check out “The Truth about Ping” over on the CramSession website.

Definitely good stuff, and kind of strange that old ray-tracing techniques would dig up such an important historical marker.

If you had your choice of how IL got JIT'ed down to the host processor assembly language, how would your JIT handle unconditional branching with regards to how the IL is laid out?  I would think the JIT'er would be a two step processor myself.  It would basically lay out the assembly to kind of mimic what the IL looks like.  However, the JIT'er as programmed, has a really cool optimization.  Now, I'm not sure if this is fully implemented as an optimization, but I'm pretty sure it is.  Check out the following code:

    IL_0000:  br         IL_0005
    IL_0005:  br         IL_000a
    IL_000a:  br         IL_000f
    IL_000f:  br         IL_0014
    IL_0014:  br         IL_0019
    IL_0019:  br         IL_001e
    IL_001e:  br         IL_0023
    IL_0023:  br         IL_0028
    IL_0028:  br         IL_002d

So when the JIT'er encounters a forward jumping unconditional break, what does it do?  Apparently, it doesn't do much at all.  It simply moves the instruction pointer to the new location, and specifies that the IL be processed from there.  There are some assumptions that I'm making here, because the JIT'er does a few additional things.  It makes sure you aren't jumping out of try/catch blocks.  That is a given, since the jumps would have to be different if that were the case.  It also makes sure the address jump doesn't run outside of the bounds of the method and all of the other basic checking.  If the target has already been JIT'ed this optimization is kind of turned off as well, since we can't just start JIT'ing at the next address.

This brings me to something that might be really cool.  Unconditional breaks over unused code might mean that unused code isn't compiled into memory.  Code coverage into today's world is a big performance tool for removing unused code-paths that aren't in use anymore, or for ensuring that the test cycle covers all of the code that needs to be tested.  However, at least from a performance perspective, the JITer is doing some of this work for you already.  Taking a large assembly that is nothing more than say a break to the next statement, and doing this many thousands of times, results in a piece of JIT'ed code the same size as if you had only done the jump once.

A second side effect is that your code, no matter how it is formatted as IL, is also execution inlined in many cases.

While you can't really take advantage of this knowledge unless you want to be extremely picky about how you program your managed code, it is nice to know that the JIT is doing some really cool things behind the scenes.  Note this isn't a license to leave in old code, since while the JIT'er may create an optimized method out of it, that would be the same as if the code were not even there, the JIT process still takes into account the total size of the IL when creating a FJitState tables and the resulting unmanaged code buffers.  The smaller your IL the less memory the compilation of your method will take up.  Also note that the code enhancements are different when you are inside of try/catch blocks.  If you stay within the try/catch blocks, then things work fine, but trying to jump out of these blocks might have really bad performance ramifications.  I might leave that for another entry once I've learned more about how the JITer works with exception catching code.

I also just learned of the FJitState tables and of some strange tuning metrics used by the JITer when allocating the unmanaged code buffer.  I'm not sure yet, but I think these items might be a possible source of code performance improvement (or at least memory footprint improvement).  Again, another entry, another time, but something to look forward to.

More Posts Next page »