A C# Set class based on enums - You don´t want to be without it when constructing a compiler ;-)

To refresh myself from obscure information models like Pile and restless thinking about software architecture I decided to pick up a new hobby: compiler construction ;-) Or rather I decided to apply my long time interest in this topic to a good cause: When I talked to Niklaus Wirth - the father of Pascal - in November at the iX conference he told me, he was about to release an updated version of his book "Compiler Construction" (read the complete book here: [PDF]) - which happens to be one of my all time favorite computer text books.

However, when I asked him, if he intended to bring his classical introduction to the world of .NET, he declined. He said, he deliberately chose Oberon as the implementation language for all examples in his book.

Now, since I´m a great fan of the book as well as the .NET Framework, I have the idea of translating all code samples in the book to C# to make the text more accessible to .NET developers. This sure is some undertaking and I don´t know, if I will find the time to really complete this effort - but at least I want to start.

So, here is my first very small piece of the puzzle: a set class. It´s very handy when you write a parser to check, if a symbol belongs to a certain set of expected symbols, e.g.

IF symbol IN [number, identifier, leftParent, minusOp] THEN ...

This checks, if the current symbol of the parser is a number or identifier etc. which means, if it might signify the start of an expression.

A set data type exists in Pascal and Oberon, but not in C# or VB. That´s pretty sad, because it´s sometimes quite convenient. Of course there exist some implementations like [1] and [2], but I thought, why not try it again using .NET 2.0 Generics? Also I though, enum data types were an ideal foundation to build a set on.

Why not be able to write:

public enum Colors
{
   red, blue, green
}

Set<Colors> sc = new Set<Colors>();
sc.Add(Colors.red);
sc.Add(Colors.green);

Set<Colors> sc2 = new Set<Colors>();
sc2.Add(Colors.blue);

sc.Add(sc2); // union

Console.WriteLine(sc); // prints: [red,blue,green]

With such a set data type setting up a scanner and parser would be as easy as with Pascal, e.g.

public enum Symbols
{
   identifier, number, string, leftParent, rightParent, ...
}

Set<Symbols> firstExpressions = new Set<Symbols>();
firstExpressions.Add(Symbols.identifier);
firstExpressions.Add(Symbols.number);
...

Symbols currentSymbol;
...

if(firstExpressions.Contains(currentSymbol))
{
   ...
}

I find using enums as a starting point very convenient. They look pretty much like the Pascal sets, but just lack any operations. So I devised a generic Set data type in C# to wrap an enum. So far it works ok for me:

    public class Set<TEnum>

    {

        #region "Data"

        bool flagsEnum;

        System.Collections.BitArray members;

        System.Collections.Generic.Dictionary<TEnum, int> mapMember2Index;

        #endregion

 

 

        #region "ctors"

        public Set()

        {

            Initialize();

            members = new System.Collections.BitArray(mapMember2Index.Count);

        }

 

        private Set(System.Collections.BitArray members)

        {

            Initialize();

            this.members = members;

        }

 

        private void Initialize()

        {

            if (typeof(TEnum).BaseType != typeof(System.Enum))

                throw new ApplicationException(string.Format("Generic type parameter <{0}> is not an enum type!",
                                                             
typeof(TEnum).FullName));

 

            flagsEnum = typeof(TEnum).GetCustomAttributes(typeof(System.FlagsAttribute), true).Length > 0;

 

            int i = 0;

            mapMember2Index = new Dictionary<TEnum, int>();

            foreach (TEnum m in Enum.GetValues(typeof(TEnum)))

                mapMember2Index.Add(m, i++);

        }

        #endregion

 

 

        #region "Working with the set"

        public void Clear()

        {

            members.SetAll(false);

        }

 

 

        public Set<TEnum> Add(TEnum member)

        {

            members.Set(mapMember2Index[member], true);

            return this;

        }

 

        public Set<TEnum> Add(Set<TEnum> otherSet)

        {

            if(otherSet != null) members.Or(otherSet.members);

            return this;

        }

 

 

        public Set<TEnum> Remove(TEnum member)

        {

            members.Set(mapMember2Index[member], false);

            return this;

        }

 

        public Set<TEnum> Remove(Set<TEnum> otherSet)

        {

            if (otherSet != null)

                for (int i = 0; i < members.Count; i++)

                    if (otherSet.members[i])

                        members[i] = false;

            return this;

        }

 

 

        public Set<TEnum> Intersect(Set<TEnum> otherSet)

        {

            if (otherSet != null) members.And(otherSet.members);

            return this;

        }

 

 

        public bool Contains(TEnum member)

        {

            return members[mapMember2Index[member]];

        }

 

 

        public int Cardinality

        {

            get

            {

                return members.Length;

            }

        }

        #endregion

 

 

        #region "Overrides"

        public override bool Equals(object obj)

        {

            return this == (Set<TEnum>)obj;

        }

 

        public override int GetHashCode()

        {

            return members.GetHashCode();

        }

 

 

        public override string ToString()

        {

            string[] names = Enum.GetNames(typeof(TEnum));

 

            StringBuilder memberNames = new StringBuilder("[");

            for (int i = 0; i < members.Count; i++)

                if (members[i])

                {

                    if (memberNames.Length > 1) memberNames.Append(",");

                    memberNames.Append(names[i]);

                }

            memberNames.Append("]");

 

            return memberNames.ToString();

        }

        #endregion

 

 

        #region "Operators"

        // intersection

        public static Set<TEnum> operator &(Set<TEnum> left, Set<TEnum> right)

        {

            System.Collections.BitArray result = new System.Collections.BitArray(new bool[left.members.Count]);

            result.Or(left.members);

            if(right != null) result.And(right.members);

            return new Set<TEnum>(result);

        }

 

        // union

        public static Set<TEnum> operator |(Set<TEnum> left, Set<TEnum> right)

        {

            System.Collections.BitArray result = new System.Collections.BitArray(new bool[left.members.Count]);

            result.Or(left.members);

            if(right != null) result.Or(right.members);

            return new Set<TEnum>(result);

        }

 

        public static bool operator ==(Set<TEnum> left, Set<TEnum> right)

        {

            if (right != null)

            {

                for (int i = 0; i < left.members.Count; i++)

                    if (left.members[i] != right.members[i]) return false;

                return true;

            }

            else

                return false;

        }

 

        public static bool operator !=(Set<TEnum> left, Set<TEnum> right)

        {

            return !(left == right);

        }

        #endregion

 

 

        #region "Iterator"

        public IEnumerator<TEnum> GetEnumerator()

        {

            TEnum[] memberValues = (TEnum[])Enum.GetValues(typeof(TEnum));

            for (int i = 0; i < members.Count; i++)

                if (members[i])

                    yield return memberValues[i];

        }

        #endregion

    }

 

The only drawbacks so far: The generic parameter type TEnum cannot be constrained to the base class System.Enum, and I left out handling bit field enumerations. (Currently if you pass Colors.red | Colors.green to Add() it will crash.) (Yes, yes, and the class is not thread-safe, I know.)

Enjoy!

PS: It seems there is one less excuse to not start translating the Compiler Construction sources... Let´s see when I really find time for that. Stay tuned!

[1] A Generic Set Data Structure
[2] Yet Another C# set class

Comments

# re: A C# Set class based on enums - You don´t want to be without it when constructing a compiler ;-)

Saturday, January 07, 2006 7:36 PM by SBC

Regarding Niklaus Wirth - you may find this amusing:
http://radio.weblogs.com/0112769/2002/10/08.html

;-)

# re: A C# Set class based on enums - You don´t want to be without it when constructing a compiler ;-)

Sunday, January 08, 2006 4:27 AM by David

Have you looked at Chrome, a .Net Pascal compiler? They recently released version 1.5 with support for .net 2.0 and it works really well.

# re: A C# Set class based on enums - You don´t want to be without it when constructing a compiler ;-)

Sunday, January 08, 2006 4:28 AM by David

And another thing: Doesn't the PowerCollection library have a set class? I am not sure, but it might ;)

# AW: A C# Set class based on enums - You don´t want to be without it when constructing a compiler ;-)

Sunday, January 08, 2006 10:17 AM by torsten.rendelmann@gmx.net (Torsten Rendelmann)

The translation project sounds cool! But did you had a look at http://www.oberon.ethz.ch/lightning/ ?

TorstenR

# re: A C# Set class based on enums - You don´t want to be without it when constructing a compiler ;-)

Sunday, January 08, 2006 2:47 PM by Ralf

@SBC: Thx for the URL. I´ll listen into it.

@David, Chrome: No, I haven´t had a look at Chrome yet. But I heard it was pretty good.

@David, PowerCollection: Yes, the PowerCollection lib (http://www.wintellect.com/powercollections/documentation/Wintellect.PowerCollections.html) contains a Set class.
For several reasons, though, I did not want to use it: a.) The PowerCollection Set class allows sets of any type. That´s good and well, but I wanted to focus on a set for a specific purpose. b.) For the purpose of the translation of the Oberon code I do not want to introduce too many dependencies on 3rd party technologies. c.) I wanted to play around with Generics ;-)

@TorstenR: Yes, I know about the Oberon for .NET implementation.

@David & TorstenR: My primary interest in migrating the Compiler Construction code to .NET is not getting it to run on .NET, but to provide an introduction to this important topic for the average .NET developer. The average developer does not know Pascal or Oberon, though. He´s fully occupied by learning and using C# (or VB.NET, which is much closer to C# than Pascal or Oberon). C# (or VB.NET) are sufficiently different from Oberon to justify a translation of the Compiler Construction code, I´d say. Also I want to show how you can do real component oriented development using the major .NET languages.

# re: A C# Set class based on enums - You don´t want to be without it when constructing a compiler ;-)

Sunday, January 08, 2006 4:05 PM by Ralf

@SBC: upps, I just realized, there is not audio interview at "radio.weblogs" as I expected from the name. Nevertheless it´s a funny quote :-)

# re: A C# Set class based on enums - You don´t want to be without it when constructing a compiler ;-)

Tuesday, January 10, 2006 8:03 AM by Thomas Freudenberg

You can shorten your ToString implementaion by using String.Join:

public override string ToString()
{
return "[" + String.Join(", ", Enum.GetNames(typeof(TEnum))) + "]";
}

# re: A C# Set class based on enums - You don&#180;t want to be without it when constructing a compiler ;-)

Tuesday, November 07, 2006 5:52 AM by Joe McRay

I vote for the following overloaded constructor: public Set()(params TEnum[] valuesSet) { ... } This allows code like: Set firstExpressions = new Set(Symbols.identifier, Symbols.number); :-)

# re: A C# Set class based on enums - You don´t want to be without it when constructing a compiler ;-)

Monday, October 27, 2008 7:27 AM by frank jurisch

Why does it work without StackOverflowException ?

public Set<TEnum> Add...  {

     if(otherSet != null) ...

calls A

public static bool operator !=...{

     return !(left == right);

calls B

public static bool operator ==...{

     if (right != null)...

calls A op != again and so on

I use "if (((object)otherSet) != null) "

and no exception occurs

Leave a Comment

(required) 
(required) 
(optional)
(required)