A C# Set class based on enums - You don´t want to be without it when constructing a compiler ;-)

To refresh myself from obscure information models like Pile and restless thinking about software architecture I decided to pick up a new hobby: compiler construction ;-) Or rather I decided to apply my long time interest in this topic to a good cause: When I talked to Niklaus Wirth - the father of Pascal - in November at the iX conference he told me, he was about to release an updated version of his book "Compiler Construction" (read the complete book here: [PDF]) - which happens to be one of my all time favorite computer text books.

However, when I asked him, if he intended to bring his classical introduction to the world of .NET, he declined. He said, he deliberately chose Oberon as the implementation language for all examples in his book.

Now, since I´m a great fan of the book as well as the .NET Framework, I have the idea of translating all code samples in the book to C# to make the text more accessible to .NET developers. This sure is some undertaking and I don´t know, if I will find the time to really complete this effort - but at least I want to start.

So, here is my first very small piece of the puzzle: a set class. It´s very handy when you write a parser to check, if a symbol belongs to a certain set of expected symbols, e.g.

IF symbol IN [number, identifier, leftParent, minusOp] THEN ...

This checks, if the current symbol of the parser is a number or identifier etc. which means, if it might signify the start of an expression.

A set data type exists in Pascal and Oberon, but not in C# or VB. That´s pretty sad, because it´s sometimes quite convenient. Of course there exist some implementations like [1] and [2], but I thought, why not try it again using .NET 2.0 Generics? Also I though, enum data types were an ideal foundation to build a set on.

Why not be able to write:

public enum Colors
{
   red, blue, green
}

Set<Colors> sc = new Set<Colors>();
sc.Add(Colors.red);
sc.Add(Colors.green);

Set<Colors> sc2 = new Set<Colors>();
sc2.Add(Colors.blue);

sc.Add(sc2); // union

Console.WriteLine(sc); // prints: [red,blue,green]

With such a set data type setting up a scanner and parser would be as easy as with Pascal, e.g.

public enum Symbols
{
   identifier, number, string, leftParent, rightParent, ...
}

Set<Symbols> firstExpressions = new Set<Symbols>();
firstExpressions.Add(Symbols.identifier);
firstExpressions.Add(Symbols.number);
...

Symbols currentSymbol;
...

if(firstExpressions.Contains(currentSymbol))
{
   ...
}

I find using enums as a starting point very convenient. They look pretty much like the Pascal sets, but just lack any operations. So I devised a generic Set data type in C# to wrap an enum. So far it works ok for me:

    public class Set<TEnum>

    {

        #region "Data"

        bool flagsEnum;

        System.Collections.BitArray members;

        System.Collections.Generic.Dictionary<TEnum, int> mapMember2Index;

        #endregion

 

 

        #region "ctors"

        public Set()

        {

            Initialize();

            members = new System.Collections.BitArray(mapMember2Index.Count);

        }

 

        private Set(System.Collections.BitArray members)

        {

            Initialize();

            this.members = members;

        }

 

        private void Initialize()

        {

            if (typeof(TEnum).BaseType != typeof(System.Enum))

                throw new ApplicationException(string.Format("Generic type parameter <{0}> is not an enum type!",
                                                             
typeof(TEnum).FullName));

 

            flagsEnum = typeof(TEnum).GetCustomAttributes(typeof(System.FlagsAttribute), true).Length > 0;

 

            int i = 0;

            mapMember2Index = new Dictionary<TEnum, int>();

            foreach (TEnum m in Enum.GetValues(typeof(TEnum)))

                mapMember2Index.Add(m, i++);

        }

        #endregion

 

 

        #region "Working with the set"

        public void Clear()

        {

            members.SetAll(false);

        }

 

 

        public Set<TEnum> Add(TEnum member)

        {

            members.Set(mapMember2Index[member], true);

            return this;

        }

 

        public Set<TEnum> Add(Set<TEnum> otherSet)

        {

            if(otherSet != null) members.Or(otherSet.members);

            return this;

        }

 

 

        public Set<TEnum> Remove(TEnum member)

        {

            members.Set(mapMember2Index[member], false);

            return this;

        }

 

        public Set<TEnum> Remove(Set<TEnum> otherSet)

        {

            if (otherSet != null)

                for (int i = 0; i < members.Count; i++)

                    if (otherSet.members[i])

                        members[i] = false;

            return this;

        }

 

 

        public Set<TEnum> Intersect(Set<TEnum> otherSet)

        {

            if (otherSet != null) members.And(otherSet.members);

            return this;

        }

 

 

        public bool Contains(TEnum member)

        {

            return members[mapMember2Index[member]];

        }

 

 

        public int Cardinality

        {

            get

            {

                return members.Length;

            }

        }

        #endregion

 

 

        #region "Overrides"

        public override bool Equals(object obj)

        {

            return this == (Set<TEnum>)obj;

        }

 

        public override int GetHashCode()

        {

            return members.GetHashCode();

        }

 

 

        public override string ToString()

        {

            string[] names = Enum.GetNames(typeof(TEnum));

 

            StringBuilder memberNames = new StringBuilder("[");

            for (int i = 0; i < members.Count; i++)

                if (members[i])

                {

                    if (memberNames.Length > 1) memberNames.Append(",");

                    memberNames.Append(names[i]);

                }

            memberNames.Append("]");

 

            return memberNames.ToString();

        }

        #endregion

 

 

        #region "Operators"

        // intersection

        public static Set<TEnum> operator &(Set<TEnum> left, Set<TEnum> right)

        {

            System.Collections.BitArray result = new System.Collections.BitArray(new bool[left.members.Count]);

            result.Or(left.members);

            if(right != null) result.And(right.members);

            return new Set<TEnum>(result);

        }

 

        // union

        public static Set<TEnum> operator |(Set<TEnum> left, Set<TEnum> right)

        {

            System.Collections.BitArray result = new System.Collections.BitArray(new bool[left.members.Count]);

            result.Or(left.members);

            if(right != null) result.Or(right.members);

            return new Set<TEnum>(result);

        }

 

        public static bool operator ==(Set<TEnum> left, Set<TEnum> right)

        {

            if (right != null)

            {

                for (int i = 0; i < left.members.Count; i++)

                    if (left.members[i] != right.members[i]) return false;

                return true;

            }

            else

                return false;

        }

 

        public static bool operator !=(Set<TEnum> left, Set<TEnum> right)

        {

            return !(left == right);

        }

        #endregion

 

 

        #region "Iterator"

        public IEnumerator<TEnum> GetEnumerator()

        {

            TEnum[] memberValues = (TEnum[])Enum.GetValues(typeof(TEnum));

            for (int i = 0; i < members.Count; i++)

                if (members[i])

                    yield return memberValues[i];

        }

        #endregion

    }

 

The only drawbacks so far: The generic parameter type TEnum cannot be constrained to the base class System.Enum, and I left out handling bit field enumerations. (Currently if you pass Colors.red | Colors.green to Add() it will crash.) (Yes, yes, and the class is not thread-safe, I know.)

Enjoy!

PS: It seems there is one less excuse to not start translating the Compiler Construction sources... Let´s see when I really find time for that. Stay tuned!

[1] A Generic Set Data Structure
[2] Yet Another C# set class

6 Comments

  • Have you looked at Chrome, a .Net Pascal compiler? They recently released version 1.5 with support for .net 2.0 and it works really well.

  • And another thing: Doesn't the PowerCollection library have a set class? I am not sure, but it might ;)

  • @SBC: upps, I just realized, there is not audio interview at &quot;radio.weblogs&quot; as I expected from the name. Nevertheless it&#180;s a funny quote :-)

  • You can shorten your ToString implementaion by using String.Join:



    public override string ToString()

    {

    return &quot;[&quot; + String.Join(&quot;, &quot;, Enum.GetNames(typeof(TEnum))) + &quot;]&quot;;

    }

  • I vote for the following overloaded constructor:

    public Set()(params TEnum[] valuesSet)
    {
    ...
    }

    This allows code like:
    Set firstExpressions = new Set(Symbols.identifier, Symbols.number);
    :-)

  • Why does it work without StackOverflowException ?

    public Set Add... {
    if(otherSet != null) ...
    calls A
    public static bool operator !=...{
    return !(left == right);
    calls B
    public static bool operator ==...{
    if (right != null)...
    calls A op != again and so on

    I use "if (((object)otherSet) != null) "
    and no exception occurs

Comments have been disabled for this content.