Software Transactional Memory II - Isolation of Changes to Transactional Objects

Wednesday, July 4, 2007

In yesterday´s posting I introduced my C# implementation (NSTM) of the Software Transactional Memory (STM) concept. It is supposed to make concurrent programming easier than it is today using explicit locking of shared in-memory resources. With NSTM multithreaded processing becomes as easy as accessing RDBMS from multiple applications isolated from each other with transactions.

How ACID is NSTM?

Since NSTM provides transactionality on some resource and you´ve come to love the usual transaction ACID properties of database transactions [1], I should quickly explain in how far NSTM supports them:

Atomicity: The whole purpose of NSTM is to provide atomicity across several operations on in-memory data. Any changes applied to INstmObjects between opening a NSTM transaction with NstmMemory.BeginTransaction() and closing it with INstmTransaction.Commit() are either all successful or none of them.
Consistency: NSTM does not have a metadata concept so there are no independent consistency rules that could be observed during a transaction. If, for example, you want to implement a transactional queue data structure which of course internally defines some consistency rules, then your code needs to take care of enforcing consistency. However, you´ll get some help for this from the Isolation property of NSTM transactions.
Isolation: Like with Atomicity the whole purpose of NSTM is to isolate changes to shared in-memory data structures from each other if they happen in different transactions which at the same time means on different threads. Changes during one transaction become only visible to others upon committing the transaction.
Durability: Durability, i.e. protecting committed data against unexpected process abortion (e.g. through a power outage), obviously is of no concern when working just in-memory ;-) Volatile data is just that: volatile.

NSTM is not ACID, but just AI. But I think that´s perfectly ok. It´s not a lack, it´s a feature ;-) ACID properties are not absolute, but mirror what can go wrong when concurrently accessing databases. A non-persistent resource with no inherent rules thus does not need C and D. But how does it ensure A and I?

Transactional Objects

The basic unit of NSTM transactional memory is the INstmObject<T> which can be called a transactional object or txo for short. It´s allocated on the heap, so you can trust the .NET garbage collection to take care of it once it´s not needed anymore. Each txo wraps a piece of your application data ranging from scalar type values to complex object models, e.g.

    1 INstmObject<int> i;
    2 INstmObject<string> s;
    3 INstmObject<MyStruct> r;
    4 INstmObject<MyClass> c;

with MyStruct and MyClass defined as:

    1 struct MyStruct
    2 {
    3     public int j;
    4 
    5     public MyStruct(int value)
    6     {
    7         this.j = value;
    8     }
    9 }
   10 
   11 class MyClass : ICloneable
   12 {
   13     public int i;
   14 
   15     public MyClass(int value)
   16     {
   17         this.i = value;
   18     }
   19 
   20     #region ICloneable Members
   21 
   22     public object Clone()
   23     {
   24         return new MyClass(this.i);
   25     }
   26 
   27     #endregion
   28 }

You can allocate space for any type in transactional memory provided its values can be cloned by the NSTM infrastructure. Scalar types and strings don´t pose a problem here. Value types (struct) are easy too, as long as they just aggregate scalar type values. All other types need to implement ICloneable to let NSTM create a copy of their instances.

ICloneable is used instead of [Serializable] because so far it seemed to me that serialization leads to more problems than it solves:

Serialization is slower by nature than cloning.
Serialization (or deserialization to be more precise) leads to problems if your own txo types contain references to other transactional objects, since they must not be serialized and need to regain their identity during deserialization. Some overall management of txo would be necessary.

Values of transactional objects need to be cloneable to isolate transactions from each other. The default mode of NSTM transactions is CloneOnRead. That means, when you first read a txo in a transaction its value is cloned. Any subsequent reads will then also return the cloned value (or any changes you made to it). If some other transaction overwrites the txo´s value in the meantime you won´t see those changes. You keep on working with your own clone. This makes working with NSTM threadsafe out of the box. No need for you to worry about locking your own values stored in transactional objects.

If a transaction is opened in CloneOnWrite mode, though, the values of transactional objects are not cloned by default when you just read them. That makes working with NSTM a little bit faster, but it opens the door to inconsistencies. If after first reading a txo and later on reading it again another transaction commits a new value to the txo, then two different values would be returned within the same transaction. However, CloneOnWrite might be ok for you, if you keep an eye on when and how often you read from a txo.

Reading and Writing Transactional Memory

Reading from and writing to transactional objects unfortunately are very explicit:

int iValue = i.Read();

This is due to the difficulty to intercept regular memory access. If I had chosen to not allow scalar types for txo I could have used some form of proxy to intercept read/write to transactional object values. But I did not want to limit the type range in that way. Maybe in the future I find some easier way to interact with txo values or to get rid of explicit transactional objects alltogether. Code enhancing could be a way to achieve that. But right now I opted for a quick implementation with a reasonable programming model instead of the most intuitive and unobtrusive one.

Just calling Read() on a txo opens it in ReadWrite mode, though. Again, this is the safest option: Even if the clone mode is not CloneOnRead a clone of the current value is created, because the read mode signals, that your application intends to write to the txo at some point in the future.

i.Read()

is the same as

i.Read(NstmReadOption.ReadWrite)

However, if you know you will never write to a txo you can spare NSTM the effort to clone it (except for if clone mode is CloneOnRead). Just state ReadOnly as the read mode:

i.Read(NstmReadOption.ReadOnly)

Inconsistencies that might occur due to changes to transactional objects you read in your transaction by other transactions committing their changes in the meantime are then of course still detected. Either on the next read access to the txo or during commit of your transaction.

If you don´t care for such inconsistencies and want to save the effort of this kind of validation - e.g. because you just want to traverse a couple of txo - then read from a txo with PassingReadOnly:

i.Read(NstmReadOption.PassingReadOnly)

No consistency checks will be performed upon future reads with this option or during commit - unless you´ve subsequently read the txo with another option.

So much for reading from transactional objects. It´s explicit, but it´s simple.

Changing txo works likewise. You just call Write() and pass the new value to it:

i.Write(99);

If the value had already been cloned the clone is overwritten. In any case will the new value not yet be visible to any other parallel transaction. You have to call Commit() on the transaction to make new values public. Until then they are kept in a transaction local buffer, the transaction log.

Explicitly writing to a txo is the surest way for changing its value. And it´s the only way for scalar types and strings and any other value type. Objects, however, can also be changed implicitly. That´s why there is the ReadWrite read option:

    1 MyClass x = c.Read();
    2 x.i = 99;

Read() returns a clone of c´s value which then is changed. Although no explicit write happens afterwards the txo´s value is changed and can be committed. Were it not for the automatic clone of the txo´s value due to ReadWrite assigning 99 would change the original and "true" value of c. But that must not happen in order to isolate transactions from each other. That´s why the default read option is ReadWrite.

You could habe read the value with ReadOnly and change the field and NSTM would not have detected this violation, though. I did not build in any checks against that as explained above. Some form of proxy would have been necessary. So the only checks are the ReadWrite default read option and the CloneOnRead default clone option for transactions.

If you change them, NSTM assumes you know what you´re doing ;-) Feel free to do so and you´ll find many places in my collection implementations where I did it. But be careful! It´s a form of optimization and should not be done too easily.

What´s next?

Single transactional objects are isolated from each other by making changes only to copies of their values. But where are those changes kept? How are changes detected? How is the Atomicity of all changes ensured? That´s all a matter of the transaction log and will be explained in my next posting.

Resources

[1] Wikipedia, ACID, http://en.wikipedia.org/wiki/ACID

How ACID is NSTM?

Transactional Objects

Reading and Writing Transactional Memory

What´s next?

Resources

No Comments