Software Transactional Memory - Making multithreading easier

Tuesday, July 3, 2007

A while ago Carl Rosenberger - chief architect of db4o - mentioned in a personal conversation the concept of Software Transactional Memory (STM) [1, 8]. I was immediately intrigued by the idea - but the conversation went on. So I sat down later and read up on STM. And what I found made me very confident, STM was a very useful idea. Microsoft has recognized this too and is working on two versions of STM: a Haskell implementation [2, 3] and a more palatable version for mere mortal C# programmers called SXM [4].

However, these implementations are either not so close to the usual transactional programming model or need some unmanaged code like LibCMT [5] with its C# API. That made think about an easy to use purely managed code STM. And here it is: the .NET Software Transactional Memory. It´s written entirely in C# and integrates well with the overall .NET Framework, I´d say. But see for yourself...

What is a Software Transactional Memory all about?

So what´s the fuzz about STM? They make it much, much easier to program multithreaded code sharing in-memory resources. And I´d like to emphasize, an STM makes it easier, not necessarily more performing.

Parallel programming traditionally is quite hard. Once you go beyond starting just a couple of threads in the background using the .NET ThreadPool and try to access shared in-memory structures you´ll realize, multithreading is not for the faint of heart. Synchronizing access to those shared data structures can become very difficult to get right. Juggling around with all the locking options of the .NET Framework to avoid deadlock and not make the whole thing too slow by locking in a too coarse grained fashion quickly uses up a lot of your brainpower. And then comes debugging multiple threads...

To make a long story short: Multithreading with access shared in-memory resources today is much too difficult for the masses really exploit it. That means, all the brandnew multi-core/multi-processor power is of little use at least for your own applications. Sad, isn´t it?

But fortunately STM comes to the rescue. It promises to aleviate you from all the nitty gritty detail of shared data structure access synchronization by giving you the same programming model for in-memory data access as you are used to for database access. Instead of locking individual records in a database you just wrap all your access code in a transaction and let the database engine do the locking.

That´s right what STM offers: just wrap a transaction around your in-memory work and in the end commit all changes. Juval Löwy has already demonstrated something along this line based on the System.Transactions namespace [6]. However, his transaction aware collections are very coarse grained. Also .NET Transactions don´t care for threadsafety. They isolate work done on volatile data structures in different transactions - but they don´t help avoiding access conflicts of concurrently running code.

Sounds too abstract to you? Ok, let´s look at some code...

Simple Example

The canonical example for transactions probably is transfering money between two accounts. So why deviate from this? Here´s how such a transfer could look in memory:

    1 static void Main()
    2 {
    3     double myAccount;
    4     double yourAccount;
    5 
    6     myAccount = 1000;
    7     yourAccount = 500;
    8 
    9     double amountToTransfer = 150;
   10     myAccount -= amountToTransfer;
   11     yourAccount += amountToTransfer;
   12 }

All´s fine as long as nothing bad happens between subtracting the amount to be transferred from one account and adding it to the other account. If an error occurred during some processing between above lines 10 and 11 the accounts would be in an inconsistent state. It´s right the same as with databases. And as with databases the solution to the problem are transactions.

There´s the solution to the same problem using .NET Software Transactional Memory:

    1 static void Main(string[] args)
    2 {
    3     INstmObject<double> myAccount;
    4     INstmObject<double> yourAccount;
    5 
    6     myAccount = NstmMemory.CreateObject<double>(1000);
    7     yourAccount = NstmMemory.CreateObject<double>(500);
    8 
    9     using (INstmTransaction tx = NstmMemory.BeginTransaction())
   10     {
   11         double amountToTransfer = 150;
   12 
   13         myAccount.Write(myAccount.Read() - amountToTransfer);
   14         yourAccount.Write(yourAccount.Read() + amountToTransfer);
   15 
   16         tx.Commit();
   17     }
   18 
   19     Console.WriteLine("My account balance: {0}", myAccount.Read());
   20     Console.WriteLine("Your account balance: {0}", yourAccount.Read());
   21 }

The code structure has not changed - but the outcome is threadsafe and transactional at the same time. No explicit locks needed.

The only price to pay: You have to allocate transactional volatile data as NstmObjects in the transactional memory represented by NstmMemory. Instead of

double myAccount;

you write

INstmObject<double> myAccount;

And instead of calling new on the type you ask the transactional memory as a factory to create instances:

myAccount = NstmMemory.CreateObject<double>(1000);

This also entails an explicit Read() and Write() to get at the data wrapped in INstmObjects. Although, I think this is not too high a price to pay at the moment, I´d have liked to make the programming model easier. Microsoft´s STM implementations provide somewhat more simplicity in this regard - but overall I found them not really easy to use. For the time being and version 1.0 of my NSTM I´ll stick with the above approach. At least you´ll find it in other publications, too ;-) (see [7] for example)

But although it´s a little bit clumsy to have this indirection of INstmObject in the programming model I deem the translation from non-transactional, not threadsafe code to transactional and threadsafe code very straightforward. It becomes even more straightforward, if you switch to System.Transactions for the transaction handling:

    1 using System.Transactions;
    2 using NSTM;
    3 
    4 namespace SimpleSample
    5 {
    6     class Program
    7     {
    8         static void Main(string[] args)
    9         {
   10             INstmObject<double> myAccount;
   11             INstmObject<double> yourAccount;
   12 
   13             myAccount = NstmMemory.CreateObject<double>(1000);
   14             yourAccount = NstmMemory.CreateObject<double>(500);
   15 
   16             NstmMemory.SystemTransactionMode = NstmSystemTransactionMode.EnlistOnAccess;
   17             using (TransactionScope tx = new TransactionScope())
   18             {
   19                 double amountToTransfer = 150;
   20 
   21                 myAccount.Write(myAccount.Read() - amountToTransfer);
   22                 yourAccount.Write(yourAccount.Read() + amountToTransfer);
   23 
   24                 tx.Complete();
   25             }
   26 
   27             Console.WriteLine("My account balance: {0}", myAccount.Read());
   28             Console.WriteLine("Your account balance: {0}", yourAccount.Read());
   29         }

Microsoft does not even offer this for its own STM implementations.

Looking ahead

STM promises to make parallel programming tasks on shared in-memory data structures much easier than it is today using all those pretty locking facilities. But of course there is more to it than the above simple example shows. How about isolation levels? How does the STM actually make concurrent access threadsafe? What about all those beloved collection data structures?

If you like, stay tuned. I´ll talk about all this in a future posting.

Download

You can download NSTM (short for .NET Software Transactional Memory) from its Google project hosting site: http://code.google.com/p/nstm/.

Enjoy - and let me know how you like it, how it performs for you, or what you think needs to be improved.

Resources

[1] Wikipedia, Software transactional memory, http://en.wikipedia.org/wiki/Software_transactional_memory

[2] Microsoft Channel9, Programming in the Age of Concurrency: Software Transactional Memory, http://channel9.msdn.com/Showpost.aspx?postid=231495

[3] Tim Harris et al., Composable Memory Transactions, http://research.microsoft.com/~simonpj/papers/stm/stm.pdf

[4] Microsoft Research, C# Software Transactional Memory, http://research.microsoft.com/research/downloads/Details/6cfc842d-1c16-4739-afaf-edb35f544384/Details.aspx

[5] Duilio Protti, LibCMT, http://libcmt.sourceforge.net

[6] Juval Löwy, Can´t Commit: Volatile Resource Managers in .NET Bring Transactions to the Common Type, http://msdn.microsoft.com/msdnmag/issues/05/12/transactions/default.aspx

[7] Maurice Herlihy et al., Software Transactional Memory for Dynamic-Sized Data Structures, http://www.cs.rice.edu/~wns1/papers/2003-PODC-DSTM.pdf

[8] Derrick Coetzee, Developing for Developers: Software Transactional Memory, http://blogs.msdn.com/devdev/archive/2005/10/20/483247.aspx

[9] Robert Ennals, Software Transactional Memory Should Not Be Obstruction-Free, http://web.cecs.pdx.edu/~walpole/class/cs510/papers/19.pdf

Why just don't use some kind of embedded DBMS?They seems to be designed to handle transactional data.

Regards.

Andrey Skvortsov - Wednesday, July 4, 2007 5:21:01 PM

@Andrey: An embedded DBMS does not run just in memory - despite what its name might suggest -, but persists data. That´s the purpose of a DBMS. Persisting data is much slower than just manipulating data structures in memory, though.

In addition, DBMS - except for ODBMS like db4o - do not offer a very object oriented API. That´s why so many O/R mappers are now offered.

Then DBMS have a much larger memory footprint compared to my STM implementation.

-Ralf

ralfw - Wednesday, July 4, 2007 8:26:50 PM

gosh, it so elegant...
i love you...;)

om wok - Thursday, July 19, 2007 5:21:58 AM

What is a Software Transactional Memory all about?

Simple Example

Looking ahead

Download

Resources

3 Comments