If you're using Linq and Resharper, you've probably seen the warning Resharper shows when you use a foreach loop in which you use the loop variable in a Linq extension method (be it on IQueryable<T> or IEnumerable<T>). In case you don't know what it is or what damage it can do if you ignore the issue, I'll give you a database oriented query (so on IQueryable<T>, using LLBLGen Pro's Linq provider) which creates a dynamic Where clause based on input, the typical scenario you should be careful with when it comes to this particular problem.
var customers = from o in metaData.Order
join c in metaData.Customer on o.CustomerId equals c.CustomerId into oc
from x in oc.DefaultIfEmpty()
select new { CustomerId = x.CustomerId, CompanyName = x.CompanyName, Country = x.Country };
string searchTerms = "U A";
var searchCriteria = searchTerms.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach(var search in searchCriteria)
{
customers = customers.Where(p => p.Country.Contains(search));
}
var ids = (from c in customers select c.CustomerId).ToArray();
The above code snippet has the demon embedded into itself, likely without you noticing it. Can you spot it? (Ok, I already gave it away a bit with the foreach loop hint).
The problem with the above query is that it will produce a WHERE clause in the SQL query with two LIKE statements which both filter on %A%. How's that possible? The cause is in the 'Access to modified closure' problem: search is a local variable. The first time the foreach is ran, search will have the value "U". The .Where() extension method will add a MethodCall expression with a call to Where(lambda) with inside the lambda among other things a ConstantExpression referring to the local variable search for the value. And that's precisely the problem: when the foreach loop is looping again, search will get another value: namely "A". As there are no more values, the loop ends and the query is executed.
Well, executed is more complex than it sounds: first, the expression tree has to be converted into SQL. When the linq provider runs into the two .Where() extension method calls, it evaluates the argument, which is a LambdaExpression which contains a ConstantExpression which refers to... a local variable called search. It can't do anything else but reading that variable, which has the value ... "A" for both, as it reads the same variable. So it's not storing the constant value search has when the call to Where and Contains is made, it's storing a reference to the local variable.
How to fix this? It's pretty straight forward: create a new local variable:
foreach(var search in searchCriteria)
{
var searchTerm = search;
customers = customers.Where(p => p.Country.Contains(searchTerm));
}
With each iteration, it creates a new local variable, and thus each Contains call will refer to a different variable and thus the SQL query will contain the two LIKE predicates the way it should, one with %U% and one with %A%.
This subtle issue pops up with Linq to Objects as well, so beware when you pass the foreach loop variable to a Linq extension method: if the query doesn't run at that same spot, you likely will run into this problem and will have an obscure bug to track down.
Happy hunting 
By popular demand, I've published the C# source code of my Multi-value Dictionary class, which can also merge dictionaries into itself and which implements ILookup<T, V> as well. It's part of Algorithmia, our upcoming data-structure and algorithm library which will ship with LLBLGen Pro v3.0 later this year. The code is released under the BSD2 license, see the enclosed readme.txt. The class comes with its own general purpose Grouping<T, V> class as well and of course its own ToMultiValueDictionary() extension method.
I hope this is useful to others. 
Update: it seems that if you run a Linq query (Linq to objects) over the MultiValueDictionary, the compiler and intellisense get confused as there are now two enumerators and both work with the linq operators, which means you either want to remove the ILookup code from the class (which is not that hard) or explicitly state the generic arguments. It's not a big problem, though in case you run into this problem, you know the reason.
This morning I ran into an interesting design decision. The problem at hand isn't that interesting, I've solved it a lot of times before. The interesting thing is that this problem isn't always solved the same way. It goes like this: do you tell an element which is inside a container (which can be inside another container) to exclude (remove) itself from its container or do you tell the container to exclude (remove) the element? This might sound simple enough, but what is the right thing to do here? And if one is chosen, on what ground is that approach the right thing and is that always the case, no matter what the scenario might be? No, "It depends" doesn't cut it, for the sole reason that every single day probably millions of developers around the world are, in any state of desperation, searching for the right thing to do, be it for this or other problems. Check the various Q&A sites, the various newsgroups and above all, the wide range of developer blogs, articles and twitter channels, and you'll see that a lot has been, is and will be discussed about that single concept: the right thing.
When I was confronted with the decision outlined above (more on that below), I wondered how a developer with decades of experience in the trenches like myself still has to wonder about this somewhat small decision and isn't capable of instantly choosing one over the other. Is it the big fear deep in all of us that if we make the wrong decision it might haunt us and eventually will bring us down? Looking at myself, with the massive code base this decision will be part of taken into account, it does bug my mind: if I pick the wrong decision, it might hurt the system, my company, and everyone depending on that. If I have these kind of questions, there must be others with the same question, wondering the same thing: what's the right thing to do. Looking at all the blogs, articles, answers given to similar questions on various Q&A sites, indeed, there are many many people out there wondering that same thing: either showing that by giving advice how to do the right thing (dear reader, if you now start to wonder if this is a recursive blog post, you're probably right), or by asking what it might be.
So I wondered: isn't this quest to find what the right thing is actually haunting our profession and why exactly is this? Why do we care so much? And more importantly: can we ourselves solve this?
This post was partly triggered also by a blog post I read this morning by Patrick Smacchia, where he shows that a toolkit written by Jeremy D. Miller called StructureMap has cyclic dependencies between namespaces and Patrick tries to make a case that this kind of coupling is apparently not that great to have. Reading the post I wondered why anyone would give a hoot about such a thing. Don't get me wrong, I like solidly written software which allows great maintainability and extensibility without a lot of effort, but I couldn't help wondering who decides what's right and wrong and why we should care about these kind of 'rules'. A lot of these rules make sense simply because they are based on common sense, however I still have the feeling that the vast majority of these rules only work in a given scenario, however in many situations the boundaries of these scenario's are omitted, be it deliberately or by mistake. The pitfall is that if these scenario boundaries aren't given, the rule at hand starts to look like a rule which can be applied always as it apparently is a rule which is one of the ones based on common sense and is the right thing to do, as it has no boundaries/scenario given where it does work so it should work always.
With the rewrite of LLBLGen Pro's designer for v3.0 using .NET 3.5, I'm trying to do some things differently compared to what I did in v2.x. One of these things is a completely different set of data-structures to store meta-data. These data-structures give a lot of freedom to reason about the meta-data and as everything is event/observer controlled, it's very loosely coupled. However, I too ran into cyclic dependencies of namespaces (inside a common root namespace in the same assembly). When I detected this, I wondered... "should I correct this" ? But then I thought: "Why? Will the sky fall down if I leave these few cycles in?". So I did what I always do in the case when I have to make a design decision: make a pro/con list and decide what's the better option based on that list, document that decision and why you took it (so you can always read back why a decision was made, the most important thing about design documentation), move on. In this particular case, I didn't see any advantage of refactoring the code to obey some rule which is only important if you're going to split up assemblies (which I'm not planning to do in this case) so I left it in.
You might wonder why I didn't present to you right away the pro/con list of the decision I started this post with. The main reason is that I wanted to show you that there is no such thing as the right thing if there's no context given, or better: if your situation isn't known. This is very important. Today, and in the years to come, you'll likely be exposed to articles, blog posts, books, lectures and what not, written by generous people who simply want to help you out, which will tell you what's the right thing to do. I'd like you to keep one single thought in mind, whenever you read such an article, post etc.: Does the scenario this rule, this good advice, applies to actually match my scenario? If not, be careful to apply that advice without proper thinking it through. Software engineering isn't an exact science: there's no such thing as a formula where you put something in and the result is calculated, our profession is about building an executable form of what's been described as the functionality of a system, however there are no turn-key solutions to make that possible: every situation is different.
Let's go back to the decision I started with and give it some context, a scenario, the situation it occurred in. This might help you with what you thought instantly what the right thing was when you read the first paragraph. In LLBLGen Pro v3, the user can add meta-data obtained from multiple databases to a single project and choose which elements to obtain from these databases, e.g. which tables, which schemas etc. See the screenshot below for an impression.
LLBLGen Pro v3 alpha: Step 2 of meta-data retrieval wizard.
The screenshot above shows how the user obtains the meta-data, a simple click-through wizard with some fancy selection/filtering (not shown). The ability to select elements, also raises the question: "what if I selected a couple of tables I don't want to see anymore in my project?". Or in other words: I want to be able to exclude elements (tables in this example) from the project later on without actually removing them from the database. In the scenario presented to the user this looks something like this:
LLBLGen Pro v3 alpha: Context menu (incomplete, not all features have been added yet!)
The above screenshot shows the ability to exclude the selected tables (Customer and Employee in this case) from the project. This means that the objects representing these two tables are removed from the meta-data data-structure of the Project (through a controller which asks the Project to exclude the elements at hand which will delegate the call further) and all mappings to these elements are cleared. This feature works, it removes the tables from the Tables collection in the schema object representing the schema the tables are in and as everything is observer-aware, events are raised which are picked up by mapping objects which clear themselves if they have the elements excluded as their target, all undo-able thanks to Algorithmia.
When I wanted to implement the feature on the Schema node I ran into a slight problem: a Schema isn't a Schema Element like a table, view or stored procedure, it is a container, although it is contained in a Catalog. I had to refactor the code I had to also support these different elements (catalog, schema, database meta container, which all aren't a schema element). This gave me the hint that the whole setup might not be correct and I should simply create an interface like IExcludable or something, implement that on all elements which are excludable and call an Exclude method on that. Sounds logical and feng shui-compliant with Common Sense Software Engineering (CSSE), don't you think?
However this runs into a tiny problem: does every element know its container? Is the container of a table the schema (its logical container) or the Tables collection in the schema object (its physical container). To be able to work with meta-data, it's essential that a table knows its schema. However it doesn't need to know that it's in a Tables collection. A schema knows its parent catalog, but a catalog doesn't know it's parent container as it's not something it should be aware of (it doesn't have a logical container, although it has a physical container in the Project). Is it better to tell the element that it should remove itself from its parent's container, or is it better to tell the parent that a contained element has to be removed? Example: do you tell the schema that it should remove itself (which in turn will force the schema to tell its logical container (catalog) to remove the schema) or do you tell the catalog that a schema has to be removed?
The interface idea sounds great, but requires that elements know their container (also for the catalog). It gives the freedom to implement the logic inside the elements which it is all about instead of in code outside the data-structures. On the other hand, code outside the data-structures and placed in a method in the Project class, which contains the meta-data, which controls the calls to the right parent is also tempting as it also knows the container of the catalog, something the catalog itself doesn't know. Yes, the Catalog owns the Schemas collection, but does it own the Schema elements inside the Schemas collection?
So it comes down to:
- Interface route. This route requires little code in Project, as an IExcludable is passed into a method on Project by the controller and the project simply calls the Exclude method on the object and the object has to take care of it being excluded (removed) from its parent. We've to make sure the Catalog knows its container as well which is outside its reach at the moment (other assembly) so logically not the right thing to do. Instead for the catalog, this requires some if/else code to call the container of the catalog as well.
- Manager code route. This route uses some switch/case statement in Project and based on the element to exclude, it calls the proper container to exclude the element. This looks straight forward, but places knowledge what the parent of which element is, inside the Project class instead of the element itself. This could be less ideal when for example the parent type changes and you have to hunt for all the references to that parent and change that code everywhere.
One thing to note is that, as you can see in the second screenshot, multiple target database types can be in the same project. Telling a project to exclude a given table object isn't a matter of asking the single container of catalogs to remove the given table object, one first has to find the proper database specific store which contains the catalog with that table. A table itself doesn't have this knowledge of course, its parent schema's parent catalog does, at least the ID of the database.
Still convinced what you picked as the right thing to do in the first paragraph is the right thing, or is it more complicated than what you initially thought? I know this scenario is very specific but that's precisely the point: the question presented is very simple and likely you've made this same decision a lot of times before, as I have too. Yet, the specific scenario, the specific context the decision has to be made in makes things less trivial than it initially looked like.
I don't believe in do this and it will be all right kind of advice without a firm context description, so I'm not going to give you one. Instead, I'm giving you advice on how you yourself might be able to find the right decision in similar and other situations you will run into: make a pro/con list of each alternative, eventually prioritize these pro/con items if you will, and simply look at the lists and make a decision based on them, make the decision which makes rationally the most sense, considering the pro/con lists. That's it. Make a decision, based on rational reasoning and cold hard facts and document it, implement that decision and move on. Don't let your decision be lead by what looks like a turn-key 10-step process to get it right which is applicable to any (and thus also yours) scenario/context. There's no such thing, every situation, every scenario is different, yours is too. Perhaps a guide tells you to componentize everything which will take you 2 weeks to complete and in the end 99% of the components are used by only one other component. Did you gain anything by that? It might be you actually made things more complex than the situation you had before you componentized everything. Who has to deal with that extra complexity? The people who gave you advice in a generic "Use ABC with XYZ and everything shall be great" article, or you?
In the end, what matters is that you a) made a decision and b) you documented the reason why you made that decision and didn't take one of the the alternative(s). If for example, after a year or two it turns out your decision wasn't the best, based on the knowledge you have at that moment, you can always check the design decision documentation you've made of the decision, and conclude you did make the right decision back then, but based on other, perhaps incomplete (compared to the situation after two years) knowledge/information. That's life. As my wise mother always says: "If you'd know everything up front, it's not hard anymore to get rich".
What I decided? For this particular case I chose the interface route with the if/else for Catalog in the Project method. Yes, it's perhaps not that pretty due to the if-statement but the alternative isn't that great either and based on the pro/con items of both alternatives, the interface route seems the best choice. In this context, at this moment, with the information at hand.
Should you do the same thing in the situation where you have to make the same decision? That's not a conclusion you should draw from this post, instead you should take my advice, make the pro/con list and decide for yourself what to do. It might be you make the same decision, it also might be you pick an alternative. That's ok, you have the pro/con list plus the reasoning to prove you made the right decision at that moment. That's what matters.
Yesterday I received the MVP award for C# again, thanks Microsoft! 
In July 2008 I started development on LLBLGen Pro v3's new designer. The first thing I realized was that I needed a good, solid, generic framework to base the new designer on, especially because v3 would introduce a new big feature: model-first entity model development. In short, model-first means that the user starts the designer and can build an entity model from scratch (so no meta-data available whatsoever) and create meta-data and mappings from that entity model, or modify an existing or reverse engineered model by adding new elements. So the user will edit, delete, and do other things which aren't based on any meta-data, but based on theory, thought processes and perhaps trial/error. In short: the user will make changes to a live model in memory and will try to undo and redo these changes during the process. Everywhere. Always. So undo/redo has to be present everywhere, and always in every situation. Removing an element, like an entity definition, should remove all its related and depending elements or at least make them update themselves and undo-ing that removal should restore the original state.
The framework I had in mind would need to be able to undo any edit action, any change. I also needed a new set of data-structures to store the entity model in. In v2.x of LLBLGen Pro, the entity model is stored in an 'enclosed way': if an entity E has a relationship with entity F, it has a relationship object R which is stored inside E (if F also has a relationship with E, there's a relationship object representing that relationship and which is stored in F). While this might be a natural way of storing object graphs (the graph edges are the references between the vertices), it leads to a problem: you can't reason over the entire model, as it always requires traversal of the object tree in a way where you need to dig through an object (e.g. the EntityDefinition instance which contains the instance of the entity relationship) to get to other elements. A graph object with vertices and edges (so the entities would be the vertices and the relationships would be the edges) would be easier to do reasoning over the model.
To be able to undo any change to a model, you need to have some kind of mechanism to perform the change in the first place and then simply revert the action the mechanism performed. This is solved with the Command pattern. In short, it describes a way to perform actions (the 'commands') onto a data-structure or other element and as you have described the action through a command, you can extend it to perform another action when the command action has to be 'undone' or better: rolled back. However, my v2.x code base of LLBLGen Pro doesn't use the command pattern to do its actions, as it never needed to: to get things done you call methods on objects which call other methods, set properties etc., the basic OO style of maintaining an object model in memory. Implementing everything through commands now seemed like a lot of work: imagine every property get/set action has to be done through commands so the change is undo-able, every method call made might change internal members and these changes have to be undoable as well.
Algorithmia
I decided to solve this properly and from the ground up so I started working on a separate project: Algorithmia. Algorithmia started as a .NET 3.5 class library I wrote in my spare time to learn .NET 3.5's new lambda stuff and which contained some well-known algorithms and data-structures which weren't in the .NET 3.5 BCL (or not implemented in a useful manner). So I implemented in-place sort algorithms (so these sort a data-structure in-place, not like Linq's OrderBy() methods which return a new enumerable) as extension methods, a couple of priority queues and heaps like a full Fibonacci Heap. Algorithmia seemed (and still is) perfect to add my general purpose algorithms and data-structures to, and the undo-redo algorithms and related classes are no exception.
After some long, deep thinking I realized I needed two fundamental things to meet LLBLGen Pro v3 requirements: a general purpose undo/redo mechanism and a set of data-structures, like a graph structure which are undo/redo aware. The two separate areas should have 1 thing in common: undo/redo should be transparent to the user (the developer). With transparent I mean:
someObject.Name = "Some String";
where the property set action in the statement above should be undoable (and redo-able). The traditional command pattern approach would have forced one to write such a simple statement with a command, so the action (setting the property) would be undoable by setting it again to the original value. I wanted it solved differently so I didn't need to write command calls everywhere and I also could leverage databinding for example or events and other things build into the .NET framework.
Commands, Command queues and their overlord manager.
To understand what undo/redo really means and how complex it can get, let's look at an example. Say I have a graph with two entity definitions: Customer and Order, and a one to many relationship R between them. Furthermore I map a foreign key field onto R in Order (so it points to Customer's identifying fields, which happens to be CustomerID). All nice and dandy. I feel a bit bold today and I select Customer and hit the DEL key. Obviously, the Customer entity definition is deleted from the model. But that's not enough. To remove it from the model, I have to remove it from the graph and because I do that, I have a dangling relationship (R) which has to be removed from the graph as well. If R is removed, the foreign key field in Order also has to be removed as it's based on R. Pressing DEL sounds rather complex all of a sudden.
The traditional command pattern approach suggests that you issue the action to remove Customer from the entity model graph through a command however that immediately gives a problem: what has to happen to the actions which follow immediately after the removal, like the removal of R and the foreign key field? Do we have to add these commands to the command which did the removal of Customer from the graph or not? If we don't, undo-ing the removal of Customer doesn't automatically undo the follow up actions as well, as these seem to be unrelated. But if we do add these commands to the initial command, it will create a complex piece of code which is also immediately unmaintainable as it has to know about all things which could happen after we've removed Customer from the graph.
The Command
To undo an action, you can take several approaches. For example, you could use the transactional approach where you make changes to a temporary space and finalize it when you commit the transaction. Another approach is to read the initial state right before the Do action is performed and Undo simply restores that state. I've taken the second approach as it is more flexible: there's no transaction to commit: a change is a change and it's final however, it's always undoable. What makes this easy is the introduction of lambda's in .NET 3.5. Do and Undo are simply lambda's. The command has support for lambdas which read the state before the Do lambda is called and the Undo action is simply passing in the original state into the Undo lambda and the action is undone. There are various bells and whistles added to that of course, but that's the basic idea.
The Command Queue
To be able to manage when to undo what, commands are placed in stack-like data-structures: the last command placed into the data-structure is the first to undo. However, that's a 1-dimensional data-structure. In my example above, undo-ing the removal of Customer requires the undo-ing of the removal of R and the removal of the foreign key field. So I created a Command Queue. A Command Queue is internally a Linked List (with a more flexible implementation than BCL's as concatenating these Linked Lists takes O(1) as it should instead of BCL's LinkedList class) with a simple pointer where the last command is. Commands are placed in a Command Queue, one after the other. This gives the flexibility of undo-ing and redo-ing them by simply moving a pointer along the Linked List inside the Command Queue.
To be able to undo a command which spawned other commands, I placed a Command Queue inside every Command. This gives the advantage that when I undo a command, it first calls Undo on all commands in its own queue and then it performs the Undo of itself. Undo-ing a command inside its queue could mean that that command also will perform an Undo action on several other commands first. And here we have our multi-dimensional structure we needed for the situation of our example. However it of course gives another problem: how do we get all these commands neatly nested into each other without any hassle?
The Command Queue manager
I created a thread-safe singleton class which manages the command queues, the CommandQueueManager. This manager is fairly straight forward and it's the interface for the developer to undo/redo anything, to enqueue and execute commands and to keep everything working in the right order. There are some static helper methods on the Command class to easily enqueue itself, but in general the manager is the one to talk to (ain't that always the case?
)
The bare-bones mechanism comes down to this: it has an active stack of Command queues and a command which comes in to be executed is simply placed in the command queue at the top of the stack. If a command is executed its queue is placed on top of this stack and every command that gets created while this command is executed is thus placed inside the queue of the command which originated it. When the command is done with its Do method, its queue is popped and the previous queue is now at the top of the stack, which can be the queue of the previous command or the main queue of the manager.
Scopes and threads
Singletons have the side-effect that there's just one instance at runtime, which is nice because that's the reason they're there. The downside is of course that multi-threaded applications have to deal with a shared resource and that's always a sign trouble is ahead if you're not careful. The manager is thread-safe, which means only one thread can queue and work with commands at any given time. Per thread there's also one stack, so different threads can't add commands to each other's command queues. In a way, per thread there's a unique scope. Such a scope consists of a command queue stack. It might be handy in some cases in single-threaded approaches as well: what if you want to create a boundary in which a user can undo/redo actions but when the user closes the form for example the actions are final? That requires a unique scope for that edit form. The Command Queue manager can deal with that, you simply ask for a scope with a new ID and you get it. If you ask for the scope of an ID which was already known, the scope of that ID becomes active.
Back to our example...
So let's go back to our Customer, Order, R and the foreign key field. The user selected Customer and pressed DEL. The UI controller calls into the main system and asks the Project to remove entity Customer. The Project then starts working, but what exactly does it have to do? Remember that I needed a graph to be undo/redo aware. The Entity Model is implemented using Algorithmia's graph class where entity definitions are vertices and relationships are edges (non-directed edges). The graph is undo-redo aware, it manages itself through commands. So removing the Customer entity definition from the Project is as simple as telling the graph to get rid of the Customer instance it has inside itself as a vertex. The UI controller called the single Project method through a command. That command's Command Queue was placed onto the stack of the current scope and its Do lambda was executed. All commands added to the Command Queue Manager will end up in this queue or in a nested queue, so undo-ing the removal will undo all these commands as well.
The graph removes the vertex Customer from itself through a command, which is placed inside the UI call command's queue. The graph notices a dangling edge, R. It removes it too, also through a command, and this command is also placed in that same queue. And now things start to get interesting: when R was removed from the graph, the graph called a method on the edge which raised the edge's event ElementRemoved. Is anyone listing to that? Yes, the object which is used inside the foreign key field inside Order. As the relationship has been removed (as been told through the event), the foreign key field has no purpose anymore, and has to remove itself as well. As it is placed inside a command-aware list, called CommandifiedList<T>, it simply removes itself from its container though that container does the actual removal through a command. That command ends up in... the queue of the removal of R, as that was the active command in progress and that command's queue is on the stack.
So after all this, we have a nested set of commands which we can undo, in the right order, and also which we can redo, in the right order, without the complexity of requiring command creation everywhere, being aware of which command is spawned from where... none of that at all: it's straight-forward .NET code like you and I are used to write.
Undo-ing this Customer removal starts by calling the Undo method on that command. As that command contains a queue with two commands (removal of Customer from the graph and removal of R from the graph). It starts with the last command, which is the removal of R and calls Undo on that. The removal of R command has also commands in its queue, namely the removal of the foreign key field, and starts undo-ing that command first. This makes sure everything is played back in the right order.
But what about that simple property setter example we started with? Let's look at the logic behind that simple statement and how things are made transparently undoable.
The little worker class under the hood: CommandifiedMember
The following code snippet shows a simple test class used in some unit-tests for the command functionality:
public class HelperClass
{
private enum HelperChangeType
{
Name
}
private readonly CommandifiedMember<string, HelperChangeType> _name;
public HelperClass()
{
// create a new commandifiedmember instance and set the default value to empty string
_name = new CommandifiedMember<string, HelperChangeType>("Name",
HelperChangeType.Name, string.Empty);
}
public string Name
{
get { return _name.MemberValue; }
set { _name.MemberValue = value; }
}
}
To combine a lot of functionality around a single member which was needed in a lot of cases I created a class called CommandifiedMember. CommandifiedMember does a lot of things: it sets the value of the member using commands, so setting the value is undo-able. It checks whether the value to set is equal to the current value of the member, so it doesn't issue unnecessary commands. It raises events when the value changes so observers can subscribe on these changes and act accordingly. It has awareness of interfaces which might be implemented on values set as the member value. This is important in the case of the foreign key field of our example: if the identifying field's type changes, the foreign key field's type also changes. To be aware of that, it needs a signal from the identifying field it relates to. Simply changing the identifying field's type will raise an event which will end up in the foreign key field's member which notices this as it automatically subscribed to the event as it recognized it. The member then simply raises an event so the foreign key field notices this and can act accordingly. Similar to removing the relationship R for example: R is removed, so it raises an event that it's been removed. Observers, like CommandifiedMember instances which refer to it, can now act accordingly and set themselves to null or raise an event for example.
The code snippet above doesn't show it, but there's more built in: it is also IDataErrorInfo aware. This is done through an object which is pluggable into a CommandifiedMember and which is also part of Algorithmia, called ErrorContainer. CommandifiedMember is aware of validation and calls a virtual method before it continues to call the Do action. It takes care of logging the error in the ErrorContainer and if a correct value is accepted, it clears the error accordingly. The code snippet above also shows the usage of an enum which is used for the change-type specification. This is useful if you want to use undo/redo to its full potential and implement a lot of logic through events using the Observer pattern: HelperClass could sport an ElementChanged event which propagated the HelperChangeType to its subscribers, which could then easily determine what exactly changed in HelperClass without the necessity for a lot of events and also avoiding string-based approaches like INotifyPropertyChanged.
So with the CommandifiedMember in place, I can create the following, undo/redo aware code:
HelperClass h = new HelperClass();
h.Name = "Foo";
By setting the property, I indirectly create a command which sets the actual member, compile time checked. I can undo this action by simply asking the Command Queue Manager to undo the last command. However, I'm not even aware that setting the property is an undo/redo aware affair nor do I care. I simply write code like I used to do, without the hassle of creating commands to make sure things are undoable later on: missing one spot makes some things suddenly not undoable, with the CommandifiedMember, that's not possible. As it's transparent, I can bind the Name property of an instance of HelperClass to a control and have undo/redo awareness without even writing any code: if the control sets the value of Name, it will be undoable. Of course, to make the control become aware of the fact that Name has been rolled back, I have to implement INotifyPropertyChanged on HelperClass, but that's pretty easy to do: I get an event when _name changes so I can anticipate on the change by a simple event handler:
public class BindableHelperClass : INotifyPropertyChanged
{
public event PropertyChangedEventHandler PropertyChanged;
private enum HelperChangeType
{
Name
}
private readonly CommandifiedMember<string, HelperChangeType> _name;
public BindableHelperClass()
{
_name = new CommandifiedMember<string, HelperChangeType>("Name", HelperChangeType.Name, string.Empty);
_name.ValueChanged += new EventHandler<MemberChangedEventArgs<HelperChangeType, string>>(_name_ValueChanged);
}
private void OnPropertyChanged(string propertyName)
{
if(this.PropertyChanged!=null)
{
this.PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
}
}
private void _name_ValueChanged(object sender,
MemberChangedEventArgs<HelperChangeType, string> e)
{
switch(e.TypeOfChange)
{
case HelperChangeType.Name:
OnPropertyChanged("Name");
break;
}
}
public string Name
{
get { return _name.MemberValue; }
set { _name.MemberValue = value; }
}
}
I introduced a switch for checking on the change type, which is a little overkill as there's just 1 member, but you get the idea. It's not really more code than one would write in the case of a simple normal class, however you get value checking, event raising, undo/redo etc. all for free. Binding the Name property of an instance of this class to a control, say a TextBox, will make it possible to edit this instance with undo/redo awareness.
So where does this 'Paradox' of the title come into play exactly? Well I think you now know enough information to understand the following example of it.
The Undo/Redo 'Paradox'
The Undo/Redo 'paradox' as I dubbed it (probably a bad name, so forgive me), is the contradiction between what the user thinks what's being undone and what the system thinks the user means that should be undone. I've specified 'paradox' in quotes as sometimes people call things a paradox while they clearly aren't a paradox and I'm not yet sure if this is a true paradox, though I have a feeling it unfortunately is.
I've created a real-life example of the paradox in the following screenshot. It's a screenshot of a part of the LLBLGen Pro v3 GUI (where I moved everything close together so it fits in a tiny area):
There's a lot of info in this tiny screenshot and I'll describe briefly what's important to understand the problem. The project shown is a dummy test project with a couple of random entities. At the left you'll see the Project Explorer which shows the groups, the entities, the value types and the typed lists (some elements are still not there, in the case you're missing something. It's not done yet
). At the right of the Project Explorer you see the editor for the Customer entity which is a subtype of Person, and below it a debug panel for the command queue manager where I can see which commands are in the queue and inside which other commands they're stored. As you can see, after I've loaded the project, I created a typed list called Test which spawned one command, the addition of adding a new item to a CommandifiedList. The arrow suggests it's the current command, so pressing cntrl-Z or clicking Undo in the toolbar will undo that command.
So, what's the problem? Well, it's at the top: I typed a space in the entity name and tabbed away from the textbox. The validator plugged into the CommandifiedMember kicked in and denied the value and reported an error: names can't have spaces. So the cursor stays in that textbox.
What will happen if I press cntrl-Z or click Undo? Will that undo the change I made inside the textbox by undoing the insertion of the space, or will it undo the last command it knows, creating the typed list?
The 'paradox' is that the system isn't aware of any command setting the Entity Name to an invalid value (as that would make the project become erroneous: what if I entered a name which is already taken?) however the user is. The textbox has a cntrl-Z mechanism, where pressing cntrl-Z will undo the changes in the textbox, which in the case above would remove the inserted space and everything would be normal. However, what does the user mean: local undo or global undo when issuing the undo command and when are local undo's all of a sudden global undo's?
In general: there's a global undo/redo system with a global access mechanism (cntrl-Z/cntrl-Y) and there are two different scopes in play: the local editor scope and the global model scope: issuing an Undo action raises the question: do you want to undo a local action which might not be propagated to the global model scope (e.g. the change hasn't been processed yet) or do you want to undo the last change at the global model scope level? This isn't an easy question to answer, as I hope to illustrate in the explanations below.
A perhaps more well-known example of this problem is the issue you run into with the Windows Forms designer and after that when you change code in the form class: after you've made some changes to a form in design view, you switch to the class and add some code, like a member declaration. Then press cntrl-Z a couple of times till you've undone all your changes to the code and you'll likely see a message box pop up which tells you that you can undo one last thing which can't be redone. Why is that?
It's the same issue: suddenly the local scope you were working in (the code editor) has no more commands to undo and pressing cntrl-Z again then raises the question: does the user want to undo more commands in the editor (though there aren't any left) or does the user want to undo things on a model /global scale, like the changes made to the design view of the form? That's unclear and can't be solved by the undo/redo system by itself: perhaps the user simply only wants to undo/redo the changes in the editor (like the textbox or code editor) and stop undo-ing commands if there aren't any left, in that scope. However, perhaps the user wanted to undo things on a global scale after all commands in the local scope are undone and to do that the user has to leave the editor to signal that the undo action is not for a local scope. This is of course confusing and unclear for a user as the user isn't aware of the length of command queues or even local / global scopes.
In the specific situation of the screenshot above, there are a couple of obvious things which one might want to try to solve this paradox, like disabling the global undo/redo mechanism when an error occurs, however that doesn't solve the situation where I don't create an error but simply append a couple of characters to the name and then press cntrl-Z. One could think of introducing a scope used only for the textbox, but it then gets tricky to get rid of that scope once the value is indeed valid as that action has to be in the global scope to be able to be undone on a global scale (so I don't have to go back to the textbox to undo the name change). Another solution might be to store the invalid value in the model and simply use the mechanism available so pressing cntrl-Z will undo the change which caused the error. The downside is that if the user presses cntrl-S after the change, the erroneous value is saved which could cause a problem, for example if the file format is in an XML format and elements are referenced by name, so what happens if I specify a name which is already in use, which is an error trapped by the validator, however I still save the project?
I can't find a simple solution for this 'paradox', and I fear there isn't one either, but perhaps some solution pops up soon.
LLBLGen Pro v3.0 is slated for release later this summer/autumn, with support for LLBLGen Pro Runtime Framework, Entity Framework, NHibernate and Linq to Sql, and Algorithmia is shipped with LLBLGen Pro v3, very likely in sourcecode form and a flexible license so you can use it in your own applications as well.
(Sorry English speaking visitor, this post is in Dutch, as it's about a Dutch user group meeting)
In Nederland hebben we een aantal gebruikersgroepen die op gezette tijden meetings organiseren voor developers. Aan dit aantal is een nieuwe toegevoegd, Devnology (http://www.devnology.nl). Devnology is niet zozeer gericht op het houden van meetings waarbij 1 persoon een praatje houdt en de rest poogt niet in slaap te sukkelen, het is meer gericht op discussie en interactie tussen developers, samen bezig zijn met code, software engineering en andere aan ons vak gerelateerde zaken. Ook is Devnology niet gelimiteerd tot .NET alleen maar zijn andere talen en platforms even welkom. Het gaat tenslotte om software engineering en niet om de laatste truuks voor een random MS product.
Op 1 april houdt Devnology haar eerste meeting, een Code Fest: je krijgt vooraf een opdracht en wie de beste implementatie maakt wint. De opdracht deze keer is: programmeer de Game of Life, een bekend concept, en de keuze van platform en taal is vrij. Deze opzet komt me bekend voor van vroeger uit de demoscene, waar op demo-parties ook dit soort compo's gehouden werden. Het leuke aan dit soort dingen is dat enerzijds de competitie je toch dwingt je best te doen en anderzijds de discussies met mede-developers ter plaatse altijd wel wat nieuwe kennis en info opleveren waar je wat aan hebt.
De opdracht van Game of Life is op het eerste gezicht wellicht wat een dood spoor maar je kunt op zoveel manieren dit probleem aanpakken dat het nadenken daarover, het uitpuzzelen welke benadering juist die originele oplossing oplevert, juist je tot ideeen kan brengen waar je als software engineer baat bij hebt, in je vak en dus in je dagelijks werk.
Devnology gaat in juni ook een Open Spaces meeting houden en daar kijk ik nu al naar uit.
Ik ben van de partij op 1 april. Zie jullie daar!
Update: I've reworded a sentence as it was too vague. Sorry for that.
Here's a simple performance tip which can benefit you without doing any effort. Linq to Objects has two methods to combine two sequences together, both with different characteristics: Union() and Concat(). The difference in characteristics makes it possible to gain performance without doing anything difficult. Let's look at a simple example first:
Say we have two lists of integers: A: {1, 2, 3, 4} and B: {1, 2, 5, 6}. When using A.Union(B), a set union is executed, which results in { 1, 2, 3, 4, 5, 6}. When A.Concat(B) is used, the sequences are simply concatenated and { 1, 2, 3, 4, 1, 2, 5, 6} is the result. Pretty straight forward stuff. If you do not want duplicates in the second sequence to appear in the resulting sequence, Union() is necessary. However, in the case where it's impossible to have duplicates in the second sequence or you don't care if duplicates in the second sequence appear in the resulting sequence, Concat() is a better choice.
It seems obvious that Union() is more performance intensive than Concat(): Contact() simply makes sure the enumerator returned enumerates over the two sequences, Union() filters out duplicates in the second sequence. If your sequences have a lot of elements, using Union() will make the operation become significantly slower.
In the past 8 months I've written a lot of Linq to Objects queries and today I saw:
/// <summary>
/// Gets the entity mapping targets in this meta-data store
/// </summary>
/// <returns>all tables/views, ordered by catalogname/schemaname/tablename unioned with
/// all views ordered by catalogname/schemaname/viewname</returns>
internal IEnumerable<IEntityMapTargetElement> GetEntityMappingTargets()
{
return from c in this.PopulatedCatalogs
from s in c.Schemas
from e in s.Tables.Cast<IEntityMapTargetElement>()
.Union(s.Views.Cast<IEntityMapTargetElement>())
orderby c.CatalogName ascending, s.SchemaOwner ascending, e.Name ascending
select e;
}
It turned out I happened to have used Union() in many cases in the code where two sequences had to be merged into one sequence, however it was impossible to have duplicates in the second sequences in these queries. Must be an old strain of SQL-itis, I think: "Oh I have two sets to combine to one set: UNION". However, in the query above, it's not possible to have duplicates in the second sequence: there aren't views in the set of Tables and vice versa. So this same query could be written with a Concat(), saving performance as the second set doesn't have to be filtered from duplicates.
If you too have the habit to use .Union() to combine sequences, pay attention to that second sequence: if it can't have duplicates (make sure it also doesn't contain duplicates in the future!), it's better to use Concat() instead of Union().
I just ran into a weird issue. During profiling I saw that controls on a form which was already closed were still reacting to events. I checked whether the Dispose() routine of the particular Form was called, but it wasn't. However, the Dispose() routine of other forms was called after it was closed, as in: immediately.
The difference between the two situations was that if I used Form.ShowDialog(parentForm), a call to Close() on the particular form didn't call Dispose. Checking the Form.Close() documentation describes this behavior:
The two conditions when a form is not disposed on Close is when (1) it is part of a multiple-document interface (MDI) application, and the form is not visible; and (2) you have displayed the form using ShowDialog. In these cases, you will need to call Dispose manually to mark all of the form's controls for garbage collection.
I never knew that. It's easy to overlook, as opening a form with Show() will result in a call to Dispose when Close() is called. Not calling Dispose (or better: wrap the Form usage in a using block) will lead to a memory leak and worse: could lead to hard-to-find bugs because event handlers aren't cleaned up.
So just in case you use ShowDialog() or ShowDialog(form) to show modal dialogs in winforms, be aware that you've to call Dispose() yourself.
Have you ever ran into database tables with a field which is used to mark if a row has been 'deleted' ? Probably. These fields are used to implement 'soft-deletes'. For the soft-delete impaired, a quick introduction. Soft-deletes are row deletes which are not really happening: instead of removing the row from the database table, a field in the row is set to a value which signals that the row is officially 'deleted'. Every SELECT statement on that table is then filtering on that field's value so only rows which aren't marked as 'deleted' are returned (as the deleted data is not there anymore, semantically).
If this sounds rather awkward, it is. However, there are people who insist in having soft-deletes instead of real deletes, because it allows them to go back in time, to look back at the data that was deleted, as all data is, well... , still there. A small group of those people even believes that soft-deletes allows them to roll-back to deleted data, a kind of 'undo' facility.
The truth is, soft-deletes using status fields in rows is a bad solution to the real problem. Fortunately there are alternatives.
First let's have to look at why people would want soft-deletes. In general there are two reasons, which are already mentioned above: to be able to look at deleted data and to be able to roll-back to deleted data. Let's discuss the second reason first: rolling back to deleted data.
Roll-back to deleted data is hard.
Let's use Northwind as our example database. Let's say we don't delete rows from that database, but flag them as 'deleted' using a new field, IsDeleted (bit), added to every table. If you want to roll-back a deleted Order row, it looks as simple as setting the 'IsDeleted' field to 0, right? Though, what if that Order row refers to a deleted Customer row using its CustomerID foreign key? For the RDBMS, it's OK, as the 'deleted' Customer row is still in the Customers table, it just has its IsDeleted field set to 1. However, executing a SELECT statement which fetches the just recovered Order row with its Customer row will run into a problem: the Customer isn't technically there: the mandatory IsDeleted filter prohibits that the Customer row is showing up. The only solution to this is to also recover the deleted Customer row. Order might have had OrderDetail rows as well, which requires the OrderDetail rows to be recovered as well.
In short, recovering a row is not what's this is all about, it's about recovering a graph. Recovering graphs instead of table rows is much more complicated, due to the dependencies between the involved entity instances (rows). Writing roll-back code for this is therefore likely a complex affair, especially if you want to make sure the data-integrity of the working set is still 100% correct (so all rows involved in the recovered graph indeed have their IsDeleted flag set to 0 and are part of the working set). In the end you'll run into issues where rows have to be merged, similar to source-control systems (e.g. in the situation where a row becomes deleted several times in different graphs). So rolling back graphs is not likely going to be implemented in the average system, it's therefore not the main reason for soft-deletes.
On to the first reason, looking back at old data.
Old data is old for a reason.
When a system deletes rows from a table, it's hopefully done inside a transaction, so when something goes wrong, the delete is 'undone'. When the transaction completes, the delete is final, and the data is gone. If you ever would want to look at the old data (the data you just deleted) again, you can't, it's gone. Let's ignore that some people can't throw anything away ("I might want to look at it in 2 years, then what!? <wild panic>") and focus on what 'delete' actually means: it means that the data is considered 'no longer needed' and therefore removed from the working set. If the data was necessary after the delete, don't execute the delete in the first place, it's the same with your file system really.
However, when 'data is no longer needed' doesn't mean it is totally worthless in all situations: there's a situation where old data could be useful: for reports on history of an entity, when happened what, etc.. Though when will those reports be ran? Every 5 minutes? Will the average user of the system look at historic data all day long or work with the actual working set? The answer to these questions is likely the same: functionality consuming historic (deleted) data is rarely used compared to the functionality consuming working set (not deleted) data.
Let it be clear that it sometimes is required for legal reasons not to toss away data, however in other situations the same requirement actually lead to different solutions: old email is archived in archives and not kept in the in-box. Is that solution useful for this situation as well?
Implications of using soft-deletes
So the main reasons why some people want this is clear, however what are the implications when soft-deletes are used? Below I've mentioned a couple, though I'm sure there are more, though I think the list below is already convincing enough to look at another solution instead.
- Queries become very complex. Make no mistake, once you introduce a field to signal if a row has been deleted, you have to make sure every table reference is accompanied by a filter on that IsDeleted status field. Every table reference. If you forget one, it's over: your data is then officially not correct. Maintainability will become more cumbersome, and as most time on a software project is spend on maintenance, it's something which will hurt the project, hard.
- Queries become slower over time. Every 'deleted' row is still there, and for the database system the row is just an ordinary row like any other. This means that DML operations on the rows will become slower, but especially SELECT statements will become slower over time: the percentage of rows which are 'live' of the total number of rows in the table is getting smaller, as more and more rows will become 'deleted'. This baggage could hurt in the long run, especially in tables with a lot of inserts/deletes: the working set might stay the same (e.g. 10K rows) but the total set of rows grows every day so the total number of rows might be millions. When you have millions of rows in the table while the actual set of rows which are 'not deleted' is a small percentage of that, it will influence the performance of queries dramatically. Compare that to a table with the actual set of rows you've to work with.
- Using constraints (UC, FK) is impossible. Using a unique constraint (UC) is not really possible, as the RDBMS will take into account the rows which are 'deleted' as well. So a value might be unique for the rows in the working set, but for the total set of rows in the table it doesn't have to be unique and the update or insert fails. In our example above with Customer and Order, we've seen that foreign key constraints are not really working anymore either, as they don't protect us from 'deleting' a PK side while keeping the FK side: for the RDBMS, both rows are still there and everything is fine (which is not the case)
A better solution to these requirements
There's a better solution, and I've already mentioned it briefly: archiving. RDBMS's (I assume you're using a proper, professional, solid ACID compliant database, not some toy RDBMS) usually sport a system called triggers. Triggers are neat things: they get called when something happens. You can compare them to event handlers really: an event occurs (e.g. a row gets deleted from a table) and the trigger responsible for handling that event is called. Additionally, the trigger is called in the same transaction as the code which triggered the trigger. So all actions taken inside a trigger are rolled back when the transaction containing the code which triggered the trigger is rolled back.
If you look closely at the two main reasons for soft-deletes, you'll recognize that both can be satisfied with simply keeping the data around which is deleted by DELETE statements: looking back at old, deleted data is possible wherever the old data is located and rolling back deleted data is not going to be less complex when the data is located elsewhere, as the complexity is in the graph rollback mechanism, not the location of the data to rollback to.
This leads to the solution of an archiving database. An archiving database is a catalog (or schema if you wish) which contains the same table definitions as the real database, with perhaps no UC or FK constraints, as data integrity is implied by the data integrity of the source data (after all, it's just for archiving data consistent data). Every table of which you want to keep deleted data around in the real database now gets a DELETE trigger which simply grabs the row(s) deleted and inserts them in the same table in the archiving database. If the transaction fails, the inserted rows in the archiving database roll back too, if the transaction succeeds, the data is successfully archived and still available. Additionally, you could add date/time fields to the rows in the archiving database to store exact dates and times when the row was deleted, the trigger can insert these values when the deleted row is inserted.
This makes sure the data is still available, so the reasons why people want this are still met, though it doesn't pollute the working set for the application anymore, and the implications of soft-deletes are gone. The only thing you've to be sure of is that the triggers and the archive database are maintained together with the real database (so schema changes in the real database are applied to the archive database as well, or you could go overboard and add a new archive database!). However, that's a small price to pay compared to the overly complicated queries one has to write (even with O/R mappers) and work with, queries which also have to be maintained and documented for the length the application is in use. Using the triggers and the archive database, the application can be written normally, can be tested normally, and no data is thrown away. Ever. One could extend this system with an UPDATE trigger as well, so updates are also tracked, so value deletes on the field level could be tracked as well.
So do yourself a favor, next time someone tells you to use soft-deletes, discuss the implications and offer this alternative solution. Everyone will be better off: you, the customer, and the group of people who will maintain the system for the next 20 years.
To all my readers, and everyone else: I hope you all have a great, productive, healthy, awesome 2009!
For me personally, 2009 will be a big year with the release of LLBLGen Pro v3, which I think will be a serious milestone in my vision about how people should approach the generic problem of Data Access in software. It will be a while before I can show anything, but I'm sure it will be worth the wait. 
More Posts
Next page »