Choose .Concat() over .Union() if possible

Update: I've reworded a sentence as it was too vague. Sorry for that.

Here's a simple performance tip which can benefit you without doing any effort. Linq to Objects has two methods to combine two sequences together, both with different characteristics: Union() and Concat(). The difference in characteristics makes it possible to gain performance without doing anything difficult. Let's look at a simple example first:

Say we have two lists of integers: A: {1, 2, 3, 4} and B: {1, 2, 5, 6}. When using A.Union(B), a set union is executed, which results in { 1, 2, 3, 4, 5, 6}. When A.Concat(B) is used, the sequences are simply concatenated and { 1, 2, 3, 4, 1, 2, 5, 6} is the result. Pretty straight forward stuff. If you do not want duplicates in the second sequence to appear in the resulting sequence, Union() is necessary. However, in the case where it's impossible to have duplicates in the second sequence or you don't care if duplicates in the second sequence appear in the resulting sequence, Concat() is a better choice.

It seems obvious that Union() is more performance intensive than Concat(): Contact() simply makes sure the enumerator returned enumerates over the two sequences, Union() filters out duplicates in the second sequence. If your sequences have a lot of elements, using Union() will make the operation become significantly slower.

In the past 8 months I've written a lot of Linq to Objects queries and today I saw:

/// <summary>
/// Gets the entity mapping targets in this meta-data store
/// </summary>
/// <returns>all tables/views, ordered by catalogname/schemaname/tablename unioned with 
/// all views ordered by catalogname/schemaname/viewname</returns>
internal IEnumerable<IEntityMapTargetElement> GetEntityMappingTargets()
{
    return from c in this.PopulatedCatalogs
           from s in c.Schemas
           from e in s.Tables.Cast<IEntityMapTargetElement>()
                     .Union(s.Views.Cast<IEntityMapTargetElement>())
           orderby c.CatalogName ascending, s.SchemaOwner ascending, e.Name ascending
           select e;
}

It turned out I happened to have used Union() in many cases in the code where two sequences had to be merged into one sequence, however it was impossible to have duplicates in the second sequences in these queries. Must be an old strain of SQL-itis, I think: "Oh I have two sets to combine to one set: UNION". However, in the query above, it's not possible to have duplicates in the second sequence: there aren't views in the set of Tables and vice versa. So this same query could be written with a Concat(), saving performance as the second set doesn't have to be filtered from duplicates.

If you too have the habit to use .Union() to combine sequences, pay attention to that second sequence: if it can't have duplicates (make sure it also doesn't contain duplicates in the future!), it's better to use Concat() instead of Union().

9 Comments

Comments have been disabled for this content.