Recently I received an email with the following contents:
I recently started working at a startup. I'm learning lots of cool stuff, including unit testing, but as a math major something kinda nagged at me about them. I don't know if I could write a unit test that I really trust all that much. It's better than manually poking my code to be sure, but I felt there might be a better way. Fast forward to an article posted in ycombinator which gave me the knowledge that there are actually people who prove their code correct. I stumbled upon a blog you wrote, in 2009 and have found a few more from around the same time, but I was wondering if you were familiar with any literature on best practices, application, etc. I have some guesses as to how I personally would start to go about attempting proving code correctness, but I find it useful to have books written by people who implement this practice in their day to day work lives. Thank you for your time
Proving code correctness of a whole system is hard. It's so hard in fact that for a large system it's undoable. So should one give up on that altogether? Well... no not exactly, as there's a different approach to it: do it in two steps. It's based on the idea that a unit test actually tests at least two things, not one:
- A unit test tests whether the algorithm implemented functions correctly with the input specified
- A unit test tests whether the implementation of the algorithm is correct.
If a test fails, it can be because there's a bug in the code (e.g. the code uses the wrong index variable) or a bug in the algorithm (e.g. it fails to reach the desired outcome with a given input set) or both (and then you're in real trouble ;)). Because if a test fails it can mean more than one thing, it's good to weed out all possibilities but one: that way, when a test or verification fails, we know it's always because of the same reason and never because of another. So if we for example eliminate errors in the algorithm, we can be sure that if a test fails, the implementation of the algorithm is borked, the algorithm itself is OK.
The 'easiest' thing to eliminate from your tests is algorithm correctness testing. I say 'easiest', because formulating an algorithm is often easier than writing it down in an executable form, as that comes down to projecting it after you formally written it out in your head. To prove an algorithm to be correct is a task which is ranging from trivial to extremely difficult and it's part of what's called Formal Verification. Formal verification is, like its name suggests, rather formal, so if you're not into math that much, it can be daunting. So I'll describe a different approach which, in my humble opinion, is rooted onto the same ideas as formal verification, albeit not using formal math.
The overall idea is this:
- For a given feature, a specification is written (be it a user story or a thick, 120 page document written by an over-paid consultant) which describes what the feature should do with what input.
- To be able to realize the feature as stated in the specification, algorithms and data-structures on which they operate are designed. No coding is done yet. You'll see that during these first two steps, changes happen often and it's very easy to apply them, much easier than when altering code.
- Using verification techniques, the algorithms are proven to be correct for the expected input.
- Using Program derivation the algorithms are implemented into an executable form. As Program derivation can be very formal, it's often the case a developer falls back to Design by Contract or a less formal way to transform an algorithm into executable form.
The core idea is that if the algorithm is correct, a carefully constructed implementation of it as executable form is therefore correct too. At least for the algorithm part. The implementation of the algorithm can still contain bugs: wrong statements, wrong variable usage, boundaries of used constructs aren't taken into account etc.
One thing that is crucial is that testing and proving are both after the same goal: to make you realize whether what you've written is doing what it should (namely what the spec says!) or not. Using proving techniques is less time consuming in many cases as you can reason about all possible situations in one go, while with unit tests, you test for a subset of situations (the one implemented with the test) which can result in many many tests if there are many different situations the code has to work correctly with. On top of that, unit tests without algorithm proving can still fail for two reasons (algorithm error or code error) instead of one (code error).
How I use this in practice
I'm not mathematician, so I am not using math to prove whether my algorithms are correct. I use pre-/post conditions per step and formal algorithm descriptions to write down the algorithms. I also use per feature a description about what the feature has to do with what input. After all, if there's no description what a feature should do with what input, there's no way to say whether the code written does what it should do, as there's no description of what it should do in the first place.
On a whiteboard I use pre/post condition proving of the algorithm and change it if necessary. If I'm satisfied, I write down the algorithm steps in comments in the code and implement the data-structures required, or look for existing data-structures whether they fit the algorithm. After that I implement each step, either in-place where the comment is placed or in its own method/class.
When I'm done, and the compiler thinks I'm done too, I go back to what I've written and start reading the code again. This is something that should be taken very seriously: humans suck at reading code, so when you read code, it's essential you pay extra attention to what everything does, what the state of things is, etc. It helps to run the code through a debugger if you're not skilled in reading code (developers often 'glance' over code, not read it) to see what it does with each step.
While reading the code, I match the code parts of each step with the algorithm step it implements, check whether pre-/post conditions are indeed taken into account. In short this means: did I implement the algorithm step as intended or did I cut corners? Remember, this verification isn't done to check whether the algorithm works or not! It's meant to see whether the implementation of the algorithm is done correctly. We already proved the algorithm to be correct before we started typing one line of code.
It's important not to get cocky here: it's nice to show off your OO skills and how clever you can intertwine these silly loops, but doing so should never ruin the connection between algorithm description and code itself: it's very important for a person who reads the code to know what bizarre algorithm it implements, so changes can be made accordingly if the algorithm changes after a while.
At this point I have:
- A description of the feature, which is the basis of proving whether the code implementing the feature does what it should
- A series of algorithm descriptions, which are proven to be correct.
- Code which implements the algorithms for the feature
- Data-structures which are needed for the algorithms.
And now I run the code to see what I read and thought to be correct is actually also correct. Think of this as integration testing, as the code is ran in-system, with all the other code. Running the code can mean running a few unit tests, if the feature is a subsystem which is UI-less or which is used later on by other subsystems. The unit tests are focused on what the specification describes as valid inputs (so you also know what invalid inputs are).
"So your code is bug-free?"
In short: not always ;). As all these steps take time, it's sometimes not possible to do every step in the time it requires but has to be done in the time that's available. This leads to some corners being cut for things which seem trivial, but later on turn out to be less trivial. However what I see in our code base is that the parts on average are relatively bug free, and that with a minimum of unit tests. Our runtime test base is only a few thousand unit tests, which is rather small for a code base of about 100K lines. The designer, which is also over 100K lines has only 50 unit tests for core sub systems and has had only 25 bug fixes in the past year. We achieved this by using a solid foundation with proven code, namely our Algorithmia library (open source algorithm and data-structures library).
There are more aspects about making your code bug-free: separation of logic/code so changing one part doesn't affect everything else, clean code so it's easy to understand and maintain, logical placement of functionality, so you can easily find things back and see what affects what as understanding a code base makes making changes easier and less error prone.
There are always a chance that you break something when you make a change. For this, unit tests can help (but are not a way to be 100% certain!): write integration tests. You can design the tests specifically for detecting breaking changes. A lot of our runtime unit tests are designed for just that: fail if a sub-system got changed in such a way it affects the consumers of that sub-system.
Conclusion
I hope to have given some insight in how to write more solid code, namely by starting with a solid root where the code is coming from: what should it do? what algorithms are used and are these correct? Are the algorithms implemented correctly?
Many algorithms have already been proven, so you can pick these up and implement them right away (if someone hasn't done that for you as well). After all, if what you have to implement sucks, your program will never work correctly, even if the code doesn't contain a single programmer error. You can't avoid all mistakes, but it's doable to make the amount of bugs small.
We have released v1.0 of our new product, ORM Profiler! ORM Profiler is a tool for data-access analysis, a must-have if you really want to know what's going on inside your application when it accesses the database. See this earlier blogpost for a more detailed overview or visit the website and grab the trial to test it out :)
Introduction price is €99, LLBLGen Pro customers pay €69.
Thanks to all the beta-testers and other people for their support, time and effort to make this a great release!
To celebrate today's release of ORM Profiler has been released today we lowered the LLBLGen Pro license price for the month October (starting today ;)) with €100 to €199!
Get it while it's hot! :)
Normally I don't use this blog to post short messages with a link to other blogposts, but as this blogpost is worth reading, I make an exceptions ;)
Matt Cowan has written an in-depth post about using LLBLGen Pro with Entity Framework to create a repository using system. Highly recommended for every LLBLGen Pro and/or Entity Framework user!
The past few months, I've spent developing (together with Walaa Atef) a general ORM / Data-access profiler for .NET, simply called ORM Profiler. (How's that for a pronounceable name, eh? ;)) As we're now feature complete and about to start the beta-cycle, we're looking for beta-testers. To help you make the decision to test ORM Profiler out, we'll give every beta-tester who is accepted a free full license for ORM Profiler! We're currently looking for 50 beta-testers: 25 LLBLGen Pro users and 25 users of other ORM frameworks/data-access layers. Please read below how to get accepted to the beta and whether your ORM / data-access logic of choice is currently supported by ORM Profiler.
First, let me briefly describe what ORM Profiler does.
ORM Profiler profiles data-access code, which means it intercepts ADO.NET calls, profiles the activity and does analysis on the gathered data. To do so, it places itself between your ORM / data-access code and the DbProviderFactory instance(s) your application use(s). To enable profiling in your application's database activity, you call a single line to register the profiler's interceptor and from then on the ADO.NET activity and queries are logged and profiled.
The data gathered during profiling is stored in a snapshot, which can be filtered and analyzed using many filters and analysis functions. A visual client offers a handy UI for analysis and viewing the profiled data and for example allows you to pull query execution plans from the database.
So in short: It allows you to gain insight in what your data-access code is doing, find performance problems and quickly learn how to fix them.
I'll now briefly go into some questions I think will be asked a lot in the coming weeks. After that I'll describe the beta-process and how to get into the beta.
Short F.A.Q.:
- Will ORM Profiler be a free product?
No, ORM Profiler will be a commercial product, similar to our other product LLBLGen Pro. Introduction pricing will be as follows:
- LLBLGen Pro customers: €69.-
- Bundle with 1 new LLBLGen Pro license: €69.- (+ €299.- for LLBLGen Pro)
- Non-LLBLGen Pro customers: €99.-
Support is free, as well as upgrades within the v1.x version period. - When will ORM Profiler be released?
ORM Profiler will likely be released at the end of September 2011. - Which frameworks are currently supported?
LLBLGen Pro v3.x runtime framework, Entity Framework v1, Entity Framework v4.x, Massive, Dapper.NET, Linq to SQL (with a few lines of manual-coding). We're currently testing SubSonic and expect it to be supported as well. We also plan to test Microsoft.Data (coming with WebMatrix) before release. Any ORM / data-access layer which uses DbProviderFactory will work, as long as it doesn't expect a specific ADO.NET provider assembly reference. - Which databases are currently supported?
Any database supported by the supported O/R mapper frameworks / data-access layers, such as: MS Access, SQL Server, Oracle, PostgreSql, Firebird, IBM DB2, MySql, Sybase Adaptive Server Enterprise (ASE), Sybase SQL iAnywhere (ASA), SQLite, VistaDB, as well as any database accessed through ODBC or OleDB (via DbProviderFactory). - My application uses only stored procedures, can it profile my application?
As long as you call your stored procedures with code which uses DbProviderFactory to create connections and commands, it will profile the calls to stored procedures and other ADO.NET activity. - How can I profile an application?
You can profile an application through code (manually creating a snapshot and start recording), from the command line (e.g. from .cmd files) either interactive or non-interactive, and from the Visual UI. - Is enabling profiling in my application intrusive? Do I have to add calls all over the place?
No, you have to do just two things:
- reference the interceptor assembly from your main application
- call the interceptor initialization method.
That's it. (For Linq to Sql, you have to pass a connection object, which is a bit more intrusive. We hope to have this more streamlined in a future update) - Does ORM Profiler support multiple ORM frameworks / data-access layers in 1 application?
Yes. You can profile for example an Entity Framework using application which has optimized parts using Dapper.NET as if it uses one data-access layer. - Does ORM Profiler support production profiling?
Yes, you can switch on/off profiling of your application with 1 line of code at any time. So you can for example create a web-page in your admin part of your web application and enable / disable profiling at will by calling that line of code. If no application is listening to the named pipe the messages are send over, profiling application code has no effect, yet it's more efficient to switch it off completely with the built-in setting for this. - Will you continue developing LLBLGen Pro?
Absolutely :). V3.5 of LLBLGen Pro is planned to go beta at the end of the year. LLBLGen Pro is a mature product (it was first released in 2003), so development of new features slowed down over the years because of that, hence we decided to create an additional product in the data-access / ORM domain. - What .NET versions are supported by ORM Profiler?
.NET 3.5 and up. The application profiled has to be an application running on .NET 3.5 or higher. - Ok, I want to become a beta tester, but what do I get in return?
As a thank-you for your time spent on testing ORM Profiler v1.0, you'll get a free full license of ORM Profiler when ORM Profiler is released at the end of September 2011. If you get accepted as a beta-tester of course, see below :)
How to get into the beta
The beta period will last for 1 month. We will start the beta a.s.a.p. so if you can't use it in the coming weeks, you don't need to apply (sorry for that :)). We're looking for 50 beta testers: 25 registered LLBLGen Pro customers and 25 users of other supported data-access technologies / ORMs. Every beta-tester who is accepted will receive a free full license of ORM Profiler. To apply for one of the 50 available beta-test slots, please do the following:
Mail to support@ormprofiler.com with the subject "ORM Profiler beta-tester application" and in the email include your full name and if you're a registered LLBLGen Pro customer, please specify your customer id as well. Additionally, if possible, please describe a little bit about the situation you're going to test ORM Profiler in, e.g. a large web application with large databases or a small desktop application, a WCF service etc., so we get a picture about the situation so we can choose testers across the wide spectrum of applications.
If you're accepted, we'll email you back a.s.a.p. with the non-distributable beta license, the download URL and the information about how to provide feedback and get support during the beta period.
All email addresses as well as other information is treated confidential and will be used only for selecting beta-testers and to send you your licenses.
Ok, a little teaser
Below you see a screenshot which shows the visual UI and three of the views on the same data, which are kept in-sync when you click a query in one of them, so you can navigate through the three views to see the query in different contexts. Click the image for a full size version.
If you have questions, please ask them below in the comments, so I can answer them there as well. I hope you all are as excited about ORM Profiler as we are and we're looking forward to hearing from you!
Updated with fix of Microsoft's code so Microsoft can get this fixed quickly. See below
As you might know, we've been busy with our own data-access profiler for a while now. The profiler, which can profile any DbProviderFactory using data-access layer / ORM, works by overwriting the DbProviderFactories table for the app-domain the profiler is used in. This is a common technique: it replaces the type name of the DbProviderFactory of an ADO.NET provider with the type name of a wrapper factory, which receives the real factory as a generic type argument. Example: ProfilerDbProviderFactory<System.Data.SqlClient.SqlClientFactory>.
This is the same technique used by the Hibernating Rhino's profilers and others, and it has the benefit that it's very easy to use and has no intrusive side effect: you only have to add 1 line of code to your own application and everything in the complete application can be profiled.
This morning I was looking into how the stacktraces of code executed by MVC 3 looked like so I used an example application to get up and running quickly. It required Entity Framework v4.1 (for code first), so I grabbed the latest bits of Entity Framework v4.1, which is the update 1 version. Our tests on Entity Framework v4.0 worked fine, so I was pretty confident.
However, it failed. Inside the Entity Framework v4.1 update 1 assembly, it tried to obtain the factory type name from the DbProviderFactories table, and did some string voodoo on the name to obtain the assembly name. As it doesn't expect a generic type, it fails and it simply crashes. For the curious:
Method which fails (in EntityFramework.dll, v4.1.10715.0, downloaded this morning):
public static string GetProviderInvariantName(this DbConnection connection)
{
Type type = connection.GetType();
if (type == typeof(SqlConnection))
{
return "System.Data.SqlClient";
}
AssemblyName name = new AssemblyName(type.Assembly.FullName);
foreach (DataRow row in DbProviderFactories.GetFactoryClasses().Rows)
{
string str = (string) row[3];
AssemblyName name2 = new AssemblyName(str.Substring(str.IndexOf(',') + 1).Trim()); /// CRASH HERE
if ((string.Equals(name.Name, name2.Name, StringComparison.OrdinalIgnoreCase) && (name.Version.Major == name2.Version.Major)) && (name.Version.Minor == name2.Version.Minor))
{
return (string) row[2];
}
}
throw Error.ModelBuilder_ProviderNameNotFound(connection);
}
Stacktrace:
[FileLoadException: The given assembly name or codebase was invalid. (Exception from HRESULT: 0x80131047)]
System.Reflection.AssemblyName.nInit(RuntimeAssembly& assembly, Boolean forIntrospection, Boolean raiseResolveEvent) +0
System.Reflection.AssemblyName..ctor(String assemblyName) +80
System.Data.Entity.ModelConfiguration.Utilities.DbConnectionExtensions.GetProviderInvariantName(DbConnection connection) +349
System.Data.Entity.ModelConfiguration.Utilities.DbConnectionExtensions.GetProviderInfo(DbConnection connection, DbProviderManifest& providerManifest) +57
System.Data.Entity.DbModelBuilder.Build(DbConnection providerConnection) +159
System.Data.Entity.Internal.LazyInternalContext.CreateModel(LazyInternalContext internalContext) +61
System.Data.Entity.Internal.RetryLazy`2.GetValue(TInput input) +117
System.Data.Entity.Internal.LazyInternalContext.InitializeContext() +423
System.Data.Entity.Internal.InternalContext.GetEntitySetAndBaseTypeForType(Type entityType) +18
System.Data.Entity.Internal.Linq.InternalSet`1.Initialize() +63
System.Data.Entity.Internal.Linq.InternalSet`1.GetEnumerator() +15
System.Data.Entity.Infrastructure.DbQuery`1.System.Collections.Generic.IEnumerable.GetEnumerator() +40
System.Collections.Generic.List`1..ctor(IEnumerable`1 collection) +315
System.Linq.Enumerable.ToList(IEnumerable`1 source) +58
...
Mind you, this isn't a CTP. It's the real deal. Hibernating Rhino's blogged yesterday about this problem in v4.2 CTP1, and they added a temporary workaround, but in the end this situation actually sucks big time.
We're close to beta for our profiler, which supports (among all other DbProviderFactory using data-access code) LLBLGen Pro, Linq to Sql, Massive, Dapper and Entity Framework v1 and v4, but from the looks of it, not v4.1. In the many years we're now building tools for .NET, this is the biggest let-down Microsoft has given me: almost done with the release and now this...
Frankly I don't know what Microsoft is up to, but it sure as hell isn't helping the tool eco-system along, on the contrary. At the moment, I'm simply sad and angry... sad for hitting just another wall after all the work we've done and angry because it's so unnecessary.
Hopefully they fix this soon...
Update
I rewrote their code in a test to see if I could obtain what they want to obtain and still use the overwriting. It's easy, especially since they have access to the DbConnection.ProviderFactory property, which is internal, but not for Microsoft. My test below uses reflection, which they don't have to use. Hacked together, so not production readly code, but it serves the purpose of illustrating what could be done about it with little effort. The 'continue' in the catch is there because you can't recover from any exceptions at that point anyway (and most of them are originating from factories you can't load)
[Test]
public void GetProviderInvariantName()
{
var factory = DbProviderFactories.GetFactory("System.Data.SqlClient");
var connection = factory.CreateConnection();
Type type = connection.GetType();
AssemblyName name = new AssemblyName(type.Assembly.FullName);
var factories = DbProviderFactories.GetFactoryClasses();
string invariantName = string.Empty;
var dbProviderFactoryProperty = connection.GetType().GetProperty("DbProviderFactory", BindingFlags.NonPublic | BindingFlags.Instance);
foreach(DataRow row in factories.Rows)
{
try
{
var tableFactory = DbProviderFactories.GetFactory(row);
if(tableFactory.GetType()==dbProviderFactoryProperty.GetValue(connection, null).GetType())
{
// found it.
invariantName = (string)row[2];
break;
}
}
catch
{
continue;
}
}
Assert.AreEqual("System.Data.SqlClient", invariantName);
}
Last time I introduced LLBLGen Pro QuerySpec, a new fluent API for specifying queries for LLBLGen Pro. As promised I'll write a couple of blogposts about certain aspects of the new API and how it works. Today I'll kick off with the basics.
Two types of queries: EntityQuery<T> and DynamicQuery
There are two types of queries in QuerySpec: entity queries (specified with objects of type EntityQuery<T>, where T is the type of the entity to return) and dynamic queries, which are queries with a custom projection (specified with objects of type DynamicQuery or its typed variant DynamicQuery<T>). The difference between them is mainly visible in the number of actions you can specify on the query. For example, an entity query doesn't have a way to specify a group by, simply because fetching entities is about fetching rows from tables/views, not rows from a grouped set of data. Similarly, DynamicQuery doesn't have a way to specify an entity type filter, simply because it's about fetching a custom projection, not about fetching entities. This difference guides the user of the API with writing the final query: the set of actions to specify, e.g. Where, From, OrderBy etc., is within the scope of what the query will result in.
A custom projection is any projection (the list of elements in the 'SELECT' statement returned by a query) which isn't representing a known, mapped entity. This distinction between entity queries and dynamic queries might sound odd at first, but it will be straightforward once you've worked with a couple of queries.
QuerySpec queries are specifications, they're not executed when enumerated, in fact you can't enumerate them. To obtain the result-set, you have to execute the queries. I'll dig deeper in how to execute QuerySpec queries in a follow up post.
The starting point: the QueryFactory
To get started with a query, the user has to decide what kind of objects the query has to produce: entity objects or objects which contain the result of a custom projection. The question is rather easy if you formulate it like this: "Do I want to obtain one (or more) instances of a known entity type, or something else?". If the answer to that is: "One (or more) instances of a known entity type" the query you'll need is an EntityQuery<T>, in all other cases you'll need a DynamicQuery.
If you change your mind half-way writing your query, no worries: you can create a DynamicQuery from an EntityQuery<T> with the .Select(projection) method and can define a DynamicQuery to return entity class instances, so there's always a way to formulate what you want.
To create a query, we'll use a factory. This factory is a small generated class called QueryFactory. It's the starting point for all your queries in QuerySpec: to create a query, you need an instance of the QueryFactory:
var qf = new QueryFactory();
The QueryFactory instance offers a couple of helper methods and a property for each known entity type, which returns an EntityQuery<entitytype> instance, as well as a method to create a DynamicQuery instance.
We'll first focus on entity queries.
Formulating your query, getting started.
QuerySpec is defined to be close to SQL, so it contains the same building blocks for queries as SQL does. Not all elements are mandatory. For example, not all queries need ordering, a where filter etc. and e.g. entity queries have a fixed projection so you don't have to specify it.
- A projection which defines the result-set shape (the 'SELECT' list). Automatic for Entity Queries, specified with .Select(projection) for Dynamic Queries.
- A source set from which the result-set has to be obtained (the 'FROM' clause). Automatic for a query if a single source (entity) is used, otherwise required for both query types, specified with .From(join list).
- A filter on the source set to obtain a subset of the complete set (the 'WHERE' clause). Optional, specified with .Where(predicate).
- An ordering definition to return the result-set rows in a given order (the 'ORDER BY' clause). Optional, but recommended when a limit or paging directive is specified. Specified with .OrderBy(sort clauses).
- A grouping specification to group source set rows on a subset of fields (the 'GROUP BY' clause). Optional, only available in Dynamic Queries. Specified with .GroupBy(field list).
- A filter on the grouped set of rows after a grouping action (the 'HAVING' clause). Optional, only available in Dynamic Queries, and only used with queries with a grouping specification specified with .GroupBy(). Specified with .Having(expression list).
- A limit specification to limit the number of rows in the result-set (the 'TOP', 'LIMIT', 'FIRST' or equivalent clause). Optional, specified with .Limit(number).
- A specification which filters out duplicate rows in the result-set (the 'DISTINCT' clause). Optional. Automatic for entity queries, if the engine decides it's more efficient to use distinct. Specified with .Distinct().
- A specification to obtain a specific range of rows in the result-set (a Paging directive). Optional, specified with .Page(pageNumber, pageSize).
There are other specific elements per query type, e.g. prefetch paths for entity queries, a typed projection lambda mechanism for dynamic queries, but we'll get to that in a later post.
Entity Queries, getting started.
To create an entity query in its most simplest form, we can simply use one of the properties exposed by a QueryFactory instance. Say we want to formulate a query for Customer entities. The QueryFactory class, which is generated for our domain, exposes a property for each entity in the domain and each property returns an EntityQuery<entitytype>, so the Customer property returns an EntityQuery<CustomerEntity>:
var qf = new QueryFactory();
var q = qf.Customer;
the variable 'q' contains an EntityQuery<CustomerEntity> instance and can be used to fetch customer entity instances. As it doesn't contain any filter, executing it will fetch all customer instances. This is the base line: a query which specifies the complete set of all instances of a given entity type, in this case Customer entities.
In general, you'd want to specify a subset of this complete set. To do so, you specify a Where clause. For example, let's specify a query for all customer entities from the USA:
var qf = new QueryFactory();
var q = qf.Customer
.Where(CustomerFields.Country=="USA");
This query specification is similar to our first specification, but it now defines a filter on the complete set of 'Customer' entities. The .Where method accepts a predicate object, which can be a single predicate (e.g. field == value) or a predicate expression, which can be a combination of predicates concatenated with AND, OR, AND NOT, or OR NOT.
The mechanism to formulate the predicates is the native LLBLGen Pro query api mechanism, and which is documented extensively in the runtime framework documentation. The generated code base contains helper classes to specify a field of a given entity, like the CustomerFields.Country specification above, and the runtime framework uses operator overloading to formulate predicates under the hood. QuerySpec offers additional methods to help you formulate predicates, as shown in the following example which is equal to the one above:
var qf = new QueryFactory();
var q = qf.Customer
.Where(CustomerFields.Country.Equal("USA"));
For now we'll leave the predicate specification semantics and focus on the query definitions themselves.
Now that we can filter our set of entities, we can look at how we can do more advanced things. For example, let's specify a query for all Customer entities which have an order on file which was filed by employee with employee id '2'. This is a query which has a filter on a related entity. To do that we can use several mechanisms, one of them is a join. This comes down to: defining a large set with Customer and Order data (the source set), filter that data with a filter and obtain the result-set from that filtered set. In short: join customer with order, use a where clause on the order fields, select the customer fields:
var qf = new QueryFactory();
var q = qf.Customer
.From(QueryTarget.InnerJoin(qf.Order)
.On(CustomerFields.CustomerId==OrderFields.CustomerId))
.Where(OrderFields.EmployeeId==2);
We need to specify the source set to be a joined set between Customer and Order, so when filtering on the fields from Order in that set, we automatically filter the rows of Customer. To do that in SQL you'd use Customer INNER JOIN Order ON .... Here we do the same thing, we specify an INNER JOIN between Customer and Order in the From() call. As the query we're specifying it on is already a Customer query, we use QueryTarget instead of re-specifying qf.Customer. In case you're wondering why the .InnerJoin isn't defined on qf.Customer instead of the From() clause, it's because it introduced ambiguity which is explained more in detail in the document in the QuerySpec beta archive available to LLBLGen Pro v3 customers.
In a follow up post we'll go deeper into what's available in QuerySpec with respect to joins and filters.
Dynamic Queries, getting started.
After we've seen how to create entity queries, it's now time to show how to create Dynamic Queries. This is done roughly in the same way as entity queries:
var qf = new QueryFactory();
var q = qf.Create();
The variable 'q' now contains a DynamicQuery instance, but we can't really do much with it: it's completely empty. To specify a query, we have to specify a projection using the .Select() method. Let's specify a query for all country names available in the Customer set, with the additional requirement that duplicates have to be filtered out. LLBLGen Pro is clever enough that when you specify CustomerFields.Country, you're fetching from 'Customer' so you don't have to specify a source set.
var qf = new QueryFactory();
var q = qf.Create()
.Select(CustomerFields.Country)
.Distinct();
This query is an untyped query, the resultset will be a list of object arrays (with 1 value in each array). To make it a typed query, we have to specify a typed projection. In the untyped variant, 'q' is a DynamicQuery. In the typed variant, 'q' will be a DynamicQuery<T>, where 'T' is the type of the objects returned by the query when it's executed. To create a typed variant, we use a different overload of .Select(), namely the one which accepts a lambda:
var qf = new QueryFactory();
var q = qf.Create()
.Select(()=>CustomerFields.Country.ToValue<string>())
.Distinct();
The variable 'q' is now a DynamicQuery<string>, due to the lambda specification in the Select(). One other new element in the Select call is the .ToValue<T>() call. QuerySpec will parse the lambda specification in the Select() method and will transform the objects which have calls to known methods to elements for the projection and leave all other code alone. This allows you to formulate complex projection statements with code which isn't convertible to SQL statements. For each row to project into the resultset, the lambda is ran at runtime, executing all code you've specified there. In our simple example above, it simply returns the value for CustomerFields.Country.
Executing the query above will result in a List<string> with all country values from the Customer set.
Dynamic Queries can, similar to Entity Queries, contain definitions for larger source sets from which you project your final result-set. Let's say you want all CompanyName values for all customers who have an order on file filed by employee with employee id '2':
var qf = new QueryFactory();
var q = qf.Create()
.Select(()=>CustomerFields.CompanyName.ToValue<string>())
.From(qf.Customer.InnerJoin(qf.Order)
.On(CustomerFields.CustomerId == OrderFields.CustomerId))
.Where(OrderFields.EmployeeId==2)
.Distinct();
Here we first specify the projection, using a .Select() method call. We then specify the source-set by using a .From() method call and after that we specify the filter on that source-set by using the .Where() method call. To limit duplicates, we specify .Distinct().
There's much more possible than this tiny scratch of the surface. I'll go deeper into the possibilities of QuerySpec in a follow up post.
QuerySpec is a free add-on for LLBLGen Pro v3 customers and is currently in beta. The complete archive with sourcecode is downloadable from the customer area on our website.
In the past two months I've been busy creating a new query specification API for LLBLGen Pro. Our native query API is modeled after SQL statement fragments (like a 'predicate', a 'relationship', a 'field'), but specifying a query with it can be a little verbose, and above all: the code doesn't look like a query. Especially with complex queries and projections it can sometimes be tedious to grasp what the SQL will look like and what the query is doing. With LLBLGen Pro QuerySpec this changes: a fluent, compact, highly expressive API which allows you to write queries in the structure of the SQL it will produce and with the expressiveness of Linq.
LLBLGen Pro QuerySpec went into beta yesterday, and is a free add-on for LLBLGen Pro v3.0/v3.1 users and available in the customer area. In theory it should work for v2.6 users as well, but we didn't test that. In the next weeks I hope to write more blog posts about QuerySpec and its features.
But first, let's look into why QuerySpec saw the light of day. In 2008 we introduced LLBLGen Pro v2.6 with a full Linq provider: you could formulate queries in Linq which made things much easier than writing queries in our own native query API. In v3.0, released in 2010, we fine-tuned the Linq provider more to make it handle more edge cases. So why a new query API?
Linq, but simpler
I've spend roughly 10 months full time on our first version of the Linq provider. In that period I learned that there are two core issues with Linq when you use it for specifying a database query: one for the user writing the Linq query and one for the Linq provider developer. I'll briefly try to explain these problems.
The user's problem: no 1:1 mapping of Linq onto SQL.
To make a Linq query runnable on a relational database, it has to be transformed into SQL. This is problematic because Linq's construct in many cases don't map 1:1 onto SQL constructs. Examples are group join, group by, multiple aggregates in projections, where clauses inside join clauses etc. To be able to create SQL that works on the RDBMS, these constructs have to be transformed into other constructs which are then interpreted and transformed into SQL. The transformation of these elements is a complex affair as the intend of the Linq expression sub-tree has to be interpreted and that result has to be used to create the transformation result.
This can get complex rather quickly. An example is a group by with a nested aggregate in the projection. Linq allows that, SQL doesn't. See this example:
LinqMetaData metaData = new LinqMetaData(adapter);
var q = from o in metaData.Order
group o by o.Customer.Country into g
orderby g.Key
where g.Sum(n => n.OrderDetails.Count()) > 10
select new
{
Country = g.Key,
Num = g.Average(n => n.OrderDetails.Count(od => od.ProductId == 3))
};
Sounds simple? Let's look at the SQL it results in. (param values are inlined by the profiler for easy displaying the SQL, the query did use parameters when it was run)
SELECT [LPA_L1].[Country],
[LPA_L1].[LPAV_1] AS [Num]
FROM (SELECT [LPA_L3].[Country],
SUM([LPA_L3].[LPAV_]) AS [LPAV_],
AVG([LPA_L3].[LPAV_1]) AS [LPAV_1]
FROM (SELECT [LPA_L4].[Country],
[LPA_L4].[OrderId],
[LPA_L4].[LPAV_],
(SELECT COUNT(* ) AS [LPAV_]
FROM [Northwind].[dbo].[Order Details] [LPLA_7]
WHERE (([LPA_L4].[OrderId] = [LPLA_7].[OrderID])
AND ([LPLA_7].[ProductID] = 3 /* @p1 */))) AS [LPAV_1]
FROM (SELECT [LPA_L5].[Country],
[LPA_L6].[OrderID] AS [OrderId],
(SELECT COUNT(* ) AS [LPAV_]
FROM [Northwind].[dbo].[Order Details] [LPLA_4]
WHERE (([LPA_L6].[OrderID] = [LPLA_4].[OrderID]))) AS [LPAV_]
FROM ([Northwind].[dbo].[Customers] [LPA_L5]
RIGHT JOIN [Northwind].[dbo].[Orders] [LPA_L6]
ON [LPA_L5].[CustomerID] = [LPA_L6].[CustomerID])) [LPA_L4]) [LPA_L3]
GROUP BY [LPA_L3].[Country]) [LPA_L1]
WHERE ((((([LPA_L1].[LPAV_] > 10 /* @p3 */)))))
ORDER BY [LPA_L1].[Country] ASC
As you can see, this shows what I call folding: it folds a query into another one as the source to make sure SQL can deal with it. With every transformation comes change. This means that the user can't easily predict what the SQL might look like. The user might take a ballpark guess, but unless s/he has worked with the Linq provider for quite some time, it's hard to do. This is problematic because the application doesn't run in a void, it uses the database, likely quite heavily. This means that the database interaction has to be efficient. With all the transformations going on, this is very hard to optimize, from a developer's point of view and from a DBA's point of view.
QuerySpec overcomes this by offering a transformation free API. There are no transformations done, what you write is what you get.
The Linq provider developer problem: Scoping
A problem I had with the Linq provider was scoping. Due to the sequence oriented nature of Linq, scopes are hard to define, however they're essential for SQL: an alias used inside a sub-query isn't usable outside that sub-query. For the provider developer it becomes problematic to determine what will end up in which SQL scope to make sure aliases are correct. A Linq provider aliases all elements by itself, as the user doesn't specify any aliases. Due to transformations, these aliases change, and scopes change with them. This in itself is solvable with mapping tables, alias objects and scope objects. The problem becomes complex when the original source is hard to track down.
For example, when you specify 2 join statements in a linq query, the second join will return an element which joins the result of the first with the right operand of the second. What was the entity joined by the left operand's first property? You have to track that back. If a transformation took place, you have to make sure the mapping tables are kept up to date. This is all caused by the fact that Linq doesn't know any scoping as SQL does. It does have some scoping but it's incompatible with SQL, and making the two match was for me the biggest challenge when writing a Linq provider. Don't get fooled by the syntactic sugar C# and VB.NET offer you, that's not what the Linq provider sees, the linq provider sees calls to extension methods in a sequence.
In the end my code works on, I think, 99% of the scoping problems (and trust me, there are some very complex problems thinkable), but to get there, I had to write a lot of complex code with a lot of transformation logic in it. To understand all that takes time, and thus to make changes to it takes even longer.
QuerySpec doesn't have this problem: as it follows SQL's structure, scoping isn't a problem, aliases are specified by the user, and scopes are defined by the user as well, through the API's flow.
To have a solid alternative to Linq, we needed an API that could match it in expressiveness and readability and could offer what our Linq provider does too: nested queries in projections, complex query definitions in a single statement, function mappings and full LLBLGen Pro feature compatibility, like prefetch path support, exclusion/inclusion of fields etc.
It also had to have the flexibility of our native API: easily append query fragments in logic (e.g. append where clauses), write your own predicate classes or other methods to create queries for your domain really easily, and be compile time safe, so when a field or entity is renamed or removed, your queries should fail to compile so you can adjust the problem right away.
QuerySpec is built on top of our native API (as is our Linq provider), and embeds all that.
Quick tour of QuerySpec's features.
I'll now very briefly go into what QuerySpec offers. One of the main goals was to re-use what was already generated, and make the user use it in an elegant, simple fashion. Another goal was to keep things close to SQL: no transformations due to mismatches. We did add some helper constructs to make things more expressive, but if you don't want to use them, you don't have to. For example there's an Any() operator, but if you feel more confident with using WHEN EXISTS, you're perfectly fine by using it, it doesn't take any extra code.
QuerySpec starts with its own factory, the QueryFactory. The QueryFactory produces either DynamicQuery instances for ad-hoc projections, or EntityQuery<T> instances for queries which return one or more entity instances of type T. Let's look at a couple of examples.
Entity query: select orders from customers from the UK:
var qf = new QueryFactory();
var q = qf.Order
.From(QueryTarget.InnerJoin(OrderEntity.Relations.CustomerEntityUsingCustomerId))
.Where(CustomerFields.Country == "UK")
.OrderBy(CustomerFields.City.Descending());
This uses a join specified by a relationship which is already generated into the generated code. Advantage is that you don't need to know the FK - PK relationship. Disadvantage is that it doesn't look like a real join.
Alternative:
var qf = new QueryFactory();
var q = qf.Order
.From(QueryTarget.InnerJoin(qf.Customer)
.On(OrderFields.CustomerId==CustomerFields.CustomerId)
.Where(CustomerFields.Country == "UK")
.OrderBy(CustomerFields.City.Descending());
This uses a join using an ON clause, which accepts a predicate. I specified an INNER join, but I also could have specified a left or right join instead. No join .. into with a DefaultIfEmpty() construct which gets transformed under the hood into a left join.
Dynamic query: a group by query with a typed projection (so a List<T> will be returned) and a nested query in the projection, using an aliased set.
var qf = new QueryFactory();
var q = qf.Create()
.Select(() => new
{
Key = CustomerFields.Country.ToValue<string>(),
CustomersInCountry = qf.Customer.TargetAs("C")
.CorrelatedOver(CustomerFields.Country.Source("C") == CustomerFields.Country)
.ToResultset()
})
.GroupBy(CustomerFields.Country);
LLBLGen Pro doesn't require you to specify a From clause as it can figure that one out by itself. Every query has methods like Select, Where, From, Having, GroupBy, which represent a clause in the final SQL query. This is predictable and has a 1:1 projection onto SQL. As we're working with an O/R mapper and the entities it deals with, writing plain SQL isn't going to help: the entity might look different than the actual table it's mapped on, it might be a subtype in an inheritance hierarchy etc. etc.
Because there are no transformations and because scoping is clear, two of the more major problems with a Linq provider are solved. I'm sure some people will not like it and stick with Linq and our Linq provider. That's fine, and it was never our goal to replace it. We just wanted to provide a solid alternative for the people who want one, and I think our new API fits that requirement. Because it all uses the same core elements under the hood, you can write one query in one API and the other query in another API and run it on the same framework: more choice, more ways to solve a problem, so less problems for you, the developer.
LLBLGen Pro QuerySpec beta is now available for LLBLGen Pro v3.x users and is a free add-on for v3.x users and can be downloaded in the customer area of our website. Stay tuned for more QuerySpec examples and blogposts in the coming weeks!
With DevExpress' Seth Juarez I recorded a webinar / video (Length: 1:25:05) which shows the most important features of the LLBLGen Pro designer (roughly the first hour of the video) and how to use the LLBLGen Pro runtime framework with DevExpress' reporting tools using Linq to LLBLGen Pro and normal databinding. Enjoy! :)
Yesterday we released LLBLGen Pro v3.1! Version 3.1 comes with new features and enhancements, which I'll describe briefly below. v3.1 is a free upgrade for v3.x licensees.
What's new / changed?
Designer
- Extensible Import system. An extensible import system has been added to the designer to import project data from external sources. Importers are plug-ins which import project meta-data (like entity definitions, mappings and relational model data) from an external source into the loaded project. In v3.1, an importer plug-in for importing project elements from existing LLBLGen Pro v3.x project files has been included. You can use this importer to create source projects from which you import parts of models to build your actual project with.
- Model-only relationships. In v3.1, relationships of the type 1:1, m:1 and 1:n can be marked as model-only. A model-only relationship isn't required to have a backing foreign key constraint in the relational model data. They're ideal for projects which have to work with relational databases where changes can't always be made or some relationships can't be added to (e.g. the ones which are important for the entity model, but are not allowed to be added to the relational model for some reason).
- Custom field ordering. Although fields in an entity definition don't really have an ordering, it can be important for some situations to have the entity fields in a given order, e.g. when you use compound primary keys. Field ordering can be defined using a pop-up dialog which can be opened through various ways, e.g. inside the project explorer, model view and entity editor. It can also be set automatically during refreshes based on new settings.
- Command line relational model data refresher tool, CliRefresher.exe. The command line refresh tool shipped with v2.6 is now available for v3.1 as well
- Navigation enhancements in various designer elements. It's now easier to find elements like entities, typed views etc. in the project explorer from editors, to navigate to related entities in the project explorer by right clicking a relationship, navigate to the super-type in the project explorer when right-clicking an entity and navigate to the sub-type in the project explorer when right-clicking a sub-type node in the project explorer.
- Minor visual enhancements / tweaks
LLBLGen Pro Runtime Framework
- Entity creation is now up to 30% faster and takes 5% less memory. Creating an entity object has been optimized further by tweaks inside the framework to make instantiating an entity object up to 30% faster. It now also takes up to 5% less memory than in v3.0
- Prefetch Path node merging is now up to 20-25% faster. Setting entity references required the creation of a new relationship object. As this relationship object is always used internally it could be cached (as it's used for syncing only). This increases performance by 20-25% in the merging functionality.
- Entity fetches are now up to 20% faster. A large number of tweaks have been applied to make entity fetches up to 20% faster than in v3.0.
- Full WCF RIA support. It's now possible to use your LLBLGen Pro runtime framework powered domain layer in a WCF RIA application using the VS.NET tools for WCF RIA services. WCF RIA services is a Microsoft technology for .NET 4 and typically used within silverlight applications.
- SQL Server DQE compatibility level is now per instance. (Usable in Adapter). It's now possible to set the compatibility level of the SQL Server Dynamic Query Engine (DQE) per instance of the DQE instead of the global setting it was before. The global setting is still available and is used as the default value for the compatibility level per-instance. You can use this to switch between CE Desktop and normal SQL Server compatibility per DataAccessAdapter instance.
- Support for COUNT_BIG aggregate function (SQL Server specific). The aggregate function COUNT_BIG has been added to the list of available aggregate functions to be used in the framework.
- Minor changes / tweaks
I'm especially pleased with the import system, as that makes working with entity models a lot easier. The import system lets you import from another LLBLGen Pro v3 project any entity definition, mapping and / or meta-data like table definitions. This way you can build repository projects where you store model fragments, e.g. the building blocks for a customer-order system, a user credential model etc., any model you can think of. In most projects, you'll recognize that some parts of your new model look familiar. In these cases it would have been easier if you would have been able to import these parts from projects you had pre-created. With LLBLGen Pro v3.1 you can.
For example, say you have an Oracle schema called CRM which contains the bread 'n' butter customer-order-product kind of model. You create an entity model from that schema and save it in a project file. Now you start working on another project for another customer and you have to use SQL Server. You also start using model-first development, so develop the entity model from scratch as there's no existing database. As this customer also requires some CRM like entity model, you import the entities from your saved Oracle project into this new SQL Server targeting project. Because you don't work with Oracle this time, you don't import the relational meta-data, just the entities, their relationships and possibly their inheritance hierarchies, if any.
As they're now entities in your project you can change them a bit to match the new customer's requirements. This can save you a lot of time, because you can re-use pre-fab model fragments for new projects. In the example above there are no tables yet (as you work model first) so using the forward mapping capabilities of LLBLGen Pro v3 creates the tables, PK constraints, Unique Constraints and FK constraints for you.
This way you can build a nice repository of model fragments which you can re-use in new projects.
More Posts
Next page »