A new interview with me about everything O/R mapping and more is now live at .Net Rocks! 
Enjoy!
Justing Etheredge posted a great article about that Design Up Front (DUF) is something else than Big Design Up Front (BDUF). He discusses the misunderstanding that just because BDUF is considered harmful in a lot of projects, it's not said that DUF is too and one should just jump in, start hammering in code and hope for the best. As an example of BDUF, Justin uses the example of a car manufacturer and asks himself why a car manufacturer uses BDUF (according to Justin) and we in software land often don't or at least try to avoid it.
One of the arguments he uses is that software is easily changeable, or at least easier changeable. I beg to differ. Sure you can change text in an editor, use some refactoring tool to get very low level refactorings on your code, but that's similar to hammering the drive train which doesn't fit till it does: it doesn't change a design. It changes the result of the design. One of the biggest problems in software land is that clients / customers think software is easy to change and that they can ask for changes right in the middle of a project or even at the end. Ever saw a large building which was almost completed to be transformed into something else with the swimming pool at the top instead of the basement right before it was done? No, because every human knows that changing a building when it's finished is very hard to do.
But software has the same restrictions.
Software parts represent functionality. This functionality is consumed by other parts so those parts can deliver their functionality and so on. This is a functional dependency, not necessarily a technical. I'm not talking about code, I'm talking about functionality and the design of the functionality elements inside the total system and subsystems, data flows between them etc. etc. Code is created from these functionality elements, so the programmer knows what to write and the architect knows what that piece of code does. That's Software Design 101, and for the hard-boiled 'code == design' visionairs: think again why you wrote class ABC the way you wrote it and not differently.
It's these functional dependencies which make it hard to change a design after code has been written. The reason is simple: changing the functional elements often leads to a lot of changes in the code (which is nothing more than the executable form of the functional elements). Changes which aren't made from the code's Point of View (POV), but from the functionality's POV. This means that refactoring code isn't a technical matter anymore, but a functional one: the basis of the code, the sole reason the code is there has changed.
So, Justin has a good point in stressing the fact that DUF is required. What's left untouched is when a design becomes 'Big'. I think what's meant with BDUF is that the design is completely done up front and not changed. DUF is more of a design where the borders are known, the general goals are known but the details in a lot of areas are filled in along the way. There are two things important to make this work:
- Know the goals of the project. A comment by Glenn Block has a good point: often the goals are unclear so what a project should become is unclear, so doing design up front is a task which can't be completed as what to design is unclear. I think that the acknowledgement of not having a clear set of goals is something which should lead to a quest to get these goals defined properly before anything is done further. Remember: this doesn't mean you've to define details. It means you have to analyze and determine what the core problems are the software is trying to solve and what the possible solutions are to these problems.
- Details shouldn't be able to change the design. To be able to fill in details during a project, you shouldn't run the risk of having a required detail forcing a significant change in the project, which for example could change a lot of the rest of the project already written.
Agilists will now say that by following Kent Beck's rules their software automatically evolves into something a customer / client wants. Though the engineers at BMW as described in Justin's example also are able to provide a car which works, drives well and a lot of customers are willing to buy it. If BDUF is so bad, why are BMW able to produce cars which work well, using BDUF? Justin uses the argument of 'costs' to illustrate that within software we simply can't do BDUF. Though I think costs have nothing to do with it. One of the reasons is that building cars also costs a lot of money, millions if not billions. Although BMW's cost a reasonable amount of money, small European cars are often very inexpensive and are build with the same principle. Having 1 billion in development cost is a big hurdle to overcome when a company wants to make money in the end.
If you consider the fact that large software projects also cost a lot of money (here in the Netherlands we have had some terrible government project failures recently which added up to more than a billion Euros), and in general are meant to bring other costs down, it's first-grade math to conclude that a software project has the same restrictions: to make a profit, costs shouldn't be too high. That costs shouldn't be too high isn't an argument that costs are relevant or a reason BDUF doesn't work: a failure is much more expensive: having to throw away a lot of the work done because an important part of the design has to change is very costly, so the path followed by BMW and other car makers is simply to avoid having to do that. But, isn't that also true in software land? Can we affort to throw away a lot of work because the design has to change? No.
So what's the secret? Why is BMW pulling it off with BDUF and we poor software developers aren't? It's simple: BMW actually doesn't use BDUF. They use roughly this route:
- Defining the goals of what the car should meet: which category it should fit it (so they know their competitors and the target price), should it be a family car, sports car etc.
- Make a rough design within boundaries which are well known. This has nothing to do with details like the thickness of the steering wheel. It has everything to do with the areas they have previously experience with: an engine up front with front wheel drive gives different characteristics than an back wheel drive, etc. etc.
- Make a prototype. Often these prototypes are build with existing, proven parts, like the base of a different model. Subsystems which are new, are filled in with surrogates
- Test the prototype in a wind tunnel and with a computer model
- Build the subsystems they need and design the assembly lines
- Test the final version and tweak here and there some things, like car height etc.. This phase, it's impossible to change a lot of the car, as that would make a lot of things change with it, effectively going back to an early stage in development. In software, this is the same thing
Don't let the word prototype make you go blind and ditch this path as stupid or not applicable to software. The lesson to learn isn't in the word 'prototype'. The lesson to learn is in the second step and the third step: design within known boundaries and re-use proven parts. This isn't at the level of code, if we project this onto software development. It's at the level of functionality elements: if you need authentication, pick the authentication subsystem which worked the last time and use that and move on, as that's already solved. After you've decided that, you can check whether the implementation you have of that functionality still works or you have to refactor it. Though it's not at the level of 'I have this class, it will work'. That class represents functionality, that functionality should fit into the design. If not, don't use that class. A simple matter of what comes first and what follows.
I know, someone will come up with 'but... BMW is creating a product, we're creating software for a client who doesn't know what he wants/needs'. Think! You're not in a different position than BMW is. In fact, you're in a better position, because you know your client, your customer, you can even ask questions. BMW can't, they can only research what a potential customer might look for. Though if you think a person who buys a car is a person who knows what s/he wants and a person who buys software is a person who has no clue, you're wrong: both think they know what they want and both turn out to have no clue what they want. That's ok, that's why they hired you to analyse what they need and give that to them.
Yes, Software Engineering is hard, deal with it. Don't use the excuse that because goals are unknown, because the client changes his mind every day the design therefore has to be done in the code editor. If those are your problems, solve them, deal with them. Because after all, software isn't written for the sake of writing software, it's written to solve a problem, to make things easier, better handleable and controllable and understandable. What your client needs therefore should fit in that criteria and it's your job to figure out if that's the case and in what form, not your client's.
I was looking for a reference in ADO.NET Entity Framework documentation (via Google) if a Complex type in an EDM could be part of an association (relationship) like Hibernate supports. I needed this for some tool I'm working on
. Google gave me an interesting link, namely to a patent held by MS about relationshipsets in the EDM. It refers to other patents of similar straight-forward concepts, either based on stuff defined by Codd or defined by other O/R mapper frameworks like Hibernate or Toplink long before the filing took place. Why these common concepts are even patentable (as they're discoveries in math-space, so not really inventions) is beyond me.
I think it's great that people get credited for their work, after all they put in the hard effort. The problem with these kind of patents like the example above, is that unless you patent the living daylights out of every line of code and every design you think of (which costs a lot of money), you'll sooner or later find that some company owns a patent of work you perhaps thought of first but you don't file a thousands of dollars costing patent request for every design, so someone else filed a patent.
This is a big risk for software engineers out there. I know Microsoft in general doesn't use patents to bitch on competitors or companies who make stuff they also make. But some other company, e.g. a patent-trolling lawfirm who bought a stack of patents, might. It's good the EU forbid software patents for now and it's likely they'll never be valid here. Let's hope the US patent office is following suit.
Btw, 'Object relational mapping' is patented a lot of different times, often duplicates more or less. Oh, and the answer to my initial question is: no, complex types can't refer to entity types. (Which is expected, having a reference to an entity from a complex type (Value object in DDD) seems rather strange and a true edge case)
With almost bleeding ears I'm currently listening to show #369 of .NET Rocks!, which has Danny Simmons and Stephen Forte as guests. Danny is of course known of his major role in the Entity Framework (EF) design and Forte is one of the Council of Wise Men (TM) which are advising the EF team how to make the EF a better product / system / whatever.
The quote in the title is one of many silly remarks you'll encounter while listening to the show. Let me start by the quote in the title of this post:
"The Entity Data Model is much bigger than just an ORM". -- Stephen Forte
Now, I know a thing or two about O/R mapping, O/R mapping tools and the like, but for the life of me, I can't understand what mr. Forte means with the above quote. I mean: the EF allows you to define entities, map them to a elements in a persistent storage, generate code from that to use these defined entities in your code and... that's it. If the Entity Data Model is so much bigger than an O/R mapper (ORM) (is that even possible? Isn't that comparing a model (declarations) with a toolkit (code) ?), what else is there that I apparently must have missed? Ok, so you can use the model defined in the edmx file with other toolkits, big deal, I can do that in LLBLGen Pro as well: the whole model is available to you in an object graph which is accessable through any task performer class so you can use it for whatever you want: emitting code, do configurations, creating other projects, whatever comes into your imagination.
The two discuss this feature of the EF as if it's an achievement of the EF, but that's not true: it's the achievement of the consuming tools like Astoria and Dynamic Data, that they can use an edmx file and the embedded model for their own services: any O/R mapper which has a designer which lets you define a model and mappings has the same potential and the same feature, the only thing that isn't there is that the EF is from Microsoft and Microsoft also produces the services which seem to work fine with the EF. If Microsoft puts in the effort to make their tools drag-n-drop compatible with other O/R mappers out there, the EF would look like the tool it actually is: Yet Another O/R Mapper, one with one of the most crappiest model designers ever made. I mean, if the EF is meant for serious applications bigger than the average Mickey Mouse website app, why is the EF shipping with a designer which forces you to have everything on one big canvas?
What's on the table with the EF, what's available to the developer, is simply an O/R mapper: it defines an abstraction above the tables/views in the database, though any O/R mapper is doing that. That's the sole purpose of an O/R mapper!
Sorry Danny and Microsoft, you can keep on trying to sell the EF as something much much bigger than an O/R mapper framework, but it won't help: walk, quack, duck etc.. What strikes me as silly as well is that Microsoft tries so hard to make it not look like an O/R mapper framework, as if O/R mappers are bad and evil. "*Boooohh*, beware of the evil ORM! Quick, call EFMan to rescue us!"...
After speaking out the quote in the title, Forte seems to step on a block of soap as he slips into bezerk mode with the all time classic:
"You can even build ORMs on top of the Entity Data Model" -- Stephen Forte
He then goes on to refer to Ideablade for already doing this. Sorry Stephen, but Ideablade's $2500 per seat (!) costing product isn't an O/R mapper. It's effectively an additional framework with features not a lot of people will ever need on top of the EF, using the EF as... its O/R mapper! He ends with the wish that the NHibernate developers would rip out the, and I quote: "Old Code", and instead build on top of the Entity Data Model. No I'm not making this up.
Why would Ayende and friends do that? What advantage does it have for the user, the application developer, that NHibernate would replace their O/R mapper logic with the EF? That would be a step backwards: all the flaws in the EF are then suddenly something you've to live with and changing things in the O/R mapper core by the NHibernate team is hard because... they don't own the code, it's closed for them. Similar to us: we won't replace our own O/R mapper framework with the EF, because that would mean we've to drop features like auditing, authorization, entity views, advanced eager loading etc. etc. Porting some of that to the EF would make sense, to help out the ones who have to work with the EF because some CTO thought it would be wise to base everything on the EF. But replacing the O/R mapper logic with the EF makes no sense whatsoever. The outspoken wish coming from Forte shows to me clearly that Microsoft is struggling getting major support for its second O/R mapper framework.
With that latest remark, Forte makes his "EDM is much bigger than just an ORM" statement even more difficult to understand: if the EF can serve as the base for an O/R mapper, how can it be, and I quote: "much bigger", than an O/R mapper? I think I must have missed something plainly obvious everyone else is apparently seamlessly understanding. However, if a person like me who has spend almost every minute of the past 6 years on O/R mapper framework design and development doesn't understand what's so incredibly special about the EF, how is Microsoft thinking about convincing the developers out there who have spend perhaps a month of their life fiddling a bit with O/R mappers or not at all, that the EF is the Silver Bullet for everything Data Access?
Here's how, and Forte says it himself: "Push it as a platform". But, it's not a platform, it's an O/R mapper framework to work with data somewhere in a database. Windows is a platform, .NET is a platform, the EF isn't. Positioning it as a platform will pollute the minds of the novices that the EF will do much more than it really does, what it really is. I mean: the list of features Danny mentions at the end as features for 'v2', those are features often already found for years in major O/R mapper frameworks out there. If EF is a platform, or in the light of Microsoft's 'vision': the platform, for data-access, what are those O/R mapper frameworks which pack even more features which the EF clearly lacks at the moment? Super-platforms? Oh no, my mistake, those will of course still be 'just ORM's!
I don't mind yet another O/R mapper framework on the market, even if it's from Microsoft: the more frameworks, the more people get interested in O/R mapping. What I do mind is that Microsoft tries to sell the idea that before the EF there wasn't any data-access framework out there which could do what the EF does, combined with a bundled release of the EF inside SP1 so every developer gets it installed by default. But I guess 'fairness' isn't something you should expect in business-land, so it's a given that this would happen eventually.
The solution to it is of course to cope with it and to come with an answer which will make Microsoft's effort look like a pig with lipstick. Oh, and without the help of a Council of Wise Men. Because you know, it doesn't take a Council of Wise Men to create what you should be creating: you should create what you personally would like to use, what you as the architect and developer of the framework would like to see in that framework because if that given feature wasn't there, you wouldn't use such a framework yourself. One doesn't need a Council of Wise Men nor a petition of angry ALT.NET-ers to get things in the right direction: just build it for yourself. That does require that writing such a framework can only succeed if you have written the alternatives by hand yourself already a couple of times. As Microsoft has done that a gazillion times internally (Sharepoint, CRM etc.) one can't deny that there is a group of people inside Microsoft who know what is needed and what isn't:
for example, who cares if the EF isn't POCO, does anyone out there really think that the people who are now in love with NHibernate will jump ship to the EF and embrace it? Why? Yegge is absolutely right in his post linked above. That's not to say you shouldn't build in features you wouldn't use yourself, of course you should. But chances are the set of features you've to build which aren't used by yourself is pretty small.
To me, it's a big failure to surpass these internal group of people who already know what to build and instead hire a group of Wise Men, who individually likely know what they're talking about in their own field and playground, but are so far away from the developer who has to use the framework created. I'm sure the Council will produce a solid, clear vision. I'm also pretty sure that vision is shared among a lot of 'architects' across the globe. But above all, I'm sure the Council's vision is not what the EF needs. What's even more troubling is that apparently the EF team doesn't have a strong vision themselves what to build. How can that ever lead to a framework which does what it should do? Scott Ambler wrote his design documents almost a decade ago. Toplink (now open source) is more than 10 years old and often considered one of the best O/R mapper frameworks ever made. It's not as if the problems the EF team is trying to solve are new nor that the solutions for these problems have been crappy at best, on the contrary. If after all these years, after all those solutions, effort, papers and debates, the EF team still needs external counseling, they need something else: an internet connection and a pair of glasses.
For the next major version of a certain application I'm working on (gee, what might that be
) I'm researching some UI frameworks and techniques. In the past few months I've spend most of my time working on application support library code, language designs, algorithm design etc. etc. (more on that in a later article) and I arrived at the point where I wanted to see how my vision for the major version would work in a draft application, just to see how the various elements would work together visually.
One of the first questions one would ask these days when a new desktop application is started is: WPF or Winforms? The current version is build with Winforms all the way though it's tempting to go for WPF, as it's new, has nice performance and great looks (if you're able to cook up the styles). After a day or 2 of fiddling with the various WPF docking frameworks out there, there's one firm conclusion to be drawn: WPF isn't up to par with Winforms when it comes to serious applications which use a normal windows look and feel: automatic menu, buttonbar handling based on selected docked window for example, one of the cool features of many winforms control libraries, is one of those things which is hard to do in WPF (at least, it's not directly available/build in). One other thing which made me draw the above conclusion was that it in general sucks bigtime when you have a normal windows application with normal menus: the text is in general blurry (or at least blurry in a short period of time after a move/open) and to make the menus to look like normal menus like in VS.NET is a pain (it doesn't get close).
Because we will need a custom rendering system in this major version for some areas, we do need WPF. However, one can host a WPF control just fine in a winforms application, so re-using our already written winforms skeleton was a choice I didn't expect at first but which makes sense.
To Ribbon or Not To Ribbon
The second question one will ask is: should we use a ribbon-like menu system? After all, it's well known that good-looking applications are often chosen over the uglier competition: Looks Sell. So, a nice fancy ribbon menu in an application is tempting, right? After all, at first, a ribbon-containing application looks better (although it doesn't have to be more productive/easier to use, that's another story) Almost every control package out there, WPF or winforms comes with a ribbon control. Now, there's a catch and you likely already know: Microsoft had the fancy idea that if you're going to use a ribbon UI, you have to sign a license with them.
Let's keep the usefullness of a ribbon UI (which IMHO is heavily overrated) aside for now, let's focus on this 'license'. There are two things in this license which made me decide to not sign it and write this blog entry instead to warn you not to sign it as well. Disclaimer: I'm in Europe, and we think a bit differently about imaginary property than for example in the USA (like software patents and the like).
- It's unclear what a ribbon UI is
This might sound weird, but let's do a thought experiment. One of the paragraphs in the license says:
Your Licensed UI must comply with the Design Guidelines. If Microsoft notifies you that the Design Guidelines have been updated or that you are not complying with the Design Guidelines, you will make the necessary changes to comply as soon as you reasonably can, but no later than your next product release that is 6 months or more from the date you receive notice.
Let's say I create a UI, call it MyUI. MyUI uses a ribbon control. However, as it's MyUI of MyApp, I use it as I see fit, it's my application and my work after all. The control I use is from a 3rd party vendor, not MS. If MyUI doesn't comply with the Office UI license, is it then a UI which has to be licensed? After all, a normal button bar UI also doesn't comply with the ribbon license. That's perhaps obvious, but where is the line which makes a given UI not a ribbon UI? To define that, one has to define exactly what a ribbon UI is. Though if a given UI, MyUI isn't compliant with that definition, it can't be in violation of this Office UI license, as it's not, by definition, a ribbon UI, and therefore doesn't need a license. After all, a normal windows UI isn't a ribbon UI either, although several Office 2007 applications use exactly that UI (Outlook main window for example, no ribbon in sight)
Let's take another point of view. Let's say 5 other companies are also super proud of their UI work and also try to tie a developers hands by forcing him to sign a license for these UIs. Perhaps you're not even aware of these companies' work and you've never heard of them. Still, your UI looks remarkably similar to their work, or at least main parts of it. Are you now liable? Should you be seen as liable? I don't think so. For the people who think I'm paranoid: ever created a UI for an application which uses a 'Card' UI to display customer data for example ? Likely as the 3rd party control package used had this fancy option and it looked great. But... who cooked up that UI concept? Unknown, yet, that company or person might want to force a license on you similar to the Office UI license, simply because that person or that company is also proud of that Card UI concept. "That's stupid" you might say, and I agree, but why is this Office UI then different? Because it's owned by Microsoft? (although I would like to see what exactly they 'own' in this case, a 'button bar' is definitely not something they invented, and for example many winforms control packages have a system where you can define one set of commands which can be placed in a menu and on a button bar, equally.).
It's similar to the Office UI and the ribbon: for WPF, there aren't any button bar / toolbar controls available, just ribbon controls. You are of course aware of Office 2007, but say you're working in a company which has still Office 2003, and your application is a normal business application, not something which competes with Office 2007 at all. Why should you sign a license agreement with some company X while you purchased the ribbon control you're using from company Y and have never used or are hardly using this 'UI' from X? And why should you sign only a license agreement for that ribbon control with company X, and not for the docking framework, the statusbar, the grid etc. ?
Doesn't make any sense.
- The license contains traps
For an ISV, the 'I' in ISV is important: we're independent and want to stay that way: only an independent software vendor can create software which is really worth using, as it's not tied to the train of thought of another company. However, the Office UI license contains restrictions that you effectively sign your independency away to Microsoft: Microsoft decides if the UI of your application is 'compliant' with the license and you have to update your UI to make it compliant or you're in violation of the license, and you have at most 6 months for that, which means you can't decide when to make the changes: MS sets the deadlines for that. But that's not the biggest issue. The biggest traps are these:
c. Microsoft is the sole owner of the Microsoft IP. All goodwill arising from your use of the Microsoft trademark and trade dress rights granted to you in Section 2 will be for Microsoft’s benefit. The quality of your Licensed Products will be consistent with your other products, meet or exceed relevant industry standards and comply with all laws, rules and regulations that apply.
f. You will comply with all export laws that apply to the subject matter of this license.
For a European like me, I can't comply to rule f. The point is: the USA has different export laws for software than the EU. For an ISV outside the USA, the ISV and not MS nor the USA, is going to decide to which clients to sell the software, to which countries to export the software. Example: one can't sell a software application to an Iranian firm at the moment, according to the USA. However, the firm apparently has computers which run windows and .NET. It's up to the laws of the country the ISV is in, not to the laws of the USA nor to MS.
The rule c is a trap as well: since when is the hard work of a) the control vendor (not MS) and b) the ISV something MS can benefit from? Goodwill can come into play when the ISV is sold to another company: the total price to pay will contain 'goodwill'. MS also can't decide what the quality bar is that the ISV will bring into the application: who will judge if the ISV has met the 'relevant industry standard' ? Who even decides what that industry standard is? Microsoft?
I'm often accused of being negative instead of being one of the cheerio-pass-me-the-marketeese-pipe crowd, but keep in mind that the world of an ISV is somewhat different than that of a developer who can leave all legal crap to some suit on the top floor: if the people in charge of the ISV don't take care of these legal issues, if they don't look into the 'what if's, it might well be the ISV simply keels over due to a heavy lawsuit which drains them from all their money, even if they're totally correct and that on paper they should win. That's why ISVs should be careful.
It might sound far fetched, but these kind of issues will in the end affect the normal developer as well. Therefore every developer should say no to software patents and no to silly, evil, licenses like the Office UI license.
In the first part of this series I talked about the fact that Linq to LLBLGen Pro is a full implementation of Linq and why it's so important to use a full linq provider instead of a half-baked one. Today, I'll discuss a couple of native LLBLGen Pro features we've added to our Linq provider via extension methods: hierarchical fetches and exclusion of entity fields in a query. Furthermore some other features will be in the spotlight as well. What I also want to highlight is that using an O/R mapper is more than just filling dumb classes with dumb data: it's entity management, and the O/R mapper framework should offer you tools so you will be able to manage and do whatever you want with the entity graph in memory with as less problems and friction as possible. After all, the task you have isn't writing infrastructure code, entity classes nor code to make these interact with eachother, your task is to write code which consumes these classes, and works with these classes. This thus means that you should be able to work on that code from the get-go, as that's what your client expects from you
.
Exclusion / inclusion of entity fields in a query
The first feature I want to highlight today is the exclusion of entity fields in a query. Say you want to fetch a set of entities and the entities contain one or more large fields, e.g. a blob/image field or a text/clob field. If you don't need these large fields, it's useless to fetch them in your query, as the transportation of the large data (which can be many megabytes) could make the query a slow performer, as all the data has to be fetched by the database and send over the wire. LLBLGen Pro has a feature called Exclusion / Inclusion of fields, which allows you to exclude a set of fields from an entity when fetching one or more instances of that entity (exclusion). You can also specify the fields you want (inclusion) if you want to fetch just a few fields from an entity which has a lot of fields for example. If you want to fetch the fields back into the entities, that's possible too, LLBLGen Pro offers a special mechanism for that which efficiently fetches the excluded field data into the existing entities. We'll see an example of that later on in this post.
For this example, we'll fetch a set of Northwind Employee instances. The Northwind Employee entity has two large fields: an image (Photo) and an ntext field (Notes). Initially we'll fetch all the Employee entities using Linq and exclude the two fields, Photo and Notes:
// Listing 1
EntityCollection<EmployeeEntity> employees = null;
using(DataAccessAdapter adapter = new DataAccessAdapter())
{
LinqMetaData metaData = new LinqMetaData(adapter);
var q = (from e in metaData.Employee
select e).ExcludeFields(e => e.Photo, e => e.Notes);
// consume 'q' here. Use the Execute method to return an entity collection.
employees = ((ILLBLGenProQuery)q).Execute<EntityCollection<EmployeeEntity>>();
}
The code uses Lambda expressions which offer compile time checked correctness. Later on, we'll see how fields also can be excluded in hierarchical fetches. One could argue that this also can be achieved by a projection onto the EmployeeEntity type using a select new {} statement. That's true in theory, but it will likely be more work (as you have to specify all fields you do want) and it also will use a different pipeline internally (namely the one for custom types being fetched through a projection), and not the entity fetch pipeline.
This might sound strange but fetching entities is more than just putting data into a class instance. The biggest hurdle is inheritance. If you do a new projection, the instances to create are known: they're instances of the type specified in the projection, be it an anonymous type or a specific type. With entity fetches this is different: the type to instantiate is determined based on the data received from the database. What if the type specified in the projection isn't a known entity type? How can the system then create an instance of a subtype of that type if the data received from the database is the data of a subtype? Only in the case where the developer has specified a class of a known entity type, the same pipeline can be used, but that's not always the case, as the developer is allowed to specify any type, including anonymous types, as shown in the following example:
// Listing 2
using(DataAccessAdapter adapter = new DataAccessAdapter())
{
LinqMetaData metaData = new LinqMetaData(adapter);
var q = from e in metaData.Employee
select new {
e.EmployeeId,
e.FirstName,
e.LastName,
e.Title,
e.TitleOfCourtesy,
e.BirthDate,
e.HireDate,
e.Address,
e.City,
e.Region,
e.PostalCode,
e.Country,
e.HomePhone,
e.Extension,
e.ReportsTo,
e.PhotoPath,
e.RegionId
};
// consume 'q' here.
}
Here, we fetch the same data, though we fetch it into an anonymous type using a new projection. We omit the two big fields so effectively this is excluding Photo and Notes. However, what if Employee was an entity type in an inheritance hierarchy and the row returned from the database was for a subtype of Employee, e.g. SalesManager. I now would exclude more than just Photo and Notes, as I also would exclude the fields for SalesManager. With the ExcludeFields() extension method used in the first example, that's not the case: if Employee is in an inheritance hierarchy, all subtypes are fetched nicely and their Photo and Notes fields would be empty, as specified.
As all Employee instances have two fields left empty, it's of course necessary to fetch these into the entities again, if that's required. Let's say I consumed the query in Listing 1 and fetched it into an entity collection of Employee instances. Say I want to fetch all Photo and Notes data into the employees from the UK, which are in my employees collection fetched in Listing 1. I'll now create, using a Lambda filter which is run in-memory, an entity view on this employees collection with solely the UK employees. Creating a view is like creating a DataView on a DataTable: it's a view on a normal collection, and you can filter it, sort it and project it onto another object again. Creating this view doesn't affect the original collection. I can also create multiple views on the same collection, with different filters and different sortings. The nice thing about this is that I can bind all views to different controls, and it will look like I have multiple collections while I have only one. As the view is a view on a live collection, modifications on the collection will be shown in the view as well.
Listing 3 will show how to create the view on the employees collection with a filter on the Country field, which is specified as a Lambda and which is ran in-memory. The view is then exported as a new collection and that collection is used to fetch the Photo and Notes field data into the entities in the collection. We've to rebuild the collection of excluded fields, as this info isn't stored inside the entity, as this allows us to be flexible which excluded fields to fetch. The excluded fields fetch code doesn't use Linq, as it would otherwise have been a bit awkward to formulate the query.
// Listing 3
// create a view. Adapter uses EntityView2, SelfServicing uses EntityView
EntityView2<EmployeeEntity> employeesFromUkView =
new EntityView2<EmployeeEntity>(employees, e=>Country=="UK");
// create new collection with the data of the view (same entity instances)
EntityCollection<EmployeeEntity> employeesFromUk =
(EntityCollection<EmployeeEntity>)employeesFromUkView.ToEntityCollection();
// create the set of excluded fields to fetch, use initializers.
ExcludedFieldsList fieldsToFetch = new ExcludedFieldsList() { EmployeeFields.Photo, EmployeeFields.Notes};
// fetch the fields into the entities, using efficient batch queries and merging techniques
using(DataAccessAdapter adapter = new DataAccessAdapter())
{
adapter.FetchExcludedFields(employeesFromUk, fieldsToFetch);
}
After Listing 3 has ran, the Employee entity instances in employeesFromUk now have their Photo and Notes fields filled with data. As the view is just a view on an existing collection, the employee instances in the original collection are the same, so we effectively fetched the Photo and Notes fields in a selection of the entities in the original collection. We'll see exclusion of fields re-appear in our next section, about Prefetch Paths.
Hierarchical fetching of entity graphs using Prefetch Paths
One core part of working with entities is the ability to fetch graphs of entities efficiently. A graph of entities contains entities of multiple types which are related to each other. A typical example is a set of Customer entities which have their Orders collection filled and each Order entity has its OrderDetails collection filled, and each Order also refers to its related Employee entity. LLBLGen Pro has offered the ability to fetch these kind of graphs efficiently for a long time now and we've extended this into the Linq provider as well. In LLBLGen Pro this feature is called Prefetch Paths, and it's similar to spans (Objectspaces), Include (Entity framework) and to some extent even LoadOptions (Linq to Sql), however all of them are pretty limited compared to Prefetch Paths. LLBLGen Pro's Linq provider offers two ways to specify Prefetch Paths, and I'll use the more Linq-eske way, using extension methods written by Jeremy Skinner. These extension methods are include in the Linq to LLBLGen Pro provider.
I'll specify a fetch for the graph: Customer - Order - OrderDetails, Order - Employee. This is a multi-branch path, with 4 different nodes: Customer, Order, OrderDetails and Employee. LLBLGen Pro will therefore fetch this whole graph in just 4 queries, one for Customer, one for Order, one for Employee and one for OrderDetails. It will fetch only the data required for the graph and will merge the entities in-memory.
The Prefetch Path execution code uses some optimization techniques under the hood, for example it will use parameterized queries instead of subqueries if the number of parent entities is below a given, settable threshold. For example, if you're fetching all Customer entities from Germany and their Order instances, you can fetch the Order instances with an IN filter on Order.CustomerId and a subquery on Customer (with the filter on Country), but you can also create an IN query with just the PK values from the Customers already fetched. This is much more efficient, when the number of parent entities (here Customer) is small (say below 100). The framework will decide this for itself, so you don't have to specify anything. The framework doesn't use joins for path node fetching, because that is less efficient due to the duplication of data and also causes problems in multi-branched paths.
In Listing 4, we're fetching all Customer instances from Germany and their Orders, the Order's OrderDetails and the Employees who filed the Orders. Also, we're excluding Photo and Notes from the Employee instances fetched. Everything is merged for us by the framework so the end result is a collection of Customer instances and their related entities available through navigational properties (e.g. customer.Orders, order.Employee, order.OrderDetails), using just 4 queries!
// Listing 4
using(DataAccessAdapter adapter = new DataAccessAdapter())
{
LinqMetaData metaData = new LinqMetaData(adapter);
var q = (from c in metaData.Customer
where c.Country == "Germany"
select c).WithPath<CustomerEntity>(cpath => cpath
.Prefetch<OrderEntity>(c => c.Orders)
.SubPath(opath=>opath
.Prefetch(o=>o.OrderDetails)
.Prefetch<EmployeeEntity>(o => o.Employee).Exclude(e => e.Photo, e => e.Notes)));
// consume 'q' here.
}
Ok, let's break it down into pieces to discuss what happens here. As a Linq query is a sequence of statements (calls to Extension methods), and the Prefetch Path to use is a multi-branched path, we need a way to specify these multiple branches in a single line of code. This is done through the usage of multiple path definitions chained together with SubPath and Prefetch. The first few lines of the query are pretty straight forward: a query on Customer, with a filter on Country and a projection which selects the Customer instance. Added to that is a call to an extension method of Linq to LLBLGen Pro, WithPath.
WithPath is a method which allows you to specify a Prefetch Path to be used together with the query you call it on, in this case the query on Customer filtered on country. Through the usage of a Lambda expression we can define the path edge Customer - Order, using the Prefetch method on the path variable. We specify what to fetch, namely Customer.Orders, and after that we continue on the same path branch by specifying the path below Order and we do that by using the method SubPath. This method specifies a new path edge below the path edge it is called on. We define a new path edge for Order - OrderDetails using Prefetch again (using a shortcut version without generics) and we also define a second branch in the path, for Order - Employee. On that path edge, we call the Exclude extension method so we can define that the Employee instances fetched with this path should have their Photo and Notes fields excluded, as they're big and these aren't needed for now.
There are more methods defined, besides Exclude, to be called on a path edge. You can specify a filter for that path edge, e.g. if you wanted only the orders before a given Orderdate fetched in the above query, you could specify a Lambda filter on the .Prefetch<OrderEntity>(c => c.Orders) line using FilterOn, and the filter specified would of course be ran inside the database. Furthermore you can specify limiters (only fetch n instances) and a sort specification to order the fetched set. And paging with prefetch paths? Sure, paging is supported together with prefetch paths as well. As long as the page size is smaller than the set threshold. By default the threshold is set to 50, but you can adjust that to whatever you like with a parameter on the DataAccessAdapter instance. So if I add the following line below the query declaration in Listing 3:
q = q.TakePage(2, 3);
the framework will fetch page 2 of size 3 with Customer instances from the total set of Customer instances from Germany. The 3 Customer instances will be fetched together with their related entities as defined in the Prefetch Path.
As everyting is inside a graph, I can navigate that graph using normal property navigation. Also, because all collections of entities inside entities (e.g. customer.Orders) are entity collections, I can create entity views on them, similar to what I've showed above, and filter them, sort them and project them in-memory without touching the original collection. Don't make the mistake that this is similar to just running a Linq to Objects query on the collection: if I bind an entity view to a grid and add a row (which is a new entity), it's added to the collection. If I remove an entity from the collection and it happens to be in, say 3 entity views, it's removed from those 3 views as well. An entity view is a live view on a subset of the entity collection, with the awareness as if you're handling the collection.
Prefetch Paths of course support inheritance and are fully polymorphic. This means that you can specify path branches which are solely for some subtypes of a given entity fetched. This way, you're able to specify very powerful paths to fetch complex graphs with very little code.
There is another form of hierarchical fetches, using nested queries inside the projection, as I've described last time in short and also more in detail in part 14 of the Developing Linq to LLBLGen Pro articles, so rehashing here what's said there is a bit redundant. I'd recommend you to read part 14 if you're interested in how this works behind the scenes and why our mechanism is more efficient than say the one inside Linq to Sql
.
Next time I'll discuss more in depth the advanced method mapping capabilities in Linq to LLBLGen Pro to map .NET constructs onto database constructs, and will also give an example of how LLBLGen Pro's authorization feature works nicely with the Linq queries, thanks to our Dependency Injection framework, so you can exclude entities, hide data etc. based on the user using the data through authorizers you write yourself. Stay tuned!
Shawn Wildermuth passed the torch to me (among others) in a new version of an old blog-theme. Oh well, these are always fun
. It's more or less an interview-like way of pulling trivia out of people, so here we go!
How old were you when you first started programming?
I was 16, my younger brother and I bought a Toshiba MSX-1 (Z80 power!) back in 1986 after working for a couple of years as newspaperboys. From day one I started writing code, as that was the main reason for me to get the computer in the first place: to be able to write a little program which did things for you, it was like magic!
How did you get started in programming?
On my highschool, a math-teacher had installed large PC-like systems with 16KB ram and some basic interpreters. Once a week we were allowed to touch the keyboards, under his supervision of course. The first time I wrote a little program (by simply repeating what he told us), I was sold: writing code was the best thing since sliced bread for me. Everything I did after that was to become someone who would write software.
What was your first language?
MSX basic, though quickly after that I bought a book about Z80 assembler and started writing assembler on the MSX-1, and later on the MSX-2. Z80 assembler is very clean and because you don't have much room on the chip (just a few registers), you have to find all kinds of solutions to tiny problems which pop up in your way, which of course is great when you're beginning with writing software.
What was the first real program you wrote?
It might sound geeky, but the first program I wrote was in MSX basic and it solved the Quadratic Equation for a given a, b and c. Pretty simple (ok, back then it was a challenge!)
What languages have you used since you started programming?
MSX Basic, Z80 assembler, Pascal, Modula-2, MC68000 assembler (amiga), APL (for CPU design emulation), C, SQL, Prolog, Miranda (functional programming), Lisp, C++, VB5-6, Java, JavaScript, VBScript, C#. I'm sure I forgot some...
What was your first professional programming gig?
At Triple-P, working on RoadRunner, which was a software product for transportation companies. The product was written in some 4GL system, on top of uniVerse, a post-relational database system (which still exists). I was a fresh B.Sc. graduate in computer science and tried to convince everyone over there to change their horrible ways of how they wrote software but of course all they did was laugh at me and did their own thing
. I learned a lot about how a real-world software project works, when most of the people on the project have learned how to write software while working on a similar project. I can't say it was a disaster, I had a lot of fun there, but I didn't learn a lot, technically (except of course that there are weird database systems like uniVerse).
If you knew then what you know now, would you have started programming?
Absolutely! There's nothing else in the world I'd like to do otherwise. Every day I am allowed to do what I like the most, and they even give me money for it
.
If there is one thing you learned along the way that you would tell new developers, what would it be?
Get an education from a good school, e.g. a B.Sc. degree or something like that, in software engineering or computer science. Learning 'on the job' isn't going to give you all the knowledge you might want to have later on in your software engineering life. Sure, most B.Sc. courses are dull and force you to wade through dark areas of computer science you don't want to hear about ever again, but at the same time they give you the unique oppertunity to learn things you'd otherwise not be able to learn because the job takes all the time during the day and during the evenings and weekends, you'll likely do something else than dig through books. So if you have the chance: go to a good school and finish it.
What's the most fun you've ever had … programming?
What I find one of the greatest thing in writing software is that every day you're faced with puzzles you've to solve. And when you find the answer, it's giving a great sense of joy, at least for me, but perhaps I'm weird, dunno.
. So the 'most fun', I think there are too many situations to mention: winning a demo competition at an Amiga demo party (demoscene), finally finding a bug after two days of hunting for it, finishing a part of a project you'd never thought you'd be able to do because it seemed so complex, finding a total unrelated situation where you can apply a class you wrote two days earlier and it is the perfect fit... etc..
So who's next?
It's always difficult to make a selection. So, please don't be mad if you're not on the list below.
I've again disabled the email form on this blog, because I now get about 15 spam emails every 5 minutes through this form and enough is enough. Apparently it's not fixable by the blog-engine overloads, so this is the only option. If you want to contact me, my email address is on the about page.
Some people asked me what the highlights are of Linq to LLBLGen Pro, which was released this week, as it seems that Linq support is apparently growing on trees these days. In this and some future posts I'll try to sum up some of the characteristic features of Linq to LLBLGen Pro, so you don't have to wade through the 15 articles I wrote about writing Linq to LLBLGen Pro
. I'll write several of these articles, this is the first one. I hope to write more of them in the coming weeks.
Linq to LLBLGen Pro is a full implementation of Linq
The first feature I'd like to highlight is the simple fact that it's a full implementation of a Linq provider. You now might think "Isn't that obvious? It is a Linq provider", but I've to spoil that dream for you: Most Linq 'providers' out there are just implementations of a small subset of what a Linq provider is expected to do. You see, implementing a Linq provider isn't just about writing a handler for MethodCall expressions to Queryable's extension methods. That's just a small part of it. The main part is about writing code which supports everything you can run into when traversing and handling an Expression tree. Everything. There's no room for compromises, sadly enough: if you don't implement a feature offered to the developer through Linq, it might be the developer is unable to write the query with Linq, as you can't mix and match things to form a SQL query in the end: the query is written in Linq or it's not.
If you run the risk of getting exceptions at runtime because the expression tree contains unsupported elements or constructs or worse: whole subtrees which aren't understood, would you use such a Linq provider? You might think this isn't a big deal, but the thing is: you only know exactly which expression tree is generated and passed to the Linq provider when you run the code: it is perfectly possible that the provider only expects Queryable extension method calls at the root of the tree, and it for example doesn't expect a 'Where' call at one side of a Join for example.
Writing a Linq provider is a lot of work which requires a lot of code. If you're dealing with a Linq provider which is just, say, 32KB in size, you can be sure it will not support the majority of situations you will run into. However, the O/R mapper developer likely simply said 'We have Linq support', and it's even likely the provider can handle the more basic examples of a single entity type fetch with a Where, an Order By or even a Group By. But in real life, once you as a developer have tasted the joy of writing compact, powerful queries using Linq, you will write queries with much more complexity than these Linq 101 examples. Will the Linq 'provider' you chose be able to handle these as well? In other words: is it a full Linq provider or, as some would say, a 'toy' ?
For developers who will use a Linq provider, it's often a tough call how to decide what will be a solid Linq provider to work with and how much 'Linq' is actually implemented and supported by the used Linq provider. Below I've added a list of questions you can ask yourself when you're testing out a Linq provider. Perhaps some aren't important to you now, but consider that a software project often lasts for several years: most time on a software project is spend during maintenance, so the provider chosen has to be able to deal with many, many cases, and you shouldn't be forced to swap out the provider (and thus the O/R mapper) later on.
It can be the O/R mapper doesn't support a given feature in general, e.g. UNION queries. It's then unlikely that the Linq provider will support the feature. That's not the set of problems I'm talking about here: what I'm talking about are queries which are expected to work as Linq queries, considering the feature set of the O/R mapper (e.g. it supports paging, so paging through Linq queries should also work) but are failing at runtime due to the lack of support in the Linq provider.
The list below is far from complete, I'll update it if more topics are brought forward. I've compiled this list mainly from memory from my experiences when writing Linq to LLBLGen Pro where countless hours have been spend on answering 'what if?' questions related to what is possible with Linq.
- Can it do joins?. Linq has two types of joins (it also has GroupJoins which are discussed further below)
// Type A: typical cross join
var q = from c in md.Customer
from o in md.Order
where c.CustomerId==o.CustomerId
select c;
// Type B: real join clauses
var q = from c in md.Customer
join o in md.Order on c.CustomerId equals o.CustomerId
select c;
Type A results in a MethodCall expression to SelectMany(), Type B results in a MethodCall expression to Join(). SelectMany() is typically handled through a CROSS JOIN SQL statement, Join() is typically handled through INNER JOIN SQL statements (or non-ansi joins if the database requires it, like Oracle 8i). If the Linq provider doesn't handle type B, you're stuck with cross joins which can seriously slow down your queries at runtime. Also keep in mind that with type B you can only create INNER JOINs. To produce LEFT / RIGHT joins, you need support for GroupJoin and DefaultIfEmpty.
- Can it handle GroupJoin and DefaultIfEmpty ?. A GroupJoin is a Linq specific type of join: it's a combination of group by and a join. As it doesn't really have a SQL equivalent, it's a bit difficult to translate it to SQL. GroupJoin is one of the extension methods of Queryable which isn't implemented in many Linq providers out there. However, you need it if you want to do LEFT/RIGHT joins, as it works together with DefaultIfEmpty:
var q = from c in md.Customer
join o in md.Order on c.CustomerId equals o.CustomerId into co
from x in co.DefaultIfEmpty()
where x.OrderId == null
select c;
This query produces a LEFT JOIN between Customer and Order and filters out any customer with one or more orders. LEFT / RIGHT JOINs are a vital part of any querying system, and therefore if you can't express these kind of queries, you're likely going to have a struggle writing efficient code, or you're forced to use another querying API than Linq which makes the whole point of using Linq in the first place rather moot. The biggest pain with GroupJoin is that the actual GroupJoin resides in one part of the expression tree, but the reference where it is actually being used, the DefaultIfEmpty, is in another part of the expression tree. This might not sound like a big deal, but it can be the GroupJoin is behind an alias border, which means the GroupJoin isn't directly reachable as it seems, but it should be reachable as the DefaultIfEmpty requires it: as if you pull one subtree into another part of the expression tree however with the scope of the new location. This often results in aliasing nightmares and other complex problems.
- Does it support Linq on all supported databases? A typical O/R mapper supports more than one database, for example SqlServer, Oracle, DB2, MySql etc. Is Linq supported on all these databases? Or is Linq only supported on a subset of these databases? One great point of Linq is that you can write queries without having to worry about database specific issues, at least, if the Linq provider offers enough features.
- Can it handle boolean values anywhere in the Linq query, also on other databases? Boolean values aren't supported in SQL, at least not in the SQL standard which is implemented in most databases. Sure, WHERE clause predicates are boolean expressions, but ever tried to use a boolean expression in the SELECT clause ? Another point is that databases like Oracle don't have support for types which can be mapped as booleans: it lacks a bit type. Does the Linq provider offer the ability to map any field in an entity to any type possible, so you can use that field with that type in a Linq query, no matter what database is used? In Linq to LLBLGen Pro I can write the following query and it works on all databases supported:
var q = from p in md.Product
where !p.Discontinued
select p;
Discontinued is a boolean field in the Product entity. Through type converter technology I can create my own type converter and map any field with any .NET type (so also your custom classes) to any database field type. Booleans in projections are also something which isn't always supported by the Linq providers out there:
var q = from c in md.Customer
select new {
c.CustomerId,
HasOrders = (c.Orders.Count() > 0)
};
The query above seems simple, but it requires a CASE statement under the hood to convert the boolean expression into 1 or 0 which are then converted back to a boolean in the projection. It might be that the Linq provider decided to do this completely on the client, inside the Linq provider, so they return the c.Orders.Count() results. But this gives the problem that if the query is folded into a subquey, it goes wrong:
var q = (from c in md.Customer
select new {
c.CustomerId,
HasOrders = (c.Orders.Count() > 0)
}).Where(c=>c.CustomerId.StartsWith("C");
Here, the complete query is folded into a derived table and surrounded with a query which filters on the inner result. But, it still has to bring out the boolean value. The SQL query looks like this:
SELECT [LPA_L1].[CustomerId], [LPA_L1].[HasOrders]
FROM
(
SELECT [LPLA_1].[CustomerID] AS [CustomerId],
CASE
WHEN (
SELECT COUNT(*) AS [LPAV_]
FROM [Northwind].[dbo].[Orders] [LPLA_2]
WHERE [LPLA_1].[CustomerID] = [LPLA_2].[CustomerID]
) > @LPFA_11
THEN 1
ELSE 0
END AS [HasOrders]
FROM [Northwind].[dbo].[Customers] [LPLA_1]
) [LPA_L1]
WHERE [LPA_L1].[CustomerId] LIKE @CustomerId2
and the output looks like this:
{ CustomerId = CACTU, HasOrders = True }
{ CustomerId = CENTC, HasOrders = True }
{ CustomerId = CHOPS, HasOrders = True }
{ CustomerId = COMMI, HasOrders = True }
{ CustomerId = CONSH, HasOrders = True }
- Can it combine queries into one single query? Most textbook examples of Linq are one query, stored in a var typed variable and often consumed right away. But, you'll soon find out that you'll write a lot of queries which actually are the same queries but for example have a different Where clause or a different projection. This can be solved by using routines which produce different queries which are then combined into one query. The expression tree however won't automatically fold these query's subtrees into the main tree: they'll appear as ConstantExpression instances. The Linq provider therefore has to be able to pull these expression trees into the main expression tree, and in such a way as if the subtrees always were part of the main tree. If the Linq provider at hand doesn't grasp the concept of external trees being folded into another query, you'll not be able to share query fragments through methods which are used inside the actual Linq query.
- Can it do Group By in C# and VB.NET? With multiple aggregates? GroupBy is an extension method which is handled differently by C# and VB.NET: the VB.NET compiler generates the projection into the GroupBy method call expression, the C# compiler always emits a separate Select method call expression which simply refers to the Group By expression sub tree. Needless to say: if you like one of these two languages, you've to make sure your Linq provider of choice handles a GroupBy in your language of choice well. With C# it's more complicated, as the separate Select is expected but you can't simply 'look ahead' for it: you've to keep track of the GroupBy and when you run into the Select, you've to pull it out of your big hat and re-use it. Though not just simply 'use it': the Select's projection has to be folded into the GroupBy's query as the projection, in any situation. One problematic issue is multiple aggregates on a group by which groups on multiple fields. It has to do 'folding' of query fragments into the GroupBy query to make everything work (as in SQL, all aggregates have to be present in the projection of the group by query). This leads to complex code if you want to support multiple aggregates in a Group By. Be sure to check for this if you expect to group on multiple fields. (Yes, that's possible in Linq, didn't you know?
)
- Can it handle let? Let is a keyword in Linq queries which allows you store a query result into a variable which is from then on used instead of the query assigned to the variable. See the example below:
var q = from c in metaData.Category
let x = c.Products.Select(p => (int)p.UnitsInStock).Sum()
where x > 500
orderby x ascending, c.CategoryName descending
select c;
Here, we store the number of products of a category c which are in stock into the variable x and select only the categories which have more than 500 products overall in stock. We re-use x in the order by clause. In SQL, which is a set-based language, there is no such concept as let. Using let results in a wrapping query around the main query with the Sum() and in the Where and Order By the value of the Sum() is referenced. Let is one of these statements which aren't always straight forward for a Linq provider, so pay attention to what you want to do in Linq and if you want to use this statement.
- Can it produce server-side paging queries? On all supported databases? Paging is a feature which is a crucial benefit of using an O/R mapper framework: without any effort the framework produces a query which allows you through page through a big resultset (e.g. millions of rows). Though, it's only efficient if the paging takes place in the SQL query, or during fetching of the rows, not after the whole set is fetched into memory. Linq itself has two statements, Skip and Take, which combined offer the abililty to specify which page of data to obtain. Some O/R mappers, like LLBLGen Pro also offer their own extension method to specify which page to retrieve, as Skip and Take could cause confusion, if Skip is mentioned first or if Take is mentioned first, it does make a difference in the syntaxis of Linq.
- Does it handle in-memory object construction inside a query? And in-memory method calls? Linq allows the developer to mix code which runs in-memory with code which is translated into a query to be run on a database server. For example it should be possible to call a method on an object inside the projection which processes the value from the database before it is stored inside the object to return. It also should be possible to use array and collection constructors inside a linq query in combination with Contains() calls, so you don't have to first setup the collection, then use it in the query: you can define it right inside the query. See the example below:
var q = from c in metaData.Customer
where !new List<string>(){"FISSA", "PARIS"}.Contains(c.CustomerId)
select c.Orders;
This query retrieves all order collections for customers which actually have orders in Northwind by specifying the CustomerIds of the customers which don't have an order defined. This example is fairly simple, but far more complex examples are thinkable, also with in-projection method calls which perform last-minute in-memory processing of values. Using in-memory objects and methods inside the projection is a key element of Linq: you can process the values retrieved from the database by writing simple code inside a query. It does require that the Linq provider understands what's in-memory code and what's not in-memory code.
- Does it offer a flexible way to map .NET methods/properties onto DB functions / constructs? We all know the LIKE statement in SQL, and a typical way to produce LIKE queries is to use the string methods StartsWith, EndsWith or Contains (if they're handled by the provider, of course
). However, what if you have a database function and you want to call that function inside a query? Linq only understands .NET methods and types. The way to do this is to map a .NET method onto a database construct, for example a database function, and simply specify the .NET method in the Linq query and the Linq provider then translates that method call into the usage of the database construct it is mapped on. This way, you're able to utilize the large library of database functions in for example Oracle or DB2, straight from .NET. Also, with these mechanisms it's possible to add Full Text Search support to Linq, and not only for SqlServer but also for MySql for example. More on this in a later episode of this series.
- Can it do hierarchical fetches? Of entities? Efficiently? Linq allows you to specify another query in the projection of a query, which can contain one or more queries in its projection etc. etc. It's key for performance that the Linq provider fetches these nested queries as efficiently as possible, i.e.: one SQL query per nested Linq query. Take this simple example:
var q = from c in metaData.Customer
select new {
c.CustomerId,
Orders = (from o in c.Orders
select new {
o.OrderId,
o.OrderDate })
};
How many SQL queries should this Linq query result in, if we consider that there are say 50,000 customers in the database? Only two: one for the customers and one for the orders. Another typical example is fetching graphs of entities: is it possible to specify which entities to fetch in a graph, like fetch all customers from 'Germany' and all their orders? Linq doesn't offer a facility to specify such a query, which leads to the question if the Linq provider offers this facility, together with the question if the graph is fetched efficiently. I'll go deeper into this feature in a later episode of this series.
- Can it deal with Nullable types and implicit casts? .NET supports nullable types, and databases support NULL values. A match made in hell, err... heaven. Nullable types have two properties: HasValue (a boolean, see the point above about booleans) and Value, the actual non-null value. These two properties can be used inside a Linq query as well, in various places. But... it's also possible to compare a Nullable type with a variable, or a value, which aren't of a Nullable type. The developer can also compare the HasValue with a boolean value, or a variable. Which leads to a different expression subtree. Add to that that the location where HasValue is compared to a value could lead to a CASE statement or not, and you're in for a lot of fun
.
- Does it handle type casts for inheritance scenario's? Can it handle inheritance types in Linq queries? O/R mappers typically support inheritance in one way or the other: they let you derive a subtype entity from a supertype entity and for example allow you to add a relation between the subtype and another entity. Linq offers a couple of ways to specify which types should be used in a query, and not all of them are through a Queryable extension method. If you want to use inheritance in your project, be sure your O/R mapper's Linq provider allows you to filter and specify which types to fetch in a Linq query.
As I said in the beginning, I'm sure I forgot to mention some topics to look into. Like the gazillion ways a Contains query can be written, or if the Linq provider supports joins between queries which again contain joins between entities and queries etc. but this article has already grown too long. The general point is this: if some O/R mapper vendor, and we are one of them so you can apply the same logic to our own code as well, claims 'Linq support', be sure it supports the Linq constructs you're looking for now and in the future.
Don't make the naive mistake that you won't be needing all those fancy joins, hierarchical fetches and what have you: you will need them, simply because they make life very easy for you as a developer. You want to know how simple? Let's close this post with an example of how easy Linq can make it, as it offers the combination of very powerful features right there at your fingertips. The example fetches a hierarchy, using a group by. Though instead of running an aggregate, it returns the whole grouping result. This results in a hierarchy where per key (the field(s) the set was grouped on) all matching elements are stored. In the example below it fetches per Country all Customer entities. It takes 4 lines of code and two SQL queries: one for the group by keys, and one for the data per group. I left the unit-test code around it so you have an idea what the data looks like (Linq to LLBLGen Pro, Selfservicing paradigm)
[Test]
public void GroupingOfEntitiesByGroupByKeyUsingGroupByVariable()
{
LinqMetaData metaData = new LinqMetaData();
var q = from c in metaData.Customer
where c.Country != null
group c by c.Country into g
select g;
int count = 0;
foreach(var v in q)
{
int customerCount = 0;
foreach(var c in v)
{
Assert.AreEqual(v.Key, c.Country);
customerCount++;
}
Assert.IsTrue(customerCount > 0);
count++;
}
Assert.AreEqual(22, count);
}
In future episode's I'll discuss excluding fields, dealing with hierarchical fetches of entities and custom projections, function mappings and much more.
After almost 11 months of design, development, beta testing and adding final polish, it's here: LLBLGen Pro v2.6!
This version, which is a free upgrade for all our v2.x customers, has a couple of major new features, the biggest of course being the full implementation of Linq support in our O/R mapper framework. The work on our Linq provider, which we've dubbed 'Linq to LLBLGen Pro', lasted almost 9 months and was discussed on this blog in a series of articles, which I'll linq (
) to below.
In the beginning of writing the Linq provider, I was pretty optimistic that it would be easy and quick, but after a while I got very pessimistic and wanted to skip it entirely as it would simply cost too much effort, and therefore time and money. The main reason was the lack of serious documentation and background on various essential details like which expression trees were formed from which linq queries, and how to understand them in full so a meaningful query could be produced from them to run on the database. Anyone who has written some form of Linq provider or is currently busy doing so will run into this problem. Doing trial-error development/research for a couple of months in a row isn't a picknick, but it's also part of being a Software Engineer so it always left me with a mixed bag of what to think of it: it's exciting and interesting, but also frustrating.
A good example of the lack of serious documentation on expression trees is the way the VB.NET compiler compiles a group-by query into an expression tree vs. how the C# compiler does that: the C# compiler always adds a separate .Select() method call, the VB.NET compiler doesn't. In theory, the VB.NET compiler is right: the group-by clause has to be in the same query scope as the projection, but as C# has to be supported as well, you have to build some form of 'look-ahead' inside the tree to see when / if / how the projection is present after the group-by expression is seen. This isn't documented anywhere. Another one is the way how anonymous types are detected. There's no boolean on the Type object which tells you 'This is an anonymous type'. So you check the name. The C# compiler generates names which start with <>, the VB.NET compiler generates names which start with $VB$. And probably yet another language which runs on the CLR and which adds Linq support might choose another prefix. Nowhere is this documented but it is sometimes required to know that the type you're dealing with is an anonymous type.
But that's all water under the bridge now. Looking back, I'm so incredibly happy that management did succeed in motivating me to go on and continue working on the Linq provider, as the end result is one of the most feature-rich Linq providers available on .NET today. With the future technology coming from Microsoft like Dynamic Data and ADO.NET Data services, but also with IQueryable supporting controls from third-party developers like DevExpress, it's becoming more and more clear that a modern O/R mapper system has to have deep and solid Linq support.
What's new?
I won't enlist everything here, just a few items of what's new in LLBLGen Pro v2.6.
- Full Linq support with our own Linq to LLBLGen Pro provider.
- .NET 3.5 support. With code changes in the runtime so it works better with Linq to Objects and with VS.NET 2008 project templates
- Derived table support. Use any query as source for a join side or as From clause
- Much lower memory consumption during transactions: 90% less memory overhead for temp values during transactions. Temp values are used to be able to roll-back to the start state of the entity graph when a transaction rolls back (PK's roll back, FK's synced with the new PK values roll back etc.)
- Up to 20% less memory usage for entity graphs
- String uniquing. When fetching a lot of redundant string data, the same string instance is now re-used to avoid unnecessary memory consumption. This is done without string interning.
- SqlServer 2008 support, SqlServer CE 3.5 support, CF.NET 3.5 support
- Using SqlServer CE Desktop is now much easier
- Plus... a lot of small, but important changes and enhancements.
Linq to LLBLGen Pro development articles
For the people who want to re-read all the articles on the development of Linq to LLBLGen Pro, they're linked below.
- Developing Linq to LLBLGen Pro, part 0
- Developing Linq to LLBLGen Pro, part 1
- Developing Linq to LLBLGen Pro, part 2
- Developing Linq to LLBLGen Pro, part 3
- Developing Linq to LLBLGen Pro, part 4
- Developing Linq to LLBLGen Pro, part 5
- Developing Linq to LLBLGen Pro, part 6
- Developing Linq to LLBLGen Pro, part 7
- Developing Linq to LLBLGen Pro, part 8
- Developing Linq to LLBLGen Pro, part 9
- Developing Linq to LLBLGen Pro, part 10
- Developing Linq to LLBLGen Pro, part 11
- Developing Linq to LLBLGen Pro, part 12
- Developing Linq to LLBLGen Pro, part 13
- Developing Linq to LLBLGen Pro, part 14
Hereby, I'd like to thank all the beta-testers and all the others who have supported us and have given feedback in one way or the other!
.
In the next weeks we'll be releasing updated code for our Dynamic Data support and also new code for support for ADO.NET Data Services (Astoria). We'll also publish our Linq to Sql templates on our main site so people who aren't yet a customer can try them out as well.
The coming year
In the coming year, we'll be working on LLBLGen Pro v3, something I'm very excited about. It won't just be a designer upgrade, it will be a designer revolution, combined with a new way of generating code. Of course, we'll be bringing further enhancements and fine-grained tweaks to our own O/R mapper framework and runtime. The project goals are huge, but that's what makes things interesting, right?
One of the first steps is a transactional graph manager which can manipulate object graphs in-memory on a transactional basis. I hope to blog about that sometime soon.
But for now, a day of rest and sunshine
.
More Posts
Next page »