Frans Bouma's blog

Generator.CreateCoolTool();

Syndication

News




    Add to Technorati Favorites

About me

Fun stuff I created

My work

November 2003 - Posts

Eric Sink hints SourceGear is for sale

Eric blogs that his company SourceGear is the ideal candidate to fill up the gap that is open at the moment in the VSS department at Microsoft.

I never used SourceGear's products, but for what I've read about their work it indeed seems an ideal candidate to replace VSS related material. I wondered while reading Eric's words: does Eric know more or is this just a hint? History will tell... :)

Posted Friday, November 21, 2003 6:45 PM by FransBouma | with no comments

Filed under:

Roundtrips and the real bottlenecks

Now the discussion of yesterday about Dynamic SQL and Stored Procedures are slowing down, I'd like to mention a little thing that popped up during the debate: roundtrips.

A roundtrip is a new request from client to database server with a response from the database server. This takes time, there is no doubt about that. I'll admit that a batch of statements in a stored procedure is easier to work with than a set of statements to create a batch of dynamic sql queries. Some argued that the call to a single stored procedure requires just 1 roundtrip and creating and executing a bunch of dynamic queries would require sometimes more than 1 roundtrip (e.g.: for each dynamic query 1 roundtrip).

This triggered me (pun intended) to think about this further. I wondered: if these people are so focussed on roundtrips and the costs of a roundtrip, are these people also using DataReader.GetBytes() ? Are these people also batching all calls required for, say, a webform, into 1 stored procedure? A wild guess from my side: no I don't think a lot of these people use GetBytes() to do efficient blob retrieval and no they're not batching multiple calls of a page into 1 stored proc. Why are these 2 obvious random features related to roundtrips? Well, focussing on the time it takes a roundtrip over an already opened connection is performed (a few milliseconds to set it up), implies all other bottlenecks in a conversation between client and server are removed already and 'hidden' extra roundtrips are optimized away. I have doubts about that. Performance is important, but if you want to discuss benefits of method A compared to method B, using the argument of the roundtrip is perhaps not the best argument you can pick, because the time a roundtrip may take is far less than for example the time it takes to fetch large blobs in resultsets.

I can imagine a lot of people simply ignore DataReader.GetBytes() and read the complete blob at once, stored procedure or not. This can be very inefficient. When roundtrips concern you because they take time, make certain that you have removed every other bottleneck first, starting with unnecessary fetching of blobdata, after removing the overhead time of using a DataAdapter of course.

When you have optimized your dataretrieval of individual rows and batches of rows, there is another roundtrip issue to be discussed: when for example a webform requests 3 or 4 batches of data to build the webform's content, are these requests bundled into 1 call? If not, you have 3 or 4 roundtrips to the server, while there could be 1. When you are concerned about roundtrips and the time it takes to make one, you definitely shouldn't leave these 2 or 3 unnecessary requests go unnoticed and optimize them away. After all: a roundtrip seems to be important.

Ah, you can't? Because that would interfer GUI design with DAL implementation, it would make the DAL be tight to the GUI's layout? That's right. You can't, or better: you shouldn't. A DAL is a generic service layer, which shouldn't have any GUI-related design aspects, after all: changing the GUI shouldn't affect the DAL. That makes the aspect of talking about roundtrips a little silly don't you think? Focussing on 1 or 2 extra roundtrips for Dynamic SQL statements (which can be batched as well, but that aside) as being a 'disadvantage', while roundtrips which are obvious are not optimized away because of the generic approach of layered design, doesn't seem to me a valid argument. OR you optimize the cr*p out of every line of code and remove every roundtrip there is OR you accept roundtrips exist and deal with the other bottlenecks to keep the performance as high as possible. You can't do both because optimization requires compromises on the design and vice versa.

I'm all for discussions with fair arguments. To me, using 'the amount of roundtrips' for technique A compared to technique B isn't a fair argument, because it is irrelevant when other aspects of a software system is taken into account. Optimization doesn't start with optimizing tiny loops. It starts with optimizing the general algorithms used.

Posted Wednesday, November 19, 2003 11:23 AM by FransBouma | 10 comment(s)

Stored procedures are bad, m'kay?

Let me start with a blunt statement: stored procedures are bad, they are a bad way to formulate data-access logic. I can't state that enough. Today I stumbled into a blog by Rob Howard, which tries to convince the reader that whatever you do, use stored procedures! With tears in my eyes I've read the arguments he brings to the table. Some are silly and one is even based on completely wrong information and assumptions and so far away from the truth it hurts. I've blogged about stored procedures before (here, here and here) and I used them a lot for 8 years, but I'm now almost stored procedure 'free' for 8 months now, and I love the feeling. The reason is obvious: the stress of maintaining a lot of stored procedures, to write another stored procedure for each thing you want to do, is gone. Dynamic SQL is the future. (Dynamic SQL is generated on the fly by a generic piece of code which gets various data as input and generates a parametrized query from it. Which can be cached on the client and will be cached on the server. Dynamic SQL generated based on objects written in C# or VB.NET).

Let's walk through Rob's list of arguments and see what we can make out of it. Let me start by saying that Rob's list of arguments is the default list of arguments any DBA will offer you when you ask him/her why stored procedures are better. However, keep in mind that the DBA most likely will offer you that list to save his/her job.

  • Ad-hoc SQL Script is brittle
    Sure, it is generated on the fly, it is gone when it is executed (except for the execution plan, Rob, but more on that later). Rob comes with the argument that putting ad-hoc SQL in your BL code is not that good. Agreed, but who said that there is just one other option to solve that problem and that option is stored procedures? ... small changes to the database can have severe impacts on the application. Yes, that's right, however changes to a relational model will have always an impact on the application that targets that model: add a non-nullable column to a table. You can use stored procedures or ad-hoc queries, you have to change the calling code to make sure that column gets a value when a new row is inserted. For Ad-hoc queries, you change the query, and you're set. For stored procedures, you have to change the signature of the stored procedure, since the INSERT/UPDATE procs have to receive a value for the new column. This can break other code targeting the stored procedure as well, which is a severe maintenance issue. A component which generates the SQL on the fly at runtime, doesn't suffer from this: it will for example receive an entity which has to be saved to the database, that entity contains the new field, the SQL is generated and the entity is saved. No maintenance problems. With a stored procedure this wouldn't be possible.
  • Security
    Ah yes, the good old dead horse, security! The most heard argument in favor of stored procedures is that with using stored procedures, security can be controlled using a fine grained mechanism: place the execution rights on the procedure and you're set. Well... ever heard of role-based security? Rob Howard hasn't obviously. Security is a subject that has to be taken seriously, very seriously. Therefore, a good DBA defines user-roles in SqlServer. Users are added to those roles and rights are defined per role, not per user. This way, you can control which users can insert / update and which users can for example select or delete or have access to views in an easy way: in most applications you have 2 roles: the average user, and the administrator user which configures the application. Define 2 roles in the SqlServer database, place the users in the right role, define the rights per role on the tables and views and off you go: fine grained security which works, without a single stored procedure in sight. A new user is added? You just add it to a user role and it has the rights it should have.
    Another nice object in SqlServer, which I already mentioned earlier, is the view. Views are there to control which data is accessed on a column basis or row basis. This means that if you want user U to select only 2 or so columns from a table, you can give that user access to a view, not the underlying table. The same goes for rows in one or more tables. Create a view which shows those rows, filtering out others. Give access rights to the view, not the table, obviously using user-roles. This way you can limit access to sensitive data without having to compromise your programming model because you have to move to stored procedures. Views are totally ignored by Rob Howard, while the view is especially created for this purpose. See Books Online: Using Views as Security Mechanisms.
    Rob mentions SQL injection attacks. SQL queries which are constructed by simply concatenating data into a query statement are indeed open for SQL injection attacks. However because ADO.NET has a great parameter support, why would anyone neglect this fine instrument? A good Dynamic SQL engine creates parametrized queries, which are not only faster (because the execution plan is cached, yes Rob, it is), they are also not open for SQL injection attacks due to the parameters.
  • Performance
    Everyone who thinks stored procedures are pre-compiled, say "Aye!". Whoa, what a noise! For all of you who said "Aye!" a few seconds ago: open SqlServer's Books Online (v7 or v2000, doesn't matter), search for "cache execution plan". You'll find fine articles like "Execution Plan Caching and Reuse" and "SQL Stored Procedures". Let me just quote some lines from the "SQL Stored Procedures" article:
    SQL Server 2000 and SQL Server version 7.0 incorporate a number of changes to statement processing that extend many of the performance benefits of stored procedures to all SQL statements. SQL Server 2000 and SQL Server 7.0 do not save a partially compiled plan for stored procedures when they are created. A stored procedure is compiled at execution time, like any other Transact-SQL statement. SQL Server 2000 and SQL Server 7.0 retain execution plans for all SQL statements in the procedure cache, not just stored procedure execution plans.
    I didn't make that up, people. It's there, for a long time (since SqlServer 7.0).
    However, what does Rob Howard say? I quote:
    There are also internal performance benefits to SQL Server for using stored procedures vs. ad-hoc SQL script. When stored procedures are used SQL Server can cache or pre-compile the ‘execution plan’ that it uses to execute the SQL vs. having to recalculate the execution plan on each request.
    Isn't that a little wrong, Rob? There is no precompilation of stored procedure code in SqlServer. It caches execution plans for each query, also ad-hoc queries. Even better: it will parametrize queries which don't even have parameters to keep the execution plan in the cache!
    Rob also claims that you can better loop through rows in T-SQL (in 'many cases') than in the data-access layer because doing it in T-SQL is faster (according to Rob). Now, looping through rows in T-SQL requires a cursor. A cursor will always create a temp table in tempdb, which can hurt performance pretty bad if tempdb is full and has to be resized. Doing a lot of looping will require that sooner or later. Besides that, T-SQL is not a procedural language, it's a set-based language (© Joe Celko). This means that the only power T-SQL has over procedural languages is that it can work on sets of data very fast. Looping through a resultset means that you didn't understand what set-based means or what you want can't be done in a set-based batch of statements. It's then better to choose the best language for the procedural statements instead of using a set-based language for this and feed it the least amount of data to perform its task.
    Now, let me add something about performance here. Say, you have a database with 100 tables, with an average of 7 fields per table. This requires 100 update stored procedures, at least (the crUd procs). These 100 procedures will have on average 7 parameters to update all fields at once, because you can't create optional update statements in stored procedures. You can't create stored procedures for all possible combinations either, that would require 100*7! procedures, which is quote a lot. A component which will create the SQL on the fly will tailor the UPDATE query to the job it has to do, for example updating that single varchar field in a row with 2 blob fields. The blob fields aren't passed to the database again, because that field isn't updated. Another nice thing about creating SQL on the fly is that you can create the WHERE predicate on the fly. Foreseeing every possibly WHERE predicate for an UPDATE stored procedure is undoable. You can fall back to optional parameters, but that will do you no good either, because COALESCE isn't that great when it comes to performance. (see my previous blogs about dynamic SQL and stored procedures).

Stored procedures also will open up a maintenance problem. The reason for this is that they form an API by themselves. Changing an API is not that good, it will break a lot of code in some situations. Adding new functionality or new procedures is the "best" way to extend an existing API. A set of stored procedures is no different. This means that when a table changes, or behaviour of a stored procedure changes and it requires a new parameter, a new stored procedure has to be added. This might sound like a minor problem but it isn't, especially when your system is already large and runs for some time (when it becomes 'legacy' but the amount of time and money invested in the system is that huge that replacing it will cost more than altering the current system). Every system developed runs the risk to become a legacy system that has to be maintained for several years. This takes a lot of time, because the communication between the developer(s) who maintain/write the stored procedures and the developer(s) who write the DAL/BL code has to be intense: a new stored procedure will be saved fine, however it will not be called correctly until the DAL code is altered. When you have Dynamic SQL in your BL at your hands, it's not a problem. You change the code there, create a different filter, whatever you like and whatever fits the functionality to implement. A good O/R mapper helps you with this. Microsoft also believes stored procedures are over: Microsoft's next generation business framework MBF is based on Objectspaces, which generates SQL on the fly.

I too find writing code like string s = "SELECT * FROM Foo WHERE Bar = " + barValue; in your code not the right thing to do. However the alternative is not stored procedures, it's a component that generates this SQL on the fly so you don't have the disadvantages of stored procedures and have the advantages of generating the SQL you need. Writing such a component is a one-time effort, you can reuse that component each time you access a database. Stored procedures are not the answer, Dynamic SQL is.

Posted Tuesday, November 18, 2003 12:33 PM by FransBouma | 185 comment(s)

Here's a (big, bad) difference between VB.NET and C#

Ricardo blogs about the differences between C# and VB.NET. Well, I'll give you one, which is very harmful in some areas.

Class C is a derived class from DataTable. DataTable implements ISerializable, but does this private. Class C has a new member variable which has to be serialized as well. Simply adding '[Serializable]' to Class C doesn't work, because a base class already implements ISerializable. So C has to implement ISerializable too.

Because DataTable implements ISerializable private (ISerializable.GetObjectData() is private), this can be a problem, since I have to do the complete serialization of C and the data in its base class by hand. Dino Esposito wrote an article about that, it's located here.

Using Dino's article, I can implement ISerializable and the GetObjectData() code to serialize the data and the private membervariable of C. That is... in C#. In VB.NET you can't, because the VB.NET compiler will throw an error that a base class of C already implements ISerializable. There is no way in VB.NET to re-implement ISerializable or to make the serialization formatter call a GetObjectData() method defined in C to do the serialization, simply because it will see the GetObjectData() method of DataTable as the implementation of ISerializable.GetObjectData(), which will serialize the DataTable contents, but not private member variables of derived classes.

Now, it took me some time to work around this (create a C# class with the member variable, derive the VB.NET class from that class) and it still isn't a solution which will work in all cases. In other words: VB.NET lacks re-implementation of interface members and with classes like DataTable in .NET (some winforms controls also have some interface members implemented privately, so you can't re-implement them) which have a privately implemented ISerializable, VB.NET can be a struggle in some situations, so be careful which language to pick for some of your classes.

Posted Wednesday, November 12, 2003 11:15 PM by FransBouma | 15 comment(s)

(Dutch) Tot over een uur op de DotNed meeting!

Voor de mensen die vandaag naar de DotNed meeting gaan in Delft, ik ben er ook, dus wellicht tot zo! :) Het zal bijzonder worden, Dino Esposito zal een presentatie houden over de internals van ASP.NET, alleen dat is al de moeite waard :) Ik zal proberen morgen een klein verslagje te tikken, mocht tijd dat toelaten.

Posted Wednesday, November 05, 2003 1:59 PM by FransBouma | 6 comment(s)

Filed under:

VB.NET developers, continued

The article I posted yesterday received the expected replies and I hope in the future it will show people how to look at software, the users of that software and on the various 'holy wars' that are still going on (and probably will be). Today

Paul Vick posted today an explanation why VB.NET will not have a Refactoring / Refactor menu, however it will have several features which will end up in the Refactoring / Refactor menu in C#. In short, he tries to explain why Microsoft made the decision to force VB.NET developers to use refactoring methods/functions but are not made familiar with the term. As I blogged yesterday that Microsoft is also to blame for the prejudice VB.NET developers have to face day in, day out, this is a good example of how Microsoft makes it hard for VB.NET developers to be seen as highly skilled developers. Below is the explanation for this, which I also posted as a comment to Paul's blog. I truly hope Paul Vick will reconsider their decision and will simply add the Refactoring / Refactor menu to the VB.NET IDE, so VB.NET developers who don't know the term will learn the term and discussions with other developers will show the VB.NET developer knows what he/she's talking about.

In a meeting C# developers, software architects and VB.NET developers are discussing the new product design. 2 groups understand the term 'Refactoring', it's a common IT/developer term. One group doesn't, because their IDE doesn't use it, or better: AVOIDS it to use it. Therefore, VB.NET developers can't participate in the discussion as equals, simply because they're never confronted with a general term in IT business however they're working with it every day. Not only does this hurt the quality of the discussion (you have to explain to them Refactoring is simply a term for a group of functions they already use), it also gives ammo to those who think VB.NET developers are less skilled and/or less smart.

Paul typed a long text to explain why Microsoft won't add a menu 'Refactoring'. I wonder why Paul went through all this trouble plus giving VB.NET developers a hard time in the future, while the opposite is much easier and will not give VB.NET developers a hard time: simply add the menu, add a simple explanation to the documentation what Refactoring is and you're set. VB.NET users who are not familiar with the term can google on the term, learn more, buy books about it etc., the same way as their C# counterparts will do when they open vs.net and see the term in the menu or read about it in the documentation. The C# developers will be able to discuss ways of develop software with java developers and system architects, even software development researchers on the university (e.g. researchers who look for methods to build better software faster, which resulted in methods like eXtreme Programming, which requires lots of refactoring). The VB.NET developer will not, because the terms used by the discussion partner are not familiar (in general) to the VB.NET developer, making the VB.NET developer look like a second grade developer.

I really don't understand why Microsoft puts in all this energy to explain a bad decision while the right decision is simple, appropriate and easy to explain. Fix it while you still can, I'd say :)

Posted Wednesday, November 05, 2003 12:04 PM by FransBouma | 11 comment(s)

Filed under:

VB Developers should learn to take criticism

A couple of days now I stumble across blogs which are written by VB.NET developers who feel offended by some remark of some MS employee I will not mention here or feel offended because someone criticizes VB.NET. I feel sorry for those VB.NET developers who feel offended: they do not understand that it isn't something personal, it's just criticism on a language and, sorry, every soul has the right to have that criticism. Sometimes it's criticism on the developers using VB.NET and then again, sorry, this is sometimes true, albeit it is based on generalization which is something that should be noted as well.

I've written VB5/6 COM components for IIS applications for years, but was really relieved when I could make the transition to C#. I really don't like VB or VB.NET, because of the various silly constructs that are in the language, like the most stupid construct ever created by man: "_". That's a personal opinion and perhaps not even a rational one. However saying that you find VB or VB.NET a really bad language makes some people get upset and make them start writing blogs about how wrong you are, not to mention the hate-mail you will receive or the "You are wrong!" kind of replies in blog comments.

These people don't get it. Using a language is not a way of life. It's simply that: using a language. When someone has a different opinion about the language you use than you have, it's just that: a different opinion. When you get personally offended by reading someone's opinion about a given language, you should take a step back, look in the nearest mirror and say out loud from the top of your lungs: "It's just software, for crying out loud!". Do you think I get offended when someone slams C#, the language I use every day? I'm more than happy to join the person at times. The reason for this is that criticism founded on rationality (a.k.a. constructive criticism) is good, it will make the subject of the criticism better in the long run. Criticism based on non-rational claims are not constructive and are just 'opinions' and can be ignored. "Java top designer slams C#"... gee... how surprising. I really feel offended now... 10, 9, 8.... . But perhaps he has something valuable to say which is true. If that's the case, embrace that criticism and follow it, support it and perhaps MS will take note of it and change something.

It seems however that having criticism about VB.NET is not welcomed to say the least, because of the set of people who get/feel offended by the criticism and ventilate that in every occasion they can find, like uber-zealous [insert random technology here]-advocates. To those I'd like to say: stop it. Look for the reasoning behind the criticism and when you find something based on rationality, embrace it, support it and also try to criticize your own language of choice so it will make it better in the long run. When someone slams the language you use, don't slam that person because you feel personally offended, try to find out if the reasons for the criticism are rational enough to follow them. After all, it's just software, a tool, not a way of life.

This is also true when you want to make clear that in some situations C# is better than VB.NET. These kinds of remarks get very fast out of hand, people start shouting "Don't start a language war!". Why would a couple of remarks about why language X sucks because of A, B and C and language Y doesn't result in a language war? Because of the people who feel personally offended by the criticism on their language X. Until these people grow up and get a life, the language X will not get better and questions about why their language X sucks because of A, B and C will not be answered. That's a shame, because it is so easy to debate pro's and con's about a given language so that given language will get better and better.

Not all blame is with the 'offended people' though. Microsoft itself helps prejudices about the IQ and skill set of the average VB.NET developer keep on living: refactoring is refactoring. It's called that way in the IT business, it's a general term. Therefore, don't re-name that. partial types is a term that is general within .NET. Do not change that in VB.NET. It will alienate these people from general discussions. Also remove silly constructs from VB.NET like the "_" character to concatenate keywords that form a single statement (the compiler doesn't need that character, it can perfectly find out where the statement ends) and On Error * and make it a language which doesn't drive people away from.

Posted Tuesday, November 04, 2003 11:45 AM by FransBouma | 26 comment(s)

Filed under:

More Posts