Buckle up, boys and girls, we’re going to be little DBA’s today ! (Data Base Administrators)
Earlier, we showed how to use LINQ over objects in memory. Today, we show how to write EXACTLY the same program, this time using DLINQ, over a real SQL-Express relational database. The whole DLINQ project, including the full data set of 3,053 rows, can be downloaded here. The earlier LINQ project – same program only over objects-in-memory, can be downloaded here. You will need the VB9 CTP (Customer Technology Preview), plus some patience. That is a pre-alpha version of VB, and you may have to create new projects and copy the source files (I have had some “inside” reports that this is so).
The point of both programs is to compare the time for a fast query to the time for a slow query. To deliver the punch line, even with 14 cities in a tiny DLINQ database, the slow query is 20 times slower than the fast one .
================================================================
SlowQuery1 found 14 cities, Time = 54.04 milliseconds
FastQuery1 found 14 cities, Time = 2.12 milliseconds
================================================================
With the full database of 3,053 cities, the results are crushing:
================================================================
SlowQuery1 found 3053 cities, Time = 46109.31 milliseconds
FastQuery1 found 3053 cities, Time = 40.53 milliseconds
================================================================
The data “entities” are from the Mondial database, a fun and free collection of facts and figures about the world. We start off small with just one “relationship” between cities and countries: each country has many cities, expressed through the “join condition” Country.Code == City.Country. In the in-memory case, we firmly establish that, practically speaking, it’s ALWAYS better to build an index, search the index, and throw the index away than to scan the relationship via a nested double loop. In the DLINQ case, we’ll use DLINQ to set up foreign keys in the Database and scan them in the fast query. Here is the query code for the DLINQ case (and we explain all the details below):
Sub SlowQuery1()
xs = Select New {Cty := it.t.Name, Cny := it.n.Name} _
From t In db.Cities, n In db.Countries _
Where it.t.Country = it.n.Code
xc = CountIEnum(xs)
End Sub
Sub FastQuery1()
xs = Select New {Cty := t.Name, Cny := t.MyCountry.Name} _
From t In db.Cities
xc = CountIEnum(xs)
End Sub
The neat thing is that DLINQ lets us express our relationship as a PROPERTY, “MyCountry,” on the City class. Neat. We don’t need an index in memory to get fast query. In my sample, I do build such an index, but I use it ONLY to set up the database quickly. I don’t use it at all in the queries and it’s absolutely not necessary.
Did I say that we’re making our SQL database FROM SCRATCH? Surprisingly, this is unusual. Almost all the samples I’ve seen magically install some big messy database and then show you how to manipulate it. They never show you how to create it. What’s up with that? I want to do EVERYTHING from my VB program. I don’t want to learn a bunch of SQL-Express GUI DBA tools too. Given the choice between “Hello World!” and “Hello Northwind!” I’ll take “Hello World!” every time. Wouldn’t you? Well, “Hello World!” translates into “Hello Mondial!”
Let’s start with class definitions. Remember, the Country class definition for our objects-in-memory LINQ version was:
Public Class Country
Public Name As String
Public Code As String 'Primary key
Public Capital As String
Public Province As String
Public Area As Integer
Public Population As Integer
End Class
The DLINQ version is, mostly, very similar. Visually, it’s more junky looking, but, fear not, I walk you through all of it:
<Table()> _
Public Class Country
<Column(DbType:="VARCHAR(32)")> _
Public Name As String
<Column(DbType:="VARCHAR(4)", Id:=True)> _
Public Code As String
<Column(DbType:="VARCHAR(35)")> _
Public Capital As String
<Column(DbType:="VARCHAR(32)")> _
Public Province As String
<Column(DbType:="INT")> _
Public Area As Integer
<Column(DbType:="INT")> _
Public Population As Integer
Private _myCities As New EntitySet(Of City)
<Association(Storage:="_myCities", OtherKey:="Country")> _
Public Property MyCities() As EntitySet(Of City)
Get
Return _myCities
End Get
Set(ByVal value As EntitySet(Of City))
_myCities.Assign(value)
End Set
End Property
End Class
The custom attribute <Table()> tells DLINQ that we want the class stored as a database table. The VB compiler ignores CLR custom attributes – it just copies them into assembly metadata so that libraries can read them through Reflection if they need to. That’s just what the DLINQ library does – it checks our class definitions at run time to see if they’re Tables.
Next, list out all the same fields as before, just decorated with <Column(…)> attributes to tell DLINQ what SQL types we want stored. I think there are some intelligent defaults I could have used, saving explicit specifying of these details, but I wanted to play it really safe, not testing those waters, so I slavishly set up column attributes for every field. The Mondial original was written in SQL, not VB, so it wasn’t hard to figure out the size of VARCHARS I needed and what not.
The new DLINQ EntitySet type and Association attribute are clearly the interesting things. They represent a recurring DLINQ pattern for one-to-many relationships. Over here, in the Country class, we’re on the ONE side of the one-to-many relationship. We must have a corresponding Association attribute on the MANY side, in the City class; we get to it in a minute. Let’s go through this side line-by-line:
Private _myCities As New EntitySet(Of City)
Ok, there is a private field that DLINQ will use behind our backs to store a “list” of cities, in effect, for each Country.
<Association(Storage:="_myCities", OtherKey:="Country")> _
Ok, DLINQ now knows the name of the storage field it can party on; it’s Country._myCities. It also knows the name of the field in the OTHER side of the association that points back to Country instances, here. Remember, our “join” condition is that City.Country == Country.Code. The OtherKey portion tells DLINQ that the field named “Country” in the OTHER side of the association hooks up to the primary key on THIS side of the relationship, which, oh-by-the-way, we already declared thus:
<Column(DbType:="VARCHAR(4)", Id:=True)> _
Public Code As String
The word “Country” in the fragment OtherKey:="Country" means the field named “Country” in the other side of the association, not the class named “Country” on this side of the association. Sorry about this, but that’s how Mondial was designed and we’re going to live with it.
Public Property MyCities() As EntitySet(Of City)
Get
Return _myCities
End Get
Set(ByVal value As EntitySet(Of City))
_myCities.Assign(value)
End Set
End Property
Ok, that is more-or-less standard stuff: it defines the public property allowing other parts of my program to interact with the EntitySet. DLINQ parties on _myCities behind the scenes, but we get to party on the public Property MyCities. The only thing to remember is to use the Assign method of EntitySet so DLINQ can keep track of the publicly made changes. Now, we just look at the City class and we’re done with almost everything difficult:
<Table()> _
Public Class City
<Column(DbType:="VARCHAR(35)", Id:=True)> _
Public Name As String
<Column(DbType:="VARCHAR(4)", Id:=True)> _
Public Country As String
<Column(DbType:="VARCHAR(35)", Id:=True)> _
Public Province As String
<Column(DbType:="INT")> _
Public Population As Integer
<Column(DbType:="FLOAT")> _
Public Longitude As Double
<Column(DbType:="FLOAT")> _
Public Latitude As Double
Private _myCountry As EntityRef(Of Country)
<Association(Storage:="_myCountry", ThisKey:="Country")> _
Public Property MyCountry() As Country
Get
Return _myCountry.Entity
End Get
Set(ByVal value As Country)
_myCountry.Entity = value
End Set
End Property
Ok, line-by-line. First, the primary key in this table is triplicate, consisting of the Name of the city, its Country code, and its province. Fair enough (how many “Springfields” ARE there, actually?) Let’s go straight to the Association. EntityRef(Of Country) on this side; a ThisKey whose name matches up with the OtherKey on Country. Once again, the word “Country” on both sides of the association means the field named “Country” in the City class, not to the Class named Country. City.Country holds a value that must be equal to the Code field value in class Country for every pair of City and Country participating in the relationship. Think about it for a minute and make sure you totally “get it” before moving on.
Last detail is that the Entity property on DLINQ’s EntityRef type is the one we must use to get and set EntityRefs on the background private party storage field, _myCountry.
There’s no freedom in the pattern, here. This is the exact way to express one-to-many relationships in DLINQ. Everything has to be wired up exactly this way. If we were NOT creating our database from scratch, we would just let the SQLmetal utility generate this goo for us from a pre-existing SQL schema, which we would have created from some other SQL toolsets I didn’t want to learn about. Furthermore, I don’t like stuff that generates VB code where I have no idea what’s going on, so I insisted on writing this sample by hand. They tried to tell me not to do it, but I march to a different drummer.
Ok, we’re done with almost all the hard stuff. Let’s get through some easy stuff:
Public Class Mondial
Inherits DataContext
Public Sub New(ByVal connection As String)
MyBase.New(connection)
End Sub
Public Countries As Table(Of Country)
Public Cities As Table(Of City)
End Class
DataContext is a DLINQ base class; we must inherit from it for our own database, and we must pass in a connection string when we create one of these. The connection string is the LAST bit of hard stuff. There is a cottage industry of helpage for these nasties: see http://www.connectionstrings.com for instance. They are amazingly brittle, so I just wish you luck. Here is the one I used (and I had to get help to construct it):
Private ReadOnly conn1 = "Data Source=.\SQLEXPRESS;AttachDbFilename=" & _
"'C:\Documents and Settings\brianbec\My Documents\Mondial.mdf';" & _
"Integrated Security=True;Connect Timeout=30;User Instance=True"
My suggestion is that you change only the highlighted thing to your user name. Even though I thought I understood this connection string, I found by bitter experience that I had no flexibility whatever with it. I couldn’t change the directory, the Timeout, the order of terms, NOTHING. If I looked at it sideways, I would get an exception from deep in the bowels.
Ok, that’s it, the rest is easy:
Module Module1
Public UseTinyData As Boolean = True
REM set to False to use big, slow data set
Public CountryFromCode As Dictionary(Of String, Country)
Private ReadOnly conn1 = ... (as before
Public db As Mondial
Public Sub Init()
CountryFromCode = New Dictionary(Of String, Country)
If db.DatabaseExists Then
db.DeleteDatabase()
End If
db.CreateDatabase()
If UseTinyData Then
InMemoryTinyDataInitializer.CountryData(db)
InMemoryTinyDataInitializer.CityData(db)
Else
InMemoryDataInitializer.CountryData(db)
InMemoryDataInitializer.CityData(db)
End If db.SubmitChanges()
End Sub
That just sets up our data tables, and yes, we do use an in-memory index to quickly set up the foreign-key attributes of type EntityRef, but we don’t have to at all. We also have two different data sets, one tiny, for debugging and development, and the full one with 3,053 city rows.
Module InMemoryTinyDataInitializer
Private Sub AddCountry(ByVal db As Mondial, ByVal c As Country)
CountryFromCode.Add(c.Code, c)
db.Countries.Add(c)
End Sub
Public Sub CountryData(ByVal db As Mondial)
AddCountry(db, New Country {Name := "Albania", Code := "AL", Capital := "Tirane", Province := "Albania", Area := 28750, Population := 3249136})
AddCountry(db, New Country {Name := "Greece", Code := "GR", Capital := "Athens", Province := "Greece", Area := 131940, Population := 10538594})
End Sub
Private Sub AddCity(ByVal db As Mondial, ByVal c As City)
c.MyCountry = CountryFromCode(c.Country)
db.Cities.Add(c)
End Sub
Public Sub CityData(ByVal db As Mondial)
AddCity(db, New City {Name := "Tirane", Country := "AL", Province := "Albania", Population := 192000, Longitude := 10.7, Latitude := 46.2})
AddCity(db, New City {Name := "Shkoder", Country := "AL", Province := "Albania", Population := 62000, Longitude := 19.2, Latitude := 42.2})
AddCity(db, New City {Name := "Durres", Country := "AL", Province := "Albania", Population := 60000, Longitude := 19.3, Latitude := 41.2})
AddCity(db, New City {Name := "Vlore", Country := "AL", Province := "Albania", Population := 56000, Longitude := 19.3, Latitude := 40.3})
AddCity(db, New City {Name := "Elbasan", Country := "AL", Province := "Albania", Population := 53000, Longitude := 20.1, Latitude := 41.1})
AddCity(db, New City {Name := "Korce", Country := "AL", Province := "Albania", Population := 52000, Longitude := 20.5, Latitude := 40.4})
AddCity(db, New City {Name := "Athens", Country := "GR", Province := "Greece", Population := 885737, Longitude := 23.7167, Latitude := 37.9667})
AddCity(db, New City {Name := "Thessaloniki", Country := "GR", Province := "Greece", Population := 406413, Longitude := 22.95, Latitude := 40.6167})
AddCity(db, New City {Name := "Piraeus", Country := "GR", Province := "Greece", Population := 196389, Longitude := Nothing, Latitude := Nothing})
AddCity(db, New City {Name := "Patrai", Country := "GR", Province := "Greece", Population := 142163, Longitude := Nothing, Latitude := Nothing})
AddCity(db, New City {Name := "Larisa", Country := "GR", Province := "Greece", Population := 102426, Longitude := Nothing, Latitude := Nothing})
AddCity(db, New City {Name := "Iraklion", Country := "GR", Province := "Greece", Population := 102398, Longitude := Nothing, Latitude := Nothing})
AddCity(db, New City {Name := "Volos", Country := "GR", Province := "Greece", Population := 71378, Longitude := Nothing, Latitude := Nothing})
AddCity(db, New City {Name := "Kavalla", Country := "GR", Province := "Greece", Population := 56705, Longitude := Nothing, Latitude := Nothing})
End Sub
End Module
I’ll leave the main routine and the timing details to the project I uploaded. Have fun!