SSIS Bulk Insert task that bit me in the butt...

Published 02 October 09 12:22 PM | MikeD

I've been working on SSIS packages that extract data from production databases and put them into data warehouses, and recently I hit an issue using the Bulk Insert task that bit me real good.

 When you create a Bulk Insert task in the control flow of your package, the properties you generally edit are:

1. The target connection (which references a connection manager)

2. The target table

3. The source file (which references a file-type connection manager).

 I did that, ran the package in Visual Studio with my local file against a dev SQL database on a test server and it all worked just fine.

I ran it again, and it failed, due to a primary key violation - so I needed to make the execution of the task conditional, so long as the table was empty, I would run the task, otherwise if it contained anything, I would skip the task.

This was harder to do than I thought it would be. I started by creating a variable to hold the row count of the table, then an Execute Sql Task to run a statement on the target table (select count(*) as RowCount from targetTable) and set the variable value to the column in the resultset of the statement.

Now I go to look for an IF construct and there isn't any such thing. The closest was a For Next loop that I went down a rabbit-trail trying to use, and having it execute only zero or once, and I couldn't get that to work. Is there magic between the @variable syntax in the initialize, condition, and iteration expressions and the package user:variable declarations that make those work together? I still don't know the answer to that.

Then I thought of using the Expression on the dependency arrow from the task that got the row count from the target table. So I joined the Row Count task to the Bulk Insert task using the green arrow, then edited the dependency to be dependent both on Success of the row count task and the value of the user:rowCount variable I had created. That worked.

 Believe it or not, that isn't really what bit me in the butt.

Now I had a package that I could execute multiple times and it would work properly. My buddy Jeremy would say that it is "idempotent".

 What bit me was when I went to execute the package in another environment.

 I moved the .dtsx file to a test server and used the Execute Package Utility. I set the values for the connection managers in the package to the new server connections (and the new location of the bulk copy file), and ran the package, and it worked.

Just to make sure it was "idempotent", I ran it again.

It failed this time.

Another PK violation. Why?

It took me a while to find the problem. Eventually it came down to the target table property of the Bulk Insert Task - the value of this property was not just a two part table name, but it also included the database name.

It just so happened that the database I was testing with from my Visual Studio is on the same server as when I was testing with the Execute Package utility.

So, the first time I ran it with the Execute Package utility with the modified connection manager settings, it was querying the *real* target database for the number of rows, and getting back 0. Then it executed the bulk insert task into the *original* database I was testing with on the server (that I happened to clear out the rows from the table), and the bulk insert worked. The second time, the number of rows was still 0, and it tried to do the bulk insert into the same database, despite the fact that the connection manager was pointing to a different database.

I can understand why this was done this way, since when you use the bcp command line utility, you need to database-qualify the table you are moving data into, because bcp doesn't specify the database otherwise. But the Bulk Insert task was using the T-SQL statement BULK INSERT which is already database specific, so you don't generally qualify the table name with the database. With the whole database name in the target table property, the task isn't very responsive to changes to the connection manager at runtime.

Here is how I fixed it, and it's a HACK.

You can't just free-form enter the table name in the Bulk Insert task, it only allows you to pick from the list, which is based off the source connection you specify. So, in the Expressions tab of the Bulk Insert task, I used an expression to set the source table property to the non-database-qualified name of the table. It's kinda hidden, and it now overrides whatever table you select from the drop down later, and you wouldn't know it.

I hope that the SSIS Bulk Insert task gets fixed so it doesn't include the database name in the target table; it isn't needed, and it gets in the way of runtime changes to the target connection.

Mike

Comments

No Comments