Mike Bosch's Blog on .NET

Agile enterprise architecture in .NET, SOA, WCF, WS-*, AJAX, MVC, Sharepoint and more...

SSIS - Skipping rows and stripping subtotals in Excel

I was working on an SSIS project to dump data from an Excel spreadsheet into a database table.  However, the Excel file happened to be a beautifully formatted, colorful, presentation of charts and graphs, subtotal rows and all sorts of interesting images.  The actual data I needed to dump didn't even start until about 15 rows down and 6 rows across.  My mind started wandering on some custom component that would skip rows, strip subtotal columns and basically rewrite the entire spreadsheet.  But SSIS had better plans.

Thank goodness for the OpenRowset() property on the Excel Datasource component. This property allows you to specify the range to be considered by the datasource in the format Sheet1$B15:Z2000.  By specifying a range, I was able to ignore all the titles and data spread all over the sheet and concentrate just on my data range.  I thought skipping rows and ignoring headers / titles was going to add considerable hours to this project. I was wrong.

The next challenge was getting rid of the subtotal rows.  Subtotal rows were distinguished by a cell with the word "RESULT" and an empty cell next to it.  I added a Conditional Split using the ISNULL([expression]) to separate "real data rows" from "subtotal rows".  After this, the import was straightforward into a OLEDB Destination.  

Posted: Oct 26 2007, 11:50 PM by MikeBosch | with 9 comment(s) |
Filed under: ,

Comments

Jonas said:

Brilliant, I had the same problem with skipping over a few rows in the beginning. This helped a lot.

# October 29, 2007 6:51 AM

Half Abude said:

Great tip! It would have helped me a lot if I had found it by the time I was dealing with the same scenario... I've spend a lot of time learning how to read Excel files via Script Component, and I thinks that setting OpenRowSet property would be enough for almost all files. Well, at least it worth the learning.

BTW, I've mentioned your post in my blog (it's in Portuguese).

# March 6, 2008 8:03 AM

surya said:

In the script task ReadOnlyVariables excelfile

Readwritevariables-excel tables

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Runtime

Imports System.IO

Imports System.Xml

Imports ADODB

Imports System.Data.OleDb

Imports System.Data.SqlClient

Public Class ScriptMain

   Public Sub Main()

       Dim excelFile As String

       Dim connectionString1 As String

       Dim connectionString2 As String

       Dim excelConnection As OleDbConnection

       Dim dbconnection As SqlConnection

       Dim tablesInFile As DataTable

       Dim tablesindb As DataTable

       Dim tableExcel As DataTable

       Dim tableCount As Integer = 0

       Dim tableDBcount As Integer = 0

       Dim tableInFile As DataRow

       Dim tableInDB As DataRow

       Dim currentTable As String

       Dim currentDBTable As String

       Dim tableIndex As Integer = 0

       Dim excelRecordSet As ADODB.Recordset

       Dim excelDataReader As OleDbDataReader

       Dim intSuccess As Integer

       Dim strTableName As DataTable

       Dim excelTables As String()

       excelFile = Dts.Variables("excelfile").Value.ToString

       connectionString1 = "Provider=Microsoft.Jet.OLEDB.4.0;" & _

           "Data Source=" & excelFile & _

           ";Extended Properties=Excel 8.0"

       excelConnection = New OleDbConnection(connectionString1)

       excelConnection.Open()

       connectionString2 = "Data Source=MEDLINK64BIT;Initial Catalog=RCMS_VALIDATION; Integrated Security=True;"

       dbconnection = New SqlConnection(connectionString2)

       dbconnection.Open()

       tablesInFile = excelConnection.GetSchema("Tables")

       tableCount = tablesInFile.Rows.Count

       ReDim excelTables(tableCount - 1)

       For Each tableInFile In tablesInFile.Rows

           currentTable = tableInFile.Item("TABLE_NAME").ToString

           excelTables(tableIndex) = currentTable

           tableIndex += 1

       Next

       Dts.Variables("exceltables").Value = excelTables

       Dim i As Integer

       For i = 0 To excelTables.Length - 1

           If excelTables(i).Substring(0, 6) = "SOURCE" And excelTables(i).Substring(excelTables(i).Length - 1, 1) = "$" Then

               Dim command As New OleDbCommand("select * from [" & excelTables(i).ToString() & "A:AL]", excelConnection)

               Dim rdr As OleDbDataReader = command.ExecuteReader

               Dim BulkCopy As New SqlBulkCopy(dbconnection)

               BulkCopy.DestinationTableName = "Op_PaymentMatch_SOURCE"

               BulkCopy.WriteToServer(rdr)

           End If

       Next

       Dts.TaskResult = Dts.Results.Success

   End Sub

   Private Function ReadVariable(ByVal varName As String) As Object

       Dim result As Object

       Try

           Dim vars As Variables

           Dts.VariableDispenser.LockForRead(varName)

           Dts.VariableDispenser.GetVariables(vars)

           Try

               result = vars(varName).Value

           Catch ex As Exception

               Throw ex

           Finally

               vars.Unlock()

           End Try

       Catch ex As Exception

           Throw ex

       End Try

       Return result

   End Function

   Private Sub WriteVariable(ByVal varName As String, ByVal varValue As Object)

       Try

           Dim vars As Variables

           Dts.VariableDispenser.LockForWrite(varName)

           Dts.VariableDispenser.GetVariables(vars)

           Try

               vars(varName).Value = varValue

           Catch ex As Exception

               Throw ex

           Finally

               vars.Unlock()

           End Try

       Catch ex As Exception

           Throw ex

       End Try

   End Sub

End Class

# March 7, 2008 1:34 AM

Marts Liena said:

I have a similar problem. I have a large text dataset of 220,000 lines and I want to import every 10th line into excel. How is this done?

# April 27, 2008 2:49 AM

MikeBosch said:

I would imagine you want to import the entire dataset first.  Then, in a loop, keep a counter variable and everytime you go to the next line, increase that counter.  When the counter reaches 10, import the line, then reset the counter to 0 and continue the loop.

# April 29, 2008 4:55 PM

John said:

Are you using Studio 2005 or 8? 5 dont't take 'sheet1@A9:Q21'

# June 11, 2008 3:37 PM

MikeBosch said:

This was using VS 2005.  I haven't tested it in VS 2008.

# June 11, 2008 5:23 PM

Possum Jones said:

What about if the sheet name has a space in it (e.g. [New Sheet1]$. Ive tried every combination of brackets ([]) and single-quotes('') that I can think of and SSIS cannot recognize the table (worksheet) with the range with a space in the name.

# September 9, 2008 6:06 PM

Maria Braun said:

Thanks, this helped a lot

# September 22, 2008 12:48 PM
Leave a Comment

(required) 

(required) 

(optional)

(required)