Mike Bosch's Blog on .NET

Agile enterprise architecture in .NET, SOA, WCF, WS-*, AJAX, MVC, Sharepoint and more...

SSIS - Skipping rows and stripping subtotals in Excel

I was working on an SSIS project to dump data from an Excel spreadsheet into a database table.  However, the Excel file happened to be a beautifully formatted, colorful, presentation of charts and graphs, subtotal rows and all sorts of interesting images.  The actual data I needed to dump didn't even start until about 15 rows down and 6 rows across.  My mind started wandering on some custom component that would skip rows, strip subtotal columns and basically rewrite the entire spreadsheet.  But SSIS had better plans.

Thank goodness for the OpenRowset() property on the Excel Datasource component. This property allows you to specify the range to be considered by the datasource in the format Sheet1$B15:Z2000.  By specifying a range, I was able to ignore all the titles and data spread all over the sheet and concentrate just on my data range.  I thought skipping rows and ignoring headers / titles was going to add considerable hours to this project. I was wrong.

The next challenge was getting rid of the subtotal rows.  Subtotal rows were distinguished by a cell with the word "RESULT" and an empty cell next to it.  I added a Conditional Split using the ISNULL([expression]) to separate "real data rows" from "subtotal rows".  After this, the import was straightforward into a OLEDB Destination.  

Posted: Oct 26 2007, 11:50 PM by MikeBosch | with 27 comment(s) |
Filed under: ,

Comments

Jonas said:

Brilliant, I had the same problem with skipping over a few rows in the beginning. This helped a lot.

# October 29, 2007 6:51 AM

Half Abude said:

Great tip! It would have helped me a lot if I had found it by the time I was dealing with the same scenario... I've spend a lot of time learning how to read Excel files via Script Component, and I thinks that setting OpenRowSet property would be enough for almost all files. Well, at least it worth the learning.

BTW, I've mentioned your post in my blog (it's in Portuguese).

# March 6, 2008 8:03 AM

surya said:

In the script task ReadOnlyVariables excelfile

Readwritevariables-excel tables

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Runtime

Imports System.IO

Imports System.Xml

Imports ADODB

Imports System.Data.OleDb

Imports System.Data.SqlClient

Public Class ScriptMain

   Public Sub Main()

       Dim excelFile As String

       Dim connectionString1 As String

       Dim connectionString2 As String

       Dim excelConnection As OleDbConnection

       Dim dbconnection As SqlConnection

       Dim tablesInFile As DataTable

       Dim tablesindb As DataTable

       Dim tableExcel As DataTable

       Dim tableCount As Integer = 0

       Dim tableDBcount As Integer = 0

       Dim tableInFile As DataRow

       Dim tableInDB As DataRow

       Dim currentTable As String

       Dim currentDBTable As String

       Dim tableIndex As Integer = 0

       Dim excelRecordSet As ADODB.Recordset

       Dim excelDataReader As OleDbDataReader

       Dim intSuccess As Integer

       Dim strTableName As DataTable

       Dim excelTables As String()

       excelFile = Dts.Variables("excelfile").Value.ToString

       connectionString1 = "Provider=Microsoft.Jet.OLEDB.4.0;" & _

           "Data Source=" & excelFile & _

           ";Extended Properties=Excel 8.0"

       excelConnection = New OleDbConnection(connectionString1)

       excelConnection.Open()

       connectionString2 = "Data Source=MEDLINK64BIT;Initial Catalog=RCMS_VALIDATION; Integrated Security=True;"

       dbconnection = New SqlConnection(connectionString2)

       dbconnection.Open()

       tablesInFile = excelConnection.GetSchema("Tables")

       tableCount = tablesInFile.Rows.Count

       ReDim excelTables(tableCount - 1)

       For Each tableInFile In tablesInFile.Rows

           currentTable = tableInFile.Item("TABLE_NAME").ToString

           excelTables(tableIndex) = currentTable

           tableIndex += 1

       Next

       Dts.Variables("exceltables").Value = excelTables

       Dim i As Integer

       For i = 0 To excelTables.Length - 1

           If excelTables(i).Substring(0, 6) = "SOURCE" And excelTables(i).Substring(excelTables(i).Length - 1, 1) = "$" Then

               Dim command As New OleDbCommand("select * from [" & excelTables(i).ToString() & "A:AL]", excelConnection)

               Dim rdr As OleDbDataReader = command.ExecuteReader

               Dim BulkCopy As New SqlBulkCopy(dbconnection)

               BulkCopy.DestinationTableName = "Op_PaymentMatch_SOURCE"

               BulkCopy.WriteToServer(rdr)

           End If

       Next

       Dts.TaskResult = Dts.Results.Success

   End Sub

   Private Function ReadVariable(ByVal varName As String) As Object

       Dim result As Object

       Try

           Dim vars As Variables

           Dts.VariableDispenser.LockForRead(varName)

           Dts.VariableDispenser.GetVariables(vars)

           Try

               result = vars(varName).Value

           Catch ex As Exception

               Throw ex

           Finally

               vars.Unlock()

           End Try

       Catch ex As Exception

           Throw ex

       End Try

       Return result

   End Function

   Private Sub WriteVariable(ByVal varName As String, ByVal varValue As Object)

       Try

           Dim vars As Variables

           Dts.VariableDispenser.LockForWrite(varName)

           Dts.VariableDispenser.GetVariables(vars)

           Try

               vars(varName).Value = varValue

           Catch ex As Exception

               Throw ex

           Finally

               vars.Unlock()

           End Try

       Catch ex As Exception

           Throw ex

       End Try

   End Sub

End Class

# March 7, 2008 1:34 AM

Marts Liena said:

I have a similar problem. I have a large text dataset of 220,000 lines and I want to import every 10th line into excel. How is this done?

# April 27, 2008 2:49 AM

MikeBosch said:

I would imagine you want to import the entire dataset first.  Then, in a loop, keep a counter variable and everytime you go to the next line, increase that counter.  When the counter reaches 10, import the line, then reset the counter to 0 and continue the loop.

# April 29, 2008 4:55 PM

John said:

Are you using Studio 2005 or 8? 5 dont't take 'sheet1@A9:Q21'

# June 11, 2008 3:37 PM

MikeBosch said:

This was using VS 2005.  I haven't tested it in VS 2008.

# June 11, 2008 5:23 PM

Possum Jones said:

What about if the sheet name has a space in it (e.g. [New Sheet1]$. Ive tried every combination of brackets ([]) and single-quotes('') that I can think of and SSIS cannot recognize the table (worksheet) with the range with a space in the name.

# September 9, 2008 6:06 PM

Maria Braun said:

Thanks, this helped a lot

# September 22, 2008 12:48 PM

Joe B said:

I have the same quaestion as Possum Jones. My sheet has a space in the name.

# October 14, 2008 12:36 PM

GNEGurl said:

Set the data access mode to 'SQL Command'

and try something like this:

SELECT F1,F2,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12,F13,F14,F15,F16,F17

FROM

[Sheet Name$A2:V2000]

# November 5, 2008 3:06 PM

ABC said:

Thanks for this information.This really help me out.But I am facing problem when I export data to excel using Data Conversion + Excel Destination.It add single quote at start of every cell data .Also enter key comes as charachter of data.

Please help me how to resolve it...

# December 17, 2008 8:02 AM

Mahinath said:

Hi,

This sounds great. But can anyone tell me how to use Openrowset. Should I change the row numbers in Properties window or is there any other way. When I change the row numbers in properties window I am getting an 0x80040E37 error.

Please help me to solve this.

Thank you.

# December 18, 2008 12:46 PM

narsi said:

it worked, earlier i have tried many other ways but unsuccessful, this worked

# December 30, 2008 2:36 PM

Pete said:

Is there a way to just skip the first five lines.  I'm using the "Sheet1$B15:Z2000" which works great.  The problem I am having is I know where the data starts and how many columns but the number of rows I'll be pulling is unknown.  Is there a way to just specify only the first cell "Sheet1$B15:*"

# January 19, 2009 6:27 PM

Pete said:

I tried using a large number (4000) that would probably cover the upper bound of possible rows.  And then used the table(0).row.count function to find the number of actual data rows returned but the count number didn't always have the correct number of data rows.  If I had 5 data rows the count would be 5.  If I had 7 data rows the count would be 95

# January 20, 2009 1:32 AM

anoopmms said:

Thanks a lot..... this solved my prblm

# March 10, 2009 4:38 AM

Nutshell said:

Thanks - this worked out perfectly for what I was needing (skipping first 4 rows - and ignoring column A)

# March 24, 2009 6:53 PM

zinou92 said:

Great, thank 's a lot for this magic property

# March 25, 2009 11:52 AM

Murray said:

To answer a couple of the questions posted:

1) To change the OpenRowSet property in SSIS, right click the OLEDB source and select "Show Advanced Editor".  Go to Component Properties tab.  It's the last property.  Set it like this: MySheet$A1:X10

2) To use an entire column, and not hve to count the records, use this syntax: MySheet$A5:X

This example starts on row 5 and process all records in columns A thru X.  

Murray

# September 17, 2009 5:58 PM

Madan said:

How to skip TOP 2 OR 3 rows in Excel using c#

# November 2, 2009 8:49 AM

Chamara said:

Murray,

That is simply awsome! You saved my day.

# November 11, 2009 4:15 PM

Saaron said:

Thanks a lot.

This works fine with SSIS 2008 as well. I have just implemented the same on SQL 2008 and it works just fine!

# February 11, 2010 11:34 AM

anupam mondal said:

Great tip Mike! I used this trick in an SSIS package to populate an excel template.

Works like a charm!

cheers,

anupam

# April 19, 2010 1:04 AM

Mosaic said:

Just what I needed!

# August 16, 2010 11:13 AM