Thursday, March 8, 2012

Creating a generic package to import a variable number of columns

Hi,

We are building an application with

a database that contains Jobs. These Jobs have properties like Name, Code etc.

and some custom properties, definable by the application admin. For bulk import

of Jobs, we want to allow the import of an Excel sheet with the columns Name,

Code and a variable amount of columns. If the header names of these columns in

the Excel sheet match the name of a custom property in the system we want to add

the value of that cell into the database as property

value.

In our Data Flow of our Import

Package in SSIS we added an Excel Source that points to a test excel sheet with

the Name and Code columns and – for this example - 3 custom property columns

(Area, Department, Job Family). When we configure the Excel Source in the Excel

Source Editor, we have the option to select the Columns from the Available

External Columns table. But here lays the problem, we do not know at design

time, what custom property columns to expect. We DO expect the Name and Code

columns, but the rest is uncertain at design-time.

That raises the question: Is there

some way to select all of any incoming columns (something like a SELECT * in

T-SQL)? This looks like a big problem since it would mean that the .DTSX XML that is

being generated at design-time would need to be updated at run-time to reflect

the variability of the columns that might be encountered while reading the excel

sheet.

Then, we thought, we could add a Script

Component to our data flow that passes some kind of DataSet (or DataReader) in

which we can walk through the columns ourselves? But then still, we miss the

option to include ANY of the columns while reading an Excel sheet (or any other

datasource by the looks of it)

We are aware of the option of

optional columns in combination with the RaggedRight option, but it seems that

we would have to put all of the columns of a row in just one column and then

extract all the columns later with Derived Columns. But then, since the source

import file is being prepared by an application admin, we want don’t want to

burden him with this horrendous task of putting everything in one

column.

We would like to have some way of

iterating through all the columns, either in a Script Component or maybe with a

Pivot/Unpivot mechanism.

Does anyone have any suggestions? Are there other options we should have considered?

The metadata of the pipeline is fixed at design-time. You cannot change the columns at runtime.

-Jamie

|||

Since you're importing from Excel, you may be able to define a dataflow that reads the maximum amount of columns you anticipate ever having in one Excel file. The Excel files with less columns would return empty strings for the non-existent columns.

Haven't tried this, but it might work.

K

No comments:

Post a Comment