Archive for the ‘SSIS’ Category

I use this script all the time when setting up an SSIS package. (Unfortunately, I can’t remember where I found the original code. I’ve adapted it slightly, so if anyone recognises the original then let me know and I’ll link to it.)

The Problem

When setting up a data flow in SSIS the data transfer speed can be very slow because the default settings in the package have not been optimised.

The Solution

SSIS Properties

The code below will show you each table in the database. I take the column MaxBufferSize and round it down to the nearest hundred – so 87235 becomes 87000. I use this value as the DefaultBufferMaxRows value. I change the DefaultBufferSize from 10485760 to 104857600 (same number but add a zero to the end). Finally, I’ll add values to the BlobTempStoragePath and BufferTempStoragePath, normally I’ll use C:\temp, but make sure the directory exists and you’re probably better choosing a value not on the C drive.

SELECT s.[name] + '.' + t.[name] as TableName, SUM (max_length) as [row_length], 10485760/ SUM (max_length) as MaxBufferSize
FROM sys.tables as t
JOIN sys.columns as c
ON t.object_id=c.object_id
JOIN sys.schemas s
ON t.schema_id=s.schema_id
GROUP BY s.[name], t.[name];

These changes will allow SSIS to load more rows simultaneously and so should speed up your loading. I tend to use OLD connection for Source and Destination.

Advertisements

To amend the default path for the SSIS package store amend this file on the server:

C:\Program Files\Microsoft SQL Server\100\DTS\Binn\MsDtsSrvr.ini.xml

<Folder xsi:type=”FileSystemFolder”>

<Name>File System</Name>

<StorePath>E:\SSIS</StorePath>

</Folder>

When importing Excel documents in SSIS on a 64 bit machine you may will encounter this error:

[SSIS.Pipeline] Error: component “Excel Source” (35) failed validation and returned error code 0xC020801C.

It may also tell you that it failed to acquire a connection.

This is because there is no 64bit driver for Access or Excel – with csv files you can use the Flat File connector.

To carry on working you need to set your project up to run in 32 bit mode. Right click on the project properties and go to debugging. Change the value in run64BitRuntime to False.

64bit properties

The Problem

A data import failed with the following error messages:

  • Failed to retrieve long data for column “Description”.
  • There was an error with output “OLE DB Source Output” (742) on component “OLE DB Source” (732). The column status returned was: “DBSTATUS_UNAVAILABLE”.
  • SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The “output “OLE DB Source Output” (742)” failed because error code 0xC0209071 occurred, and the error row disposition on “component “OLE DB Source” (732)” specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
  • SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component “OLE DB Source” (732) returned error code 0xC0209029. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

The column description is of the data type VarChar(Max). This error will normally occur when the data type is some form of BLOB

The Solution

When SSIS Imports BLOB data it writes the data our to a file and then imports the file. By default it will use the system temp directory for this. However you can set two directory locations for Blob data and system cache. It is recommended that you use fast disks and that these two directories are on separate disks.

Click on the Data Flow Task and view the properties. You need to enter values for two options: SSISBlobTempStoragePath and BufferTempStoragePath. I created directories with those names on the E and L drives.

Blob Path Config Setting

After making this change the data task ran in 5 minutes rather than 8. More importantly it shouldn’t fail in the future

SSIS DB2 and Stored Procedures

Posted: October 22, 2010 in Data Flow, DB2, OLE DB, SSIS

The Scenario

Extracting data from a DB2 file using SSIS when the data is accessed through a stored procedure.

With DTS packages this could be done by using a ODBC driver to connect to the DB2.

The Problems

With DTS packages there was an option to choose Other Driver. From this I could set up an ODBC connection to a DB2 database and execute a stored procedure – this acted as my data source. This ODBC connection used a system DSN which pointed at eh db2 database on an AS400 server.

SSIS does not give you the option of using other driver. What they have instead are OLE DB drivers or .Net drivers.

The OLE DB drivers will not allow the execution of a stored proc on the DB2 to retrieve data.

The IBM OLE DB DB2 provider and Microsoft OLE DB DB2 provider do not support using an SQL command that calls a stored procedure. When this kind of command is used, the OLE DB source cannot create the column metadata and, as a result, the data flow components that follow the OLE DB source in the data flow have no column data available and the execution of the data flow fails..

See: http://msdn.microsoft.com/en-us/library/ms141696.aspx

The Dot.Net connection, in SSIS 2008, will allow you to execute the procedure in an Execute Script Task. It will not allow execution in a data flow task – although it will allow you to preview the data in data flow.

After setting up a Linked Server to the DB2 I encountered the same problem – although the error message was slightly misleading and I spent a long time investigating permissions.

So the only options now are to carry out a row by row import by using an Execute TSQL Task to populate a record set object and then rolling through this to import the data or asking the DB2 DBA to use the stored procedures to populate file which I can then import from.

My forum question on MSDN relating to the issue:
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/7ea40a98-c930-48fc-a961-2fdb977de0d3/#e2edc306-f0f4-45bb-9844-53cf83fb845a

Connect issue raised with Microsoft:
https://connect.microsoft.com/SQLServer/feedback/details/615901/ssis-db2-and-stored-procedures
Please follow the link and vote if the issue effects you and maybe Microsoft will bring out a new driver.

Importing Large Amounts of Data

Posted: October 21, 2010 in Data Flow, ETL, SSIS

The Problem

I have a system which imports large amounts of data over night. This will normally be in the low millions. If it fails for a few nights it can then be tens of millions. The problems is space – I don’t have enough drive space to allow the transaction log to grow large enough to accommodate more than two million rows of data.

The Solution

Create a variable called @loopRowCounter.

Create Another called @loopCount

Use an execute TSQL task to populate that variable with the number of rows in the data source.

You can do this by:

Under General

SQL command should start like this – SELECT count(*) as loopRowCount FROM …

Set the ResultSet to single row

Under Result Set

Set result name to loopRowCount and set it to match your variable – User::loopRowCounter.

Now you have a variable with the number of rows in.

Move on to a For Loop.

In InitExpressions enter this @loopCount = 2000002 – this is to set a default value

In EvalExpression enter this @loopCount > 0

In asignExpression enter @loopCount = @loopRowCounter – inside the container @loopRowCounter will decremented.

Now use a data flow and in the Data source use Select Top (500000) …

You will need to do a check that the rows in the source are not in the destination – I do a join on the table and compare ids

This will limit the flow of data to 500000 rows.

Next got to an Execute SQL Task

Under Parameter Mapping

Select the variable Name User::loopRowCounter give the parameter a name and set it to type Long with a size of 4

Under General

SQL command should start like this:

checkpoint;

SELECT ? – 500000 as loopRowCount

Set the ResultSet to single row

Under Result Set

Set result name to loopRowCount and set it to match your variable – User::loopRowCounter.

We know how may rows were in the table and we assigned that value to User::loopRowCounter. We know we have just move 500000 out so we subtract 500000 from the parameter which represents User::loopRowCounter and the result is then assigned back to User::loopRowCounter.

Now, no matter how many millions of records go through my log file only has to cope with 500000 at a time.

Nearly everything in the package can be configured.

The main items you will want to configure are the passwords in server connections along with the name of the server and possible the initial catalog (database).
To set up a configuration option right click the package main window and select package configurations.
The options available are to use an XML file, Registry Entry, Parent Package Variable or a SQL database.
In this example I have a database called dataimports which I am going to use. In dataimports I have created a schema for each batch of work I use. This means I can separate out all the SSIS logging and configuration and I can give users access through the schemas.

Once you select the SQL server option you will need to select a database connection to use and then a table. If you already have config tables the drop down box will list them – else you can create a new one. You will need to type in a configuration filter. This is a key word that will match up the configuration settings in the table with this package. The name should be easily identifiably like >> DataDownload.
When you click next you will see a list of configurable items. If this is a new package there will not be very many.

You should try to restrict the configured values to the ones really needed else you will make reading the configuration table difficult which can lead to errors.

A config file should be used to store secure data – like passwords. It should also be used to store values which will or could change. For example a package which emails a user upon completion could have the To field of the email task configured. This way if you need to change the recipient you can change the config table. If a server name is going to change as the package is deployed then store the server name in a config file. You shouldn’t store static or trivial data like the protection level of a database connection – this will probably never change; or the connection string to a database and then all the individual elements of the connection. If you are using Windows Authentication you may not need to store any connection data.

What it looks like in the table:

I have created a primary key on ConfigurationFeature and PackagePath – the table is created without any primary key.

Note for passwords the value stored in the config file will be *****. This is a place holder. You will need to change this value to the actual password and set security on the table to stop people reading it.