Archive for November, 2011

Slow Linked Server Query.

Posted: November 24, 2011 in Linked Servers, OpenQuery

A BI Developer reported a problem when using a linked server. Queries which normally executed at a reasonable speed were now taking 20-30 minutes.

I logged on to the linked server and ran a simple select

select * from sys.databases;

This returned the results set in less than 1 millisecond.

I then ran the developer’s query and it came back with the results in 5-6 minutes (this is roughly the time I was told it would take). So there was nothing wrong with the remote server itself.

I took the same command and converted the From clauses to run through the linked server using 4 part naming: servername.database.schea.object

I limited this result set to TOP 100. This came back in about 6 minutes.
As it ran there was a wait on the linked server for CXPacket. After this there was a wait for Async_Network_IO.

I ran the same query again, using the 4 part name, but this time I removed the TOP 100 clause. I watched the Activity Monitor on the remote server again. There was the same CXPacket wait – which took about the same amount of time as before. Then it switched to Async_Network_IO. I killed the query after 20 minutes.

After a bit of fiddling with the query I ran it a third time, using open query and the results came back in 2.5 minutes – this was around 44.5k rows. (The query had already been run once on the local server so this query took advantage of the pre-cached execution plan.)

Conclusion

This type of query (one which selects data from very large tables) is not suitable for linked servers using the four part naming convention.

When a query compiles, using this convention, it assumes that there is no difference between using the linked server and using the local server.

The way SQL returns data is to grab data from all the tables in the FROM clause (in order left to right) and then filter them on the joins then filter on the columns selected and finally filter on the WHERE clause. This is all done in memory and the final results returned. Using the linked server it can’t all be done in memory – the data is selected on the remote server and then sent back to the local server to begin filtering – so instead of waiting for a memory function we are waiting for a network connection.

When the same query is run using openquery the query is passed to the linked server for execution and the smaller result set is returned.

Advertisements

The Problem

A data import failed with the following error messages:

  • Failed to retrieve long data for column “Description”.
  • There was an error with output “OLE DB Source Output” (742) on component “OLE DB Source” (732). The column status returned was: “DBSTATUS_UNAVAILABLE”.
  • SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The “output “OLE DB Source Output” (742)” failed because error code 0xC0209071 occurred, and the error row disposition on “component “OLE DB Source” (732)” specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
  • SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component “OLE DB Source” (732) returned error code 0xC0209029. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

The column description is of the data type VarChar(Max). This error will normally occur when the data type is some form of BLOB

The Solution

When SSIS Imports BLOB data it writes the data our to a file and then imports the file. By default it will use the system temp directory for this. However you can set two directory locations for Blob data and system cache. It is recommended that you use fast disks and that these two directories are on separate disks.

Click on the Data Flow Task and view the properties. You need to enter values for two options: SSISBlobTempStoragePath and BufferTempStoragePath. I created directories with those names on the E and L drives.

Blob Path Config Setting

After making this change the data task ran in 5 minutes rather than 8. More importantly it shouldn’t fail in the future