Using Parallel HttpClientConnection objects in SSIS
In SSIS (SQL Server Integration Service) you often need to talk with web resources like WCF services. To do that you will generally define the connection using Connection Manager and HTTP. This sets up SSIS to communicate with the remote resource. To use the connection you normally use a script task to get the connection from Connection Manager, create an HttpClientConnection object and then use the methods on the connection to communicate with the remote server. You will generally find code similar to the following.
' Get connection from connection manager Dim cm = Dts.Connections("MyHttpConnection").AcquireConnection(null) ' Create connection and adjust the URL by adding a query parameter or something Dim conn = new HttpClientConnection(cm) conn.ServerURL = AddQueryStringToUrl(conn.ServerURL) ' Use it to download a file conn.DownloadFile(someFile, True)
This code works great if you have a sequential package but if you have multiple script tasks running in parallel, such as when you break up processing into pipelines, then the above code will not work correctly. You can confirm this for yourself by copying the task and creating multiple containers running in parallel. I posted in the SQL forums for help and not even Microsoft felt this should cause a problem but in fact it does.
The underlying problem is in
HttpClientConnection itself. It doesn’t actually do anything other than wrap the COM object representing the connection. The problem with this approach is that all instances of the class using the same connection manager object will be using the same underlying COM object. If it is called in parallel then any changes to the state of one connection impacts all the connections. This makes using it in parallel tasks useless.
While the problem is unexpected, the solution is straightforward. All that needs to be done is to have each connection be separate from all other connections. To accomplish this we simply need to copy the COM object. This is one of the few times where the Clone method actually makes sense and works. By calling it we get a different copy of the COM object allowing us to run in parallel. It does seem odd to have to new up an object and then clone it but it works.
' Get connection from connection manager Dim cm = Dts.Connections("MyHttpConnection").AcquireConnection(null) ' Create connection and adjust the URL by adding a query parameter or something Dim conn = new HttpClientConnection(cm).Clone(); conn.ServerURL = AddQueryStringToUrl(conn.ServerURL) ' Use it to download a file conn.DownloadFile(someFile, True)
Interestingly enough you do not have to worry about cleaning up the objects. Even if you wanted to you couldn’t as there are no cleanup methods exposed.
It seems odd to me that a core component of SSIS doesn’t support parallel processing since breaking work up into pipelines is common with SSIS. It also seems odd that this isn’t documented anywhere. Nevertheless the solution is straightforward and doesn’t require a lot of extra code. Hopefully this saves others the time I had to take to find and fix this issue.