Custom configuring an Azure SSIS integration runtime
I've been doing some Power BI work recently where my customers had on-premises databases that were fairly large. After building a data model in Power BI Desktop that accessed the data directly, we published it up to PowerBI.Com, installed and configured a Power BI Data Gateway, and configured scheduled data refreshes.
In one instance, the on-premises data was hosted by Postgres, and for the other, Sybase SQL Anywhere.
At some point, we decided in each case that we wanted to build a data warehouse (or data mart, if you wish) and use it for the Power BI data model, rather than the source databases directly. The primary reason was that some of the tables were quite large, and most of the data didn't change, so we wanted to load it incrementally. Sales data, in this case - we just load the warehouse staging table with the past 24-48 hours of transactions and incrementally merge it into the main table.
Neither of these customers had SQL Server, but I wanted to use SSIS packages to do the loading, and use Azure SQL database as the target location.
Now that Azure Data Factory V2 supports running SSIS packages, I delved into that configuration. Most of the docs for doing this are pretty good.
However, when it comes to customizing the Azure-SSIS integration runtime, the instructions are ok but I certainly learned a few things along the way. I've done it twice now, and I'm going to show you the things I learned here.
The Azure-SSIS integration runtime is just a docker container running Windows Server of some sort. What I mean by docker container is that it is essentially a virtual machine provisioned and started when necessary, from a template containing the OS and whatever other software it needs. You don't really get any console or remote access to this container, but you can have custom configuration steps to install other software as needed.
Be prepared for a long debug cycle though. It takes about 30-40 minutes for the runtime to start, and your custom steps happen at the very end of that. Hopefully the log files that are gathered will show you enough of what happened, or what didn't happen, or what failed, for you to fix your mistake, update your scripts, and try again.
Your configuration happens via an Azure blog storage account, containing a main.cmd file that you write and whatever other files your main.cmd references.
This is what my main.cmd file looks like:
@echo offpowershell.exe -ExecutionPolicy RemoteSigned -File %~dp0\InstallNetFx35.ps1msiexec /i %~dp0\sqlany12-64bit.msi /passive /l %CUSTOM_SETUP_SCRIPT_LOG_DIR%\sqlanyclient.txtpowershell [reflection.assembly]::Load('iAnywhere.Data.SQLAnywhere.v4.5, Version=12.0.1.41344, Culture=neutral, PublicKeyToken=f222fc4333e0d400') > %CUSTOM_SETUP_SCRIPT_LOG_DIR%\sqlanyverify.log=== Logging started: 10/31/2018  20:41:31 ===Action start 20:41:31: INSTALL.Action start 20:41:32: AppSearch.Action ended 20:41:32: AppSearch. Return value 1.Action start 20:41:32: LaunchConditions.MSI (s) (08:64) [20:41:32:477]: Product: SQL Anywhere 12 Deployment -- .NET Framework 2.0 is required by this product..NET Framework 2.0 is required by this product.Action ended 20:41:32: LaunchConditions. Return value 3.Action ended 20:41:32: INSTALL. Return value 3.MSI (s) (08:64) [20:41:32:482]: Product: SQL Anywhere 12 Deployment -- Installation failed.MSI (s) (08:64) [20:41:32:483]: Windows Installer installed the product. Product Name: SQL Anywhere 12 Deployment. Product Version: 12.01.4403. Product Language: 1033. Manufacturer: iAnywhere Solutions. Installation success or error status: 1603.=== Logging stopped: 10/31/2018  20:41:32 ===DISM /Online /Enable-Feature /FeatureName:NetFx3 /All > %CUSTOM_SETUP_SCRIPT_LOG_DIR%\dism.logDeployment Image Servicing and Management toolVersion: 10.0.14393.0Image Version: 10.0.14393.0Enabling feature(s)[                           0.1%                           ][                           1.1%                           ][==========================100.0%==========================]Error: 0x800f081fThe source files could not be found. Use the "Source" option to specify the location of the files that are required to restore the feature. For more information on specifying a source location, see http://go.microsoft.com/fwlink/?LinkId=243077.The DISM log file can be found at C:\Windows\Logs\DISM\dism.logpowershell Install-WindowsFeature Net-Framework-Core > %CUSTOM_SETUP_SCRIPT_LOG_DIR%\dotnetfx.logInstall-WindowsFeature : The request to add or remove features on the specified server failed.Installation of one or more roles, role services, or features failed.The source files could not be found. Use the "Source" option to specify the location of the files that are required to restore the feature. For more information on specifying a source location, see http://go.microsoft.com/fwlink/?LinkId=243077. Error: 0x800f081fAt line:1 char:1+ Install-WindowsFeature Net-Framework-Core+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    + CategoryInfo          : InvalidOperation: (@{Vhd=; Credent...Name=localh    ost}:PSObject) [Install-WindowsFeature], Exception    + FullyQualifiedErrorId : DISMAPI_Error__Failed_To_Enable_Updates,Microsof    t.Windows.ServerManager.Commands.AddWindowsFeatureCommand Ok, in both those tries, it tells me it needs source files. Where might they be? What's actually in this virtual machine?
I decided to do some sleuthing to get a list of folders in that container:
dir /s /w /b
This is what came back (after 40 minutes):
c:\Bootc:\License Termsc:\PerfLogsc:\setupc:\ssisc:\SSIS64Uniquec:\Usersc:\Windowsc:\Program Files\Attunityc:\Program Files\Common Filesc:\Program Files\internet explorerc:\Program Files\Microsoft Analysis Servicesc:\Program Files\Microsoft Officec:\Program Files\Microsoft OLE DB Provider for DB2c:\Program Files\Microsoft SQL Serverc:\Program Files\Microsoft Visual Studio 10.0c:\Program Files\SharePoint Client Componentsc:\Program Files\Uninstall Informationc:\Program Files\Windows Defenderc:\Program Files\WindowsAppsc:\Program Files\WindowsPowerShellc:\Program Files (x86)\Attunityc:\Program Files (x86)\Common Filesc:\Program Files (x86)\internet explorerc:\Program Files (x86)\Microsoft Analysis Servicesc:\Program Files (x86)\Microsoft Help Viewerc:\Program Files (x86)\Microsoft SQL Serverc:\Program Files (x86)\Microsoft Visual Studio 10.0c:\Program Files (x86)\Microsoft Visual Studio 14.0c:\Program Files (x86)\Microsoft.NETc:\Program Files (x86)\MSECachec:\Program Files (x86)\Uninstall Informationc:\Program Files (x86)\Windows Defenderc:\Program Files (x86)\WindowsPowerShellc:\setup\tvm-1518333292_1-20181101t160318z##2018-11-01T161504Zc:\ssis\TaskExecutorc:\SSIS64Unique\License Terms
Actually, that's a heavily redacted list - there was actually 1.5 MB of folders listed there. Anyway, it's interesting to see what all is installed in this container. But I'm still not sure where the Windows source files could be.
THen my sub-conscious reminded me that I'd seen something about customization samples somewhere. Sure enough, there is a set of sample scripts available, it's mentioned on the How To Configure Azure SSIS IR Custom Setup page. Once you plug this URL into Azure Storage Explorer, you get some scenarios that are likely quite helpful.
Hey look! A scenario that installs .NET Framework 3.5! I think that's what I want to do.
That folder contains a Powershell script, InstallNetFx35.ps1, and a file microsoft-windows-netfx3-ondemand-package.cab.
The script looks like this:
if ((Get-WindowsFeature -Name Net-Framework-Core).Installed){    Write-Output ".NET framework 3.5 has already been installed."}else{    if ((Install-WindowsFeature -Name Net-Framework-Core -Source (Get-Location).Path -LogPath %CUSTOM_SETUP_SCRIPT_LOG_DIR%\install.log).Success)    {        Write-Output ".NET framework 3.5 has been installed successfully"    }    else    {        throw "Failed to install .NET framework 3.5"    }}
I have no idea how I would have found that cab file on my own, but I'm glad it was there.
So after I added the first Powershell to call that script, the setup log for the SQL Anywhere MSI was happy:
=== Logging started: 11/1/2018  18:25:01 ===Action start 18:25:01: INSTALL.Action start 18:25:01: AppSearch.Action ended 18:25:01: AppSearch. Return value 1.
<bunch of lines deleted>
Action ended 18:25:40: InstallFinalize. Return value 1.Action ended 18:25:40: INSTALL. Return value 1.MSI (s) (B4:80) [18:25:40:710]: Product: SQL Anywhere 12 Deployment -- Installation completed successfully.MSI (s) (B4:80) [18:25:40:711]: Windows Installer installed the product. Product Name: SQL Anywhere 12 Deployment. Product Version: 12.01.4403. Product Language: 1033. Manufacturer: iAnywhere Solutions. Installation success or error status: 0.=== Logging stopped: 11/1/2018  18:25:40 ===My final Powershell call is a sanity check that the data provider is actually installed correctly.
The SQL Anywhere provider is an ADO.NET Provider, so it uses ADO.NET and the .NET Framework to run in. I can use .NET reflection to instantiate the class from PowerShell, to prove that it loads ok.
powershell [reflection.assembly]::Load('iAnywhere.Data.SQLAnywhere.v4.5, Version=12.0.1.41344, Culture=neutral, PublicKeyToken=f222fc4333e0d400') > %CUSTOM_SETUP_SCRIPT_LOG_DIR%\sqlanyverify.log
The trick is to get the string inside the Load() method correct. That's pretty easy, once you know how.
In SSDT where I've built my SSIS package, I have my SQL Anywhere connection managers defined at the project level. So if I open the MyConnection.connmgr file (using View Code rather than View Designer), I see something like this:
The magic string is in line 5. After "ADO.NET:" the string is a .NET class name ("iAnywhere.Data.SQLAnywhere.SAConnection") and assembly information ("iAnywhere.Data.SQLAnywhere.v4.5, Version=12.0.1.41344, Culture=neutral, PublicKeyToken=f222fc4333e0d400") for where to load the class from. We use Powershell to attempt to load the assembly and it either raises an exception if the assembly is not installed or registered properly, or it succeeds.
DTS:CreationName="ADO.NET:iAnywhere.Data.SQLAnywhere.SAConnection, iAnywhere.Data.SQLAnywhere.v4.5, Version=12.0.1.41344, Culture=neutral, PublicKeyToken=f222fc4333e0d400">
Here is the line in main.cmd:
powershell [reflection.assembly]::Load('iAnywhere.Data.SQLAnywhere.v4.5, Version=12.0.1.41344, Culture=neutral, PublicKeyToken=f222fc4333e0d400') > %CUSTOM_SETUP_SCRIPT_LOG_DIR%\sqlanyverify.log
This is what the sqlanyverify.log file contained:
GAC    Version        Location                                                 ---    -------        --------                                                 True   v4.0.30319     C:\Windows\Microsoft.Net\assembly\GAC_MSIL\iAnywhere.D...
At this point, my SSIS package will be able to create the connection manager for Sybase successfully. Assuming that all my networking configuration for this container is done correctly (which I didn't touch on in this blog, but the instructions are fairly straightforward), my package should be able to run.
I'll make another blog post to show a few details around installing the Postgres ADO.NET provider in a container.
I hope this helps someone who is trying to configure their SSIS Integration Runtime with customized components.
Mike