Azure Feature Pack for Integration Services (SSIS)
Applies to:SQL Server (all supported versions) SSIS Integration Runtime in Azure Data Factory
SQL Server Integration Services (SSIS) Feature Pack for Azure is an extension that provides the components listed on this page for SSIS to connect to Azure services, transfer data between Azure and on-premises data sources, and process data stored in Azure.
The download pages also include information about prerequisites. Make sure you install SQL Server before you install the Azure Feature Pack on a server, or the components in the Feature Pack may not be available when you deploy packages to the SSIS Catalog database, SSISDB, on the server.
Components in the Feature Pack
Use TLS 1.2
The TLS version used by Azure Feature Pack follows system .NET Framework settings. To use TLS 1.2, add a value named with data under the following two registry keys.
Dependency on Java
Java is required to use ORC/Parquet file formats with Azure Data Lake Store/Flexible File connectors.
The architecture (32/64-bit) of Java build should match that of the SSIS runtime to use. The following Java builds have been tested.
Set Up Zulu's OpenJDK
- Download and extract the installation zip package.
- From the Command Prompt, run .
- On the Advanced tab, select Environment Variables.
- Under the System variables section, select New.
- Enter for the Variable name.
- Select Browse Directory, navigate to the extracted folder, and select the subfolder. Then select OK, and the Variable value is populated automatically.
- Select OK to close the New System Variable dialog box.
- Select OK to close the Environment Variables dialog box.
- Select OK to close the System Properties dialog box.
If you use Parquet format and hit error saying "An error occurred when invoking java, message: java.lang.OutOfMemoryError:Java heap space", you can add an environment variable to adjust the min/max heap size for JVM.
Example: set variable with value . The flag Xms specifies the initial memory allocation pool for a Java Virtual Machine (JVM), while Xmx specifies the maximum memory allocation pool. This means that JVM will be started with amount of memory and will be able to use a maximum of amount of memory. The default values are min 64MB and max 1G.
Set Up Zulu's OpenJDK on Azure-SSIS Integration Runtime
This should be done via custom setup interface for Azure-SSIS Integration Runtime. Suppose is used. The blob container could be organized as follows.
As the entry point, triggers execution of the PowerShell script which in turn extracts and sets accordingly.
If you use Parquet format and hit error saying "An error occurred when invoking java, message: java.lang.OutOfMemoryError:Java heap space", you can add command in to adjust the min/max heap size for JVM. Example:
The flag Xms specifies the initial memory allocation pool for a Java Virtual Machine (JVM), while Xmx specifies the maximum memory allocation pool. This means that JVM will be started with amount of memory and will be able to use a maximum of amount of memory. The default values are min 64MB and max 1G.
Set Up Oracle's Java SE Runtime Environment
- Download and run the exe installer.
- Follow the installer instructions to complete setup.
Scenario: Processing big data
Use Azure Connector to complete following big data processing work:
Use the Azure Blob Upload Task to upload input data to Azure Blob Storage.
Use the Azure HDInsight Create Cluster Task to create an Azure HDInsight cluster. This step is optional if you want to use your own cluster.
Use the Azure HDInsight Hive Task or Azure HDInsight Pig Task to invoke a Pig or Hive job on the Azure HDInsight cluster.
Use the Azure HDInsight Delete Cluster Task to delete the HDInsight Cluster after use if you have created an on-demand HDInsight cluster in step #2.
Use the Azure HDInsight Blob Download Task to download the Pig/Hive output data from the Azure Blob Storage.
Scenario: Managing data in the cloud
Use the Azure Blob Destination in an SSIS package to write output data to Azure Blob Storage, or use the Azure Blob Source to read data from an Azure Blob Storage.
Use the Foreach Loop Container with the Azure Blob Enumerator to process data in multiple blob files.
- Updated target .NET Framework version from 4.6 to 4.7.2.
- Renamed "Azure SQL DW Upload Task" to "Azure Synapse Analytics Task".
- When accessing Azure blob storage and the machine running SSIS is in a non en-US locale, package execution will fail with error message "String not recognized as a valid DateTime value".
- For Azure Storage Connection Manager, secret is required (and unused) even when Data Factory managed identity is used to authenticate.
- Added support for shared access signature authentication to Azure Storage connection manager.
- For Flexible File task, three are three improvements: (1) wildcard support for copy/delete operations is added; (2) user can enable/disable recursive searching for delete operation; and (3) the file name of Destination for copy operation can be empty to keep the source file name.
This is a hotfix version released for SQL Server 2019 only.
- When executing in Visual Studio 2019 and targeting SQL Server 2019, Flexible File Task/Source/Destination may fail with the error message
- When executing in Visual Studio 2019 and targeting SQL Server 2019, Flexible File Source/Destination using ORC/Parquet format may fail with the error message
- In certain cases, package execution reports "Error: Could not load file or assembly ‘Newtonsoft.Json, Version=184.108.40.206, Culture=neutral, PublicKeyToken=30ad4fe6b2a6aeed’ or one of its dependencies."
- Add delete folder/file operation to Flexible File Task
- Add External/Output data type convert function in Flexible File Source
- In certain cases, test connection malfunctions for Data Lake Storage Gen2 with the error message "Attempted to access an element as a type incompatible with the array"
- Bring back support for Azure Storage Emulator
Azure Blob Storage Data Upload with SSIS
By: Rajendra Gupta | Updated: 2020-12-16 | Comments | Related: > Azure
Azure provides a cloud solution for storing data using Azure Blob Storage. We need to export SQL Server data and store it in Azure blob storage. How can we do so? This tip will cover the following topics.
- A brief overview of Azure storage.
- Setup Azure storage account, containers.
- Configure an SSIS package for data upload into the blob storage.
- Download the data from blob storage into the local storage.
Azure Blob Storage Overview
Azure Storage provides a scalable, reliable, secure and highly available object storage for various kinds of data. You get the following kinds of data storage:
- Azure Blobs: An object-level storage solution similar to the AWS S3 buckets. You can store the file and access it through a URL. It can be used for streaming audio, video, documents, binary log files, images, etc.
- Azure Files: Use this to configure the file shares for on-premises or the cloud deployments.
- Azure Queues: Store a large number of messages for communication between application components.
- Azure tables: Azure tables can store the structured NoSQL data in the cloud. It is suitable for structured and non-relational data.
- Azure disks: Azure disks are used as storage volumes for the Azure Virtual machines.
In this tip, we are working with the Azure Blobs for storing the exported data from SQL Server.
Azure Storage Account
Log in to the Azure Portal by specifying your Azure subscription credentials. If you do not have access, use this link to Create your Azure free account.
Once you login to the Azure portal, a dashboard is launched for all Azure services.
Click on the Storage accounts under the Azure services. As shown below I do not have a configured storage account in my subscription.
Click on Create storage account. In the below storage account configuration, enter the following values.
- Resource group: A resource group is an Azure container for Azure resources. Choose an existing resource group or create a new group. Here, I use the new resource group – StorageAccountResourceGroup.
- Storage account name: Enter a unique storage account name.
- Location: Select the Azure region from the drop-down. You should choose the nearest region to avoid network latency.
We can go ahead with default values for the other configurations.
First, validates all configurations for the storage account. Once the validations pass, select Create.
Next, the deployment will start for the storage account.
A storage account is created quickly. Click on Go to resource to open the storage account.
In the storage account you can see storage options as described earlier.
Create a Container in the storage account
We need to create a container for blob storage. Select "Containers". If there are any existing containers you can view them or create a new container.
Select this "Container" and specify a name for your container. Here, we specify the container name – datauploadssis.
Our container datauploadssis now exists in the mssqltipsdata storage account.
Install Azure Storage Explorer
The Azure Storage Explorer tool manages the storage account and works with Azure blobs, files, queues, tables, Azure Cosmos DB and Azure Data Lake Storage entities.
Launch the Azure Storage Explorer and choose the option Use a storage account name and the key to connect to the storage account.
To connect with the storage account, use the access keys. You can click on the access keys option and it shows the following information:
- Storage account name
- Access keys
Note the access keys and enter the next screen to connect to the storage account.
Click Next. Review your storage account name, display name and access key.
Click on Connect. Here, we see the storage account and Container.
We can create a new directory in the Container for uploading files. Click on New Folder shown above and enter a folder name as shown below.
In the Azure storage explorer you can view the connected storage account, container (datauploadssis) and directory (Mydata).
Download Azure Feature Pack for Integration Services
We need to install the Azure feature pack for Integration Services to work with the Azure resources in the SSIS package. Before you install the Azure feature pack, make sure to have the following environment for this article.
Download and install the SSIS feature pack as per your integration service version.
Create a new SSIS Package for Azure Blob Storage Upload
Launch Visual Studio 2019 and create a new Integration Services Project.
Specify a project name and directory to store the package files.
In the SSIS toolbox we see the Azure blob tasks as highlighted in the square box.
To export the data from a SQL database table, add a data flow task.
In the data flow task, add an OLE DB Source (renamed as Source SQL Data) and destination (renamed as Destination CSV). We will not discuss in further detail the configuration of these OLE DB sources and destinations in this tip. You can follow these integration services tips for detailed steps.
For reference purposes, my OLE DB source details are as below:
- SQL instance: SQLNode2\INST1
- Source Database: AdventureWorks2019
- Table for export: dbo.orders
The configuration for the Flat file destination is as below:
- Flat file format: CSV
- File name: C:\Test\SampleDataUpload.CSV
Add the Azure Blob Upload Task and join it with the Data flow task as shown below.
We need to configure the Azure Blob Upload task. In the task editor, perform the following configurations.
AzureStorageConnection: Add a new storage connection by specifying the storage account name, authentication (access keys), and account key (access key).
Select "Test Connection" to verify the connection with the storage account.
Once the storage account connection is successful, enter the blob container and directory as we configured earlier.
In the source folder, enter the path for the CSV file. This is the output of the data flow task.
Click OK to finish the configuration. Execute the SSIS package and it should run successfully.
Refresh the Azure Storage container and you should see the uploaded CSV using the SSIS Package.
You can view this using the Azure Storage Explorer.
SSIS Package for Azure Blob Storage Download
Similar to the Azure Blob Upload Task, use Azure Blob Download Task for downloading files from Azure blob storage as shown below.
In the below package, we also use an Execute SQL task in between the Azure Blob Upload Task and Azure Blob Download Task.
This Execute SQL task is to introduce a wait between a blob upload and download. If you have a large file to upload into the storage container, it might take some time depending upon the network bandwidth. Therefore, the wait might solve the issue if you want to download the same file.
Below is the configuration for the Execute SQL task.
Below is the configuration for the Azure Blob Download task.
Execute the SSIS package and see all tasks completed successfully.
- Go through existing tips on Microsoft Azure.
- Read more about the Azure Storage services in Microsoft docs.
About the author
View all my tips
Article Last Updated: 2020-12-16
- Chillicothe ohio weather
- Logitech ultimate ears boom 3
- Gas prices fresno
- Graduation caps decorated with pictures
- Menards flooring
About chanmingmanSince March 2011 Microsoft Live Spaces migrated to Wordpress (http://www.pcworld.com/article/206455/Microsoft_Live_Spaces_Moves_to_WordPress_An_FAQ.html) till now, I have is over 1 million viewers. This blog is about more than 50% telling you how to resolve error messages, especial for Microsoft products. The blog also has a lot of guidance teaching you how to get stated certain Microsoft technologies. The blog also uses as a help to keep my memory. The blog is never meant to give people consulting services or silver bullet solutions. It is a contribution to the community. Thanks for your support over the years. Ming Man is Microsoft MVP since year 2006. He is a software development manager for a multinational company. With 25 years of experience in the IT field, he has developed system using Clipper, COBOL, VB5, VB6, VB.NET, Java and C #. He has been using Visual Studio (.NET) since the Beta back in year 2000. He and the team have developed many projects using .NET platform such as SCM, and HR based applications. He is familiar with the N-Tier design of business application and is also an expert with database experience in MS SQL, Oracle and AS 400.
View all posts by chanmingman →
SQL Server Integration Services Flexible File Task with Azure Data Lake Storage
By: John Miner | Updated: 2020-05-20 | Comments | Related: 1 | 2 | 3 | > Azure Integration Services
Integration Services has a colorful product history over the last 15 years since its release. The extract, translate and load product was introduced with the release of SQL Server 2005. This was a major change in design philosophy with programming tasks divided into control flows and data flows. Microsoft introduced the SSIS catalog in SQL Server 2012 to satisfy runtime tracking requirements of packages. The idea of a flight recorder was a major enhancement. Not only did the package have a step by step trace of its execution, it has runtime statistics for each step. The incremental package deployment option in SQL Server 2016 eliminate duplicate code from being copied to the catalog. Both enhancements have truly made SQL Server Integration Services a world class product.
There are a ton of packages out there written in various versions of the product. The management of folders and files is part of normal data processing. If your company is thinking of lifting and shifting these packages to Azure Data Lake Storage, how can you replicate these file management tasks using SSIS?
The most recent Azure Feature Pack for Visual Studio 2019 was released in November of 2019. Microsoft has been supplying the SSIS developer with a set of tools for Azure since 2012. Most systems designed for Azure use two types of storage: Blob Storage and Data Lake Storage. The Flexible File Task is the next evolution in managing files that reside on local, blob and data lake storage systems. System designers should only use generation 2 of Azure Data Lake Storage. The prior version is deprecated and will not work with this control.
Today, we are going to investigate how the Flexible File Task can replace existing legacy file management code.
There are three use cases that in which the Flexible File Task can come in handy.
Because this Flexible File Task works in a variety of storage layers, replacing existing code with this new control might be a good idea. This will ease any future migrations to the cloud. Usually I like to compare and contrast older versus new controls. However, that is not possible since Azure Data Lake Storage went to general availability in February 2019. This feature pack is the first set of tools that work with the newer storage system.
It is assumed that a development machine with Visual Studio 2019 and the SSIS extensions has been created. See this prior article for a typical build list for a development machine. In addition, the Azure Feature Pack for Visual Studio 2019 must be downloaded and installed within the environment. I chose to install the x86 feature pack since drivers are plentiful for this build version.
The focus of this article will be on using the new control with Azure Data Lake Storage, Generation 2.
Local Data Files
The Flexible File Task can only be used to copy and delete files. It is important to have a set of staged files that can be used in the example packages that we created. The image below shows the combined S&P 500 daily stock information compressed into a one zip file for each year. Each zip file contains 505 data files, one for each S&P 500 company. There is a total of 6 years of data shown in the image below.
Use Case #3 – Azure Data Lake Storage
The most exciting part of the new Azure Feature Pack is the ability to work with files stored on Azure Data Lake Storage, Generation 2 (ADLS2). The Azure Blob Filesystem (ABFS) Driver is an interface between the client and the Azure REST API. Please see the below image from Microsoft for details. This driver supports the Hadoop Filesystem logical calls from Big Data Tools such as Azure Data Bricks and Azure SQL Data Warehouse. It took Microsoft several months to write the new task controls, the destination connection and the source connection to work with ADLS2.
The diagram below shows a typical file movement from on premise to the cloud. In this section, we are going to use a For Each Loop Container to iterate over both local and ADLS2 files. This repetitive loop will allow for a single action be applied to a group of files. Thus, we can copy or delete a bunch of files using this design.
At this time, let us create a blob storage account and a data lake storage container. Please adopt an Azure object naming convention for your company. I am using the abbreviation "sa" for storage account and "sc" for storage container. The image below shows a storage account on the default dashboard in the Azure Portal.
Bug Found In Control
Before we move onto the examples, I need to talk about a bug in Azure Feature Pack for Visual Studio 2019. The Flexible File task in version 220.127.116.11 might error when working with ADLS2. I had to work with the engineering team to find the bug and they suggested the work around.
The image below shows the output from the failed execution. The error message states "Attempted to access an element as a type incompatible with the array." How do we catch the actual root cause of the bug?
We need to apply break points on the package before and after the reporting of the error. The system internals, process explorer utility can be used to examine the runtime executable used for debugging (DtsDebugHost.exe). The image below shows the root cause of the issue. There are two versions of the Newtonsoft.Json.DLL being loaded into memory.
The fix for this current issue is to find the "DtsDebugHost.config" file under AppData folder for the current user profile. Edit the file. Insert the below redirect statement (XML) just before the closing tag </assemblyBinding>. Make sure your restart Visual Studio to get this new binding correctly applied.
Assigning rights to Service Principle
The File SystemTask has to use a connector to log into Azure Data Lake Storage. It only supports a user defined service principle. Please see my prior article on how to create a service principle. Azure storage explore is still the only way to assign ACL rights to directories that you create. Please see my article on using azure storage explorer with ADLS2. Please create two directories underneath the root directory of the file system. The image below shows the stocks directory for unprocessed files and the archives directory for processed files.
Please grab the application id, directory id, and object id assigned to your service principle. The image below shows the details of the service principle named svcprn01. Use the third hyper link on the left to obtain the object id for the service principle. An application key (secret) will need to be defined for the principle. Please see Microsoft documentation for details.
The image below shows a look up of the svcprn01 using the hidden object id.
Please assign access control permissions of read, write and execute to the service principle at the root and subdirectories. The image below shows the assignment at the root directory. Repeat the same action for the /stocks and /archives sub-directories.
Last but not least, the Service Principle needs to be given RBAC rights to the storage container in Azure. We can see below that both the user (Dilbert) and the service principle (svcprn01) have owner rights.
Make sure you test the connectivity of the ADLS Gen 2 connector before proceeding. If you get a failure, please double check the permissions.
Copy Local Files to ADLS2
The last package will use three unit tests to make sure the Flexible File task is ready for production use. The first unit test is to copy a bunch of files from the c:\data directory on premise to the \sc4flexfile2adls2\stocks directory on ADLS2. The successful execution of the "FELC_LOCAL_DATA_FILES" container is shown below. The "For Each Loop Container" uses an enumerator to find any matching files. Then, for each file it will call the "Flexible File Task" to perform an action. This design pattern will be used for all three unit tests.
The "For Each Loop Container" will be looking for any files in the source directory that have a zip extension. This control has many different enumerators. We are interested in the enumerators for LOCAL and ADLS2 storage.
Please define three variables for this task: [varFileNm] – dynamically updated by loop container, [varSrcDir] – the location of the source files, and [varDstDir] – the location of the destination files. In addition, we need a second destination variable and a second file name variable. These variables will be explained and used later on.
Make sure that you map the file name variable to the output of the container given at index 0.
Initially, we have to hard code the "flexible file task" so that errors are not produced. Mark the task object delay validation property to true to avoid errors.
The most important part of the process is to map the properties exposed by the control to the variables that we defined. Use the task editor to set the property expressions. In this example, we are using the three variables that we previously defined. This mapping turns a static execution into a dynamic execution.
A successful execution of the container will copy the files from on premise to Azure Data Lake Storage. The Azure Storage Explorer image shows the results of Test 3a. Please disable this container so that we can create the next two unit tests.
Flexible File Task – Manage files in ADLS Gen2
The second unit test is to copy 6 files from \sc4flexfile2adls2\stocks to the \sc4flexfile2adls2\archives folder. The successful execution of the "FELC_ADLS2_STOCK_FILES" container is shown below.
There are configuration differences to note when using the ADLS2 enumerator. First, the folder path to the storage is using LINUX like pathing. Second, there is no pattern matching to reduce the resulting file list. Third, we cannot dictate the format of the retrieved file name.
To shorten the article, I am only going to skip steps that are the same and previously examined in the prior paragraph. For instance, the variable mappings section will be using a different variable, varFileNm2. However, this step uses the same dialog box seen before and is required for the next two unit tests.
The above image shows the source and destination connections using data lake storage. The folder paths are pointing to the directories that we created using the Azure Storage Explorer. The image below shows the property expression to variable mapping. Why did I choose to use a different file variable? Unlike the local file system enumerator, we have no control over the format of the file returned by the for each loop. I wanted to point this out. The returned file name is a fully qualified path. We will have to use functions in SSIS to create a formula to extract the file name.
The image above shows the usage of three variables. The file name variable will have to be modified. Please see the image below. Use the expression builder to enter the following formula. It will return just the file name assuming it is proceeded by a slash.
A successful execution of the container will copy the 6 files from the stocks to archives directory. Please disable the "For Each Loop Container" at this time so that we can create the final test.
The third unit test deletes the 6 files from the \sc4flexfile2adls2\stocks folder to complete the logical MOVE action. The successful execution of the "FELC_ADLS2_ARCHIVE_FILES" container is shown below.
Similar actions must be executed to setup the enumerator and variable mappings of the "For Each Loop Container". Since these steps are not new, I am going to jump right to the "Flexible File Task" properties. The connection, folder path and file name are hard coded in the image below. We will need to use expressions to variable mappings to make this task dynamic.
Again, we need to apply a SSIS formula to the convert the fully qualified path to just the file name and file extension.
In February of 2019, Azure Data Lake Storage Generation 2 (ADLS2) went to general availability. However, toolsets like PowerShell and SQL Server Integration Services were unable to take advantage of this new storage. The Flexible File Task released in November of 2019 allows the developer to write programs that can copy and delete files to, from, and within ADLS2 storage.
The Flexible File Task is the next evolution in controlling the copying and deleting of files regardless if they exist in local, blob storage or data lake storage. You might want to consider replacing older controls with this newer one when legacy programs are changed. This preparation will ease the transition to the cloud if you still want to use SSIS as an ETL tool.
Regarding security, the use of a service principle or managed identity is always preferred over a certificate. That is why ADLS2 storage is more secure since it has both RBAC and ACL levels of permissions. Regardless of what type of Azure storage you are using, always test your connections before usage.
In closing, the use of naming conventions and documentation make the longevity of the package easier for the teams on support. The "Flexible File Task" must be used in conjunction with a "For Each Loop Container" to copy or delete a set of files. Next time, we can talk about the Azure Data Lake File System Task which performs the bulk actions within one control. It is important to note that the Azure Storage Explorer is still the main application to create directories, delete directories and manage file system access.
- What is new in the Azure Feature Pack for SSIS 2019
- Use the Azure Data Lake File System Task in a control flow
- Use the Flexible File Source in a data flow
- Use the Flexible File Destination in a data flow
About the author
View all my tips
Article Last Updated: 2020-05-20
Azure feature pack ssis
.Azure Feature Pack for Integration Services
- Craigslist cabool, mo
- Yandere simulator characters
- Mega man imdb
- Outlaw offroad
- Star session models
- Investors hub board
- Diy window well scenes
- True martial world
- Winchester cvs pharmacy
- Amish rocking chair
- Phantom forces mods
- Minecraft revenge
- Water pallet costco