A tool that helps import (stage) data for reporting purposes. This application will allow users to upload data files in a logical/systematic way so that a Data Factory pipeline can copy the data into an Azure SQL Database. In some instances, the files are normalized during upload for sustainability.
Steps to run this application on your local machine for development purposes.
- Java 14
- NPM (
cd client
thennpm install
) - Azure CLI
- Docker CLI
- KUBECTL CLI (if deploying to AKS)
./gradlew test
- Application:
./build.sh
- Container:
./build-push-docker-image.sh
-
Log into Azure
az login
-
Create a Resource Group, skip this step if you already have one you would like to use
az group create --name <resource-group> --location eastus
-
Create a Blob Storage Account (Don't enable Hierarchical Namespace, i.e. we don't want Data Lake Gen 2)
az storage account create -n <account-name> -g <resource-group> --location eastus --kind StorageV2 --access-tier Hot --sku Standard_ZRS az storage account blob-service-properties update --account-name <account-name> --enable-delete-retention true --delete-retention-days 365
-
An Azure AD App Registration is required to procure an OAuth Client ID and Client Secret; a sample manifest is located here
-
Create a SQL Server, skip this step if you already have one you would like to use
az sql server create -l eastus -g <resource-group> -n <server-name> -u <username> -p <password>
-
Create a SQL Database
az sql db create -g <resource-group> -n <database-name> -s <server-name> -e GeneralPurpose -f Gen5 -z false -c 4 --compute-model Serverless
-
Create a Data Factory using the Azure Portal and deploy the configuration using this repository
-
Create an App Registration for the Data Factory with the Contributor Role, a sample manifest is located here
-
Create a Container Registry, skip this step if you already have one you would like to use
az acr create --resource-group <Recource Group Name> --name <Container Name> --sku Basic
-
From here, you may deploy this application with the appropriate environment variables, see below
-
For detail related to the deployment of containerized applications in Azure, reference this repo with deployment scenarios
- AZURE_OAUTH_CLIENT_ID: Active Directory Client ID
- AZURE_OAUTH_CLIENT_SECRET: Active Directory Client Secret
- AZURE_OAUTH_TENANT_ID: Active Directory Tenant ID
- AZURE_STORAGE_ACCOUNT: Blob Storage Account
- AZURE_STORAGE_ACCOUNT_KEY: Blob Storage Account Key
- DATAFACTORY_CLIENT_ID: Client ID for the data facotry App Registration
- DATAFACTORY_CLIENT_SECRET: Client Secret for the data facotry App Registration
- DATAFACTORY_NAME: The name of the data factory
- DATAFACTORY_RESOURCE_GROUP: The resource group for the data factory
- AZURE_TENANT_ID: The Azure tenant id the datafactory is in
- AZURE_SUBSCRIPTION_ID: The Azure subscription id the datafactory is in
Once the application is initialized, a system administrator needs to be set.
- Go to the application URL (ex: https://dataloader-itadev2.vangos-cloudapp.us/#/)
- Select "Dataloader ADMIN"
- Add the trade.gov email address of a system administrator to the Dataloader ADMIN "Business Unit"
- Click Save