21 Jun 2024

10 Advanced SQL Concepts You Should Know for Data Science Interviews

 

10 Advanced SQL Concepts You Should Know for Data Science Interviews

  1. Common table expressions (CTEs).
  2. Recursive CTEs.
  3. Temporary functions.
  4. Pivoting data with case when.
  5. Except versus Not In.
  6. Self joins.
  7. Rank versus dense rank versus row number.
  8. Calculating delta values.
  9. Calculating running totals.
  10. Date-time manipulation.


1. Common Table Expressions (CTEs)

If you’ve ever wanted to query a query, that’s when common table expressions (CTEs) come into play. CTEs essentially create a temporary table.

Using CTEs is a great way to modularize and break down your code just like you would break an essay down into several paragraphs.

Consider the following query with a subquery in the where clause:

SELECT name
       , salary
FROM People
WHERE name in (SELECT DISTINCT name 
               FROM population 
               WHERE country = "Canada"
                     AND city = "Toronto")
      AND salary >= (SELECT AVG(salary)
                     FROM salaries
                     WHERE gender = "Female")

This may not seem too difficult to understand, but what if there were many subqueries or subqueries within subqueries? This is where CTEs come into play.

with toronto_ppl as (
   SELECT DISTINCT name
   FROM population
   WHERE country = "Canada"
         AND city = "Toronto"
)
, avg_female_salary as (
   SELECT AVG(salary) as avgSalary
   FROM salaries
   WHERE gender = "Female"
)
SELECT name
       , salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
      AND salary >= (SELECT avgSalary FROM avg_female_salary)

Now, the where clause is clearly filtering for names in Toronto. CTEs are useful both because you can break down your code into smaller chunks and because you can assign a variable name to each CTE (i.e., toronto_ppl and avg_female_salary).

CTEs also allow you to do more advanced techniques like creating recursive tables.

 

2. Recursive CTEs

A recursive CTE is a CTE that references itself, just like a recursive function in Python. Recursive CTEs are especially useful when querying hierarchical data like organization charts, file systems, a graph of links between web pages, and so on.

A recursive CTE has three parts:

  1. The anchor member, which is an initial query that returns the base result of the CTE.
  2. The recursive member is a recursive query that references the CTE. This is UNION ALLed with the anchor member
  3. A termination condition that stops the recursive member.

Here’s an example of a recursive CTE that gets the manager ID for each staff ID:

with org_structure as (
   SELECT id
          , manager_id
   FROM staff_members
   WHERE manager_id IS NULL
   UNION ALL
   SELECT sm.id
          , sm.manager_id
   FROM staff_members sm
   INNER JOIN org_structure os
      ON os.id = sm.manager_id

 

3. Temporary Functions

Knowing how to write temporary functions is important for several reasons:

  1. It allows you to break code down into smaller chunks.
  2. It’s useful for writing cleaner code.
  3. It prevents repetition and allows you to reuse code, similar to using functions in Python.

Consider the following example:

SELECT name
       , CASE WHEN tenure < 1 THEN "analyst"
              WHEN tenure BETWEEN 1 and 3 THEN "associate"
              WHEN tenure BETWEEN 3 and 5 THEN "senior"
              WHEN tenure > 5 THEN "vp"
              ELSE "n/a"
         END AS seniority 
FROM employees

Instead, you can leverage a temporary function to capture the CASE clause.

CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS (
   CASE WHEN tenure < 1 THEN "analyst"
        WHEN tenure BETWEEN 1 and 3 THEN "associate"
        WHEN tenure BETWEEN 3 and 5 THEN "senior"
        WHEN tenure > 5 THEN "vp"
        ELSE "n/a"
   END
);
SELECT name
       , get_seniority(tenure) as seniority
FROM employees

With a temporary function, the query itself is much simpler and more readable. You can also reuse the seniority function. For more on temporary functions, 

4. Pivoting Data With Case When

You’ll most likely see many questions that require the use of case when statements, and that’s simply because it’s such a versatile concept. It allows you to write complex conditional statements if you want to allocate a certain value or class depending on other variables.

Less commonly known, however, is that it also allows you to pivot data. For example, if you have a month column, and you want to create an individual column for each month, you can use case when statements to pivot the data.

Example Question: Write an SQL query to reformat the table so that there is a revenue column for each month.

Initial table:
+------+---------+-------+
| id   | revenue | month |
+------+---------+-------+
| 1    | 8000    | Jan   |
| 2    | 9000    | Jan   |
| 3    | 10000   | Feb   |
| 1    | 7000    | Feb   |
| 1    | 6000    | Mar   |
+------+---------+-------+
Result table:
+------+-------------+-------------+-------------+-----+-----------+
| id   | Jan_Revenue | Feb_Revenue | Mar_Revenue | ... | Dec_Revenue |
+------+-------------+-------------+-------------+-----+-----------+
| 1    | 8000        | 7000        | 6000        | ... | null        |
| 2    | 9000        | null        | null        | ... | null        |
| 3    | null        | 10000       | null        | ... | null        |
+------+-------------+-------------+-------------+-----+-----------+

 

5. Except vs Not In

Except and not in operate almost identically. They’re both used to compare the rows between two queries/tables. That being said, there are subtle nuances between the two that you should know.

First, except filters out duplicates and returns distinct rows, unlike not in.

Further, except expects the same number of columns in both queries/tables, whereas not in compares a single column from each query/table.

 

6. Self Joins

An SQL self-join joins a table with itself. You might think that such an action serves no purpose, but you’d be surprised at how common this is. In many real-life settings, data is stored in one large table rather than many smaller tables. In such cases, self-joins may be required to solve unique problems.

Let’s look at an example.

Example Question: Given the Employee table below, write a SQL query that finds employees who earn more than their managers. 

+----+-------+--------+-----------+
| Id | Name  | Salary | ManagerId |
+----+-------+--------+-----------+
| 1  | Joe   | 70000  | 3         |
| 2  | Henry | 80000  | 4         |
| 3  | Sam   | 60000  | NULL      |
| 4  | Max   | 90000  | NULL      |
+----+-------+--------+-----------+
Answer:
SELECT
    a.Name as Employee
FROM
    Employee as a
    JOIN Employee as b on a.ManagerID = b.Id
WHERE a.Salary > b.Salary

For the above table, Joe is the only employee who earns more than his manager.


 

7. Rank vs Dense Rank vs Row Number

Ranking rows and values is a common application. Here are a few examples in which companies frequently use ranking:

  • Ranking highest valued customers by number of purchases, profits, and so on.
  • Ranking the top products sold by number of units sold.
  • Ranking the top countries with the most sales.
  • Ranking the top videos viewed by number of minutes watched, number of distinct viewers, and so on.

In SQL, there are several ways that you can assign a rank to a row, which we’ll dive into with an example. Consider the following query and results:

SELECT Name
       , GPA
       , ROW_NUMBER() OVER (ORDER BY GPA desc)
       , RANK() OVER (ORDER BY GPA desc)
       , DENSE_RANK() OVER (ORDER BY GPA desc)
FROM student_grades
A chart
Image created by the author.

ROW_NUMBER()returns a unique number for each row starting at one. When there are ties (e.g., Bob versus Carrie), ROW_NUMBER() arbitrarily assigns a number if a second criterion is not defined.

RANK()returns a unique number for each row starting at one, except for when there are ties, then RANK() will assign the same number. A gap will follow a duplicate rank.

DENSE_RANK()is similar to RANK() except that there are no gaps after a duplicate rank. Notice that with DENSE_RANK(), Daniel is ranked third as opposed to fourth with RANK().

 

8. Calculating Delta Values

Another common application is comparing values from different periods. For example, what was the delta between this month and last month’s sales? Or what was the delta between this month and the same one last year?

When comparing values from different periods to calculate deltas, this is when LEAD() and LAG() come into play.

Here are some examples:

# Comparing each month's sales to last month
SELECT month
       , sales
       , sales - LAG(sales, 1) OVER (ORDER BY month)
FROM monthly_sales
# Comparing each month's sales to the same month last year
SELECT month
       , sales
       , sales - LAG(sales, 12) OVER (ORDER BY month)
FROM monthly_sales

 

9. Calculating Running Totals

If you knew about ROW_NUMBER() and LAG()/LEAD(), this probably won’t be much of a surprise to you. But if you didn’t, this is probably one of the most useful window functions, especially when you want to visualize growth.

Using a window function with SUM(), we can calculate a running total. See the example below:

SELECT Month
       , Revenue
       , SUM(Revenue) OVER (ORDER BY Month) AS Cumulative
FROM monthly_revenue
A chart
Image created by the author.

 

10. Date-Time Manipulation

You should definitely expect some sort of SQL questions that involve date-time data. For example, you may be required to group data by months or convert a variable format from DD-MM-YYYY to simply the month.

Some functions you should know are the following:

  • EXTRACT
  • DATEDIFF
  • DATE_ADDDATE_SUB
  • DATE_TRUNC

Example Question: Given a Weather table, write a SQL query to find all date IDs with higher temperatures compared to previous dates.

+---------+------------------+------------------+
| Id(INT) | RecordDate(DATE) | Temperature(INT) |
+---------+------------------+------------------+
|       1 |       2015-01-01 |               10 |
|       2 |       2015-01-02 |               25 |
|       3 |       2015-01-03 |               20 |
|       4 |       2015-01-04 |               30 |
+---------+------------------+------------------+
Answer:
SELECT
    a.Id
FROM
    Weather a,
    Weather b
WHERE
    a.Temperature > b.Temperature
    AND DATEDIFF(a.RecordDate, b.RecordDate) = 1

20 Jun 2024

Deploying Synapse Serverless SQL pool with DevOps

 Case

The external tables and other objects in my Synapse Serverless SQL pool are not stored in the GIT repository of Synapse. How do I deploy those objects through the DTAP street?
Release Synapse Serverless SQL Pools with DevOps











Solution
You can use the good old database project in Visual Studio with the SQL Server Data Tools (SSDT) addon to store the External tables, File formats and Data Stores in a Git repository and then use Azure DevOps with YAML pipelines to release the database objects through the DTAP street. Very similar to deploying a regular Azure SQL Database with some minor differences.


1) Download Visual Studio
First download and install Visual Studio 2022 then make sure to add the SQL Server Data Tools (SSDT) extention which will add the Database project option. If you already have Visual Studio 2022 then make sure to update it because versions before 17.7 don't support SSDT for Serverless SQL pools.
Visual Studio 2022 with SQL Server Data Tools

















2) Create repository
This example is for Azure DevOps, but feel free to use Github instead. Create a new repository. In the root we have two folders:
  • CICD: to store all release-related files such as the YAML files. In fact there is a subfolder called YAML to store the .yml files
  • SQL: to store the database project from Visual Studio.
The repository folder structure








After you have created the initial folder structure you need to clone the repository to Visual Studio by hitting the clone button in the upper right corner.
Clone repository to Visual Studio











3) Create new database project
Once in Visual Studio you now have a cloned repos folder. In the SQL folder we will create a new database project.
Cloned DevOps repository in VS2022














Create a new project via the File menu and search for SQL in the upper search textbox.
Create new project
















Create the new SQL Server Database Project in the SQL folder from your repository. Since there will be only one project in the solution, the solution and project are stored in the same folder. Otherwise you will get an additional subfolder level.
Create new SQL Server Database Project
























4) Create Azure DevOps Service Connection
Wihtin your Azure DevOps project click on the Project settings and under Service connections create a new Service connection of the type Azure Resource Manager. You will need an Azure Entra Service Principal for this. Depending on the organization/projectsize/number of environments create one or more Service Connections. Ideally, one per DTAP environment.
Create Service Connection













5) Give Service Principal access to database
Go to Synapse and open a new SQL Script. Then either create a new database in your Serverless SQL pool to store your external tables or use an existing one. Our example database is called datamart since it will host external tables from our gold layer a.k.a. the datamart with facts and dimensions for Power BI.

Then switch to your datamart database. Either via the use-command or via the selectbox. Once you are in your datamart database create an (external) user for your Service Principal (SP) that you used in the  Azure Devops Service connection from the previous step. After that we will need to give the SP enough rights to deploy all objects to this database: db_owner
Give Service Principal access to Serverless SQL pool DB









1
2
3
4
5
6
7
8
9
10
11
12
13
-- First create a new database (if you don't have one)
CREATE DATABASE datamart;
 
-- Once created switch to your (new) database
USE datamart;
 
-- Create a new database use for the Service Principal
-- used in the DevOps Service Connection
CREATE USER [myserviceprincipal] FROM EXTERNAL PROVIDER;
 
-- Give the Service Principal enough rights to create
-- external resources and a master key
ALTER ROLE [db_owner] ADD MEMBER [myserviceprincipal];
You have to repeat this for all your Serverless SQL pool DB's in your DTAP environment. Note that if you will do the SQL development within Synapse Studio, and not in Visual Studio, then you don't need to deploy to the Development environment and then you also don't need to give a SP access to your development environment. In that case only do this for Test, Acceptance and Production.

Tip: you can also give the SP access within Synapse by either making is Synapse Adminstrator or Synapse SQL administrator. However then it's access for everything wihtin Synapse or all Serverless SQL Pool database within Synapse. If you already are using the same SP to role out Synapse with Infra as Code (with BICEP or Terraform) then it already has the Synapse Admistrator role.

6) Master Key and External Resources
For this example we will use a basic example with the following code. Note the that the DeltaLocation is pointing to the Development environment of our dataplatform. During the deployment we will need to override this since each DTAP enviroment has its own Azure Storage Account.
Create external table on delta table (for Power BI)















1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Create Master Key for using Managed Service Identity access
CREATE MASTER KEY;
 
-- Create Credential to use Managed Service Identity of Synapse
CREATE DATABASE SCOPED CREDENTIAL [SynapseIdentity]
WITH
    IDENTITY = 'Managed Service Identity'
;
 
-- Create a File format for Delta tables
CREATE EXTERNAL FILE FORMAT [DeltaFormat]
    WITH (
    FORMAT_TYPE = DELTA,
    DATA_COMPRESSION = N'org.apache.hadoop.io.compress.SnappyCodec'
    );
 
-- Create a Data source pointing to the container of your Delta tables
CREATE EXTERNAL DATA SOURCE [DeltaLocation]
    WITH (
    CREDENTIAL = [SynapseIdentity]
    );
 
-- Create an external table
CREATE EXTERNAL TABLE [dbo].[dim_abcdef] (
    [integerfield] INT NULL,
    [stringfield] NVARCHAR (400) NULL,
    [datefield] DATETIME2 (7) NULL
)
    WITH (
    LOCATION = N'/delta/dim_abcdef',
    DATA_SOURCE = [DeltaLocation],
    FILE_FORMAT = [DeltaFormat]
    );
You can encrypt the master key with an addional password. The credential with the Managed Service Identity (MSI) is so that this specific Synapse identity will be used to connect to the data source (the Delta table in your Azure Storage Account a.k.a. Datalake a.k.a. Delta Lake). This means that if you have access to the Serverless SQL Pool and its credentials, you also have access to the underlying storage account.

7) Schema Compare in Visual Studio
The datamart database located in the Serverless SQL pool of Synapse in the development is filled with all the above SQL objects like the external table. Now we want to get those objects to our database project. For this we need to do a schema compare in the database project by rightclicking the project in the Solution Explorer. The schema compare is between the development database and the database project.
Schema Compare














In the Schema Compare window click on options to exclude Role Memberships and Users, because this is different for each environment in your DTAP street. This will result in not showing up in the list to compare. Otherwise you have to uncheck those each time you do a schema compare.
Application-scoped
























You also want to add Database Scoped Credentials and Master Key in the Non-Application-scoped section. These are the first two items in the script from the previous step. Without these your external tables won't work in the test/acceptance/production database. Click OK to confirm the settings.
Non-Application-scoped
























Now make sure your dev database is on the left side in the schema compare and the database project is on the right side. You can find the URL of your Serverless SQL pool in the Azure portal on the overview page of your Synapse. It's called Serverless SQL endpoint and looks like [synapsename]-ondemand.sql.azuresynapse.net.
Azure Synapse Workspace overview page










Then hit the Compare button to see all the new objects that are not yet in your project. Tip: save this schema compare (including the changed options) in your database project. Then you can reuse it.
Result of schema compare














Verify the objects and hit the Update button to include them in your database project. 
Result of the update














8) Override location of External Data Source
If you open the external data source, you will notice the hardcoded URL of the gold container in the development environment. Replace that URL by $(DeltaLocation)
Replace hardcoded URL by variable









Go to the properties of your project (not the solution) and then go to the SQLCMD Variables. Enter a new variable called $(DeltaLocation) and enter the URL of your Development environment (the URL you just replaced). You only need to fill in the Default column. Save it.
Add variable to SQLCMD Variables







Now we need to create a Publish Profile file, which we can override during deployment in the YAML pipeline from Azure DevOps. Right click the project (not the solution) and click Publish. In this window fill in the values from your development environment (we will replace them later on): target database connection and the value of the SQLCMD variable. Then hit the Create Profile button which will add the Publish Profile file to your database project. After that Cancel the window because we will not publish via Visual Studio.
Publish Profile
















9) Stage, Commit and Sync repository
The changes in Visual Studio need to go back to the repository in Azure DevOps. Brance strategy and brance policies are ignored in the explanation to keep things short and simpel.
Commit and sync changes to Azure DevOps repos




















Now all changes are stored in the Azure Repository. Next step is setting up the CICD proces.
Azure DevOps repository














10) Pipeline Libarary - Variable Groups
To make the YAML scripts reusable for mulitple Serveless SQL pools of your DTAP street we need to create some Variable Groups in Azure DevOps. You can find them under Pipelines - Library. You need one for each of your DTAP environments and opionally one for general variables that don't change between the environments. 
DevOps Variable Groups
























Is this case we will store the database project name and the database name in the general variable group.  You could just use one variable because they probably often have the same value.
  • SqlDatabaseName - Name of the database within the Serverless SQL Pool
  • SqlProjectName - Name of the database project within Visual Studio
For the environment specific variable groups we have the name of the SQL server name which is equals to the Synapse Workspacename. So if you are also deploying the Synapse workspace then you could reuse that one. The other one is the storage account location.  
  • ADLSLocation - For replacing the storage account URL between environments
  • SqlServerName - For storing the name of the server (equals to Synapse workspacename)
11) YAML pipelines
Last development step is setting up the YAML files. This example has two YAML file located in the CDCD\YAML folder of the repository. Tip: Visual Studio Code has some nice YAML editors, but you can also just use the Azure DevOps website to create and edit the YAML files.

BuildSqlServerless.yml
The YAML starts with some general steps, like reading the variable group from the previous step. Then showing that it triggers when in one of those 4 branches a change happens in the SQL folder.
Step 1, checkout, is to get the repository content to the agent.
Step 2, treeview, is just for debugging and showing you all files on the agent. Useful for step 3
Step 3, build, builds the database project that was just retrieved from the repository
Step 4, copy, copies the files required for deployment to a artifact staging folder
Step 5, publish, publishes the artifact so that it can be used in the next YAML file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
###################################
# General Variables
###################################
variables:
  - group: SQLServerlessParamsGen
   
###################################
# When to create a pipeline run
###################################
trigger:
  branches:
    include:
    - development
    - test
    - acceptance
    - main
  paths:
    include:
    - SQL/*
  
stages:
###################################
# Create DacPac Artifcat
###################################
- stage: CreateSQLArtifact
  displayName: Create SQL Artifact (dacpac)
  
  jobs:
  - job: Job
    displayName: 'Build DacPac'
    workspace:
      clean: all
    pool:
      vmImage: 'windows-latest'
      # name: my-agentpool
    steps:
  
    ###################################
    # 1 Retrieve Repository
    ###################################
    - checkout: self
      displayName: '1 Retrieve Repository'
      clean: true
  
    ###################################
    # 2 Show treeview of agent
    ###################################
    - powershell: |
        Write-Output "This is the folder structure within Pipeline.Workspace"
        tree "$(Pipeline.Workspace)" /F
      displayName: '2 Treeview Workspace'
  
    ###################################
    # 3 Build Visual Studio project
    ###################################
    - task: MSBuild@1
      displayName: '3. Creating Artifact'
      inputs:
        solution: '$(System.DefaultWorkingDirectory)/SQL/$(SqlProjectName)/$(SqlProjectName).sqlproj'
        msbuildArchitecture: x86
        msbuildVersion: latest
  
    ###################################
    # 4 Stage dacpac and publish.xml
    ###################################
    - task: CopyFiles@2
      displayName: '4. Copy Artifact'
      inputs:
        contents: |
          **\*.dacpac
          **\*.publish.xml
        TargetFolder: '$(build.artifactstagingdirectory)'
  
    ###################################
    # 5 Publish dacpac and xml artifact
    ###################################
    - task: PublishPipelineArtifact@1
      displayName: '5. Publish Artifact'
      inputs:
        targetPath: '$(Build.ArtifactStagingDirectory)'
        artifact: 'SQL_Dacpac'
        publishLocation: 'pipeline'
After that is calls the second YAML file that does the actual deployment. Because we want to reuse is for all environments you need to pass some parameter values that are comming from the variable groups from the previous step. The hardcoded parameter is the name of Azure DevOps Service Connection that uses the SP to connect to the database. The env parameter is just for some logging/debugging purposes to show to which environment you are deploying.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
###################################
# Deploy Test environment
###################################
- stage: DeployTst
  displayName: Deploy TST
  variables:
    - group: SQLServerlessParamsTst
  pool:
    vmImage: 'windows-latest'
    # name: my-agentpool
  condition: and(succeeded(), eq(variables['Build.SourceBranchName'], 'test'))
  dependsOn: CreateSQLArtifact
  jobs:
    - template: DeploySqlServerless.yml
      parameters:
        env: TST
        ServiceConnection: SC-Synapse-T
        SqlServerName: $(SqlServerName)
        SqlDatabaseName: $(SqlDatabaseName)
        SqlProjectName: $(SqlProjectName)
        ADLSLocation: $(ADLSLocation)
 
###################################
# Deploy Acceptance environment
###################################
- stage: DeployAcc
  displayName: Deploy ACC
  variables:
    - group: SQLServerlessParamsAcc
  pool:
    vmImage: 'windows-latest'
    # name: my-agentpool
  condition: and(succeeded(), eq(variables['Build.SourceBranchName'], 'acceptance'))
  dependsOn: CreateSQLArtifact
  jobs:
    - template: DeploySqlServerless.yml
      parameters:
        env: ACC
        ServiceConnection: SC-Synapse-A
        SqlServerName: $(SqlServerName)
        SqlDatabaseName: $(SqlDatabaseName)
        SqlProjectName: $(SqlProjectName)
        ADLSLocation: $(ADLSLocation)
 
###################################
# Deploy Production environment
###################################
- stage: DeployPrd
  displayName: Deploy PRD
  variables:
    - group: SQLServerlessParamsTst
  pool:
    vmImage: 'windows-latest'
    # name: my-agentpool
  condition: and(succeeded(), eq(variables['Build.SourceBranchName'], 'main'))
  dependsOn: CreateSQLArtifact
  jobs:
    - template: DeploySqlServerless.yml
      parameters:
        env: PRD
        ServiceConnection: SC-Synapse-P
        SqlServerName: $(SqlServerName)
        SqlDatabaseName: $(SqlDatabaseName)
        SqlProjectName: $(SqlProjectName)
        ADLSLocation: $(ADLSLocation)

Build and publish artifact












DeploySqlServerless.yml
The second YAML file starts with the parameters that are required to call this script. Then the environment name in the job that you can use to set some approvals. The first step is just for debugging. Showing the parameter values and a treeview of the agent. You should be able to see the artifact folder which is useful to set up the deployment task where you need the paths of the dacpac and publish profile. In the AdditionalArguments property you can override the value of the Storage Account location. If you have multiple just repeat the entire value with a space between it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
parameters:
  - name: env
    displayName: Environment
    type: string
    values:
    - DEV
    - TST
    - ACC
    - PRD
  - name: ServiceConnection
    displayName: Service Connection
    type: string
  - name: SqlServerName
    displayName: Sql Server Name
    type: string
  - name: SqlDatabaseName
    displayName: Sql Database Name
    type: string
  - name: SqlProjectName
    displayName: Sql Project Name
    type: string
  - name: ADLSLocation
    displayName: Azure Data Lake Location
    type: string
  
  
jobs:
    - deployment: deploymentjob${{ parameters.Env }}
      displayName: Deployment Job ${{ parameters.Env }}
      environment: deploy-to-${{ parameters.Env }}
      strategy:
        runOnce:
          deploy:
            steps:
            ###################################
            # 1 Show environment and treeview
            ###################################
            - powershell: |
                Write-Output "Deploying ${{ parameters.SqlProjectName }} to DB ${{ parameters.SqlDatabaseName }} and server ${{ parameters.SqlServerName }} in the ${{ parameters.env }} environment"
                Write-Output "Changing SQLCMD variabele DeltaLocation to value ${{ parameters.ADLSLocation }}"
                tree "$(Pipeline.Workspace)" /F
              displayName: '1 Show environment and treeview Pipeline_Workspace'
  
            ###################################
            # 2 Deploy DacPac
            ###################################           
            - task: SqlAzureDacpacDeployment@1
              displayName: '2 Deploy DacPac'
              inputs:
                azureSubscription: '${{ parameters.ServiceConnection }}'
                AuthenticationType: 'servicePrincipal'
                ServerName: '${{ parameters.SqlServerName }}-ondemand.sql.azuresynapse.net'
                DatabaseName: '${{ parameters.SqlDatabaseName }}'
                deployType: 'DacpacTask'
                DeploymentAction: 'Publish'
                DacpacFile: '$(Pipeline.Workspace)/SQL_Dacpac/SQL/${{ parameters.SqlProjectName }}/bin/debug/${{ parameters.SqlProjectName }}.dacpac'
                PublishProfile: '$(Pipeline.Workspace)/SQL_Dacpac/SQL/${{ parameters.SqlProjectName }}/${{ parameters.SqlProjectName }}.publish.xml'
                AdditionalArguments: /Variables:DeltaLocation=${{ parameters.ADLSLocation }}
                IpDetectionMethod: 'AutoDetect'

Deploy Serverless SQL pool database












Conclusion
In this post you learned to apply an 'old fashioned' solution to a relative new service. This solution doesn't only work for the Synapse Serverless SQL Pool, but also for the Synapse Dedicated SQL Pool.

There is still one bug/problem. For the Serverless SQL Pool it seems that updating the External Data Source is not possible. The first deployment is no problem, but altering statements are ignored. Probably because there are already external tables referencing this object. You are probably not updating this external source a lot to make it a big issue, but if you want to do it you can use a pre-deployment script in the database project to first drop those object.

SQL Server Copilot AI

 Here are some of the latest tips and advanced concepts for Microsoft SQL Server (MS SQL) in 2024: 1. SQL Server Copilot AI Microsoft has in...