1 May 2024

Tips to Increase SQL Server Query Performance: Part 2

 Hello! Friends, this is my second article on this topic (Tips To Increase SQL Server Query Performance). Today we learn some new tips to increase the performance of SQL Server queries.

If you did not read the first part of this article series then I suggest you read that first.

  • Tips to Increase SQL Server Query Performance: Part 1

If you already have read it then it is OK.

Let us begin this article.

Avoid Null value in the fixed length field

We should avoid the Null value in fixed-length fields because if we insert the NULL value in a fixed-length field then it will take the same amount of space as the desired input value for that field. So if we require a null value in a field then we should use a variable length field that takes less space for NULL. We should reduce the use of NULLS in the database. The use of NULLs in a database can reduce database performance, especially in WHERE clauses.

We should avoid the following data types

  • CHAR- Fixed-length non-Unicode character data with a maximum length of 8,000 characters. (Use VARCHAR).
  • NCHAR- Fixed-length Unicode data with a maximum length of 4,000 characters. (Use NVARCHAR).
  • TEXT- Variable-length non-Unicode data with a maximum length of 2^31 - 1 (2,147,483,647) characters. (Use NTEXT).
  • BINARY- Fixed-length binary data with a maximum length of 8,000 bytes. (Use VARBINARY).

Keep Clustered Index Small

We know that a clustered index is used for fast retrieval of data from tables. But I suggest that clustered Indexes should be Narrow, Unique, and Static.

First of all, we need to understand what Clustered Indexes are.

A Clustered Index is a special type of index that reorders the way records in the table are physically stored. Therefore the table can have only one Clustered Index and this is usually made on the Primary Key. The leaf nodes of a Clustered Index contain the data pages. It is like a dictionary, where all words are sorted in alphabetical order in the entire book. Since it alters the physical storage of the table, only one Clustered Index can be created per table.

Because a Clustered index stores data physically in memory, if the size of a clustered index is very large then it can reduce the performance. Hence a large clustered index on a table with a large number of rows increases the size significantly. Never use an index for frequently changed data because when any change in the table occurs then the index is also modified and that can degrade performance. Create an index on columns used in a WHERE clause and use it in aggregate operations, such as GROUP BY, DISTINCT, ORDER BY, MIN, MAX, and so on. We can create multiple indexes per table in SQL Server. Small or narrow indexes provide more options than a wide composite index.

Usually, Foreign Keys are used in joins, so an index created on Foreign Keys is always beneficial.

USE Common Table Expressions (CTEs) Instead of Temp table

We should prefer a CTE over the temp table because Temp tables are stored physically in a TempDB and they are permanent tables that are deleted after the session ends but CTEs are created within memory. Execution of a CTE is very fast compared to Temp Tables and it is very lightweight.

Use UNION ALL instead of UNION

I think we should use the UNION ALL instead of UNION because:

  1. UNION ALL doesn't sort the result set for distinguished values.
  2. UNION ALL is faster than UNION.

Use Schema Name

A schema is an organization or structure for a database. We can define a schema as a collection of database objects that are owned by a single principal and form a single namespace. Schema name helps the SQL Server for finding that object in a specific schema. It increases the speed of query execution.

We should use the schema name before the SQL object name followed by "." as in the following:

SELECT e.Emp_IId,e.First_Name,e.Last_Name FROM Employee e /* Bad Practice */
SELECT e.Emp_IId,e.First_Name,e.Last_Name FROM dbo.Employee e /* Good Practice*/
SQL

SET NOCOUNT ON

When an INSERT, UPDATE, DELETE, or SELECT command is executed then SQL Server returns the number affected by the query.

SET NOCOUNT ON

We can stop this by using NOCOUNT ON as in the following:

SET NOCOUNT ON

We can again unset the NONCOUNT using the “SET NONCOUNT OFF” command. 

Remove Unused Index

Remove all unused indexes because indexes are always updated when the table is updated so the index must be maintained even if not used.

Use Relationship (Foreign Key) and Appropriate Action

A foreign key is a column or combination of columns that is the same as the primary key but in a different table. Foreign keys are used to define a relationship and enforce integrity between two tables. In addition to protecting the integrity of our data, FK constraints also help document the relationships between our tables within the database itself.

We should use the appropriate action for Delete and Update Operations.

SQL provides mainly 4 types of Action rules in maintaining relationships between tables for Delete and Update operations.  

  • No Action- If a value is deleted or updated from the parent table then no action (change) will be done on the child table.
  • Set NULL- Associated values in a child table would be set to NULL if a value is deleted or updated from the parent table.
  • Cascade- If a value is updated in the parent table then the associated values in the child table would also be updated and If the value is deleted from the parent table then the associated values in the child table would also be deleted.

Set Default

Associated values in a child table would be set to a default value specified in the column definition. Also, the default value should be present in the primary key column. Otherwise, the basic requirement of an FK relation would fail and the update/delete operation would not be successful. If no default value is provided in the foreign key column then this rule could not be implemented.

Use Index Name in Query

Although in most cases the query optimizer will pick the appropriate index for a specific table based on statistics, sometimes it is better to specify the index name in your SELECT query.

SELECT
e.Emp_IId,
e.First_Name,
e.Last_Name
FROM dbo.EMPLOYEE e
WITH (INDEX (Clus_Index))

WHERE e.Emp_IId > 5
SQL

Select Limited Data

We should retrieve only the required data and ignore the unimportant data. The fewer data retrieved, the faster the query will run. Rather than filtering on the client, push as much filtering as possible on the server end. This will result in fewer data being sent on the wire and you will see results much faster.

Let us see an example in the following

SELECT
e.Emp_IId,
e.First_Name,
e.Last_Name
FROM dbo.EMPLOYEE e
WITH (INDEX (Clus_Index))

WHERE e.Salary=12000
SQL

In the preceding example, we can easily avoid the Emp_Iid from the query because we know that all Employees have a salary of 12000.

Drop Index before Insertion of Data

We should drop the index before the insertion of a large amount of data. This makes the insert statement run faster. Once the inserts are completed, you can recreate the index again.

Use Temp Table for Insertion of Large data

We can enhance our previous concept using a temp table. If you are inserting thousands of rows into a table, use a temporary table to load the data. You should ensure that this temporary table does not have an index. Since moving data from one table to another is much faster than loading from an external source, you can now drop indexes on your primary table, move data from the temporary to the final table and finally recreate the indexes.

Use Unique Constraint and Check Constraint

A Check constraint checks for a specific condition before inserting data into a table. If the data passes all the Check constraints then the data will be inserted into the table otherwise the data for insertion will be discarded. The CHECK constraint ensures that all values in a column satisfy certain conditions.

A Unique Constraint ensures that each row for a column must have a unique value. It is like a Primary key but it can accept only one null value. In a table, one or more columns can contain a Unique Constraint.

So we should use a Check Constraint and Unique Constraint because it maintains the integrity of the database.

Position of Column

If we are creating a Non-Clustered index on more than one column then we should consider the sequence of the columns. The order or position of a column in an index also plays a vital role in improving SQL query performance. An index can help to improve the SQL query performance if the criteria of the query match the columns that are left the most in the index key. So we should place the most selective column on the leftmost side of a non-clustered index.

Reduce the Number of Columns

We should reduce the number of columns in tables. That means that when more rows can fit on a single data page then that helps boost SQL Server read performance.

Recompiled Stored Procedure

We all know that Stored Procedures execute T-SQL statements in less time than a similar set of T-SQL statements executed individually. The reason is that the query execution plan for the Stored Procedures is already stored in the "sys.procedures" system-defined view.

We all know that recompilation of a Stored Procedure reduces SQL performance. But in some cases, it requires recompilation of the Stored Procedure.

These cases might be

  1. Dropping and altering of a column, index, and/or trigger of a table.
  2. Updating the statistics used by the execution plan of the Stored Procedure. Altering the procedure will cause the SQL Server to create a new execution plan.

We can recompile the Stored Procedure using one of the following two ways.

Recompile at the Creation of Stored Procedure

We can use the recompile option during the creation of the Stored Procedure like as.

Create Procedure My_Proc_
WITH RECOMPILE
AS
Begin Transaction My_Trans
   Insert Into Student Values(9,'1',50,11)

   If @@ERROR <>0
   Begin
      Rollback Transaction My_Trans
      Return 4
   End
   Insert Into Student Values(9,'J',15,11)
   If @@ERROR <>0
   Begin
      Rollback Transaction My_Trans
      Return 4
   End
   Insert Into Student Values(11,'K',50,11)
   If @@ERROR <>0
   Begin
      Rollback Transaction My_Trans
      Return 4
End
Commit Transaction My_Trans
Go
SQL

But it has a big disadvantage in that it will always recompile every time we call the Stored Procedure so it can reduce the performance. We should avoid this method. We can refer to another method.

Recompile during Execution

In this method, we can create a Stored Procedure without a recompilation option. But we can use the recompilation option during the execution of the Stored Procedure. We should prefer this method for the recompilation of a Stored Procedure.

// Creation of a Stored Procedure
Create Procedure My_Proc_

AS
Begin Transaction My_Trans
Insert Into Student Values(9,'1',50,11)

If @@ERROR <>0
Begin
Rollback Transaction My_Trans
Return 4
End
Insert Into Student Values(9,'J',15,11)
If @@ERROR <>0
Begin
Rollback Transaction My_Trans
Return 4
End
Insert Into Student Values(11,'K',50,11)
If @@ERROR <>0
Begin
Rollback Transaction My_Trans
Return 4
End
Commit Transaction My_Trans
Go
// Execute Stored Procedure with recompile Option
EXEC My_Proc_ WITH RECOMPILE
Go

Tips to Increase SQL Server Query Performance: Part 1

 

Introduction

We all create a database, table, trigger, and Stored Procedure. Then we implement DML and DDL queries and fetch the desired result. Now the question is, was our task completed here?

The answer is no.

Because a bad logical database design results in a bad physical database design and generally results in poor database performance. So creation and implementation of data is one part of the task. After the creation of a database, improving performance is also important. A good database design provides the best performance during data manipulation which results in the best performance of an application. Users always want a fast response to their data retrieval action and developers put forth their best efforts to provide the data in the shortest time. Performance tuning is not easy and there aren't any Silver bullets, but we can go a surprisingly long way with a few basic guidelines.

Today we learn how to improve SQL Server Performance.

Select the Appropriate Data type

I think the selection of the data type for a column can have the most important role in database performance. If we select an appropriate data type then it will reduce the space and enhance the performance otherwise it generates the worst effect. So select an appropriate data type according to the requirements. SQL contains many data types that can store the same type of data but select an appropriate data type because each data type has some limitations and advantages over another data type.

The following are some guidelines about the selection of data type.

  1. Never use a nvarchar or nchar, instead use a varchar or char because nvarchar and nchar take a double amount of memory compared to varchar and char. Use nchar and nvarchar when we must store Unicode or 16-bit character data such as Hindi characters.
  2. Avoid the use of the text data type instead of varchar because the performance of varchar is much better than text. Use the text data type when we must store text data of more than 8000 characters.

Never use Select * Statement

When we require all the columns of a table then we usually use a “Select *” statement but this is not a good approach because when we use the “select *” statement then SQL Server convert * into all column names before executing the query and this approach takes some extra time and effort. So always provide all the column names in the query instead of “select *”. 

SELECT * FROM Employee  /*Bad Approach */

SELECT  Emp_IId, First_Name,Last_Name,Salary ,City FROM Employee   /*Good Approach  */
SQL

Use Appropriate Naming Convention

The main goal of adopting a naming convention for database objects is so that you and others can easily identify the type and purpose of all objects contained in the database. A good naming convention decreases the time required in searching for an object. A good name clearly indicates the action name of any object that it will perform.

tblEmployees // Name of table
vw_ProductDetails // Name of View
PK_Employees // Name of Primary Key
SQL

Select Appropriate choice between Exist and IN

I find that many programmers often suggest using Exist instead of IN but I am not satisfied with those people because it is not always correct that we use Exist instead of IN query. It mainly depends upon the amount of data. EXISTS provides the answer in the form of TRUE or FALSE & IN return values.

Now the question develops of which is faster, IN or EXISTS?

It totally depends upon the query. For some queries Exist is efficient and for some IN is efficient.

Exits is a correlated sub-query in which the outer query runs first and for each outer query, an inner query is probed.

Whereas in IN the sub-query is evaluated, distinct, indexed and then joined to the original table. So consider one big table (say 1 million rows) and one small table containing relatively fewer rows.

So if the outer table is small then it will be probed a fewer number of times and with the inner query (a big table) Exist will be faster.

If the inner query table (small table) is giving a smaller result set then IN will be faster. EXISTS will find the first row faster in general than the IN will and the IN will get the LAST row (all rows) faster then the where exists.

The recommendation at that time was

  • If the majority of the filtering criteria are in the subquery, use IN.
  • If the majority of the filtering criteria are in the main query, use EXISTS.

In other words

  • IN for a large outer query and a small inner query.
  • EXISTS for small outer query and big inner query.

Never use Count(*) or Count(1)

We should never use Count(*) or Count(1) in SQL. Instead of this, we should use Count (Col_Name).

Exception plan of Count(*), Count(1), Count (Col_Name) are the same. 

SELECT COUNT( *) FROM Employee
SELECT COUNT( 1) FROM Employee
SELECT COUNT(Emp_IId)  FROM Employee
SQL

Then we should prefer Count (Col_name) because it provides a meaningful expression and the time taken may be slightly different in terms of CPU usage.

SELECT COUNT( *) FROM Employee    /* Bad Practice  */
SELECT COUNT( 1) FROM Employee     /* Bad Practice  */
SELECT COUNT(Emp_IId)  FROM Employee   /* Good Practice  */
SQL

 Never Use ” Sp_” for User Define Stored Procedure

Most programmers use “sp_” for user-defined Stored Procedures. I suggest never using “sp_” for user-defined Stored Procedures because, In SQL Server, the master database has a Stored Procedure with the "sp_" prefix, so when we create a Stored Procedure with the "sp_" prefix then SQL Server always looks first in the master database than in the user-defined database so it takes some extra time.

So we use another prefix for the user-defined Stored Procedure like “usp_” as in the following.

sp_Employees_Insert   /* Bad Practice   */
usp_Employees_Insert  /* Good Practice */
SQL

Avoid Cursors

A cursor is a temporary work area created in the system memory when a SQL statement is executed. A cursor is a set of rows together with a pointer that identifies a current row. It is a database object to retrieve data from a result set one row at a time. But the use of a cursor is not good because it takes a long time because it fetches data row by row.

So we can use a replacement of cursors. A Temporary table, For or While loop may be a replacement for a cursor in some cases.

Use Normalization

In creating a database, normalization is the process of organizing it into tables in such a way that the results of using the database are always unambiguous and as intended. It consists of decomposing tables to eliminate data redundancy and undesirable characteristics like insert, update and delete anomalies.

We should use the following 4 normalization rules in the creation of a database:

  1. First Normal Form (1NF)
  2. Second Normal Form (2NF)
  3. Third Normal Form (3NF)
  4. Boyce & Codd Normal Form (BCNF)

Use Try–Catch

In T-SQL a Try-Catch block is very important for exception handling. A best practice and use of a Try-Catch block in SQL can save our data from undesired changes. We can put all T-SQL statements in a TRY BLOCK and the code for exception handling can be put into a CATCH block.

Use Exception Handling for the following

  1. In Transaction Management for Rollback the transaction
  2. When using Cursor in SQL Server
  3. When implementing DML Query (Insert, Update, Delete) for checking errors and handling them.

Example

Begin Transaction Trans
Begin Try
Delete From Employee Where Employee.Emp_IID<3
Update Employee Set Employee.First_Name='Pankaj kumar' Where Employee.Emp_IID='6th' /* Error Will Occur Here */
If @@TranCount>0
begin Commit Transaction Trans
End
End Try
Begin Catch
if @@TranCount>0
Print 'Error Is Occur in Transaction'
begin Rollback Transaction Trans   /* RollBack Occur  */
End
End Catch
SQL

Avoid Multiple Joins

Try to avoid writing a SQL query using multiple joins that include outer joins, cross apply, and outer apply. It reduces the speed of execution and reduces the choices for the Optimizer to decide the join order and join type. We can use a temp table or temp variables instead of Multiple Joins.

Create and Use the Index

An index is a data structure to retrieve fast data. Indexes are special lookup tables that the database search engine can use to speed up data retrieval. Simply put, an index is a pointer to data in a table. Mainly an index increases the speed of data retrieval.

There are the following two types of indexes in SQL.

  • Clustered Index- A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore a table can have only one clustered index and this is usually made on the primary key. The leaf nodes of a clustered index contain the data pages.  
  • Non-Clustered Index- A non-clustered index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table's rows containing the data. A non-clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on the disk.
    Create Nonclustered Index NonClusteredIndex
    on Employee_Detail(Emp_Name,Emp_Age)       /* Non Clustered Index  */
    
    Create Clustered Index My_ClusteredIndex
    on Employee_Detail(Emp_IId)                   /* Clsutered Index   */
    SQL
    We should use the index in the table.

Use Primary Key

We should create a primary key on each table. It has two benefits. The first is that it uniquely identifies each record in the table and second it creates a clustered index that stores the data in the form of a B-Tree. So due to this approach, the retrieval of the data is very fast.

Column Level

Create Table Table_Name
(
Column_Name Datatype Constraint Constraint_Name Primary Key,
)
SQL

Example

Create Table Employee
(
IId int constraint Const_primary_IId primary key,
Name nvarchar(50)
)
SQL

Table Level

Alter Table Table_Name
Add constraint Constraint_Name Primary Key(Column_Name)
SQL

Example

Alter Table Employee
Add constraint Constraint_Name Primary Key(Emp_No,Salary)
SQL

Use Alias Name

Aliasing renames a table or a column temporarily by giving another name. The use of table aliases means renaming a table in a specific SQL statement. Using aliasing, we can provide a small name to a large name which will save us time. 

SELECT Employee.Emp_IId,Employee.First_Name,Employee.Last_Name ,Employee.Salary FROM Employee   /*Bad Practice    */
SELECT e.Emp_IId,e.First_Name,e.Last_Name,e.Salary FROM Employee e   /* Good Practice    */
SQL

Use Stored Procedure

We should use a Stored Procedure for repeated data. Stored Procedures are compiled once and stored in executable form, so procedure calls are quick and efficient. Executable code is automatically cached and shared among users. The code for any repetitious database operation is the perfect candidate for encapsulation in procedures. This eliminates needless rewrites of the same code, decreases code inconsistency and allows the code to be accessed and executed by any user or application possessing the necessary permissions.  

stored procedure in SQL Server

Find more about Stored Procedure here: Stored Procedure

Use Transaction Management

A transaction is a unit of work performed against the database. A transaction is a set of work (T-SQL statements) that execute together like a single unit in a specific logical order as a single unit. If all the statements are executed successfully then the transaction is complete and the transaction is committed and the data will be saved in the database permanently. If any single statement fails then the entire transaction will fail and then the complete transaction is either canceled or rolled back.

A transaction mainly contains 4 properties that are also known as ACID rules.

  • Atomicity- Atomic means that all the work in the transaction is treated as a single unit.
  • Consistency- Transaction ensures that the database properly changes states upon a successfully committed transaction.
  • Isolation- It ensures that transactions operate independently and are transparent to each other.
  • Durability- It ensures that the effect of a committed transaction will be saved in the database permanentlyand should persist no matter what (like due to power failure or something).

    Begin Transaction in SQL Server

    Conclusion

    In the next article, we learned some other concepts to increase the performance of a SQL Server database.

    SQL Server Copilot AI

     Here are some of the latest tips and advanced concepts for Microsoft SQL Server (MS SQL) in 2024: 1. SQL Server Copilot AI Microsoft has in...