Showing posts with label Database. Show all posts
Showing posts with label Database. Show all posts

12 Dec 2025

Best SQL Query to Find Duplicates – Step by Step Guide

 inding duplicate rows is one of the most common tasks in SQL, especially when working with real-world data like customer records, invoices, transactions, or log data.

In this article, you will learn the best SQL queries to identify duplicates, how to remove them, and how to prevent them in the future.

This is a beginner-friendly and practical tutorial you can use in SQL Server, MySQL, Oracle, PostgreSQL, and even BigQuery.


🔍 Why Finding Duplicates Is Important

Duplicate rows can create multiple problems:

  • Wrong totals in reports

  • Incorrect customer counts

  • Issues in billing

  • Slow performance

  • Errors in downstream systems (Power BI, ETL, APIs)

Cleaning duplicates is one of the first tasks in data quality improvement.


1. Find Duplicates Using GROUP BY (Best & Easiest Method)

This is the most common and universal method.

Query:

SELECT column1, column2, COUNT(*) AS duplicate_count FROM table_name GROUP BY column1, column2 HAVING COUNT(*) > 1;

✔ When to use:

  • When you want to know which values are duplicated

  • Works in SQL Server, MySQL, Oracle, PostgreSQL, BigQuery


📌 Example: Find Duplicate Email IDs

Suppose you have a table:

The email raj@gmail.com is repeated.

Query:

SELECT email, COUNT(*) AS duplicate_count FROM users GROUP BY email HAVING COUNT(*) > 1;

Output:

emailduplicate_count
 
        raj@gmail.com

🔥 3. Find Duplicates Using ROW_NUMBER() (Best for Deleting)

This is the most powerful method.
It uses window functions and gives a row number to each duplicate group.

Query:

WITH cte AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn FROM users ) SELECT * FROM cte WHERE rn > 1;

✔ When to use:

  • When you need to delete only duplicates and keep 1 record

  • When you want to mark duplicates


🗑 4. Delete Duplicate Rows (Keep Only One)

Using the same CTE:

WITH cte AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn FROM users ) DELETE FROM cte WHERE rn > 1;

This deletes all duplicate rows but keeps the first row.


🧠 5. Find Duplicates by Date or Conditional Columns

Example: Find duplicates only for rows created today.

SELECT name, email, created_date, COUNT(*) FROM users WHERE created_date = CURRENT_DATE GROUP BY name, email, created_date HAVING COUNT(*) > 1;

This is useful for ETL loads and daily imports.


🛑 6. Prevent Duplicates Using Constraints

Best long-term solution.

In SQL Server / MySQL / PostgreSQL:

ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);

In Oracle:

ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);

After this, the database will automatically block duplicate values.


📸 Screenshots You Should Add (To Increase SEO + AdSense Earnings)

You can add these screenshots:

  • SSMS window showing query

  • Duplicate rows in a sample table

  • Execution result window

(If you want, I can generate clean screenshots for you.)


FAQ – Frequently Asked Questions

1. What is the fastest way to find duplicates in SQL?

Using GROUP BY + HAVING COUNT(*) > 1 is the simplest and fastest method.


2. Does ROW_NUMBER() find duplicates?

Yes. ROW_NUMBER assigns sequence numbers to rows allowing you to identify duplicate rows easily.


3. Will deleting duplicates improve performance?

Yes.
Removing duplicates improves:

  • Query performance

  • Joins

  • Aggregations

  • ETL workflows


4. How to avoid duplicates permanently?

Use a UNIQUE constraint or PRIMARY KEY on important columns like email, mobile, employee code, etc.


5. Does this work in all SQL databases?

Yes — all examples work in:

  • SQL Server

  • MySQL

  • PostgreSQL

  • Oracle

  • BigQuery


⭐ Final Thoughts

Finding and removing duplicates is a critical part of database maintenance.
By using the queries shown above—GROUP BY, ROW_NUMBER, and unique constraints—you can quickly identify and eliminate duplicate data efficiently.

If you want more SQL tutorials, feel free to browse my other posts or ask in the comments!


👉 Want more articles like this for your blog?

I can write complete SEO-optimized posts on:

  • SQL Server

  • MySQL

  • BigQuery

  • Oracle

  • DBeaver

  • Power BI

  • ASP.NET

  • Data analysis topics

Just tell me your next topic!

7 Dec 2025

FULL PYTHON + MS SQL SCRIPT (BEST PRACTICE)

 

FULL PYTHON + MS SQL SCRIPT (BEST PRACTICE)

import pyodbc


# -----------------------------------------

# SQL CONNECTION (EDIT THIS PART)

# -----------------------------------------

def get_connection():

    try:

        conn = pyodbc.connect(

            "DRIVER={ODBC Driver 17 for SQL Server};"

            "SERVER=YOUR_SERVER_NAME;"      # e.g. DESKTOP-123\SQLEXPRESS

            "DATABASE=YOUR_DATABASE_NAME;"

            "UID=YOUR_USERNAME;"

            "PWD=YOUR_PASSWORD;",

            autocommit=False                # Manual commit = safer

        )

        return conn

    except Exception as ex:

        print("Database connection failed:", ex)

        return None



# -----------------------------------------

# INSERT RECORD

# -----------------------------------------

def insert_employee(emp_id, name, salary):

    conn = get_connection()

    if not conn:

        return

    try:

        cursor = conn.cursor()

        query = """

            INSERT INTO Employees (EmpID, Name, Salary)

            VALUES (?, ?, ?)

        """

        cursor.execute(query, (emp_id, name, salary))

        conn.commit()

        print("Record inserted successfully!")

    except Exception as ex:

        conn.rollback()

        print("Insert failed:", ex)

    finally:

        conn.close()



# -----------------------------------------

# UPDATE RECORD

# -----------------------------------------

def update_employee(emp_id, new_salary):

    conn = get_connection()

    if not conn:

        return

    try:

        cursor = conn.cursor()

        query = """

            UPDATE Employees

            SET Salary = ?

            WHERE EmpID = ?

        """

        cursor.execute(query, (new_salary, emp_id))

        conn.commit()

        print("Record updated successfully!")

    except Exception as ex:

        conn.rollback()

        print("Update failed:", ex)

    finally:

        conn.close()



# -----------------------------------------

# DELETE RECORD

# -----------------------------------------

def delete_employee(emp_id):

    conn = get_connection()

    if not conn:

        return

    try:

        cursor = conn.cursor()

        query = "DELETE FROM Employees WHERE EmpID = ?"

        cursor.execute(query, (emp_id,))

        conn.commit()

        print("Record deleted successfully!")

    except Exception as ex:

        conn.rollback()

        print("Delete failed:", ex)

    finally:

        conn.close()



# -----------------------------------------

# SELECT ALL RECORDS

# -----------------------------------------

def get_all_employees():

    conn = get_connection()

    if not conn:

        return

    try:

        cursor = conn.cursor()

        cursor.execute("SELECT EmpID, Name, Salary FROM Employees")


        print("\n--- Employee List ---")

        for row in cursor.fetchall():

            print(row.EmpID, row.Name, row.Salary)


    except Exception as ex:

        print("Select failed:", ex)

    finally:

        conn.close()



# -----------------------------------------

# CALL STORED PROCEDURE

# -----------------------------------------

def call_stored_procedure():

    """

    Example Stored Procedure:

    CREATE PROCEDURE GetEmployees

    AS

    BEGIN

        SELECT * FROM Employees

    END

    """

    conn = get_connection()

    if not conn:

        return

    try:

        cursor = conn.cursor()

        cursor.execute("{CALL GetEmployees}")   # or EXEC GetEmployees

        rows = cursor.fetchall()


        print("\n--- Stored Procedure Result ---")

        for row in rows:

            print(row.EmpID, row.Name, row.Salary)


    except Exception as ex:

        print("Stored procedure failed:", ex)

    finally:

        conn.close()



# -----------------------------------------

# MAIN TESTING

# -----------------------------------------

if __name__ == "__main__":

    insert_employee(1, "Vijay", 50000)

    update_employee(1, 60000)

    get_all_employees()

    delete_employee(1)

    get_all_employees()

    call_stored_procedure()

🚀 How to Use

  1. Update:

SERVER= DATABASE= UID= PWD=
  1. Create sample table in SQL:

CREATE TABLE Employees ( EmpID INT PRIMARY KEY, Name VARCHAR(100), Salary DECIMAL(18,2) );
  1. Run the Python file:

python mssql_python_demo.py

4 Dec 2025

Copilot in SSMS provides AI assistance similar to ChatGPT directly within SQL Server Management Studio (SSMS)

 Requirements

SQL Server Management Studio 21 or later is required, along with an Azure OpenAI endpoint and deployment from Azure AI Foundry. Azure authentication is recommended over API keys for security.​


Installation Steps

Launch the Visual Studio Installer, select your SSMS installation, and choose Modify. Under the AI Assistance workload, ensure Copilot in SSMS is selected, then click Modify to install.​


Configuration Steps

Open SSMS, click the Copilot button on the SQL Editor toolbar (or use View > Copilot or Ctrl+Alt+C). Enter your Azure OpenAI Endpoint, Deployment (e.g., gpt-4o), and API Key if needed, then select Launch Copilot. Authenticate via Microsoft Entra ID if prompted, and adjust settings under Tools > Options > Copilot.​


Usage

Connect to your database in a query editor, open the Copilot chat window, and ask natural language questions about your schema or T-SQL code. It supports SQL Server, Azure SQL, and related platforms but review outputs for accuracy. No direct ChatGPT integration exists without this Azure setup

SQL Server 2025 introduces vector data types and functions

 SQL Server 2025 introduces vector data types and functions, enabling native AI-driven similarity searches and machine learning tasks directly in SQL queries without external tools.​


Key AI Features

Vector data types store embeddings as optimized binary JSON arrays for efficient operations like distance calculations (e.g., VECTOR_DISTANCE, VECTOR_NORM). New vector indexes support approximate nearest neighbor searches, accelerating AI workloads. External AI model management allows invoking REST endpoints for embeddings via sp_invoke_external_rest_endpoint.​


Developer Enhancements

Regular expressions functions like REGEXP_LIKE and REGEXP_REPLACE enable pattern matching and text manipulation in queries. Fuzzy string matching with EDIT_DISTANCE_SIMILARITY computes similarity scores for approximate searches. Change event streaming captures real-time DML changes to Azure Event Hubs in JSON or Avro formats.​


Performance Improvements

Optimized locking reduces blocking and memory use, while tempdb governance prevents space exhaustion from runaway jobs. Cardinality estimation feedback for expressions adapts query plans across executions, and optional parameter plan optimization (OPPO) handles varying parameters dynamically. These align with 2025 trends like AI integration and query tuning for modern data stacks

20 Jun 2024

How to Make Link Server in SQL Server

 Creating a linked server in SQL Server allows you to connect to and query external data sources directly from your SQL Server instance. This can include other SQL Server instances, databases like Oracle, MySQL, or even non-relational data sources. Below is a guide on how to create a linked server in SQL Server.

Step-by-Step Guide

  1. Open SQL Server Management Studio (SSMS):

    • Launch SSMS and connect to your SQL Server instance.
  2. Open the Object Explorer:

    • In the Object Explorer, expand the node for the SQL Server instance where you want to create the linked server.
  3. Navigate to Linked Servers:

    • Expand the Server Objects node.
    • Right-click on Linked Servers and select New Linked Server....
  4. Configure the Linked Server:

    • In the New Linked Server dialog, you need to provide the following details:

    • General Page:

      • Linked server: Provide a name for the linked server.
      • Server type: Choose the appropriate server type.
        • If connecting to another SQL Server, select SQL Server.
        • For other data sources, select Other data source.
      • Provider: Select the appropriate OLE DB provider for the data source.
      • Product name: This can be a descriptive name of the data source (optional for SQL Server).
      • Data source: The name or network address of the server to which you are linking.
      • Provider string: Connection string information (if needed).
      • Location: For some providers, you may need to specify the location.
      • Catalog: The default database/catalog to use.
    • Security Page:

      • Configure the security settings to control how the local server will connect to the linked server.
      • You can choose to:
        • Not be made: Local logins cannot access the linked server.
        • Be made without using a security context: Connect using a self-mapping mechanism.
        • Be made using the login's current security context: Use the current login's security context.
        • Be made using this security context: Specify a remote login and password to use for the connection.
    • Server Options Page:

      • Configure additional server options if needed (e.g., collation compatibility, data access settings, etc.).
  5. Create the Linked Server:

    • Click OK to create the linked server. If everything is configured correctly, the linked server should appear under the Linked Servers node in Object Explorer.

Example Using T-SQL

Alternatively, you can create a linked server using T-SQL. Below is an example for linking to another SQL Server instance:

sql
EXEC sp_addlinkedserver @server=N'MyLinkedServer', @srvproduct=N'SQL Server'; EXEC sp_addlinkedsrvlogin @rmtsrvname = N'MyLinkedServer', @useself = N'False', @locallogin = NULL, @rmtuser = 'RemoteUser', @rmtpassword = 'RemotePassword';
  • Replace 'MyLinkedServer' with the name you want for your linked server.
  • Replace 'RemoteUser' and 'RemotePassword' with the appropriate credentials for the remote server.

For non-SQL Server data sources, the T-SQL might look like this:

sql

EXEC sp_addlinkedserver @server = N'MyOracleServer', @provider = N'OraOLEDB.Oracle', @datasrc = N'OracleServerName'; EXEC sp_addlinkedsrvlogin @rmtsrvname = N'MyOracleServer', @useself = N'False', @locallogin = NULL, @rmtuser = 'oracle_user', @rmtpassword = 'oracle_password';
  • Replace 'MyOracleServer' with the name for your linked server.
  • Replace 'OraOLEDB.Oracle' with the appropriate OLE DB provider for Oracle.
  • Replace 'OracleServerName' with the network name of your Oracle server.
  • Replace 'oracle_user' and 'oracle_password' with the appropriate Oracle credentials.

Testing the Linked Server

To test the linked server, you can run a simple

SELECT * FROM MyLinkedServer.DatabaseName.SchemaName.TableName;

Replace MyLinkedServer, DatabaseName, SchemaName, and TableName with the appropriate names for your setup.

Creating a linked server in SQL Server provides a powerful way to access and query data from diverse sources, enabling greater flexibility and integration in your data management workflows.

Best SQL Query to Find Duplicates – Step by Step Guide

 inding duplicate rows is one of the most common tasks in SQL, especially when working with real-world data like customer records, invoices,...