Understanding SQL Normalization and Joins
Ensuring data integrity and optimizing query performance are crucial when working with relational databases. Two fundamental concepts that help achieve these goals are SQL normalization and joins. In this blog post, we will explore these concepts in detail and how to implement them with practical examples.
SQL Normalization
Normalization is organizing the fields and tables of a relational database to minimize redundancy and dependency. The primary goal is to ensure data integrity and improve query performance. Let’s break down the normalization process into its normal form.
First Normal Form (1NF)
The first step in normalization is to ensure the table adheres to the first normal form (1NF). A table is in 1NF if:
- It contains only atomic (indivisible) values.
- Each column contains values of a single type.
- Each column has a unique name.
- The order in which data is stored does not matter.
Consider the following unnormalized table:
This table contains repeating groups and is not in 1NF. To convert it to 1NF, we eliminate the repeating groups:
Second Normal Form (2NF)
A table is in the second normal form (2NF) if:
- It is in 1NF.
- It has no partial dependencies (i.e., no non-prime attribute depends on any proper subset of any candidate key).
To achieve 2NF, we divide our table into two separate tables: Customers and Orders.
Customers Table:
Orders Table:
Third Normal Form (3NF)
A table is in the third normal form (3NF) if:
- It is in 2NF.
- It has no transitive dependencies (i.e., non-prime attributes should not depend on other non-prime attributes).
In our example, the data is already in 3NF as there are no transitive dependencies.
SQL Joins
Joins in SQL are used to combine rows from two or more tables based on a related column between them. Understanding different types of joins is essential for efficient data retrieval.
Types of Joins
- INNER JOIN: Returns only the rows that have matching values in both tables.
- LEFT (OUTER) JOIN: Returns all rows from the left table and the matched rows from the right table. If no match is found, NULL values are returned for columns from the right table.
- RIGHT (OUTER) JOIN: Returns all rows from the right table and the matched rows from the left table. If no match is found, NULL values are returned for columns from the left table.
- FULL (OUTER) JOIN: Returns all rows when there is a match in either the left or right table. If no match is found, NULL values are returned for columns from the other table.
Example of Joins
Here are the CREATE TABLE and INSERT queries used for setting up our example:
-- Customers Table
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100),
CustomerAddress VARCHAR(255)
);
-- Products Table
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100)
);
-- SalesPersons Table
CREATE TABLE SalesPersons (
SalesPersonID INT PRIMARY KEY,
SalesPersonName VARCHAR(100),
SalesPersonPhone VARCHAR(15)
);
-- Orders Table
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
ProductID INT,
Quantity INT,
Price DECIMAL(10, 2),
SalesPersonID INT,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID),
FOREIGN KEY (ProductID) REFERENCES Products(ProductID),
FOREIGN KEY (SalesPersonID) REFERENCES SalesPersons(SalesPersonID)
);
-- Insert data into Customers Table
INSERT INTO Customers (CustomerID, CustomerName, CustomerAddress) VALUES
(1, 'John Doe', '123 Elm St, City, State'),
(2, 'Jane Smith', '456 Oak St, City, State');
-- Insert data into Products Table
INSERT INTO Products (ProductID, ProductName) VALUES
(1, 'Laptop'),
(2, 'Phone'),
(3, 'Mouse');
-- Insert data into SalesPersons Table
INSERT INTO SalesPersons (SalesPersonID, SalesPersonName, SalesPersonPhone) VALUES
(1, 'Alice Johnson', '555-1234'),
(2, 'Bob Brown', '555-5678');
-- Insert data into Orders Table
INSERT INTO Orders (OrderID, CustomerID, ProductID, Quantity, Price, SalesPersonID) VALUES
(1, 1, 1, 1, 1000.00, 1),
(2, 2, 2, 2, 600.00, 2),
(3, 1, 3, 1, 25.00, 1),
(4, 3, 4, 1, 45.00, 3); -- Order references missing CustomerID, ProductID, SalesPersonID
Query with INNER JOIN:
SELECT
Customers.CustomerName,
Products.ProductName,
Orders.Quantity,
Orders.Price,
SalesPersons.SalesPersonName
FROM
Orders
INNER JOIN
Customers ON Orders.CustomerID = Customers.CustomerID
INNER JOIN
Products ON Orders.ProductID = Products.ProductID
INNER JOIN
SalesPersons ON Orders.SalesPersonID = SalesPersons.SalesPersonID;
Result:
Query with LEFT JOIN:
-- Insert data into Orders Table(Drop Existing data and insert this data)
INSERT INTO Orders (OrderID, CustomerID, ProductID, Quantity, Price, SalesPersonID) VALUES
(1, 1, 1, 1, 1000.00, 1),
(2, 2, 2, 2, 600.00, 2),
(3, 1, 3, 1, 25.00, 1),
(4, 1, 4, 1, 45.00, 3);
SELECT
Customers.CustomerName,
Products.ProductName,
Orders.Quantity,
Orders.Price,
SalesPersons.SalesPersonName
FROM
Customers
LEFT JOIN
Orders ON Customers.CustomerID = Orders.CustomerID
LEFT JOIN
Products ON Orders.ProductID = Products.ProductID
LEFT JOIN
SalesPersons ON Orders.SalesPersonID = SalesPersons.SalesPersonID;
Result:
Query with RIGHT JOIN:
SELECT
Customers.CustomerName,
Products.ProductName,
Orders.Quantity,
Orders.Price,
SalesPersons.SalesPersonName
FROM
Customers
RIGHT JOIN
Orders ON Customers.CustomerID = Orders.CustomerID
RIGHT JOIN
Products ON Orders.ProductID = Products.ProductID
RIGHT JOIN
SalesPersons ON Orders.SalesPersonID = SalesPersons.SalesPersonID;
Result:
Query with FULL OUTER JOIN:
SELECT
Customers.CustomerName,
Products.ProductName,
Orders.Quantity,
Orders.Price,
SalesPersons.SalesPersonName
FROM
Customers
FULL OUTER JOIN
Orders ON Customers.CustomerID = Orders.CustomerID
FULL OUTER JOIN
Products ON Orders.ProductID = Products.ProductID
FULL OUTER JOIN
SalesPersons ON Orders.SalesPersonID = SalesPersons.SalesPersonID;
Result:
Here I have used an online SQL server to run the queries so few results are not giving null values properly. Some online SQL servers may not support RIGHT OUTER JOIN or FULL OUTER JOIN, or they might have specific configurations that affect their behaviour. Here are a few reasons why the expected NULL values might not be returned:
- Database Engine Differences: Not all SQL engines handle joins the same way. For example, MySQL does not natively support FULL OUTER JOIN. If you are using an engine that doesn’t fully support these types of joins, you may not get the expected results.
- Incomplete Data: If your data does not have the conditions that produce NULLs (i.e., unmatched rows), you won’t see NULLs in the result set. Make sure the data setup includes cases where there are no matching records.
- Query Execution Plan: Some SQL execution plans might optimize queries in ways that affect how joins are processed, especially if indexes and optimizations are in play.
- SQL Syntax or Compatibility: The SQL syntax might differ slightly between different SQL servers. Ensure that the syntax you use is compatible with the SQL engine you are working on
Conclusion
Normalization helps in organizing the database to reduce redundancy and improve data integrity, while joins allow combining data from multiple tables based on related columns, providing various ways to retrieve related data efficiently. Understanding these concepts is essential for anyone working with relational databases.