Quantcast
Viewing latest article 25
Browse Latest Browse All 31

Natural Keys vs Surrogate Keys part 2

A while back, I asked if surrogate keys provide better performance than natural keys in SQL Server. @sqlvogel provided an answer to that question yesterday that caused me to revisit it.

This question is an attempt to "upgrade" the prior question, and hopefully provide the opportunity for thoughtful answers that help the community.

Consider a system for storing details about computers. Each computer has an architecture, and an Operating System. In SQL Server, we could create these tables using natural keys like this:

CREATE TABLE dbo.Architecture(    ArchitectureName varchar(10) NOT NULL    , ArchitectureVersion decimal(5,2) NOT NULL    , ReleaseDate date NOT NULL    , CONSTRAINT PK_Architecture        PRIMARY KEY CLUSTERED        (ArchitectureName, ArchitectureVersion));CREATE TABLE dbo.Manufacturer(    ManufacturerName varchar(10) NOT NULL        CONSTRAINT PK_Manufacturer        PRIMARY KEY CLUSTERED);CREATE TABLE dbo.OS(    OSName varchar(30) NOT NULL    , ManufacturerName varchar(10) NOT NULL        CONSTRAINT FK_OS_Manufacturer        FOREIGN KEY        (ManufacturerName)        REFERENCES dbo.Manufacturer(ManufacturerName)    , ArchitectureName varchar(10) NOT NULL    , ArchitectureVersion decimal(5,2) NOT NULL    , CONSTRAINT FK_OS_Architecture        FOREIGN KEY         (ArchitectureName, ArchitectureVersion)        REFERENCES dbo.Architecture(ArchitectureName, ArchitectureVersion)    , CONSTRAINT PK_OS        PRIMARY KEY CLUSTERED        (OSName));CREATE TABLE dbo.Computers(    ComputerID varchar(10) NOT NULL        CONSTRAINT PK_Computers        PRIMARY KEY CLUSTERED    , OSName varchar(30) NOT NULL        CONSTRAINT FK_Computers_OSName        FOREIGN KEY         REFERENCES dbo.OS(OSName)    , ComputerManufacturerName varchar(10) NOT NULL        CONSTRAINT FK_Computers_Manufacturer        FOREIGN KEY         REFERENCES dbo.Manufacturer(ManufacturerName)    , EffectiveDate datetime NOT NULL        CONSTRAINT DF_Computers_EffectiveDate        DEFAULT (GETDATE())    , ExpiryDate datetime NULL);

To query the dbo.Computers table, with 2 rows in dbo.Computers, showing various details, we could do this:

SELECT Computers.ComputerID    , Computers.ComputerManufacturerName    , OSManufacturer = OS.ManufacturerName    , Computers.OSName    , OS.ArchitectureName    , OS.ArchitectureVersionFROM dbo.Computers    INNER JOIN dbo.OS ON Computers.OSName = OS.OSNameWHERE Computers.EffectiveDate <= GETDATE()    AND (Computers.ExpiryDate >= GETDATE() OR Computers.ExpiryDate IS NULL)ORDER BY Computers.ComputerID;

The query output is:

╔════════════╦══════════════════════════╦════════════════╦════════════╦══════════════════╦═════════════════════╗║ ComputerID ║ ComputerManufacturerName ║ OSManufacturer ║ OSName     ║ ArchitectureName ║ ArchitectureVersion ║╠════════════╬══════════════════════════╬════════════════╬════════════╬══════════════════╬═════════════════════╣║ CM700-01   ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║║ CM700-02   ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║╚════════════╩══════════════════════════╩════════════════╩════════════╩══════════════════╩═════════════════════╝

The query plan for this is quite simple:

Image may be NSFW.
Clik here to view.
enter image description here

Or, if we choose to use surrogate keys, like this:

CREATE TABLE dbo.Architecture(    ArchitectureID int NOT NULL IDENTITY(1,1)        CONSTRAINT PK_Architecture        PRIMARY KEY CLUSTERED    , ArchitectureName varchar(10) NOT NULL    , ArchitectureVersion decimal(5,2) NOT NULL    , ReleaseDate date NOT NULL    , CONSTRAINT UQ_Architecture_Name        UNIQUE        (ArchitectureName, ArchitectureVersion));CREATE TABLE dbo.Manufacturer(    ManufacturerID int NOT NULL IDENTITY(1,1)        CONSTRAINT PK_Manufacturer        PRIMARY KEY CLUSTERED    , ManufacturerName varchar(10) NOT NULL);CREATE TABLE dbo.OS(    OS_ID int NOT NULL IDENTITY(1,1)        CONSTRAINT PK_OS        PRIMARY KEY CLUSTERED    , OSName varchar(30) NOT NULL        CONSTRAINT UQ_OS_Name        UNIQUE    , ManufacturerID int NOT NULL        CONSTRAINT FK_OS_Manufacturer        FOREIGN KEY        REFERENCES dbo.Manufacturer(ManufacturerID)    , ArchitectureID int NOT NULL        CONSTRAINT FK_OS_Architecture        FOREIGN KEY         REFERENCES dbo.Architecture(ArchitectureID));CREATE TABLE dbo.Computers(    ComputerID int NOT NULL IDENTITY(1,1)        CONSTRAINT PK_Computers        PRIMARY KEY CLUSTERED    , ComputerName varchar(10) NOT NULL        CONSTRAINT UQ_Computers_Name        UNIQUE    , OS_ID int NOT NULL        CONSTRAINT FK_Computers_OS        FOREIGN KEY         REFERENCES dbo.OS(OS_ID)    , ComputerManufacturerID int NOT NULL        CONSTRAINT FK_Computers_Manufacturer        FOREIGN KEY         REFERENCES dbo.Manufacturer(ManufacturerID)    , EffectiveDate datetime NOT NULL        CONSTRAINT DF_Computers_EffectiveDate        DEFAULT (GETDATE())    , ExpiryDate datetime NULL);

In the design above, you may notice we have to include several new unique constraints to ensure our data model is consistent across both approaches.

Querying this surrogate-key approach with 2 rows in dbo.Computers looks like:

SELECT Computers.ComputerName    , ComputerManufacturerName = cm.ManufacturerName    , OSManufacturer = om.ManufacturerName    , OS.OSName    , Architecture.ArchitectureName    , Architecture.ArchitectureVersionFROM dbo.Computers    INNER JOIN dbo.OS ON Computers.OS_ID = OS.OS_ID    INNER JOIN dbo.Manufacturer cm ON Computers.ComputerManufacturerID = cm.ManufacturerID    INNER JOIN dbo.Architecture ON OS.ArchitectureID = Architecture.ArchitectureID    INNER JOIN dbo.Manufacturer om ON OS.ManufacturerID = om.ManufacturerIDWHERE Computers.EffectiveDate <= GETDATE()    AND (Computers.ExpiryDate >= GETDATE() OR Computers.ExpiryDate IS NULL)ORDER BY Computers.ComputerID;

The results:

╔══════════════╦══════════════════════════╦════════════════╦════════════╦══════════════════╦═════════════════════╗║ ComputerName ║ ComputerManufacturerName ║ OSManufacturer ║ OSName     ║ ArchitectureName ║ ArchitectureVersion ║╠══════════════╬══════════════════════════╬════════════════╬════════════╬══════════════════╬═════════════════════╣║ CM700-01     ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║║ CM700-02     ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║╚══════════════╩══════════════════════════╩════════════════╩════════════╩══════════════════╩═════════════════════╝

Image may be NSFW.
Clik here to view.
enter image description here

The I/O statistics are even more telling. For the natural keys, we have:

Table 'OS'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  Table 'Computers'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  

For the surrogate key setup, we get:

Table 'Manufacturer'. Scan count 0, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Architecture'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'OS'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Computers'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Quite clearly, in the above admittedly very simple setup, the surrogate key is lagging in both ease-of-use, and performance.

Having said that, what happens if we need to change the name of one of the manufacturers? Here's the T-SQL for the natural key version:

UPDATE dbo.Manufacturer SET ManufacturerName = 'Microsoft­™'WHERE ManufacturerName = 'Microsoft';

And the plan:

Image may be NSFW.
Clik here to view.
enter image description here

The T-SQL for the surrogate key version:

UPDATE dbo.Manufacturer SET ManufacturerName = 'Microsoft­™'WHERE ManufacturerID = 1;

And that plan:

Image may be NSFW.
Clik here to view.
enter image description here

The natural key version has an estimated subtree cost that is nearly three times greater than the surrogate key version.

Am I correct in saying that both natural keys and surrogate keys offer benefits; deciding which methodology to use should be carefully considered?

Are there common situations where the comparisons I made above don't work? What other considerations should be made when choosing natural or surrogate keys?


Viewing latest article 25
Browse Latest Browse All 31

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>