Quantcast
Channel: Active questions tagged surrogate-key - Database Administrators Stack Exchange
Viewing all articles
Browse latest Browse all 31

Do natural keys provide higher or lower performance in SQL Server than surrogate integer keys?

$
0
0

I'm a fan of surrogate keys. There is a risk my findings are confirmation biased.

Many questions I've seen both here and at http://stackoverflow.com use natural keys instead of surrogate keys based on IDENTITY() values.

My background in computer systems tells me performing any comparative operation on an integer will be faster than comparing strings.

This comment made me question my beliefs, so I thought I would create a system to investigate my thesis that integers are faster than strings for use as keys in SQL Server.

Since there is likely to be very little discernible difference in small datasets, I immediately thought of a two table setup where the primary table has 1,000,000 rows and the secondary table has 10 rows for each row in the primary table for a total of 10,000,000 rows in the secondary table. The premise of my test is to create two sets of tables like this, one using natural keys and one using integer keys, and run timing tests on a simple query like:

SELECT *FROM Table1    INNER JOIN Table2 ON Table1.Key = Table2.Key;

The following is the code I created as a test bed:

USE Master;IF (SELECT COUNT(database_id) FROM sys.databases d WHERE d.name = 'NaturalKeyTest') = 1BEGIN    ALTER DATABASE NaturalKeyTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE;    DROP DATABASE NaturalKeyTest;ENDGOCREATE DATABASE NaturalKeyTest     ON (NAME = 'NaturalKeyTest', FILENAME = 'C:\SQLServer\Data\NaturalKeyTest.mdf', SIZE=8GB, FILEGROWTH=1GB)     LOG ON (NAME='NaturalKeyTestLog', FILENAME = 'C:\SQLServer\Logs\NaturalKeyTest.mdf', SIZE=256MB, FILEGROWTH=128MB);GOALTER DATABASE NaturalKeyTest SET RECOVERY SIMPLE;GOUSE NaturalKeyTest;GOCREATE VIEW GetRandAS     SELECT RAND() AS RandomNumber;GOCREATE FUNCTION RandomString(    @StringLength INT)RETURNS NVARCHAR(max)ASBEGIN    DECLARE @cnt INT = 0    DECLARE @str NVARCHAR(MAX) = '';    DECLARE @RandomNum FLOAT = 0;    WHILE @cnt < @StringLength    BEGIN        SELECT @RandomNum = RandomNumber        FROM GetRand;        SET @str = @str + CAST(CHAR((@RandomNum * 64.) + 32) AS NVARCHAR(MAX));         SET @cnt = @cnt + 1;    END    RETURN @str;END;GOCREATE TABLE NaturalTable1(    NaturalTable1Key NVARCHAR(255) NOT NULL         CONSTRAINT PK_NaturalTable1 PRIMARY KEY CLUSTERED     , Table1TestData NVARCHAR(255) NOT NULL );CREATE TABLE NaturalTable2(    NaturalTable2Key NVARCHAR(255) NOT NULL         CONSTRAINT PK_NaturalTable2 PRIMARY KEY CLUSTERED     , NaturalTable1Key NVARCHAR(255) NOT NULL         CONSTRAINT FK_NaturalTable2_NaturalTable1Key         FOREIGN KEY REFERENCES dbo.NaturalTable1 (NaturalTable1Key)         ON DELETE CASCADE ON UPDATE CASCADE    , Table2TestData NVARCHAR(255) NOT NULL  );GO/* insert 1,000,000 rows into NaturalTable1 */INSERT INTO NaturalTable1 (NaturalTable1Key, Table1TestData)     VALUES (dbo.RandomString(25), dbo.RandomString(100));GO 1000000 /* insert 10,000,000 rows into NaturalTable2 */INSERT INTO NaturalTable2 (NaturalTable2Key, NaturalTable1Key, Table2TestData)SELECT dbo.RandomString(25), T1.NaturalTable1Key, dbo.RandomString(100)FROM NaturalTable1 T1GO 10 CREATE TABLE IDTable1(    IDTable1Key INT NOT NULL CONSTRAINT PK_IDTable1     PRIMARY KEY CLUSTERED IDENTITY(1,1)    , Table1TestData NVARCHAR(255) NOT NULL     CONSTRAINT DF_IDTable1_TestData DEFAULT dbo.RandomString(100));CREATE TABLE IDTable2(    IDTable2Key INT NOT NULL CONSTRAINT PK_IDTable2         PRIMARY KEY CLUSTERED IDENTITY(1,1)    , IDTable1Key INT NOT NULL         CONSTRAINT FK_IDTable2_IDTable1Key FOREIGN KEY         REFERENCES dbo.IDTable1 (IDTable1Key)         ON DELETE CASCADE ON UPDATE CASCADE    , Table2TestData NVARCHAR(255) NOT NULL         CONSTRAINT DF_IDTable2_TestData DEFAULT dbo.RandomString(100));GOINSERT INTO IDTable1 DEFAULT VALUES;GO 1000000INSERT INTO IDTable2 (IDTable1Key)SELECT T1.IDTable1KeyFROM IDTable1 T1GO 10

The code above creates a database and 4 tables, and fills the tables with data, ready to test. The test code I ran is:

USE NaturalKeyTest;GODECLARE @loops INT = 0;DECLARE @MaxLoops INT = 10;DECLARE @Results TABLE (    FinishedAt DATETIME DEFAULT (GETDATE())    , KeyType NVARCHAR(255)    , ElapsedTime FLOAT);WHILE @loops < @MaxLoopsBEGIN    DBCC FREEPROCCACHE;    DBCC FREESESSIONCACHE;    DBCC FREESYSTEMCACHE ('ALL');    DBCC DROPCLEANBUFFERS;    WAITFOR DELAY '00:00:05';    DECLARE @start DATETIME = GETDATE();    DECLARE @end DATETIME;    DECLARE @count INT;    SELECT @count = COUNT(*)     FROM dbo.NaturalTable1 T1        INNER JOIN dbo.NaturalTable2 T2 ON T1.NaturalTable1Key = T2.NaturalTable1Key;    SET @end = GETDATE();    INSERT INTO @Results (KeyType, ElapsedTime)    SELECT 'Natural PK' AS KeyType, CAST((@end - @start) AS FLOAT) AS ElapsedTime;    DBCC FREEPROCCACHE;    DBCC FREESESSIONCACHE;    DBCC FREESYSTEMCACHE ('ALL');    DBCC DROPCLEANBUFFERS;    WAITFOR DELAY '00:00:05';    SET @start = GETDATE();    SELECT @count = COUNT(*)     FROM dbo.IDTable1 T1        INNER JOIN dbo.IDTable2 T2 ON T1.IDTable1Key = T2.IDTable1Key;    SET @end = GETDATE();    INSERT INTO @Results (KeyType, ElapsedTime)    SELECT 'IDENTITY() PK' AS KeyType, CAST((@end - @start) AS FLOAT) AS ElapsedTime;    SET @loops = @loops + 1;ENDSELECT KeyType, FORMAT(CAST(AVG(ElapsedTime) AS DATETIME), 'HH:mm:ss.fff') AS AvgTime FROM @ResultsGROUP BY KeyType;

These are the results:

enter image description here

Am I doing something wrong here, or are INT keys 3 times faster than 25 character natural keys?

Note, I've written a follow-up question here.


Viewing all articles
Browse latest Browse all 31

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>