I'll start with an example. If I have a person
table with intended surrogate primary key Id
:
+----+------+------------+-------------+| Id | name | DoB | SSN |+----+------+------------+-------------+| 1 | John | 1901-01-01 | 111-11-1111 || 2 | Jane | 1902-02-02 | 222-22-2222 || 3 | John | 1901-01-01 | 111-11-1111 |+----+------+------------+-------------+
Note Id
s 1 & 3 have the same attributes; They both represent the same person.
Now from what we know about the theory behind what constitutes a primary key, which I think is well summarized here:
- The primary key must uniquely identify each record.
- A record’s primary-key value can’t be null.
- The primary key-value must exist when the record is created.
- The primary key must remain stable—you can’t change the primary-key field(s).
- The primary key must be compact and contain the fewest possible attributes.
Consider the first bullet, "The primary key must uniquely identify each record." In my example, I suppose whether or not each Id
does represent uniqueness depends on what's really supposed to be considered unique. A different database record? Yes. A different person (what the records are supposed to represent)? No.
So multiple Id
s represent what is functionally the same subject generating the data, present in 2 records. A "two to one Id" of sorts. I've not read anything that directly address the scenario my example illustrates, as is relates to what is or is not a PK.
- Does this example violate the theory behind what constitutes a primary key?
- If not, does this example illustrate a violation of any larger principles of database architecture, or can this concept be reduced to something as simple as "duplication of data - clean it up"?
Many thanks.