Quantcast
Viewing all articles
Browse latest Browse all 31

Surrogate vs composite key in hierarchical data structure

I'm currently evaluating a schema for a hierarchical data structure. My main problem is how I should design the schema to prevent inconsistent data (reference of foreign key in another hierarchy). The two variants I discovered are either to use a composite or a surrogate key.

Requirements

  • Strict consistency checking per hierarchy, e.g. a Part in Hierarchy A cannot have a foreign key on a Type in Hierarchy B (see example schemas below).
  • The constraints have to be checked in the database and not with application code, as the database is used from different applications.
  • Scalable for performance and extendable with more hierarchy levels.

Variant 1: Composite Key

In this example, the foreign keys are part of the composite primary key. With this, it is automatically checked that the Type of the Part are assigned with the same Item (same hierarchy), because they are part of the composite primary key.

This approach seems troublesome to me, as it is not very easy to extend the model. For example, if I decide to create a PartMetadata table, which holds metadata for a single Part, I have to include the whole composite key, even though the metadata has no connection to the hierarchy.

Item  ItemId (PK)  NameType  TypeId (PK)  ItemId (PK, FK)  NameSubItem  SubItemId (PK)  ItemId (PK, FK)  NamePart  PartId (PK)  ItemId (PK, FK)  SubItemId (PK, FK)  TypeId (PK, FK)  Name

Variant 2: Surrogate Key

In this case I would have to define additional constraints or triggers to ensure that it is not possible to insert a Part with a Type that belongs to a different Item.

Item  ItemId (PK)  NameType  TypeId (PK)  ItemId (FK)  NameSubItem  SubItemId (PK)  ItemId (FK)  NamePart  PartId (PK)  SubItemId (FK)  TypeId (FK)  Name

Questions

  • Should the surrogate solution always be favored? In what cases is it useful to use the composite key solution?
  • Which solution is better maintainable in the long term, especially if the hierarchy gets bigger?
  • Which solution offers better performance (reporting, CRUD)? Does the surrogate solution have a huge impact because of check constraints or triggers?
  • Are there other options (in my opinion, both my variants seem like non-ideal solutions)?
  • Has someone experience with both variants and can share his wisdom?

Viewing all articles
Browse latest Browse all 31

Trending Articles