Why Surrogate Keys are Important

Example of Primary / Foreign Keys

Lets have a sample data

StoreID *StreetCityStateCountry
S100124th BlvdPhoenixAZUSA
S100221 Bell RoadMiamiFLUSA
S1003Main StreetNew PortCAUSA

StoreID is Natural Key (PK) It has a meaning S stands for Store 1001 means its the first store.

What is the issue now ?

When data changes (Slowly Changing Dimension we will cover in next chapter) how to handle the change.

Situation 1: Store 1 moves to new location or store is closed for sometime and opened under new franchise.

StoreID *StreetCityStateCountry
S100124th BlvdPhoenixAZUSA
S100221 Bell RoadMiamiFLUSA
S1003Main StreetNew PortCAUSA
S10011st StreetPhoenixAZUSA

Situation 2: When acquiring competitive business (say Target buys KMart), the Natural Keys dont make sense now.

StoreID *StreetCityStateCountry
S100124th BlvdPhoenixAZUSA
S100221 Bell RoadMiamiFLUSA
S1003Main StreetNew PortCAUSA
233South StreetNew BrunswickNJUSA
1233JFK BlvdCharlotteNCUSA

These business decisions / changes have nothing to do with the Technology.

How to Overcome this issue?

Add Surrogate Keys (running sequence number)

Surr_Store *StoreIDStreetCityStateCountry
1S100124th BlvdPhoenixAZUSA
2S100221 Bell RoadMiamiFLUSA
3S1003Main StreetNew PortCAUSA
4233South StreetNew BrunswickNJUSA
51233JFK BlvdCharlotteNCUSA

Properties of Surrogate Keys

- Numerical
- Sequential
- Meaningless Simple Number

Adv of Surrogate Keys

- Constant Behavior (will not change based on Business need)
- Integration is easier.
- Faster Query Performance. (because of Integer values)
- Future records (every other column can be NULL still ID is available)

Its a good practice to have Surrogate Key in DataWarehouse Dimension & Fact tables.