Candidate Keys vs Primary Keys: A Database Guide

Imagine you need to store information about customers placing orders in your ecommerce store. You setup neatly organized tables in a database – one table to store Customer details like name, address etc. And another table for their Orders.

This is a common example of a relational database which relies on table relationships and key columns that uniquely identify records (rows of data). Keys are essential to identify, access and connect data across tables.

But when modeling this database, how do you decide what columns to use as the keys? Should you use a single column like CustomerID or multiple columns combined? This is where the crucial concepts of candidate keys and primary keys come in…

What Exactly is a Candidate Key?

To properly understand candidate keys, we need to first cover some formal definitions from the ANSI standard:

"A candidate key is defined as a set of one or more attributes (columns) that can uniquely identify each entity (row) in the table and no proper subset of the key attributes can uniquely identify entities."

That‘s quite a mouthful! Let‘s break it down –

Uniqueness – Candidate keys guarantee uniqueness i.e. no two records can have identical values for that key column (or set of columns).
Irreducibility – You cannot remove columns from the key and still retain uniqueness. This guarantees minimality.
Minimality – Within the key, you have the bare minimum set of columns needed to identify records.

These qualities make candidate keys well-suited for crucial database functions like indexing, constraints, optimizations and relationships.

Usage and Examples

Consider a simple Customer table storing customer details. With columns for CustomerID, Name, DateOfBirth and Email:

CustomerID can uniquely identify each customer record by itself, as can the combination of Name + DateOfBirth.
So both CustomerID and {Name, DateOfBirth} would be candidate keys.
The Email column alone would NOT be a candidate key because it does not satisfy uniqueness – multiple customers can have the same email address.

Candidate keys like CustomerID can also be used to:

Index the table on CustomerID to optimize search/retrieval
Apply a uniqueness constraint to ensure no duplicate IDs
Help establish relationships by associating a foreign key in another table to this candidate key column

Introducing the Primary Key

While candidate keys flexibly identify records in a table, for consistency you need one clear way to uniquely distinguish each row. This critical task is handled by the primary key.

Formally, a primary key has similar uniqueness and irreducibility as candidate keys but is also constrained to have:

"Exactly one selected candidate key which uniquely identifies tuples within a table and serves as a reference in other tables via foreign keys"

Breaking this definition down:

Chosen from existing candidate keys but limited to only ONE per table
Used as the definitive unique ID for rows in this table
Enables referential integrity with child tables since foreign keys can reference the primary key

Setting Column Rules

The crucial task then is to select the optimal primary key column(s) from your candidate keys. Some guidelines for this selection:

Should have mostly static, unchanging values
Short, numeric keys simplify joins and storage
Single-column keys avoid null issues
Optionally include semantic meaning

In our Customer table example, CustomerID is the best primary key choice based on these principles.

While occasionally composite keys (combining columns) are useful for enforcement via natural keys, simpler is better!

Handling Database Evolution

Proper modeling with primary keys is pivotal because schema changes can break dependencies. Altering a primary key column will cascade updates to all associated foreign key relationships!

Ideally the primary key should seldom if ever change at all after initial selection. Values themselves must certainly stay immutable.

When to Use Each Key Type

Candidate keys serve for supplementary unique identifiers and optimizations. Use them liberally with minimal duplicate values across tables.

The primary key establishes the identity for a record, centralizing ownership and acting as a known reference point. Rely on this single immutable identifier both within the table and externally via foreign keys.

Hopefully this clears up when to use candidate vs primary keys for effective database modeling! Proper selection sets the foundation for performance and maintainability long term.