What is the main purpose of using foreign keys in a relational database design?

A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It acts as a cross-reference between tables because it references the primary key of another table, thereby establishing a link between them.

The majority of tables in a relational database system adhere to the foreign key concept. In complex databases and data warehouses, data in a domain must be added across multiple tables, thus maintaining a relationship between them. The concept of referential integrity is derived from foreign key theory.

Foreign keys and their implementation are more complex than primary keys.

Advertisement

Techopedia Explains Foreign Key

While a primary key may exist on its own, a foreign key must always reference to a primary key somewhere. The original table containing the primary key is the parent table (also known as referenced table). This key can be referenced by multiple foreign keys from other tables, known as “child” tables.

For any column acting as a foreign key, a corresponding value should exist in the linked table. Special care must be taken while inserting data and removing data from the foreign key column, as a careless deletion or insertion might destroy the relationship between the two tables.

If the integrity between the two databases is compromised, errors may ensue.

For instance, if there are two tables, customer and order, a relationship can be created between them by introducing a foreign key into the order table that refers to the CUSTOMER_ID in the customer table. The CUSTOMER_ID column exists in both customer and order tables.

The CUSTOMER_ID in the order table becomes the foreign key, referring to the primary key in the customer table. To insert an entry into the order table, the foreign key constraint must be satisfied.

An attempt to enter a CUSTOMER_ID that is not present in the customer table fails, thus maintaining the table's referential integrity.

In real-world databases, integrity between foreign key and primary key tables can be ensured by enforcing referential constraints whenever rows in a referenced table are updated or deleted.

Some referential actions associated with a foreign key action include the following:

Cascade

When rows in the parent table are deleted, the matching foreign key columns in the child table are also deleted, creating a cascading delete.

Set Null

When a referenced row in the parent table is deleted or updated, the foreign key values in the referencing row are set to null to maintain the referential integrity.

Triggers

Referential actions are normally implemented as triggers. In many ways foreign key actions are similar to user-defined triggers. To ensure proper execution, ordered referential actions are sometimes replaced with their equivalent user-defined triggers.

Set Default

This referential action is similar in function to the "restrict" action except that a no-action check is performed only after trying to alter the table.

In a previous column, I talked about primary keys ("How to Choose a Primary Key," April 1999). This month, I cover the foreign key. A foreign key is an integral part of a relational database design. It establishes relationships between tables, and it makes possible the procedures that cross-reference data stored in separate tables in the database. It enforces data integrity rules and prohibits modifications that might compromise data integrity. The foreign key is as important to the database design as the primary key, and together they control updates to the data. So let's look at what a foreign key is, what you use it for, and what value it brings to your database design.

What Is a Foreign Key?

A foreign key is a table column that establishes a link from the table it resides in to the primary key or a candidate key in another, related table. The foreign key is the anchor on the many side of a one-to-many (1:M) relationship, much as the primary or candidate key is the anchor on the one side of this relationship.

A foreign key is the linchpin that makes sure you don't enter invalid data into a table. It also prohibits you from doing a delete or update operation that might leave orphan rows. The database development community refers to this feature as declarative referential integrity, and it can't function without foreign keys.

Figure 1 is an entity relationship diagram (ERD) representing three tables (Project, Employee, and Paycheck) and two 1:M relationships. Project contains attributes describing the kinds of workplace projects an employee might be assigned to, Employee contains attributes describing individual employees, and Paycheck contains attributes describing employee compensation. To demonstrate a point, I've simplified the relationships and restricted the business rules. Each Project can have many Employees assigned to it, but an Employee is associated with only one Project at a time. Each Employee gets many Paychecks, but each Paycheck goes to only one Employee.

In Paycheck, the attribute EmpNo is a foreign key (FK). EmpNo in Paycheck has the same meaning and is the same data type and length as EmpNo in Employee. In fact, the set of real data values for EmpNo must be the same in both Employee and Paycheck to ensure that you never cut a paycheck for a nonexistent employee. This relationship creates a referencing link between Paycheck and Employee that you can enforce and exploit when you write SQL queries, as you'll see later in this article.

Figure 2 is an implementation of the Project-Employee-Paycheck ERD, in which I've converted each entity to a table, and each attribute has become a column in a table. I populated the tables with test data to demonstrate the concept of foreign keys.

Column ProjNo in the Employee table anchors the 1:M relationship between Project and Employee on the many side. In this model, the arrow points from the foreign key to the primary key, implying a foreign key reference from Employee.ProjNo to Project.ProjNo. This arrangement means that the project number in the Employee table references the project number in the Project table. So in Figure 2, both Abby and Don are assigned to work on the Y2K Remediation Project.

Dependent Relationships

Not all 1:M relationships are created equal. Notice that in both Figures 1 and 2, some relationship lines are dotted and some are solid. The solid connector line represents the dependent, or identifying, 1:M relationship. A dependent relationship means each record on the many side of the relationship depends for its existence on a related record on the one side.

The dotted connector line represents the independent, or nonidentifying, relationship, which is the condition between Project and Employee. An independent 1:M relationship means that a record may exist on the many side of the relationship without a related record on the one side. An Employee doesn't need to be working on a Project in order to be an employee. In Figure 2, Ethyl Etheridge is not connected to a project, but she still gets a paycheck.

The terms identifying relationship and non-identifying relationship refer to how the primary key of the table on the many side is supposed to be constructed. The theory is that for identifying relationships, the primary key of the one table cascades down to the many table, and is concatenated to the identifier of the many table to form the primary key. Many CASE tools that you can use to draw ERDs implement this theory as a rule that you can't easily circumvent.

You can enforce the identifying relationship two ways. You can declare the relationship when you initially create the table. Or you can declare this relationship after you create the table, with an ALTER TABLE statement, as in:

ALTER TABLE paycheck
  ADD CONSTRAINT fk_Paycheck2Emp FOREIGN
    KEY (EmpNo) 
    REFERENCES Employee(EmpNo)
GO

The nonidentifying relationship doesn't have a restriction about cascading the primary key. Each table has its own unique identifier; neither table inherits part of its primary key from the other. The Project table primary key, ProjNo, is a foreign key in the Employee table. ProjNo in the Employee table can be null, as in the case of Ethyl Etheridge, who is not assigned to a project.

Using the Foreign Key

You always need a foreign key in the many table when you have a 1:M relationship. The database must include this relationship somewhere. A nonrelational (object-oriented) database might store the relationship as pointers, or addresses in a one table record that point to associated records in the many table.

However, in a relational database management system (RDBMS), pointers aren't necessary and might not be used to define the 1:M relationships. Instead, when you specify two related tables in a JOIN query, the values in the foreign key column of the many table are compared to the values in the primary key column of the one table. If the two values match, the query returns that record in the result set.

Although I've been talking about a foreign key and its corresponding primary key, a foreign key can also be compared to a candidate key. In the April issue, I talked about what criteria you use to promote a candidate to primary key. A candidate key can be a primary key, but for business reasons, which you determine, it isn't always. Such is the case in the Employee table, where both EmpNo and EmpSSN are candidates, but for privacy reasons, I've promoted EmpNo to primary key.

Make sure that a foreign key and its corresponding primary or candidate key have the same data type and length when you create tables. (You don't have to make the attribute names the same.) Then you can use the following query, for example, to compare the values of EmpSSN in the Employee table to the values of EmpSSN in the Paycheck table and generate a report listing the total amount of pay each employee has received. The report appears in Table 1.

SELECT EmpLname, EmpFname, SUM(PayAmt) AS
  "Total Pay Week 15-17"
FROM Employee INNER JOIN PayCheck ON
Employee.EmpSSN = Paycheck.EmpSSN
GROUP BY EmpLname, EmpFname

You can use EmpNo to do the comparison in the previous query and generate the same result set. However, a reasonable assumption is that the payroll department will reference employees by Social Security Number (SSN).

In much the same way, by comparing ProjNo from the Project table to ProjNo from the Employee table, you can use the following query to generate a report that lists employees and their associated projects. The output report appears in Table 2.

SELECT ProjName, EmpFname + " " + EmpLname
  AS "Project Team Member"
FROM Project LEFT OUTER JOIN Employee ON
  Project.ProjNo = Employee.ProjNo
ORDER BY ProjName

The SQL Server query optimizer uses an index on the foreign key column, if one is available, to resolve a JOIN query if the tables are large (multipage). (For more information on the SQL Server query processor, see Petkovic and Unterreitmeier, "New Features for Query Processing," July 1999.)

When you create the foreign key reference, SQL Server doesn't automatically build an index for the foreign key column. Create the foreign key index in a separate step:

CREATE INDEX idx_PaycheckEmpNo ON
paycheck(EmpNo)

After you declare the foreign key relationship to the DBMS, the DBMS always enforces it; you cannot program around the constraint. For example, if you try to insert a record containing an employee ID of 6 into the Paycheck table and no employee number 6 exists, the DBMS will reject your INSERT query. If you try to delete the record for Becky Brown in the Employee table, the DBMS will reject that operation with a warning: DELETE statement conflicted with COLUMN REFERENCE constraint 'fk_Paycheck2Emp.' The conflict occurred in database 'SQLmag,' table 'Employee,' column 'EmpNo.' This message means that if you want to get rid of Becky Brown, you first have to remove from the Paycheck table all records that refer to her. Then you can remove her employee record from the Employee table. DRI, or the foreign key constraint, prevents you from leaving behind orphan records for Becky Brown in the Paycheck table.

Value-Added Design

A foreign key creates a bridge between two tables in a database. Database programmers use the foreign key as one of the critical components to dynamically join data from two or more tables. Without a foreign key, if you want to create a report that uses data from more than one table, you have to resort to exhaustive coding in a non-SQL programming language.

Extracting information from multiple tables is significantly complicated beyond the simple joins shown earlier, and end-user access to the data diminishes as the complexity of retrieval increases. You might be tempted to denormalize the tables to make data access easier for the end user. But table denormalization compromises data integrity because you have to manually synchronize duplicate data (the result of denormalization) on each data update.

The foreign key and the primary key control updates to a database. Used correctly, the foreign key can ensure database table integrity.

What is the most important function of a foreign key?

The primary purpose of the foreign key constraint is to enforce referential integrity and improve performance, but there are additional benefits of including them in your database design.

What is the purpose of the foreign key attribute?

A foreign key is an attribute that completes a relationship by identifying the parent entity. Foreign keys provide a method for maintaining integrity in the data (called referential integrity) and for navigating between different instances of an entity.