What is one requirement for a relational table?

When assigned to a column of a table, a domain defines the set of the only values, and only the values, which are valid in row-level instances of that column. For example, the domain of a customer status column is the set of all valid customer status codes.

Sometimes domains include all the values permitted by the datatype of the column they are assigned to. At other times, domains contain only some of those values.

Figure 3.2 shows three domains, D1, D2 and D3. The first two domains are not ordered, shown by using braces; the third one is ordered, shown by using brackets. These domains are the sets S1, S2, and S3, but understood as functioning as domains for the columns of database tables. Later on, we will assign them to TX.C1, TX.C2, and TX.C3, the columns of table TX.

What is one requirement for a relational table?

Figure 3.2. Three Domains.

Domain D1 appears to be ordered, since it would seem that we can sort the members of that set. For similar reasons, domain D2 appears to be ordered. But in fact, these domains are not ordered. The character strings “C1”, “C2” and “C3”, and “A”, “B” and “C” each have an obvious ordering. But these character strings aren’t the members of these two sets. They are the symbols used to represent those members. The ordering of a set is an ordering of its members, not of the symbols which represent those members.

D3 isn’t as obviously ordered, and it might not have been ordered at all. To say that it is ordered is to say that the position of any of its members in the set is known, which is to say that each one can be paired with one of the first three positive integers. Given that there is an order to the members of D3, and that the order is Platinum first, Gold second, and Silver third, we recognize that order as intuitively right because we understand that platinum is more valuable than gold and gold is more valuable than silver. But we could have called the three customer statuses Mike, Susan, and Frank, and we would still have had the same set and the same ordering in terms of relative rank or value. What the set consists of is three customer statuses, one of which is most highly ranked, another of which is least highly ranked, and the third of which is ranked between those two.

If we combine Figures 3.1 and 3.2, we have a description of SX which shows the members of the sets on which it is defined, and also shows that the third of those sets is itself an ordered set. This is shown as Figure 3.3.

What is one requirement for a relational table?

Figure 3.3. An Ordered Set of Three Sets.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124080676000036

Data Persistence

Raul Sidnei Wazlawick, in Object-Oriented Analysis and Design for Information Systems, 2014

13.2.1.2 Index selection

By default, a relational table is just a set of records. Finding a given object at a table would require iterating over all elements until the desired element is found. For example, looking for a book given its ISBN would require an exhaustive search.

Databases usually provide, however, the possibility of indexing columns. An indexed column has an auxiliary table that allows specific records to be found in almost constant time. For example, if the isbn column of the table is indexed, then when a book is searched based on its ISBN, no iteration is performed over the set of all records: the system simply would translate the value of the ISBN into a position in memory by using a hash function and retrieve the element from that position. If that is not the desired element, then it looks for the next, and so on until finding it. If the hash function is well implemented and the hash table has enough space to avoid collisions (two values being translated to the same hash value), then usually the desired element is really in the first place searched, or very close to it at least.

The use of indices improves query speed. However, it slows database updating because every time a record is updated, inserted, or deleted, the auxiliary table must be updated as well (Choenni, Blanken, & Chang, 1993). Furthermore, indices also require more storage space for accommodating the auxiliary tables.

A primary key is indexed by default. Other columns may be indexed if the designer chooses to do that. Given the restrictions mentioned before, creating other indices may be an advantage in the case of an attribute that is used as internal qualifier. In that case, finding the objects quickly may be crucial for the application’s performance. Otherwise, indices should be avoided. For example, columns that are rarely used for searching purposes (for example, a book’s page count) should not be indexed.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124186736000132

Structured Query Language

Catherine M. Ricardo, in Encyclopedia of Information Systems, 2003

IV.A. CREATE SCHEMA, DROP SCHEMA Commands

A relational schema is a set of relational tables and associated items that are related to one another. All of the base tables, views, indexes, domains, user roles, stored modules, and other items that a user creates to fulfill the data needs of a particular enterprise or set of applications belong to one schema. SQL provides a statement to define a schema. Its form is

What is one requirement for a relational table?

For our sample database, we could write

What is one requirement for a relational table?

This statement permits User111 to create the tables and other structures in OrderSystem, and to write authorization statements allowing others to have access to them.

There is also a statement to destroy a schema, with the form

What is one requirement for a relational table?

If the user chooses RESTRICT, which is the default, all of the tables, views, and other items in the schema must be empty before the schema itself can be dropped, or the operation fails. For CASCADE, the system drops the associated schema items as well as the data.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0122272404001738

Temporal Transactions on Multiple Tables

Tom Johnston, Randall Weis, in Managing Time in Relational Databases, 2010

Foreign Keys and Temporal Foreign Keys

Before proceeding, let's remind ourselves of the difference between (i) foreign keys (FKs), the relationships they implement and the constraints they impose, and (ii) temporal foreign keys (TFKs), the relationships they implement and the constraints they impose.

A foreign key is a column in a relational table whose job is to relate rows to other rows.5 If the foreign key column is declared to the DBMS to be nullable, then any row in that table may or may not contain a value in its instance of that column. But if it does contain a value, that value must match the value of the primary key of a row in the table declared as the target table for that foreign key. For non-nullable foreign keys, of course, every row in the source table must contain a valid value in its foreign key column.

In addition, once the FK relationship is declared to the DBMS, the DBMS is able to guarantee that the two managed objects—the child row and the parent row—accurately reflect the existence dependency between the objects they represent. It does so by enforcing the constraint expressed in the declaration, the constraint that if the child row's FK points to a parent row, that parent row must have existed in its table at the time the child row was added to its table, and must continue to exist in the parent table for as long as the child row exists in its table and continues to point to that same parent.

This is a somewhat elaborate way of describing something that most of us already understand quite well, and that few of us may think is worth describing quite so carefully—that foreign keys relate child rows to parent rows and that, in doing so, they reflect a relationship that exists in the real world. We have gone to this length in order to be very clear about both the semantics and the mechanics of foreign keys—semantics described in our talk about objects, and mechanics in our talk about managed objects—and to place the descriptions at a level of generality where the semantics and mechanics of TFKs can be seen as analogous to those of the more familiar FKs. So if we use an “X/Y” notation in which the “X” term is part of the referential integrity description and the “Y” term is part of the temporal referential integrity description, we have a description which makes it clear that temporal referential integrity really is temporalized referential integrity, that TRI is RI as it applies to temporal data. That description is given in the following paragraph.

Once the FK/TFK relationship is declared to the DBMS/AVF, the DBMS/AVF is able to guarantee that the two managed objects—the child row/version and the parent row/episode—accurately reflect the existence dependency between the objects they represent. Each does so by enforcing the constraint expressed in the declaration, the constraint that if the FK/TFK in the child row/version points to a parent row/episode, that parent row/episode must have existed in its table/be currently asserted and currently effective at the time the child row/version was added to its table, and must continue to exist/be currently asserted and currently effective in the parent table for as long as the child row/version exists/is currently asserted and currently effective in its table and continues to point to that same parent.

TFKs: A Data Part and a Function Part

As a data element, a TFK is a column in an asserted version table whose job is to relate child managed objects to parent managed objects. Of course, the same may be said of FKs. The difference is that the parent managed object of a FK is a non-temporal row, while the parent managed object of a TFK is a group of possibly many rows. A TRI child table is an asserted version table that contains a TFK. A TRI parent table is an asserted version table referenced by a TFK. The FK reference is a data value, and is unambiguous; but the TFK reference, as a data value, is not unambiguous.

So as a data element, all a TFK can do is designate the object on which the object represented by its own row is existence dependent. There may be any number of versions representing that object in the parent table, and those versions may be grouped into any number of episodes scattered along the assertion and effective time timelines. So as a data value, a TFK reference is incomplete.

For example, a TFK data value in a Policy table references all the episodes in a Client table which represent the client on which that policy is existence dependent, that being the client whose oid matches the data value in the TFK. To complete the reference, we need to identify, from among those episodes, the one episode which was in effect when the policy version went into effect, and will remain in effect as long as that policy version remains in effect.

What is needed to complete the reference is a function. We will name this function fTRI. It has the following syntax:

fTRI(PTN, TFK, [eff-beg-dt – eff-end-dt])

PTN is the name of the parent table which this TFK points to. Given the TFK and effective time period of a version in a TRI child table, the AVF searches the parent table for an episode whose versions have that oid as part of their primary key, and whose effective time period fully includes the effective time period designated by the function. If there is such an episode, it is the TRI parent episode of that version, and the fTRI function evaluates to True. If there is no such episode, then the function evaluates to False, and that version will never be added to the database because if it were, it would violate TRI.

If the AVF finds such an episode, in carrying out this function, it does not have to check further to insure that there is only one such episode. If there were more than one, then those episodes would be in TEI conflict across all their clock ticks which [intersect]. The AVF does not allow TEI violations to occur, so if there is a TRI parent episode for the TFK reference, there is only one of them.

For example, the oid value in the TFK of P861-A(2) picks out client C903. Before the AVF added that version to the database, it used the fTRI function to determine whether or not it was referentially valid.6 That TRI validation check would look something like this:

IF ISTRUE(fTRI(Client, C903, [Jul10 – 9999])) THEN

{add the version}

ELSE

{notify the calling program of a TRI error}

ENDIF

Together, the explicit and implicit parts of the TFK, its data element part and its function part, complete an unambiguous reference from a TFK to the one episode which satisfies the TRI constraint on the relationship from that version to that episode.

Note that this description of a TFK is a semantic description, not an implementation-level description. The fTRI function is one component of a TFK. Its representation here is obviously not source code that could be compiled or interpreted. But however it is expressed, whether in the AVF or in some other framework based on these concepts, it is a function; and without it, the columns of data we call TFKs are not TFKs. Those columns of data are simply those components of TFKs which can be expressed as data.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012375041900011X

Data Matching

AnHai Doan, ... Zachary Ives, in Principles of Data Integration, 2012

7.1 Problem Definition

We consider the following problem. Suppose we are given two relational tables X and Y. In some cases we will assume that X and Y have identical schemas, but in the general case they will not. We assume that each of the rows in X and Y describes some properties of an entity (e.g., person, book, movie). We say a tuple x ∈ X matches a tuple y ∈ Y if they refer to the same real-world entity. We call such a pair (x,y) a match. Our goal is to find all matches between X and Y. For example, given the two tables X and Y in Figures 7.1(a-b), whose tuples describe properties of people, specifically their names, phone numbers, cities, and states, we want to find the matches shown in Figure 7.1(c). The first match (x1,y1) states that (Dave Smith, (608) 395 9462, Madison, WI) and (David D. Smith, 395 9426, Madison, WI) refer to the same real-world person. Of course, while we consider the data matching problem in the context of relational data, it also arises in other data models.

What is one requirement for a relational table?

Figure 7.1. An example of matching relational tuples that describe persons.

The challenges of data matching are similar in spirit to those of string matching: how to match accurately and to scale the matching algorithms to large data sets. Matching tuples accurately is difficult due to variations in formatting conventions, use of abbreviations, shortening, different naming conventions, omission, nicknames, and errors in the data. We could, in principle, treat each tuple as a string by concatenating the fields, and then apply string matching techniques described in Chapter 4. While effective in certain cases, in general it is better to keep the fields apart, since more sophisticated techniques and domain-specific knowledge can then be applied to the problem. For example, when the entities are represented as tuples we can write a rule that states that two tuples match if the names and phones match exactly.

We cover several classes of solutions to the data matching problem. The first kind employs handcrafted rules to match tuples. These techniques typically make heavy use of domain-specific knowledge in domains where the complexity of the rules is manageable. The next kind of solution learns the appropriate rules from labeled examples, using supervised learning. The third kind, clustering does not use training data. Instead, it iteratively assigns tuples to clusters, such that all tuples within a single cluster match and those across clusters do not.

The fourth kind of solution, probabilistic approaches, models the matching domain using a probability distribution, and then reasons with the distribution to make matching decisions. As such, these approaches can naturally incorporate a wide variety of domain knowledge, leverage the wealth of probabilistic representation and reasoning techniques that have been developed in the past two decades, and provide a frame of reference for understanding other matching approaches.

The above approaches match tuple pairs independently. The last kind of approach we consider, known as collective matching, considers correlations among tuples to improve the accuracy of its matching decisions. For example, suppose that “David Smith” is a coauthor of “Mary Jones” and that “D. M. Smith” is a coauthor of “M. Jones.” Then if we have successfully matched “David Smith” with “D. M. Smith,” we should have increased confidence that “Mary Jones” matches “M. Jones.” Collective matching will propagate the result of one matching decision into others in an iterative fashion.

We cover the above matching techniques in the following sections and discuss scaling up in Section 7.7.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124160446000077

What's Missing?

Jim Melton, Stephen Buxton, in Querying XML, 2006

IBM

At the time of writing, IBM had not yet released an XQuery implementation. Naturally, they have not released an XQuery update capability either.

IBM has, however, offered a tool that “provides an XML view of relational tables and a query of those views as if they were XML documents.” This tool, named XML for Tables, is available34 on IBM's alpha Works site. The tool, which implements a subset of XQuery, translates XQuery expressions into SQL expressions that are executed by DB2. The tool does not, however, appear to have any ability to update the data behind those XML views.

IBM has frequently indicated that a future version of DB2 will support XQuery directly. Presumably that support will include XQuery updating capabilities at the same time. We presume that IBM will add support for the XQuery Update Facility when that spec has been finalized.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781558607118500149

Time, Types and the Future of Relational Databases

Tom Johnston, in Bitemporal Data, 2014

In today’s relational databases:

1.

Each row of data in a relational table is an instantiation of the statement schema for that table, and each row is identified by a primary key.

2.

The thing referred to by each row is an instance of the type represented by the table, and each thing referred to is represented by a referent identifier.

3.

The attributes of those things are instances of the types represented by the non-surrogate-valued columns in those statement schemas.

4.

The referent of each row is the thing one of whose states is the subject of the statement made by the row.

5.

Each attribute is a predicate in that statement.

6.

Each predicate is an instance of the type represented by the column of its attribute.

7.

The combination of a row’s referent identifier and state-time period identifies each temporal state of each thing. In conventional tables, the state-time period is implicit. Each time a row in a conventional table is accessed, its state time takes on the value of Henceforth determined by the then-current moment in time.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012408067600019X

The Relational Paradigm

Tom Johnston, in Bitemporal Data, 2014

Statements and Assertions

Statements are the units of language that can be true or false. Rows in relational tables express statements. When a written statement is inscribed by an author who intends, by inscribing it, to state something he believes is true, and when he is or can reasonably be expected by those who read the statement to have been doing that, then the statement is asserted to be true, and thus is an assertion. Asserting that what a statement says is true (to the best of the author’s knowledge), is a speech act. A complementary speech act is the speech act of withdrawing an assertion. As withdrawn, that statement is an inscription which once was an assertion, but which no longer is.

Aside

Neither the standard theory of bitemporal data, nor my own Asserted Versioning theory, manage the distinction between a statement and an inscription. Neither, that is, can keep track of multiple inscriptions of the same statement.

However it is clear that the same statement may be expressed by more than one inscription. And, as we shall see, an assertion is withdrawn by setting the end point of its assertion-time period to a non-9999 value, i.e. to a real point in time. But since this physical activity is applied to a row as a physical object, it means that it is possible to leave a set of databases in an inconsistent state. If the same statement is expressed by one row in an enterprise data warehouse, and by another row in an operational database, for example, then when it is the user’s intention to withdraw a statement, both inscriptions – the one in the enterprise warehouse and the one in the department-level warehouse – should be withdrawn as part of the same atomic unit of work.12

This might suggest that we re-interpret assertion time (which everyone else calls “transaction time”) as being the assertion of one inscription of a statement, of which there may be other inscriptions. Indeed, since database transactions manage individual inscriptions, the use of transaction time (assertion time) in fact amounts to this. Bitemporal DBMSs, like all DBMSs, create and manage physical rows of data. They manage inscriptions, and don’t attempt to help us keep track of which inscriptions are inscriptions of which statements.

But from the semantic point of view, what a user is doing when he creates a row of data is not asserting a physical inscription. Physical objects aren’t the kind of thing that can be asserted. Rather, the user is asserting the statement expressed by that inscription.

This issue could actually be raised to yet another level, because it obviously applies to sets of statements which express the same proposition. When a statement is asserted, what is being asserted is that what the statement says is what is indeed the case. What a statement says is the proposition it expresses. So that statement and all other statements that say the same thing should, to maintain semantic consistency in the database, be asserted during exactly the same temporal intervals, which means that if they are withdrawn, they should be withdrawn at the same point in time.

But until the next-to-last chapter, this book will continue to treat a statement and its inscription as one managed object, just as the standard theory, the standards committees, and DBMS vendors do. In the meantime, let us turn to the question of under what conditions a statement can be asserted. This is the question of what the felicity conditions for asserting a statement are.

Consider, again, the statement “John loves Mary”. The statement “John loves Mary” contains no variables (pronouns), and the present tense makes its temporal indexical determinate. The statement is true if it is a present fact that John loves Mary, and is otherwise false.

But suppose some kind of party game is being played, in which the prize goes to the person who can create the most made-up statements. Each made-up statement is created by choosing two entries from one list, and one entry from a second list. “John” and “Mary” are two entries in the first list; “loves” is one entry in the second list. The only constraint is that the first letters of each of the three words must be in alphabetical sequence.

That would be a pretty boring game, of course; but that’s not the point. The point is that if the statement “John loves Mary” is written down as part of that game, it doesn’t make sense to ask whether or not the person who created it believes that it is true. Written down as part of that game, that statement is not and cannot be asserted. By writing down that statement, as part of that game, we are not writing down anything that we are asserting is true, or false. Truth and falsity don’t enter into it.

Rows in conventional tables and in state tables are implicitly asserted. By putting them in these tables, it is understood that we are not playing a party game. It is understood that we believe that what those rows say is true, and that we assert that what those rows say is true.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124080676000061

Introduction

Tom Johnston, Randall Weis, in Managing Time in Relational Databases, 2010

Non-Temporal, Uni-Temporal and Bi-Temporal Data

Figure Part 1.1 is an illustration of a row of data in three different kinds of relational table.1 id is our abbreviation for “unique identifier”, PK for “primary key”, bd1 and ed1 for one pair of columns, one containing the begin date of a time period and the other containing the end date of that time period, and bd2 and ed2 for columns defining a second time period.2 For the sake of simplicity, we will use tables that have single-column unique identifiers.

What is one requirement for a relational table?

Figure Part 1.1. Non-Temporal, Uni-Temporal and Bi-Temporal Data.

The first illustration in Figure Part 1.1 is of a non-temporal table. This is the common, garden-variety kind of table that we usually deal with. We will also call it a conventional table. In this non-temporal table, id is the primary key. For our illustrative purposes, all the other data in the table, no matter how many columns it consists of, is represented by the single block labeled “data”.

In a non-temporal table, each row stands for a particular instance of what the table is about. So in a Customer table, for example, each row stands for a particular customer and each customer has a unique value for the customer identifier. As long as the business has the discipline to use a unique identifier value for each customer, the DBMS will faithfully guarantee that the Customer table will never concurrently contain two or more rows for the same customer.

The second illustration in Figure Part 1.1 is of a uni-temporal Customer table. In this kind of table, we may have multiple rows for the same customer. Each such row contains data describing that customer during a specified period of time, the period of time delimited by bd1 and ed1.

In order to keep this example as straightforward as possible, let's agree to refrain from a discussion of whether we should or could add just the period begin date, or just the period end date, to the primary key, or whether we should add both dates. So in the second illustration in Figure Part 1.1, we show both bd1 and ed1 added to the primary key, and in Figure Part 1.2 we show a sample uni-temporal table.

What is one requirement for a relational table?

Figure Part 1.2. A Uni-Temporal Table.

Following a standard convention we used in the articles leading up to this book, primary key column headings are underlined. For convenience, dates are represented as a month and a year. The two rows for customer id-1 show a history of that customer over the period May 2012 to January 2013. From May to August, the customer's data was 123; from August to January, it was 456.

Now we can have multiple rows for the same customer in our Customer table, and we (and the DBMS) can keep them distinct. Each of these rows is a version of the customer, and the table is now a versioned Customer table. We use this terminology in this book, but generally prefer to add the term “uni-temporal” because the term “uni-temporal” suggests the idea of a single temporal dimension to the data, a single kind of time associated with the data, and this notion of one (or two) temporal dimensions is a useful one to keep in mind. In fact, it may be useful to think of these two temporal dimensions as the X and Y axes of a Cartesian graph, and of each row in a bi-temporal table as represented by a rectangle on that graph.

Now we come to the last of the three illustrations in Figure Part 1.1. Pretty clearly, we can transform the second table into this third table exactly the same way we transformed the first into the second: we can add another pair of dates to the primary key. And just as clearly, we achieve the same effect. Just as the first two date columns allow us to keep multiple rows all having the same identifier, bd2 and ed2 allow us to keep multiple rows all having the same identifier and the same first two dates.

At least, that's the idea. In fact, as we all know, a five-column primary key allows us to keep any number of rows in the table as long as the value in just one column distinguishes that primary key from all others. So, for example, the DBMS would allow us to have multiple rows with the same identifier and with all four dates the same except for, say, the first begin date.

This first example of bi-temporal data shows us several important things. However, it also has the potential to mislead us if we are not careful. So let's try to draw the valid conclusions we can from it, and remind ourselves of what conclusions we should not draw.

First of all, the third illustration in Figure Part 1.1 does show us a valid bi-temporal schema. It is a table whose primary key contains three logical components. The first is a unique identifier of the object which the row represents. In this case, it is a specific customer. The second is a unique identifier of a period of time. That is the period of time during which the object existed with the characteristics which the row ascribes to it, e.g. the period of time during which that particular customer had that specific name and address, that specific customer status, and so on.

The third logical component of the primary key is the pair of dates which define a second time period. This is the period of time during which we believe that the row is correct, that what it says its object is like during that first time period is indeed true. The main reason for introducing this second time period, then, is to handle the occasions on which the data is in fact wrong. For if it is wrong, we now have a way to both retain the error (for auditing or other regulatory purposes, for example) and also replace it with its correction.

Now we can have two rows that have exactly the same identifier, and exactly the same first time period. And our convention will be that, of those two rows, the one whose second time period begins later will be the row providing the correction, and the one with the earlier second time period will be the row being corrected. Figure Part 1.3 shows a sample bi-temporal table containing versions and a correction to one of those versions.

What is one requirement for a relational table?

Figure Part 1.3. A Bi-Temporal Table.

In the column ed2, the value 9999 represents the highest date the DBMS can represent. For example, with SQL Server, that date is 12/31/9999. As we will explain later, when used in end-date columns, that value represents an unknown end date, and the time period it delimits is interpreted as still current.

The last row in Figure Part 1.3 is a correction to the second row. Because of the date values used, the example assumes that it is currently some time later than March 2013. Until March 2013, this table said that customer id-1 had data 456 from August 2013 to the following January. But beginning on March 2013, the table says that customer id-1 had data 457 during exactly that same period of time.

We can now recreate a report (or run a query) about customers during that period of time that is either an as-was report or an as-is report. The report specifies a date that is between bd2 and ed2. If the specified date is any date from August 2012 to March 2013, it will produce an as-was report. It will show only the first three rows because the specified date does not fall within the second time period for the fourth row in the table. But if the specified date is any date from March 2013 onwards, it will produce an as-is report. That report will show all rows but the second row because it falls within the second time period for those rows, but does not fall within the second time period for the second row.

Both reports will show the continuous history of customer id-1 from May 2012 to January 2013. The first will report that customer id-1 had data 123 and 456 during that period of time. The second will report that customer id-1 had data 123 and 457 during that same period of time. So bd1 and ed1 delimit the time period out in the world during which things were as the data describes them, whereas bd2 and ed2 delimit a time period in the table, the time period during which we claimed that things were as each row of data says they were.

Clearly, with both rows in the table, any query looking for a version of that customer, i.e. a row representing that customer as she was at a particular point or period in time, will have to distinguish the two rows. Any query will have to specify which one is the correct one (or the incorrect one, if that is the intent). And, not to anticipate too much, we may notice that if the end date of the second time period on the incorrect row is set to the same value as the begin date of the second time period on its correcting row, then simply by querying for rows whose second time period contains the current date, we will always be sure to get the correct row for our specific version of our specific customer.

That's a lot of information to derive from Figures Part 1.1, Part 1.2 and Part 1.3. But many experienced data modelers and their managers will have constructed and managed structures somewhat like that third row shown in Figure Part 1.1. Also, most computer scientists who have worked on issues connected with bi-temporal data will recognize that row as an example of a bi-temporal data structure.

This illustrates, we think, the simple fact that when good minds think about similar problems, they often come up with similar solutions. Similar solutions, yes; but not identical ones. And here is where we need to be careful not to be misled.

Read full chapterView PDFDownload book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123750419000236

DATA

David C. Hay, in Data Model Patterns, 2006

The Conceptual Model to Relational Database Design

Figure 2-35 shows how the column and table definitions may be based on attribute and entity class definitions—via column attribute mappings and table entity class mappings. In principle, an initial database design should be directly derived from the entity-relationship model structure. In most cases, a table should be based on a single entity class. Things are not always as tidy as that, of course, because, among other things, the super-type/sub-type structures in an entity-relationship model cannot be directly implemented in a purely relational database.

What is one requirement for a relational table?

Fig. 2-35. Tables and entity classes.

There are fundamentally three different approaches to mapping super-type/ sub-type structures to flat relational tables. Each has advantages and disadvantages, so the selection of an approach must be made with some care:

A table can be defined for each super-type, encompassing all of its sub-types. Columns are then defined both for the super-type attributes and for those in each sub-type. This has the advantage of simplicity, but it means that many columns will have null values. Moreover, it will not be possible to require values for any of the sub-type columns. A column that describes a sub-type will only have values for rows that represent occurrences of that sub-type.

One table can be defined for each sub-type, to include both columns derived from its attributes and columns for all inherited super-type attributes. This allows for requiring a value for each row of a column, but it adds complexity to the structure. Any relationship pointing to the super-type, for example, must now be implemented (via an exclusionary arc) with foreign keys pointing to each of the corresponding sub-type tables. In addition, the columns for the super-type must be defined redundantly for each sub-type.

One table can be defined for the super-type and for each sub-type, with foreign keys pointing from the super-type to each sub-type. This approach is the most elegant, with the least redundancy and best control, but it is also the most complex to implement.

What are the requirements for a table in relational database?

In the relational model, a table cannot contain duplicate rows, because that would create ambiguities in retrieval. To ensure uniqueness, each table should have a column (or a set of columns), called primary key, that uniquely identifies every records of the table.

What are the requirements of relational model?

Best Practices for creating a Relational Model.
Data need to be represented as a collection of relations..
Each relation should be depicted clearly in the table..
Rows should contain data about instances of an entity..
Columns must contain data about attributes of the entity..
Cells of the table should hold a single value..

What are the characteristics of a relational table?

Properties of Relational Tables..
Values Are Atomic..
Column Values Are of the Same Kind..
Each Row is Unique..
The Sequence of Columns is Insignificant..
The Sequence of Rows is Insignificant..
Each Column Has a Unique Name..

What are the four basic requirements of a relational database?

Relational databases need ACID characteristics. ACID refers to four essential properties: Atomicity, Consistency, Isolation, and Durability.