Schemas
The Three Schema Architecture
A schema is a description of data contents, structure, and
possibly other aspects of a database
- external schema: describes data as seen by an application
program or by an end user
- conceptual schema: describes the base logical structure of
all data
- internal schema: describes how the database is physically
encoded, including selection of files, indexes, etc.
The separation of external schema from conceptual schema enables logical
data independence. The separation of conceptual schema from
internal schema enables physical data independence.
Data Independence
Isolates applications from data description and changes to data. Data
structure description is stored separately from the access programs. In
a database, if a record structure is changed by adding a field, then the
new field shows up on the next access of the database. The DBMS
itself does not need to be changed in order for it to access the data.
The details of storage implementation are hidden from the user.
- data about data is called meta-data (catalog and description of
data)
- independent of the data itself, and allows program independence
- separates interface from implementation
- as long as you know that the data exists, and what it looks
like, you can access it without worrying about how it is accessed
- design pattern abstractions or concerns - two concerns
- basic concern: functionality -what is the basic purpose of
software or a database
- specialization (is-a) - involved in inheritance.
ex: a TA is a special version of a student. They have all the
attributes of a student, and perhaps more
- aggregation (has a) - an element is made up of sub
elements, and the sub elements have no existence outside of
the upper level element. ex: a Mark is an element of a
student. Without the student, there can be no Mark
- association (uses a) - an element may use another
element, but both exist independently. ex: Course and student
each have independent existences, but they are associated
- special purpose concern: user interface, control, or timing
- views-a - the relationship between modules, i.e.
the way they look at each other. ex: a student and an
Instructor have different views of a Course. A student may
examine portions of a Course, whereas an Instructor may alter
Course data. This view is not inherent in the data, but rather
is imposed from outside independent of the data. (This is a
relatively 'new' software primitive).
- can have multiple user interfaces (analog and digital clock
show same data using different interface)
- Separating these concerns makes software easier to design,
implement, and reuse. (Why?)
- Databases may contain virtual data and/or computed data - a
different view of the data
- In the DCRIS database, a member is associated with a
narrow expertise by the memberNarrow table
- The memberNarrow table contains only the memberNarrowMemberId
and the memberNarrowNarrowId, which is not
particularly informative when read by a human. User can define a
view that gives the member name and the narrow expertise
description, narrowDesc. These are never actually
stored together in the same table, but exist together virtually in
the view.
- student assignment marks are stored as raw marks and assignment
value - percent result is a computed field that is
never stored, but derived from two other fields
- Concurrency insures that simultaneous updates are not committed by
separate users, and data integrity is maintained
- Redundancy is reduced by efficient data storage
- (see sample member, List, and narrow
data)
- members each appear once, and therefore occupy a full record
- Each member may have many areas of expertise, and are related
through memberNarrow. Use of ID fields minimizes (but
does not eliminate) redundancy
- Redundancy reduction improves data integrity:
- allows limiting of choice for data fields
- by storing information in only one place, eliminates
possibility that two or more pieces of data that should match
do not
- reduces storage space necessary
- reduces duplication of effort - if the data is already there,
why re-enter it?
- Rules can be designed into databases
- Moves procedures (code) from the application program into the
DBMS
- Application programs do not need to worry about applying the
rules, and thus the rules do not have to be programmed into every
application
- Triggers can be applied whenever data is inserted, updated, or
deleted. A trigger is an event that is 'triggered' by the
insertion, update, or deletion.
- Constraints or checks can be set upon data so that insert or
update attempts that violate the checks can be rejected
Relational Model
A domain is a set of atomic values. An atomic value is
a value that is indivisible; i.e. it cannot be broken down into smaller
values. This depends partly on how the value is to be used. A
Name
attribute (ex: "David B. Brown") may be considered atomic if only the
entire name value is used. However a
Name
could be broken down into
Surname
,
Forename
, and
Initial
should there be a use for the individual attributes. On the other hand a
student ID would be irreducibly atomic in that it cannot be broken down
further.
- A domain has a name, data type, and format. (
INTEGER
,
CHAR[n]
, VARCHAR[n]
, SMALLINT
,
amongst others.)
- A column header or field is an attribute. The content of an
attribute is a value, where the value is an element of the domain of
the attribute. (i.e. if the attribute domain is
INTEGER
,
then the attribute value must be INTEGER
). A NULL
is a special value, and is not necessarily allowed in all domains.
- A row or record is a tuple. A tuple is a set of
<attribute, value> pairs. At the abstract level, the order of
<attribute, value> pairs is irrelevant. (It is important to a
particular view of the data).
- A table is a relation. A relation is a set of tuples. Sets
are logically unordered, though they may have a physical order when
stored
- A simple sample relation:
relation name |
attributes |
student |
id |
surname |
forename |
tuples |
165465 |
Brown |
David |
458965 |
Zzap |
Zachary |
532595 |
Snord |
Cranston |
- A key is an attribute or set of attributes that can
uniquely identify a tuple. In the above example ID is the
key attribute because although names may be repeated, a student ID is
unique.
If an attribute is multivalued it should be represented by
separate relations. Thus if a
supervisor
has many
employees
this information cannot be stored in a single table because
employees
is multivalued. The relationship between the two must be broken up into
at least two tables.
Keys
Generally all tuples in a relation should be distinct
- this distinctiveness can be based upon some subset of the relation's
attributes: this set of attributes is a superkey
- A key is a minimal superkey: it is the smallest set of
attributes that can uniquely identify a tuple in a relation.
- A relation may have more than one key - each of these possible keys
is a candidate key.
- One of a relation's keys can be designated as the primary key.
- Ex: a student table may contain the attributes studentId,
OHIPNumber and SIN. Taken together these three
attributes form a superkey. As well any two of these attributes taken
together form a superkey. Individually all three are candidate keys
because each can be used to uniquely identify a tuple in student.
Only one can be chosen to be the primary key.
- Ex: In the case of a lookup table, like that between member
and narrow the primary key is a combination of the
attributes memberId and narrowId because only
the combination of these two attributes uniquely identifies a tuple in
the memberNarrow table.
- No primary key value should be null. (This is the entity
integrity constraint).
Key constraints between tables are called referential integrity
constraints.
- A tuple in one relation may refer to an existing tuple in another
relation.
- An attribute in one table is called a foreign key if its
values refer to the values of a primary key in another table.
- A foreign key may or may not be allowed to have a null value.
- These key constraints must hold when table data is inserted,
deleted, or updated.
DDL / DML
DDL: Data Definition Language
- specifies schemas
- may be different DDLs for external, conceptual and internal schemas
- meta-data is stored in a data dictionary
DML: Data Manipulation Language
- specifies data queries and updates
- two general ways of querying and updating a database
- through "stand alone" DML facilities
- from within application programs
- two kinds of DMLs: result sets may be
- navigational (one record at a time)
- non-navigational
SQL (Structured Query Language) is used as both a DDL and a DML.
Constraints
A referential key constraint can protect data in one table from
deletions or updates in another table.
- If a tuple in a parent table has children (i.e. tuples related by a
foreign key) then attempts to delete the parent before the children
are deleted fail. This is because the children are linked to the
parent's primary key by their foreign key value, and if the primary
key is deleted, their foreign key values become invalid. Thus the
deletion is rejected. Similarly this can prevent the primary key of a
parent from being updated. Once all the children are deleted, however,
the parent tuple may also be deleted.
A referential key constraint can also make deletions and updates
easier.
- Rather than rejecting the deletion of a parent, the deletion can be
cascaded. In this case when a parent tuple is deleted all
child tuples with matching foreign keys are deleted as well. Thus a
single command can be used to delete large amounts of data. Similarly
an update to a particular primary key value can be cascaded to all
child foreign keys.
The database designer must choose whether a particular relation rejects
or cascades deletions and updates.