A schema is a description of data contents, structure, and
possibly other aspects of a database
external schema: describes data as seen by an
application program or by an end user
conceptual schema: describes the base logical structure
of all data
internal schema: describes how the database is
physically encoded, including selection of files, indexes, etc.
The separation of external schema from conceptual schema enables logical
data independence. The separation of conceptual schema from internal
schema enables physical data independence.
Data Independence
Isolates applications from data description and changes to data. Data
structure description is stored separately from the access programs. In a
database, if a record structure is changed by adding a field, then the new
field will show up on the next access of the database. The DBMS itself does
not need to be changed in order for it to access the data. The details of
storage implementation are hidden from the user.
data about data is called meta-data (catalog and description of data)
independent of the data itself, and allows program independence
separates interface from implementation
as long as you know that the data exists, and what it looks like,
you can access it without worrying about how it is accessed
design pattern abstractions or concerns - two concerns
basic concern: functionality -what is the basic purpose of software
or a database
specialization (is-a) - involved in inheritance. ex: a
TA is a special version of a Student. They have all the attributes
of a Student, and perhaps more
aggregation (has a) - an element is made up of sub
elements, and the sub elements have no existence outside of the
upper level element. ex: a Mark is an element of a Student.
Without the Student, there can be no Mark
association (uses a) - an element may use another
element, but both exist independently. ex: Course and Student each
have independent existences, but they are associated
special purpose concern: user interface, control, or timing
views-a - the relationship between modules, i.e. the
way they look at each other. ex: a Student and an Instructor have
different views of a Course. A Student may examine portions of a
Course, whereas an Instructor may alter Course data. This view is
not inherent in the data, but rather is imposed from outside
independent of the data. (This is a relatively 'new' software
primitive).
can have multiple user interfaces (analog and digital clock show
same data using different interface)
Separating these concerns makes software easier to design,
implement, and reuse. (Why?)
Databases may contain virtual data and/or computed data - a different
view of the data
In the DCRIS database, a member is associated with a keyword
through member_keyword table
The member_keyword table contains only the member_id
and the keyword_ID, which are not particularly informative
when read by a human. User can define a view that gives the member
name and the keyword Description. These are never actually
stored together as such, but exist together virtually in the view.
Student assignment marks are stored as raw marks and assignment
value - Percent result is a computed field that is never
stored, but derived from two other fields
Concurrency insures that simultaneous updates are not comitted by
separate users, and data integrity is maintained
Redundancy is reduced by efficient data storage
(see sample member, List, and keyword
data)
members each appear once, and therefore occupy a full record
Each member may have many areas of expertise, and are related
through member_keyword. Use of ID fields minimizes (but does
not eliminate) redundancy
Redundancy reduction improves data integrity:
allows limiting of choice for data fields
by storing information in only one place, eliminates possibility
that two or more pieces of data that should match do not
reduces storage space necessary
reduces duplication of effort - if the data is already there, why
re-enter it?
Rules can be designed into databases
Moves procedures (code) from the application program into the DBMS
Application programs do not need to worry about applying the rules,
and thus the rules do not have to be programmed into every application
Triggers can be applied whenever data is inserted, updated, or
deleted. A trigger is an event that is 'triggered' by the insertion,
update, or deletion.
Constraints or checks can be set upon data so that insert or update
attempts that violate the checks can be rejected
Relational Model
A domain is a set of values and has a name, data type, and
format. (INTEGER, CHAR[n], VARCHAR[n], SMALLINT, etc.)
A column header or field is an attribute. The content of an
attribute is a value, where the value is an element of the domain of the
attribute. (i.e. if the attribute domain is INTEGER, then the attribute
value must be INTEGER). A NULL is a special value, and is not necessarily
allowed in all domains.
A row or record is a tuple. A tuple is a set of <attribute,
value> pairs. At the abstract level, the order of <attribute,
value> pairs is irrelevant. (It is important to a particular view of
the data).
A table is a relation. A relation is a set of tuples. Sets are
logically unordered, though they may have a physical order when stored
A simple sample relation:
relation name
attributes
STUDENT
Surname
Forename
ID
tuples
Brown
David
165465
Zzap
Zachary
458965
Snord
Cranston
532595
A key is an attribute or set of attributes that can uniquely
identify a tuple. In the above example ID is the key
attribute because although names may be repeated, a student ID is unique.