General Notes
These notes use examples of Structured Query Language (SQL)
code throughout. Not all dialects of SQL support the syntax shown here.
This syntax is based upon MySQL.
There are many dialects of SQL, including: SQL Anywhere, DB2, Oracle,
amongst others. Depending on the dialect, some operations may have to be
reworded, or restricted.
Introduction
What is a Database?
- a collection of related data
- has meaning and rules for relations between data elements
Database Applications
- business: payroll, inventory control, banking, reservations
- computer aided design (CAD)
- computer aided software environment (CASE)
- telecommunications systems
- archiving and warehousing (large data set storage and analysis)
- multimedia (object oriented databases)
Common Database Properties
- data is formatted
- data has integrity
- data can be manipulated
- (usually) large amounts of data
- there is concurrency control
- data is secure
- has data independence
Database Management System (DBMS)
- details of data storage and access is removed from application
program
- storage and access controlled by DBMS
- applications access data through DBMS
- reduces redundancy
- minimizes risk of inconsistency
- restricts access
- can be accessed by multiple applications
- data independent of application
Database and Application Development
Analysis
- ignores details of structure and implementation
- Answers the questions:
- why is a database needed?
- what goes into the database?
- what form is the data needed in?
- who needs to access the data?
- how is the database updated? (i.e. where do updates come from?)
- can be done with Unified Modelling Language (UML) use cases
Design
- lay out data structure(s) and relationships
- design data access points (how do we get at the data?)
- design application interface
Implementation
- how data is stored, indexed, updated, and retrieved
- data can be distributed across multiple systems
- data can be accessed interactively or through scripts (ex: SQL)
- scripts can be stored as part of the database, or in application
programs
- application programs must follow design rules of database
Data
Metadata
- data about data (i.e.: catalog and description of data)
- independent of the data itself - stored separately from the data
- allows program independence
- separates interface from implementation - details of storage
implementation are hidden from application program
- as long as you know that the data exists, and what it looks like,
you can access it without worrying about how it is accessed
Data Relations
- specialization (is-a): involved in inheritance. ex: a TA
is a special version of a student. A TA has all the
attributes of a student and more
- aggregation (has a): an element is made up of sub elements,
and the sub elements have no existence outside of the upper level
element. ex: an address is an element of a student.
Without the student, there can be no address
- association (uses a): an element may use another element,
but both exist independently. ex: course and student
each have independent existences, but they are associated - a student
can take a course, and a course can be taken by many
students
- views-a: the relationship between modules, i.e. the way
they look at each other. ex: a student and an instructor
have different views of a course. A student may
examine portions of a course, whereas an instructor
may alter course data. This view is not inherent in the data,
but rather is imposed from outside independent of the data.
- These conceptual relations can be modeled graphically in UML
(Unified Modeling Language)
- These conceptual relations are implemented in different ways
depending on the type of application: procedural programming,
object-oriented programming, or relational database design
Data Redundancy
Uncontrolled redundancy (repetition of data) can lead to problems:
- duplication of effort: multiple adds and updates are
required
- wasted storage space: repeated data takes up more storage
- inconsistent data: multiple occurrences of data can lead to
update mistakes
Controlled redundancy can be used in specific instances - ex. data
repeated in order to improve query performance. In such a case the DBMS
must be designed to make sure that repeated data is automatically added
or updated as necessary in order to avoid duplication of effort and
inconsistent data.
Data Integrity
A DBMS must define and enforce constraints on its data. This helps to
verify the correctness of the data, at least so far as the constraints
are defined.
- key constraint: unique identifier for a data item
- referential integrity constraint: data that references
other data - relates one set of data to another set
- domain constraint: data type definition
- general semantic integrity constraint: data must fit
real-world rules
Triggers can be applied whenever data is inserted, updated,
or deleted. A trigger is an event that is 'triggered' by the
insertion, update, or deletion.
- concurrency constraint: restrictions on multiple user
access to a DBMS - insures that simultaneous updates are not committed
by separate users, and data integrity is maintained
Data insert or update attempts that violate these integrity checks can
be rejected by the DBMS. Application programs do not need to worry about
applying the rules, and thus these integrity checks do not have to be
programmed into every application. The application merely needs to
handle any error exceptions provided by the DBMS.
The Relational Model
- A data type is a domain. It has a name, data type, and
format. (SQL domains include
INTEGER
, CHAR[n]
,
VARCHAR[n]
, SMALLINT
, amongst others)
- A column header or field is an attribute. The content of an
attribute is a value, where the value is an element of the domain of
the attribute. (i.e. if the attribute domain is
INTEGER
,
then the attribute value must be INTEGER
). A NULL
is a special value, and is not necessarily allowed in all domains.
- A row or record is a tuple. A tuple is a set of
<attribute, value> pairs. At the abstract level, the order of
<attribute, value> pairs is irrelevant. (It is important to
particular views of the data).
- A table is a relation. A relation is a set of tuples. Sets
are logically unordered, though they may have a physical order when
stored.
- A key is an attribute or set of attributes that can
uniquely identify a tuple.
- A view is a virtual relation - a description of how data
should be computed or displayed is stored as metadata rather than the
computed data itself.