General Notes

These notes use examples of Structured Query Language (SQL) code throughout. Not all dialects of SQL support the syntax shown here. This syntax is based upon MySQL. There are many dialects of SQL, including: SQL Anywhere, DB2, Oracle, amongst others. Depending on the dialect, some operations may have to be reworded, or restricted.

Introduction

What is a Database?

a collection of related data
has meaning and rules for relations between data elements

Database Applications

business: payroll, inventory control, banking, reservations
computer aided design (CAD)
computer aided software environment (CASE)
telecommunications systems
archiving and warehousing (large data set storage and analysis)
multimedia (object oriented databases)

Common Database Properties

data is formatted
data has integrity
data can be manipulated
(usually) large amounts of data
there is concurrency control
data is secure
has data independence

Database Management System (DBMS)

details of data storage and access is removed from application program
storage and access controlled by DBMS
applications access data through DBMS
reduces redundancy
minimizes risk of inconsistency
restricts access
can be accessed by multiple applications
data independent of application

Database and Application Development

Analysis

ignores details of structure and implementation
Answers the questions:
- why is a database needed?
- what goes into the database?
- what form is the data needed in?
- who needs to access the data?
- how is the database updated? (i.e. where do updates come from?)
can be done with Unified Modelling Language (UML) use cases

Design

lay out data structure(s) and relationships
design data access points (how do we get at the data?)
design application interface

Implementation

how data is stored, indexed, updated, and retrieved
data can be distributed across multiple systems
data can be accessed interactively or through scripts (ex: SQL)
scripts can be stored as part of the database, or in application programs
application programs must follow design rules of database

Data

Metadata

data about data (i.e.: catalog and description of data)
independent of the data itself - stored separately from the data
allows program independence
separates interface from implementation - details of storage implementation are hidden from application program
as long as you know that the data exists, and what it looks like, you can access it without worrying about how it is accessed

Data Relations

specialization (is-a): involved in inheritance. ex: a TA is a special version of a student. A TA has all the attributes of a student and more
aggregation (has a): an element is made up of sub elements, and the sub elements have no existence outside of the upper level element. ex: an address is an element of a student. Without the student, there can be no address
association (uses a): an element may use another element, but both exist independently. ex: course and student each have independent existences, but they are associated - a student can take a course, and a course can be taken by many students
views-a: the relationship between modules, i.e. the way they look at each other. ex: a student and an instructor have different views of a course. A student may examine portions of a course, whereas an instructor may alter course data. This view is not inherent in the data, but rather is imposed from outside independent of the data.
These conceptual relations can be modeled graphically in UML (Unified Modeling Language)
These conceptual relations are implemented in different ways depending on the type of application: procedural programming, object-oriented programming, or relational database design

Data Redundancy

Uncontrolled redundancy (repetition of data) can lead to problems:

duplication of effort: multiple adds and updates are required
wasted storage space: repeated data takes up more storage
inconsistent data: multiple occurrences of data can lead to update mistakes

Controlled redundancy can be used in specific instances - ex. data repeated in order to improve query performance. In such a case the DBMS must be designed to make sure that repeated data is automatically added or updated as necessary in order to avoid duplication of effort and inconsistent data.

Data Integrity

A DBMS must define and enforce constraints on its data. This helps to verify the correctness of the data, at least so far as the constraints are defined.

key constraint: unique identifier for a data item
referential integrity constraint: data that references other data - relates one set of data to another set
domain constraint: data type definition
general semantic integrity constraint: data must fit real-world rules
Triggers can be applied whenever data is inserted, updated, or deleted. A trigger is an event that is 'triggered' by the insertion, update, or deletion.
concurrency constraint: restrictions on multiple user access to a DBMS - insures that simultaneous updates are not committed by separate users, and data integrity is maintained

Data insert or update attempts that violate these integrity checks can be rejected by the DBMS. Application programs do not need to worry about applying the rules, and thus these integrity checks do not have to be programmed into every application. The application merely needs to handle any error exceptions provided by the DBMS.

The Relational Model

A data type is a domain. It has a name, data type, and format. (SQL domains include INTEGER, CHAR[n], VARCHAR[n], SMALLINT, amongst others)
A column header or field is an attribute. The content of an attribute is a value, where the value is an element of the domain of the attribute. (i.e. if the attribute domain is INTEGER, then the attribute value must be INTEGER). A NULL is a special value, and is not necessarily allowed in all domains.
A row or record is a tuple. A tuple is a set of <attribute, value> pairs. At the abstract level, the order of <attribute, value> pairs is irrelevant. (It is important to particular views of the data).
A table is a relation. A relation is a set of tuples. Sets are logically unordered, though they may have a physical order when stored.
A key is an attribute or set of attributes that can uniquely identify a tuple.
A view is a virtual relation - a description of how data should be computed or displayed is stored as metadata rather than the computed data itself.

Lesson 1: Introduction