2020-12-16 07:12

CP363 : Data and Relations

The Three Schema Architecture

A schema is a description of data contents, structure, and possibly other aspects of a database

external schema: describes data as seen by an application program or by an end user
conceptual schema: describes the base logical structure of all data
internal schema: describes how the database is physically encoded, including selection of files, indexes, etc.

The separation of external schema from conceptual schema enables logical data independence. The separation of conceptual schema from internal schema enables physical data independence.

Data Independence

Isolates applications from data description and changes to data. Data structure description is stored separately from the access programs. In a database, if a record structure is changed by adding a field, then the new field will show up on the next access of the database. The DBMS itself does not need to be changed in order for it to access the data. The details of storage implementation are hidden from the user.

data about data is called meta-data (catalog and description of data)
- independent of the data itself, and allows program independence
- separates interface from implementation
- as long as you know that the data exists, and what it looks like, you can access it without worrying about how it is accessed
design pattern abstractions or concerns - two concerns
- basic concern: functionality -what is the basic purpose of software or a database
  - specialization (is-a) - involved in inheritance. ex: a TA is a special version of a Student. They have all the attributes of a Student, and perhaps more
  - aggregation (has a) - an element is made up of sub elements, and the sub elements have no existence outside of the upper level element. ex: a Mark is an element of a Student. Without the Student, there can be no Mark
  - association (uses a) - an element may use another element, but both exist independently. ex: Course and Student each have independent existences, but they are associated
- special purpose concern: user interface, control, or timing
  - views-a - the relationship between modules, i.e. the way they look at each other. ex: a Student and an Instructor have different views of a Course. A Student may examine portions of a Course, whereas an Instructor may alter Course data. This view is not inherent in the data, but rather is imposed from outside independent of the data. (This is a relatively 'new' software primitive).
  - can have multiple user interfaces (analog and digital clock show same data using different interface)
- Separating these concerns makes software easier to design, implement, and reuse. (Why?)
Databases may contain virtual data and/or computed data - a different view of the data
- In the DCRIS database, a member is associated with a keyword through member_keyword table
- The member_keyword table contains only the member_id and the keyword_ID, which are not particularly informative when read by a human. User can define a view that gives the member name and the keyword Description. These are never actually stored together as such, but exist together virtually in the view.
- Student assignment marks are stored as raw marks and assignment value - Percent result is a computed field that is never stored, but derived from two other fields
Concurrency insures that simultaneous updates are not comitted by separate users, and data integrity is maintained
Redundancy is reduced by efficient data storage
- (see sample member, List, and keyword data)
- members each appear once, and therefore occupy a full record
- Each member may have many areas of expertise, and are related through member_keyword. Use of ID fields minimizes (but does not eliminate) redundancy
- Redundancy reduction improves data integrity:
  - allows limiting of choice for data fields
  - by storing information in only one place, eliminates possibility that two or more pieces of data that should match do not
- reduces storage space necessary
- reduces duplication of effort - if the data is already there, why re-enter it?
Rules can be designed into databases
- Moves procedures (code) from the application program into the DBMS
- Application programs do not need to worry about applying the rules, and thus the rules do not have to be programmed into every application
- Triggers can be applied whenever data is inserted, updated, or deleted. A trigger is an event that is 'triggered' by the insertion, update, or deletion.
- Constraints or checks can be set upon data so that insert or update attempts that violate the checks can be rejected

Relational Model

A domain is a set of values and has a name, data type, and format. (INTEGER, CHAR[n], VARCHAR[n], SMALLINT, etc.)
A column header or field is an attribute. The content of an attribute is a value, where the value is an element of the domain of the attribute. (i.e. if the attribute domain is INTEGER, then the attribute value must be INTEGER). A NULL is a special value, and is not necessarily allowed in all domains.
A row or record is a tuple. A tuple is a set of <attribute, value> pairs. At the abstract level, the order of <attribute, value> pairs is irrelevant. (It is important to a particular view of the data).
A table is a relation. A relation is a set of tuples. Sets are logically unordered, though they may have a physical order when stored

A simple sample relation:

STUDENT	Surname	Forename	ID
relation name	attributes
tuples	Brown	David	165465
	Zzap	Zachary	458965
	Snord	Cranston	532595

A key is an attribute or set of attributes that can uniquely identify a tuple. In the above example ID is the key attribute because although names may be repeated, a student ID is unique.