2020-12-16 07:12

CP363 : Data

Metadata

data about data (i.e.: catalog and description of data)
independent of the data itself - stored separately from the data
allows program independence
separates interface from implementation - details of storate implementation are hidden from application program
as long as you know that the data exists, and what it looks like, you can access it without worrying about how it is accessed

Data Relations

specialization (is-a): involved in inheritance. ex: a TA is a special version of a Student. A TA has all the attributes of a Student and more
aggregation (has a): an element is made up of sub elements, and the sub elements have no existence outside of the upper level element. ex: an Address is an element of a Student. Without the Student, there can be no Address
association (uses a): an element may use another element, but both exist independently. ex: Course and Student each have independent existences, but they are associated - a Student can take a Course, and a Course can be taken by many Students
views-a: the relationship between modules, i.e. the way they look at each other. ex: a Student and an Instructor have different views of a Course. A Student may examine portions of a Course, whereas an Instructor may alter Course data. This view is not inherent in the data, but rather is imposed from outside independent of the data.
These conceptual relations can be modeled graphically in UML (Unified Modeling Language)
These conceptual relations are implemented in different ways depending on the type of application: procedural programming, object-oriented programming, or relational database design

Data Redundancy

Uncontrolled redundancy (repetition of data) can lead to problems:

duplication of effort: multiple adds and updates are required
wasted storage space: repeated data takes up more storage
inconsistent data: multiple occurrences of data can lead to update mistakes

Controlled redundancy can be used in specific instances - ex. data repeated in order to improve query performance. In such a case the DBMS must be designed to make sure that repeated data is automatically added or updated as necessary in order to avoid duplication of effort and inconsistent data.

Data Integrity

A DBMS must define and enforce constraints on its data. This helps to verify the correctness of the data, at least so far as the constraints are defined.

key constraint: unique identifier for a data item
referential integrity constraint: data that references other data - relates one set of data to another set
domain constraint: data type definition
general semantic integrity constraint: data must fit real-world rules

Triggers can be applied whenever data is inserted, updated, or deleted. A trigger is an event that is 'triggered' by the insertion, update, or deletion.
concurrency constraint: restrictions on multiple user access to a DBMS - insures that simultaneous updates are not comitted by separate users, and data integrity is maintained

Data insert or update attempts that violate these integrity checks can be rejected by the DBMS. Application programs do not need to worry about applying the rules, and thus these integrity checks do not have to be programmed into every application.

The Relational Model

A data type is a domain. It has a name, data type, and format. (SQL domains include INTEGER, CHAR[n], VARCHAR[n], SMALLINT, etc.)
A column header or field is an attribute. The content of an attribute is a value, where the value is an element of the domain of the attribute. (i.e. if the attribute domain is INTEGER, then the attribute value must be INTEGER). A NULL is a special value, and is not necessarily allowed in all domains.
A row or record is a tuple. A tuple is a set of <attribute, value> pairs. At the abstract level, the order of <attribute, value> pairs is irrelevant. (It is important to particular views of the data).
A table is a relation. A relation is a set of tuples. Sets are logically unordered, though they may have a physical order when stored.
A key is an attribute or set of attributes that can uniquely identify a tuple.
A view is a virtual relation - a description of how data should be computed or displayed is stored as metadata rather than the computed data itself