CP363 : Distributed Databases

A distributed computing system consists of a number of processing elements connected by a computer network. Tasks are distributed amongst these components. A distributed database (DDB) is a group of interconnected databases spread over a network, and a distributed database management system (DDBMS) is system software that manages a DDB To a user, a distributed database looks the same as a centralized database.

Types of multiprocessor system architecture:

shared memory architecture

shared disk architecture

shared nothing architecture

In more widely-spread architectures multiple servers are connected by a communication network. The various servers need not be of the same type, merely share the same DDBMS.

Why distributed database?

Data distributed amongst many sites is more likely to be always available, and may be closer to where it is needed most. Ttransactions too are distributed amongst many sites, thus spreading the load. Some transactions may have various elements run in parallel (subqueries, for example). Expansion is easier in terms of adding more data, processors, or increasing data size.

Database must be transparent in a number of ways:

Issues:

Distributed systems add a great deal of overhead that must be dealt with for the system to work properly.


Replication

fully replicated distributed database: entire database is copied to multiple sites - great for availability and reliability, not so great for updates

non-redundant allocations: no data is repeated between sites, i.e. all fragments are disjoint

partial replication: some fragments may be replicated at some sites. Especially prevalent where mobile workers need access to data in the field; sales people, insurance claims adjustors, financial planners

Optimizing fragment allocation is a difficult problem in optimization. Where is data most likely updated? Where is it most likely accessed? How important is 100% up time? How fast/easy is communication between systems?


Types of Distributed Database Systems

homogenous - all servers and software are of the same type

heterogenous - not the above

local autonomy - implies direct access by local transactions to a server allowed

Federated Database Management System (FDBS)

A heterogenous database system. Needs a global schema manager to coordinate all of the disparate data elements together.

multidatabase system

Also hetergenous, but does not have a global schema manager - constructs one on the fly as needed by an application.


Concurrency Control and Recovery in Distributed Databases

Problems unique to a distributed environment: