A Level Computing - COMP3 Databases



Blog Posts


Conceptual Data Modelling

Conceptual Data Model

A systems analyst designing a new system would need to produce a conceptual data model describing how data needs to be stored - how elements within the data are related. A conceptual data model is independent of any software that may be used for implementation.

The following key terms are used,

  • Entity - an object, event or person about which data is stored.
  • Attribute - a property or characteristic of an entity - like a field within a table
  • Relationship - a link or association between entities.
  • Degree of Relationship - the nature of the relationship between two entities

One-To-One

Eg. husband - wife

One-To-Many

Eg. parent - children, customer - order, sport - team, team - result

Many-To-Many

Eg. student - course, film - film star

You can also have a Many-To-One relationsip between entities. The arrows would be the mirror of the One-To-Many diagram.

Example

The data requirements for a database of school sports results are as follows.

A number of sports are played in the school. Each sport has a different name. A number of teams play each sport. Each team has a different name (eg U14A etc). The results of each match are to be stored including the date, points for and against, the result and the opposition.

Suggest 3 entities, listing their attributes. Suggest which attribute(s) in each entity will uniquely identify the records stored. Draw an entity relationship diagram to show the relationship between the entities.

More Examples

Draw Entity Relationship Diagrams To Illustrate the Relationships

  1.  Employee - Company Car
  2. Ward - Patient
  3. Album - Singer
  4. Product - Component
  5. Homeowner - Main Residence
  6. Racehorse owner - Racehorse


Database Design

Relational Databases - Key Terms

A relational database is a collection of tables.

In a relational database, we use the term relation to describe the headings and data that model an entity (table). The word refers not only to the attributes (field names and data types) but to the valid attribute values stored under each attribute. A set of attribute values is called a tuple. This is analogous to the rows in our database table.

In a relational database, each tuple or row must have a unique identifier. Often, one attribute is unique to each tuple and is used as the Primary Key in a table definition. Where no single attribute is unique to a tuple, a combination of two or more attributes is used to identify the tuple - this is called a Composite Primary Key.

Relationships are modelled by having common attributes among entities. An attribute in a table that is the primary key of another table is called a Foreign Key.

Since there is related data in several tables in a relaionship data, updates to one table may require updates to related tables. The need to cascade updates across more than table is called Referential Integrity.

Normalisation

The aim of database design is to store all of the data without unnecessary duplication of information. Duplicating data requires more work on maintaining referential integrity and leads to errors. Normalisation is a process that allows us to optimise a data model to avoid duplicating data.

Example - The Context

Pilcon Electronics is a manufacturer with a traditional hierarchical staffing structure. The personnel department is setting up a database to hold personal staff details which are currently stored manually. Each member's staff details are held on a single sheet.

The first thing we need to do is normalise the data. Data normalisation is a technique established by E F Codd to simplify the structure of data and to overcome the problems of data duplication in a database.

Original Structure

StaffMember(Staff Code, Surname, Initials, Job Title, Salary, Start Date, Pension, Dept Name, DeptHead [Course Date, Course Title, Duration, Certificated])

The fields in square brackets are repeating fields. The data as it is could not be represented in a single table.

First Normal Form

Data is in first normal form if there are no repeating fields. To put this data into first normal form we need to create a new entity for the repeating fields.

STAFFMEMBER-1 (StaffCode, Surname, Initials, Job Code, JobTitle, Salary, StartDate, Pension, DeptCode, DeptName, DeptHead)
TRAINING-1 (CourseDate, StaffCode, CourseTitle, Duration, Certificated)

Second Normal Form

To put the data into second normal form we need to remove non-key fields from tables with composite primary keys where those fields are not functionally dependent on the whole of the primary key.

STAFFMEMBER-2 (StaffCode, Surname, Initials, Job Code, JobTitle, Salary, StartDate, Pension, DeptCode, DeptName, DeptHead)
TRAINING-2 (CourseDate, StaffCode, CourseTitle, Duration, Certificated)

In this example, the data is already in second normal form since course details only appear in the DB if a staff member attends a course.

Third Normal Form

As far as we are concerned, this is the final stage of data normalisation. To place data in third normal form we need to remove non-key attributes that could act as a unique identifier in a separate table.

STAFFMEMBER-3 (StaffCode, Surname, Initials, Job Code, Salary, StartDate, Pension, DeptCode)
TRAINING-3 (CourseDate, StaffCode, CourseTitle, Duration, Certificated)
JOB-3 (JobCode, JobTitle)
DEPARTMENT-3 (DeptCode, DeptName, DeptHead)

Having the job and department tables allows us to edit details of these in one location and avoids the problems of duplication and inconsistency.

Data Dictionary

The following tables describe the data structure for the Pilcon Electronics database.

Staff Member

Field NameData TypeField SizeInput MaskValidation RuleValidation TextRequiredIndexed
StaffCodetext3>=100 and <=200Range 100 to 200YesYes (no duplicates)
Surnametext25NoNo
Initialstext4NoNo
JobCodetext3LL0NoNo
DeptCodetext3LLL"ACC" or "GOS" or "MIS" or "PER" or "PRO" or "PUR" or "RAD" or "SAL" or "WAR"Invalid CodeNoYes (duplicates OK)
Salarycurrency0DP>=8000 and <=45000Range 8000 to 45000NoNo
StartDatedate<=now()Invalid DateNoNo
PensionYes/noNoNo

Training

Field NameData TypeField SizeInput MaskValidation RuleValidation TextRequiredIndexed
CourseDatedateYesYes (Duplicates OK)
StaffCodetext3>=100 and <=200Range 100 to 200YesYes (Duplicates OK)
CourseTitletext30 NoNo
DurationnumberInteger 0 dp>=1 and <=10Range 1 to 10NoNo
CertificatedYes/noNoNo

Job

Field NameData TypeField SizeInput MaskValidation RuleValidation TextRequiredIndexed
JobCodetext3LL0YesYes (No Duplicates)
JobTitleText35NoNo


Department

Field NameData TypeField SizeInput MaskValidation RuleValidation TextRequiredIndexed
DeptCodetext3LLL"ACC" or "GOS" or "MIS" or "PER" or "PRO" or "PUR" or "RAD" or "SAL" or "WAR"Invalid CodeYesYes (duplicates OK)
DeptTitleText35NoNo
DeptHeadtext3>=100 and <=200Range 100 to 200YesYes (No Duplicates)


Relationships

The following relationships need to be established,

  • DeptCode in DEPARTMENT has a one-to-many relationship with DeptCode in STAFFMEMBER. There may be many staff in each department but each member of staff can only belong to one department.
  • JobCode in JOB has a one-to-many relationship with JobCode in the STAFFMEMBER table. Each member of staff has only one job title but many members of staff may have the same job title.
  • StaffCode in STAFFMEMBER has a one-to-many relationship with StaffCode in the TRAINING table. Each member of staff may undertake more than one course of training.
  • StaffCode in STAFFMEMBER has a one-to-many relationship with DeptHead in the DEPARTMENT table.


Database Basics - Online Lesson

Database Normalisation - Online Lesson

TeachICT Normalisation Theory and Task Booklet


Structured Query Language

DDL

More to come....

DML

More to come....

SQL

SQL is a

data access language

and not a programming language.

To query a Microsoft Access Database using SQL

  • Select Queries from the database window.
  • Open a new Query in Design View.
  • Close the Add Table box without adding a table.
  • Click on Query, SQL Specific, Data Definition.

For the purposes of the examination you are required to know how to use SELECT queries only. You can use SQL for all of the same types of query that you use through access. You may wish to learn more about SQL when you are using Access with Visual Basic.

For the examples below use the Access 2000 database, trek2000.mdb which is at the bottom of this post. You may find it useful to peek at the tables before you begin. Copy the examples and save all of the queries that you write.

Section 1

SELECT [Episode Number], [Episode Title], Series, Season
FROM tblEpisode
WHERE Series ="TOS";

This produces a table of episodes from The Original Series of Star Trek. There are 79 records for this series in the database. Square brackets are used where field names have spaces in their titles.

Use the format shown above to write queries that,

  • produce a table of episodes from the DS9 series of Star Trek showing only the episode number and title of each record.
  • produce a table of episodes from the TNG series of Star Trek showing only the episode number and title of each record.
  • produce a table of episode titles and story authors where Gene Rodenberry is the sole author of the episode. (Gene is the daddy of Star Trek)

Section 2

SELECT DISTINCT Series
FROM tblSeries;

You should now have a table showing 3 records from the tblSeries table. They should be TOS, TNG and DS9. Duplicate records have been removed with the DISTINCT statement.

Write SQL queries that,

  • produce a table of episode titles and story authors showing only distinct records.
  • produce a table of episode titles and story authors from the TNG series showing only distinct records.

Section 3

SELECT *
FROM tblEpisode
WHERE [Teleplay Author] IS NULL;

You should now have a table showing 123 episodes for which there is no value in the Teleplay Author field.

Section 4

SELECT *
FROM tblEpisode
WHERE [Episode Number] > 20 and Series ="TNG";

You should have 106 records.

Write SQL queries that,

  • produce a table of episodes showing the first 3 episodes in each series.
  • produce a table of series shown after 1987 (think about the date format and for Access include a # either side of the date).

Section 5

Other Conditions that can be specified in the WHERE line are,

ConditionMeaningExample
LIKESimilar to.[Episode Title] LIKE "Q*"
BETWEEN...ANDWithin a range of values including the two values which define the limits.[Episode Number] BETWEEN 20 and 50
OREither condition must be true for the record to be displayed.[Story Author] = "Gene Rodenberry" OR [Story Author] ="D C Fontana"

Operators such as >, <>, >=, <= can be used in SQL and work as you would expect.

Modify the example query from Number 4 for each of the above examples.

Section 6

SELECT tblEpisode.[Story Author], *
FROM tblEpisode
WHERE Series= "TNG"
ORDER BY [Episode Title];

This will produce a table of episodes from the TNG series, listing the episodes in alphabetical order by Title.

Write SQL queries that,

  • produce a table of episodes from the TOS series, listing the episodes in alphabetical order by Episode Title.
  • produce a table of episodes written by Gene Rodenberry, listing the episodes in numerical order by Episode Number.

Section 7

SELECT Sum([Episode Number]) AS [Episode Nos Added],[Story Author]
FROM tblEpisode
Group by [Story Author];

This query does not make a lot of sense since the Episode Numbers are used to identify episodes and not for any arithmetic purpose. You should have produced a list of Story Authors and another field showing the value produced when the episode numbers of the episodes that they wrote are added together. More use in another database. Notice that you have 'created' a field here.

Supplementary Tasks

Write queries that,

  1. produce a table of episodes which include the word Time or Data in the Episode Title field. (The character Data is very popular as are episodes about Time).
  2. tell you which series of the three included in this database uses the word 'Space' most in episode titles.

Explain the nature and purpose of a database query language. (3)

The Trek database began with the following data structure (repeating fields not identified).

Episode(Episode Number, Episode Title, Series, Season, Story Author, Teleplay Author, Date First Aired)

The database has not been properly normalised and, although it is usable for the above tasks, would need to be improved if it were used in another context.

  • Identify the problem field(s) that exist in this database, explaining why they would need to be modified if the database needed to be used in a professional context.
  • Convert the data structure to 3rd Normal Form showing all of the stages involved.
  • Draw an entity relationship model (ERM) for the data structure that you have specified.

Use the WWW to locate a more comprehensive introduction to using SQL queries in Access 2000. Concentrate on finding a resource that you will be able to refer to when completing your project. (SELECT queries are most useful).


Other Query Types

Delete
Whenever you delete information from a table, you want to take care not to delete any data that you still need. DELETE queries in SQL are relatively straightforward to write and can be easier to read.

If you wanted to delete the record for the Acacia seeds, you would write,

DELETE FROM tblseeds WHERE name = "Acacia";

Often we want to remove data in a more complex manner. The following query removes all records with a seed name starting with the letter A.

DELETE FROM tblseeds WHERE name like 'A%';

Update

Update queries are used to change the data in the table. You can change one row at a time or lots of rows depending on how you write the WHERE clause of your query. The following query will change the price of Acacia seeds to £9.

UPDATE tblseeds SET price = '9.00' WHERE name = 'Acacia';

SQL for Queries - Online Lesson
SQL for DDL, Creating the Data Structure - Online Lesson
SQL for DML, editing records - Online Lesson