Category Archives: Database

database

Database tips? SQL Lesson Tips – Notes from COMP1630 at BCIT

databaseWhy Learn SQL?

People may want to learn SQL for different reasons.  Some are computer programmers, some are entrepreneurs, and others may be even auditors who need to know what’s going on with calculations in order to catch mistakes.

That’s true, turns out that if an auditor from Revenue Canada or IRS shows up at your office and you tell them that your computer system calculates everything, they could ask you to see the code for the SQL query!

 

SQL and its advantages:

  • Non-procedural language – used to specify what information required rather than how to get it (4GL)
  • Free-format – statements can be typed at any locations on the screen
  • Used for data manipulation, data definition and data administration

 

Getting Started with SQL and Database Design

In an article previously published, we discussed some benefits of proper database design.  It’s important to have your plan perfected before starting to use any software to make it happen.

If you are also learning SQL, you may find these materials from our lecture room helpful.  For our class we used SQL Server 2012 RDBMS platform. Did you know that such SQL server can hold a maximum of 32,767  databases?  For home I am downloading SQL Server 2008 R2 Express software from my school, BCIT.

Entities translate into tables, attributes into columns.  RDBMS automatically indexes primary keys and foreign keys. There are two types of tables – user tables (filled with information from the database) and system tables (contains database description).

Beside tables with data we can store other objects, such as views, indexes, stored procedures, and triggers. Because SQL language allows for that, it makes simple and complex queries possible.

 

Using Proper Syntax

For writing commands we need to use reserved words without any spelling errors and also user-defined words.  It is crucial to follow the syntax rules in order to formulate some working SQL statements, like for example using single quotes for enclosing values. But on the lighter note, all commands are case-insensitive.  For naming attributes you can even use characters like @, #, and _ (underscore).  However , it is recommended to keep it consistent and to use upper case letters to represent reserved words and lower case letters to represent user-defined words.

Keep in mind that certain symbols at the beginning of an identifier have special meaning.  Table names must follow the rules for naming identifiers and the name itself must be unique in the database.

  • For a local variable or parameter the identifier begins with @
  • DO NOT begin identifiers with @@ because it is used in SQL function names

Each SQL statement (not line) should be ended with a semi-colon – it signifies the end of the statement.

 

Difference between DDL, DML and DCL commands

  • Data Definition Language (DDL) – for defining database structure and controlling access to the data, examples of such commands are CREATE, ALTER, DROP, TRUNCATE, COMMENT, RENAME.  There are two ways of implementing DDL – doing it by script and doing it in the Management Studio (GUI).
  • Data Manipulation Language (DML) – for retrieving and updating data, examples of such commands are SELECT, INSERT, UPDATE, DELETE, MERGE, UPSERT, CALL, EXPLAIN PLAN, LOCK TABLE
  • Data Control Language (DCL) – is used for giving and restricting access privileges to the database, these are commands like GRANT and REVOKE

 

Various Data Types

A data type is a constraint on what kind of data can be entered.  The most simple data types are integer, character, date, timestamp and money / smallmoney.  These are probably the ones you’ll see most of the time.

Integers for example also have several data types – bigint, int, smallint, tinyint (0 to 255), bit (1 or 0).

There is also decimal and numeric types.  Money types, quite interestingly, have 4 decimal places for accuracy. For numbers with more decimal places there is float and real data types.

Non-unicode type of data is recorded as char, varchar, and text.  If you are using less than 8,000 characters for a field, use char as this is the smallest out of these.

Date and time data also has its limits.  You may be in a sense amused about what they call “limits” as I am puzzled to imagine for what practical application I’d need to go that far back in time or that far forward.

  • Datetime: from 1st January 1753 through 31st December 9999 with accuracy of 3/100ths of a second (3.33 ms)
  • Smalldatetimefrom 1st January 1900 through 6th June 2079 with accuracy of one minute

The benefit of selecting proper datatypes is saving your disc space.  It also prevents people from entering clearly invalid data.

User-Defined Data Types

Still not enough data types for you?  You can create user-defined datatypes.  Such datatypes are good when you want to enforce data integrity.  They are used when several tables must store the same type of data in a column and you must ensure that these columns have exactly the same data type, length, and null ability

Such user-defined data types can apply a name to a data type that is more descriptive of the types of values to be held in the object.

Must supply:

  • Name
  • System data type upon which new data type is based
  • Nullability (whether data type allows null values)

In the Management Studio such data types can be created by using the programmability option for the current database. Once that is selected, it’s quite intuitive to create a new one.

If using new query editor window, here is an example of a command to create your own data type.  After typing that, click Execute button to run the query and save your result.

CREATE TYPE empidt FROM char(9) NOT NULL;

 

Here is an example of actually putting that data type to use:

CREATE TABLE employee
(
emp_id empidt,
fname char(20) NOT NULL,
minit char(1),
lname char(30) NOT NULL,
job_id smallint DEFAULT 1
);

Creating a Database

The first step is to create the physical files that will hold the database.  The CREATE DATABASE statement must run in autocommit mode (the default transaction management mode).

Here is an example of a script that will do that.

Using New Query Editor Window:

CREATE DATABASE cust_sales

This one phrase is enough to create the database, but if you wish to be more specific by imposing your own defaults, you can use the example below.

CREATE DATABASE product
ON
( NAME = product_data,
FILENAME = ‘C:\Data\product_data.mdf’,
SIZE =5,
MAXSIZE = 15,
FILEGROWTH = 1
)
LOG ON
( NAME = product_log,
FILENAME = ‘C:\Data\product_log.ldf’,
SIZE = 2,
MAXSIZE = 5,
FILEGROWTH = 1
)

 

Log file is used to store changes made to the database.  This is a smart way to keep track of going on, especially with multiple users connected to that database.

There is the primary master database and possibly with several users in it.  Do not create any user objects in the master database because it contains the system tables.  The master database should be backed up whenever a user database is created, modified, or dropped.

What is a Master Database?

The purpose of a master database is to be the core system database to manage the SQL Server instance.  In SQL Server 2005, the Master database is the logical repository for the system objects residing in the sys schema.  In SQL Server 2000 and previous editions of SQL Server, the Master database physically stored all of the system objects.

 

What is a User Database? 

A user database is any database created by the user, such as Pubs.  As mentioned above, SQL Microsoft Server can store close to 33 thousand of such databases.

 

Deleting a Database

This command will remove the database and all of its contents. Here is the correct Syntax:

DROP DATABASE database name

 

For example, to delete the Pubs database use the following command:

DROP DATABASE pubs

 

You can also remove it via Management Studio, which is more user-friendly like any other Microsoft software.  Make sure your database is not in use when you are doing that.  There is a ‘close connection’ box in Management Studio for that.

 

Creating a Table

The purpose of a table is storage and manipulation of data.  It contains rows and columns.  Each row represents a unique record, and each column represents a field within the record.

In the Management Studio

  • Click on correct database (expand)
  • Right click on Tables
  • Click on New Table and the rest is intuitive

 

In the query window use the following code:

CREATE TABLE publishers
(
pub_id int NOT NULL,
pub_name varchar(40),
city varchar(20),
province char(2) DEFAULT ‘BC’,
postal_code char(6)
);

or something like this if you want to specify your primary key,

CREATE TABLE invoice
(
inv_number csid PRIMARY KEY,
cus_code char(10) NOT NULL,
inv_date datetime,
cus_initial char(10),
cus_areacode char(3),
cus_phone char(8),
cus_balance smallmoney
);

Keep in mind that column definitions appear in a comma-separated list enclosed in parentheses.  Also, order of the column definitions determines the left-to-right order of the columns in the table.  For each column you can select the data type, required data that cannot be null, and a default value if desired. 

 

Identity Property in SQL

This identity property is used with the table creation code to specify increments for a column.

A column can use the Identity(seed, increment) function to provide values for a primary key (usually).  It is ideal for a surrogate key.  Such column cannot contain null values, it has integer datatype and cannot be updated.  (This is an equivalent to MySQL’s AUTO_INCREMENT property.)

Seed is the initial value (ex. 1) and increment is exactly that. Usually such combination is (1,1).

In the Management Studio when creating your table:

  • Data Type – integer or decimal
  • Required data
  • Identity Specification
    • (Is Identity) – Yes
    • Identity Increment: 1
    • Identity Seed: 1000

 

If using script, here is an example of the code to create such a column with the initial value of 1000 and incremented by 1:

CREATE TABLE titles
(
title_id int IDENTITY(1000,1),
title varchar(200)
);

Setting Constraints (Rules for Table Data)

We set constraints on our data to provide data integrity on a table and individual columns by limiting the values that the user can enter.

Time for constraints to be specified is when creating the table (inside the CREATE TABLE statement) or when the table is modified (inside the ALTER TABLE statement).

Some of SQL constraints:

  • NOT NULL – cannot have NULL value, there absolutely must be something entered
  • UNIQUE – must be a unique value, it prevents duplicates from being entered
  • PRIMARY KEY – both  NOT NULL and UNIQUE
  • FOREIGN KEY – ensures that referential integrity of the data in one table matches values in another table
  • CHECK – ensures that the value in a column meets a specific condition set earlier
  • DEFAULT – if nothing entered, a default value is displayed

 

CREATE TABLE Persons
(
P_Id int NOT NULL UNIQUE,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)

 Or let’s say if you want to alter your table and introduce some constraint at a later point, you can use the command below. This will verify all existing data for violations.

ALTER TABLE Persons
ADD CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)

If for some reason you no longer require that constraint, you can drop it.

ALTER TABLE Persons
DROP CONSTRAINT uc_PersonID

 

ALTER TABLE customers
ADD PRIMARY KEY ( customer_id );

 

ALTER TABLE sales_order_line
ADD CONSTRAINT fk_sales_order_line_product FOREIGN KEY (product_id)
REFERENCES product
(product_id);

 

ALTER TABLE customers
ALTER COLUMN name varchar(50) NOT NULL;

ALTER TABLE customers
ADD CONSTRAINT default_country
DEFAULT ( ‘Canada’ ) FOR country;

ALTER TABLE products
ADD CONSTRAINT min_reorder
CHECK( reorder_level >= 0 );

 

Pubs Database for Learning Purposes

Most likely your SQL server software comes with this Pubs database for your training purposes.  A lot of what we do in class involves this database.  If unsure how to structure your query, it’s very handy to have a look at the database diagram.

sql server pubs database diagram

 

Basic commands – Select, Insert, Update, Delete

Important tip – before you do any of this, make sure you have the right database selected!  Got it? Well then, let’s move on to some SQL command.

Insertion, modification, and deletion are primary data management tasks that one should know in SQL.

What happens if we deliberately try to insert an error?

INSERT INTO province

VALUES (‘BC’, ‘British Colombia’);

 

Yes, in this case British Columbia is spelled incorrectly!

But let’s say there was already a ‘BC’ record with its p_code value.  What will happen?

The system will reject it because we can’t have a duplicate primary key.

 

What if we want to make a correction?

Let’s use the UPDATE function.

 

UPDATE province

SET name = ‘British Columbia’

WHERE p_code = ‘BC’;

 

This way we try to correct the spelling.  How man records would be updated in this case?  The answer is One or None.  If there was no entry with BC p_code, there would be nothing to update.

 

Let’s now consider a basic delete example.

 

DELETE FROM inventory

WHERE obsolete_marker =’Y’;

 

In this case we are working with an inventory table.  How many records would be removed? The ones that have ‘Y’ will be removed.

But be careful, if you forget to put ‘where’, you will delete your whole database! Don’t forget, otherwise you may forget the entire table instead of a single record.

 

Keep in mind the writing standards for queries.  Each clause should begin with a new line and so on.  Use single quotes.

 

Examples of Writing Standards:

 

CREATE TABLE Persons

(P-ID int,

LastName char(50),

FirstName char(15),

Address char(255),

City char(255)

);

INSERT INTO Persons

VALUES (4, ‘Nilsen’, ‘Johan’, ‘Bakken 2’, ‘Stavanger’);

 

It’s a good idea not to be sloppy and to watch how you do your formatting.  Example above shows how to insert a list of 4 items.

Select function is probably the one most widely used since this is what produces queries.

 

SELECT *

FROM Persons;

this would mean that all columns and all records would be selected!

If you want specific columns, please specify them.

 

ex.

SELECT DISTINCT stor_ID

FROM sales

 

This lists the unique store numbers, each store gets mentioned only once.

 

Calculated fields are also possible.  Let’s say you want monthly average, you just divide the year by 12.  A calculated column (or derived column) will have no header in the query.

 

What about a monetary value?

SELECT Cus_Name

FROM Customers

WHERE balance >$10000;

Do you see, there is no quotes on the monetary value?

 

Pay attention to SELECT syntax.

The order of the clauses cannot be changed.

Literal values are enclosed in quotes.

 

Arithmetic operations are allowed like + _ / and so on.

Comparison operators are also allowed. = < > != and so on

Logical operators – AND, OR, NOT

Aggregate operations – COUNT, SUM, AVG, MIN, MAX.  They can be applied to numeric and non-numeric fields.

Non-null values work for each functions, except for COUNT which will count all the records.

If you want to search for it, write WHERE price IS NOT NULL or WHERE price IS NULL.

 

You can use distinct function to avoid duplicates, like for example to list the unique store numbers found in the sales table.

SELECT DISTINCT stor_id
FROM sales

Calculated fields are also possible to work with. In the example below we calculate average monthly sales.

SELECT title_id,
title,
( ytd_sales / 12 )
FROM titles

 

WHERE clause has 5 basic search conditions.

5 basic search conditions to restrict retrieved rows:

  • Comparison – compare the values of one expression to the value of another expression
  • Range – test whether the value of an expression falls within a specified range of values
  • Set membership – test whether the value of an expression equals one of a set of values
  • Pattern match – test whether a string matches a specified pattern
  • Null – test whether a column has a NULL (unknown) value

 

Comparison Search Condition 

Here is an example of how to list all the titles with a type of business.

SELECT *
FROM titles
WHERE type = ‘business’

List all sales which have an order date of September 14 1994.  Note that there are two ways to write the date and that it is enclosed in single quotes.

SELECT *
FROM sales
WHERE ord_date= ‘ 1994-09-14’

SELECT *
FROM sales
WHERE ord_date= ‘ Sep 14 1994’

 

Simple compound comparison logical search is also possible, ex – show either by name or by ID.  Situations can be mixing OR and AND in various combinations.  Remember to use the brackets properly like in any mathematical expression.

 

This example lists all the titles which are business types or have a publisher id of 1389.

SELECT *
FROM titles
WHERE type = ‘business’
OR pub_id = ‘1389’

 

Pay attention to the use of brackets in this example:

SELECT *
FROM employee
WHERE (pub_id = ‘1389’
OR pub_id = ‘9999 ‘)
AND job_id = ’11 ‘

 

Special operators:

  • BETWEEN
  • Used to check whether attribute value is within a range
  • IN
  • Used to check whether attribute value matches any value within a value list
  • LIKE
  • Used to check whether attribute value matches given string pattern
  • IS NULL
  • Used to check whether attribute value is null

Sometimes you can use the greater / lesser symbols and AND to replace BETWEEN.  But the 2nd one is neater.
List all the sales with an order date between September 1, 1994 and September 30,1994. This is called the range search condition.

SELECT *
FROM sales
WHERE ord_date BETWEEN ‘Sep 1 1994’ AND ‘Sep 30 1994’

SELECT *
FROM sales
WHERE ( ord_date >= ‘Sep 1 1994’ AND
ord_date <= ‘Sep 30 1994’ )

 

Membership search condition – IN condition.

List the title id, title, and type from the titles table where the type is mod_cook or trad_cook.

SELECT title_id,
title,
type
FROM titles
WHERE type IN ( ‘mod_cook’, ‘trad_cook’ )

SELECT title_id,
title,
type
FROM titles
WHERE ( type = ‘mod_cook’ OR
type = ‘trad_cook’ )

 

Pattern Matching in SQL Server

Pattern matching Search Conditions can be used to search for all the books that have “computer” in the beginning of their title and something like that.  You need to use LIKE clause for that.

LIKE clause selects rows containing fields that match specified portions of character strings.  It is used with data types like char, varchar, text, datetime and smalldatetime data.

% Any string of zero or more characters (percent wildcard)
_ Any single character (underscore wildcard)
[] Any single character within the specified range Example: [a-f]
[^] Any single character not within the specified range
Example: [^a-f]

 

For example, here is a task to list all the books that begin with ‘Computer’ in their title using percent wildcard.

SELECT *
FROM titles
WHERE title LIKE ‘Computer%’

SELECT supplier_name
FROM suppliers
WHERE supplier_name LIKE 'Hew%';

Only interested in the start phrase. % wildcard indicates that we are not interested in the value of the remaining characters in the column

 

More Pattern Matching examples with LIKE
(on a column called ‘name’)

  • ‘%inger’ searches for every name that ends with ‘inger’
  • ‘_heryl’ searches for every six-letter name ending with ‘heryl’
  • ‘[CK]ars[eo]n’ will find ‘Carsen,’ ‘Karsen,’ ‘Carson,’ and ‘Karson’
  • ‘[M-Z]inger’ searches for all names ending with ”inger” that begin with any single letter from M to Z
  • ‘M[^c]%’ searches for all names beginning with ‘M’ that do not have ‘c’ as the second letter.

By default all records are sorted in PK sequence.  It is possible run a query and display items in certain order.

Null Search Condition

What if you want to search for entries where something is left unfilled? Like let’s say, which books are not priced yet? This is when you can use null search condition.

SELECT title_id,
title
FROM titles
WHERE price IS NULL

 

Not Null Search Condition

The same way we are interested to search for null condition, we may have an interest to search for not null. This can be achieved simply by writing Not Null instead of just Null.

SELECT title_id,
title,
price
FROM titles
WHERE price IS NOT NULL

ORDER BY Clause

You may notice by now that rows of an SQL query result table are not arranged in any particular order. What if you wish them to be arranged in a certain order?
Use the ORDER BY clause to ensure the results of a query are sorted
Column identifier(s) may be either:
Column name or
Combination of column names separated by commas

List the authors ID and names, ordering the result set in last name order (first example) or by first name within last name (second example).
SELECT au_id,
au_lname,
au_fname
FROM authors
ORDER BY au_lname
SELECT au_id,
au_lname,
au_fname
FROM authors
ORDER BY au_lname,fname

 

People complain sometimes that the query is not fast enough if they are not ordering by their PK. This is where a secondary key might help.

 

What if you want to display a nicer title than what the programmer actually called the field?  How to give nicer names to the query results without permanently changing the name of the column?

Renaming Columns

  • Column Headings default to Column Names, unless they are changed in the select list. Examples shown below.
  • SELECT ‘First Name’ = fname,
  • lname AS ‘Last Name’
  • FROM employee
  • Results
  • First Name Last Name
  • ————– ————-
  • Paolo Accorti
  • Pedro Afonso
  • Victoria Ashworth
  • SELECT [Author’s First Name] = fname,
  • lname AS [Author’s Last Name]
  • FROM employee

With square brackets you can easily include special characters

International Date Formats

 

You can also use conversion options to show something in a different format, like date or numbers.

Date, especially, has so many formats depending on the country!  Different countries show it in different ways – British, Italian, etc.

If you only need the year for example, use DATEPART function.

DATEADD adds number of days, months, etc. to get a new derived date.  For example, if you are giving clients 30 day terms you may want to do something like that.

 

DATEADD

  • Returns a new datetime value based on adding an interval to the specified date
  • Result is a datetime value equal to the date plus the number of date parts
  • If the date parameter is a smalldatetime value, the result is also a smalldatetime

EXAMPLE

  • SELECT title_id,
  • pubdate,
  • ‘NEW_DATE’=DATEADD ( DAY, 30, pubdate )
  • FROM titles
  • WHERE pubdate >=’MAY 1 2000′
  • Displays 3 columns:
  • title_id, pubdate and pubdate+30 days

 

DATEDIFF shows the difference.   Let’s say, the difference between publication date and today’s date.

Mathematical functions are also possible, like square root and trigonometry.  Rounding function also exists.

 

Concatenation creates one result from multiple character columns.

Substring can return part of the text.

It is a good idea to just pull up the raw data first to see what you are dealing with.  Are there already spaces in the phone number?

for example:

SELECT
‘Name’= au_fname+’ ‘+au_lname,
‘Phone’='(‘+substring(phone,1,3)+’) ‘ +substring(phone,5,8) ,state
FROM authors
ORDER BY name;

 

//this adds brackers around first 3 characters starting at position 1 and then appends 8 characters starting from the 5th position.

‘Phone’='(‘+substring(phone,1,3)+’) ‘ +substring(phone,5,8) ,state

select phone number format sql

 

RTRIM can be quite useful to take out empty spaces.

DATALENGTH function measures the length.

 

Joins in SQL

sql joins

 

A site for practice suggested by our instructor – http://www.w3schools.com/sql/default.asp

 

Transaction Control (TCL) statements are used to manage the changes made by DML statements. It allows statements to be grouped together into logical transactions.

  • COMMIT – save work done
  • SAVEPOINT – identify a point in a transaction to which you can later roll back
  • ROLLBACK – restore database to original since the last COMMIT
  • SET TRANSACTION – Change transaction options like isolation level and what rollback segment to use
individual marketing

Database Theory Notes

This material is taken from COMP1630 course in BCIT

Evolution of First File Systems

Prior to pre- computer systems, manual file systems were made up of collection of folders, each tagged and kept in cabinet or ledger (large books).  Easy ad-hoc queries were out of question.  Maybe some places still keep their data like that to this very day.

Each file folder had its contents logically related, but such manual systems were still cumbersome for large collections.  It was impossible to run a query fast and efficiently.

With introduction of computers things got slightly easier.  Data processing specialists would convert previous manual system into file structures.  They would write software to manage such data and designed application programs.  But still this resembled the manual system very much.

Evolution of file systems was brought on by an increase in file volume.  Since then each file used its own application program to store, retrieve, and modify data and each file was owned by a specific person or  department.

Perhaps at this point let’s define a few words so we don’t get confused later:

Data: this is an unprocessed fact like a birthday, a name, or a telephone number.  It has little meaning on its own unless it is connected logically to something else.

Field: this is a character or a group of characters that has a specific meaning. It defines and stores data.

Record: This is a logically connected set of fields that describes something like an employee, a business, or a building.  Le’ts say a file on a mechanic may contain his address, contact info, pay rate, and so on.

File: This is a set of related records  like let’s say workers in a specific factory.

But with the file system the main inefficiency was that separate departments had file systems developed for their own use.

 

Difference Between 3GL and 4GL

3GL – For a file system every task would require extensive programming in a 3GL language (third generation programming).  A programmer would need to specify a task and how to do it.  It doesn’t matter how simple the change would be, it would still require programming.  Ad hoc queries would be impossible because DP specialists would require some time to complete the task.

Disadvantages:

  • Lengthy development times
  • Difficult to get quick answers
  • Complex system administration
  • Lack of Security
  • Limited Data Sharing
  • Extensive programming

 

4GL – In modern databases 4GL is used instead (fourth generation language).  It allows users to specify what must be done without being concerned with the “how”.  This is called the nonprocedural data manipulation language and it allows to specify “what” without the “how”.

4GL has stronger security because it is possible to restrict access to certain parts of the database and to install password protection.  So in short 3GL is “what to do and how to do it” and 4GL is just “what to do”.  File systems used 3GL and as a result required a lot of programming and could not perform ad hoc queries.  Such system administration would be quite complex, thus making it difficult to make changes to existing structures. It could also result in inadequate security features.

 

Structural and Data Dependence

Let’s start with some definitions of what it means to have or not to have structure and data dependence.

Structural dependence: access to a file is dependent on its own structure

Structural independence: can change file structure without affecting data access

Data dependence: data access changes when data storage characteristics change

Data independence: data storage characteristics do not affect data access

 

Structural dependence means means that access to a file is dependent on its structure. For example, adding a person’s birthday field to the STUDENT file would require a program to read the record, transform it, create a new file structure, and repeat that for each record in the original file.  Whatever files work with that file also need to be modified and adjusted, otherwise they won’t work.  All these modifications are likely to produce bugs.

Structural Independence means that it is possible to make such changes in the file structure without affecting the ability to access data.  There is no need to re-design the whole file to make one change.

 

Now, what if we have to change a field from an integer to a decimal?  Changing such data characteristics would also require changes in programs that access that file.  If by changing the datatype we create disharmony with the rest of the files, we say that the file system exhibits data dependence.

Data independence means that if we change data storage characteristics without affecting the program’s ability to access the data.

 

Avoiding Data Redundancy

Data redundancy is when same information is stored in different places unnecessarily.  This results in data inconsistency and data anomalies.

Types of data anomalies:

  • Update anomalies – Occur when changes must be made to existing records in different places – could end up with different values
  • Insertion anomalies – Occur when entering new records – different values or not inserted in all affected files
  • Deletion anomalies – Occur when deleting records – not all occurrences deleted

Example of Redundancy – some teacher characteristics occur multiple times in a file. For example, the teacher named Maria Cordoza’s first name, last name, and initial occur three times.   This means if changes must be made for any given teacher, those changes must be made multiple times.  All it takes is one incorrect entry or one forgotten change to create data inconsistencies.

Redundant data are not a luxury you can afford in a data environment.

You also should keep data about specific entities in their own tables, so thus deleting one entity doesn’t result in deleting another.

 

Evolution of First Databases

Now that we saw all the disadvantages of the file systems, we may better understand how databases evolved to solve these shortcomings.

Problems inherent in file systems make using a database system desirable.  File systems use many separate and unrelated files.

  • Database uses l
  • ogically related data stored in a single logical data repository.

Are you still using a paper system to keep track of data? If so, you are living in 1970’s!  

1970’s was the time when Egdar Codd came up with the first database system to manage data efficiently.  It was like reducing a 5 page program to a single line.  

 

“There was this guy Ted Codd who had some kind of strange mathematical notation, but nobody took it very seriously.” (Don Chamberlin, coinventor of SQL)

How competitors appeared when IBM was snoozing:

The symposium convinced IBM to fund System R, a research project that built a
prototype of a relational database and that would eventually lead to the creation of SQL
and DB2. IBM, however, kept System R on the back burner for a number of crucial years.
The company had a vested interest in IMS, a reliable, high-end database system that had
come out in 1968. Unaware of the market potential of this research, IBM allowed its staff
to publish these papers publicly.
Among those reading these papers was Larry Ellison, who had just founded a small
company. Recruiting programmers from System R and the University of California, Ellison
was able to market the first SQL-based relational database in 1979, well before IBM. By
1983, the company had released a portable version of the database, grossed over
$5,000,000 annually, and changed its name to Oracle. Spurred on by competition, IBM
finally released SQL/DS, its first relational database, in 1980.
IBM has yet to catch up. By 2007, global sales of relational database management systems
rose to $18.8 billion. Oracle captured 48.6% of the market share, more than its two
closest competitors, IBM and Microsoft, combined.

 

Data-Modeling Skills

  • Most users lack the skill to properly design databases
  • Data-modeling skills are vital in the data design process
  • Good data modeling facilitates communication between the designer, user, and the developer

The Database System Environment

Five major parts of a database system that we can’t do without:

  • Hardware – all the physical devices
  • Software (OS, DBMS, Application and Utility SW) – requires 3 types (operating system software, DBMS software, application programs and utility software)
  • People (administrators, designers, analysts, programmers, users)
  • Procedures (instructions and rules)
  • Data

What is data?

We hear this word all the time, but let’s formally define it.  Data is a raw fact, not meaning much on its own.  When stored in a database, such data begins to make sense when processed.

Did you know that telecommunications companies such as Sprint and AT&T are known to have systems that keep data on trillions of phone calls, with new data being added to the system at speeds up to 70,000 calls per second?  They would not be able to keep track of all that without proper databases.

Examples of where databases are used:

  • computerized library systems
  • automated teller machines
  • flight reservation systems
  • computerized parts inventory systems
  •  

    What is the difference between Data and Information?

    Information is the result of processing raw data to reveal its meaning.  Data can be structured or unstructured.

    Data are the foundation of information, which is the bedrock of knowledge – meaning the body of information and facts about a specific subject.

    •  Data constitute the building blocks of information.
    •  Information is produced by processing data.
    •  Information is used to reveal the meaning of data.
    •  Accurate, relevant, and timely information is the key to good decision making.
    •  Good decision making is the key to organizational survival in a global environment.

    Information is displayed in diagram that shows relationships.  An example of such diagram could be Crow’s Foot notation.

    Such relational model was introduced by E. F. Codd in 1970.  Relation as a synonym for table.  Microsoft Access, for example, refers to it as dataset.

    What is a Database?

    Database is more of a logical concept rather than a physical concept.  That’s why we spend so much time learning about theory.

    There are single-user databases and multi-user databases.  Examples of multi-user would be workgroup and enterprise databases.

    Some more types to consider:

    • Centralized database: data located at a single site
    • Distributed database: data distributed across several different sites
    • Operational database: supports a company’s day-to-day operations (ex. transaction or production database)
    • Data warehouse: stores data used for tactical or strategic decisions or to store historical data

    Extensible Markup Language (XML) is used to represent data elements in textual format.

     

    Database Management Systems

    This is a computer program (interface) that allows to store, retrieve, and modify information in a database.

    Advantages of DBMS:

    • – improved data sharing
    • – improved data security
    • – better data integration
    • – minimized data inconsistency
    • – improved data access
    • – improved decision making
    • – increased end-user productivity

     

    DBMS Functions

    • Most DBMS functions are transparent to end users
    • Data dictionary management
    • DBMS stores definitions of data elements and relationships (metadata) in a data dictionary
    • Changes automatically recorded in the dictionary
    • DBMS provides data abstraction and removes structural and data dependency
    • Data storage management
    • DBMS creates and manages complex data storage structures
    • DBMS stores related data entry forms, screen definitions, report definitions, etc.
    • DBMS carries automatic Performance tuning: activities that make the database perform more efficiently
    • DBMS stores the database in multiple physical data files
    • Data transformation and presentation
    • DBMS transforms data entered to conform to required data structures
    • DBMS transforms physically retrieved data to conform to user’s logical expectations
    • Security management
    • DBMS creates a security system that enforces user security and data privacy
    • Security rules determine which users can access the database, which items can be accessed, etc.
    • Multiuser access control
    • DBMS uses sophisticated algorithms to ensure concurrent access does not affect integrity
    • Backup and recovery management
    • DBMS provides backup and data recovery to ensure data safety and integrity
    • Recovery management deals with recovery of database after a failure
    • Critical to preserving database’s integrity
    • Database access languages and application programming interfaces
    • DBMS provides access through a query language
    • Query language is a nonprocedural language
    • Structured Query Language (SQL) is the de facto query language
    • Standard supported by majority of DBMS vendors
    • Database communication interfaces
    • Current DBMSs accept end-user requests via multiple different network environments
    • Communications accomplished in several ways:
    • End users generate answers to queries by filling in screen forms through Web browser
    • DBMS automatically publishes predefined reports on a Web site
    • DBMS connects to third-party systems to distribute information via e-mail

    Disadvantages of database systems:

    • Increased costs
    • Management complexity
    • Maintaining currency (latest supported version)
    • Vendor dependence
    • Frequent upgrade/replacement cycles (and inherent cost)

    Why Database Design Is Important?

    • Database design focuses on design of database structures – Designer must identify database’s expected use
    • Well-designed database:
    • Facilitates effective & efficient data management
    • Generates accurate and valuable information
    • Poorly designed database leads to errors

     

     

     

    You’ve probably heard of Primary Key and Foreign Key.

    But what is composite key?

    It might take more than a single attribute to define functional dependence; that is, a key may be composed of more than one attribute. Such a multiattribute key is known as a composite key.

    • Composite key – A key which is formed from a combination of attributes
    • Candidate key –  Each Candidate Key can qualify as Primary Key.
    • PRIMARY key – One or more attributes which determine the value of each of the other attributes within the row and which is unique amongst the set of like attributes in the table
    • Key attribute – Any attribute that is part of a composite key
    • Superkey – Any key that uniquely identifies each row. Normally, too complex (with too many attributes) to be used as a Primary Key

     

     

    Logical Data Format – how humans see the data.

    Physical Data Format – how computers see the data.

     

    Nulls

    • An attribute for which an actual value has not been entered:
    • No data entry
    • Not permitted in primary key
    • Should be avoided in other attributes
    • Can represent
    • An unknown attribute value
    • A known, but missing, attribute value
    • A “not applicable” condition
    • Can create problems when functions such as COUNT, AVERAGE, and SUM are used
    • Can create logical problems when relational tables are linked

     

    • Controlled redundancy
    • Same Data may be stored in more than one place to provide links between entities (tables)
    • Makes the relational database work for 1:M relationships
    • Tables within the database share common attributes
    • Enables tables to be linked together
    • Redundancy exists only when there is unnecessary duplication of attribute values

     

    • Entity Integrity – All primary Key values are unique and no part may be null
    • Referential integrity – FK contains a value that refers to an existing valid tuple (row) in another relation