People may want to learn SQL for different reasons. Some are computer programmers, some are entrepreneurs, and others may be even auditors who need to know what’s going on with calculations in order to catch mistakes.
That’s true, turns out that if an auditor from Revenue Canada or IRS shows up at your office and you tell them that your computer system calculates everything, they could ask you to see the code for the SQL query!
SQL and its advantages:
- Non-procedural language – used to specify what information required rather than how to get it (4GL)
- Free-format – statements can be typed at any locations on the screen
- Used for data manipulation, data definition and data administration
Getting Started with SQL and Database Design
In an article previously published, we discussed some benefits of proper database design. It’s important to have your plan perfected before starting to use any software to make it happen.
If you are also learning SQL, you may find these materials from our lecture room helpful. For our class we used SQL Server 2012 RDBMS platform. Did you know that such SQL server can hold a maximum of 32,767 databases? For home I am downloading SQL Server 2008 R2 Express software from my school, BCIT.
Entities translate into tables, attributes into columns. RDBMS automatically indexes primary keys and foreign keys. There are two types of tables – user tables (filled with information from the database) and system tables (contains database description).
Beside tables with data we can store other objects, such as views, indexes, stored procedures, and triggers. Because SQL language allows for that, it makes simple and complex queries possible.
Using Proper Syntax
For writing commands we need to use reserved words without any spelling errors and also user-defined words. It is crucial to follow the syntax rules in order to formulate some working SQL statements, like for example using single quotes for enclosing values. But on the lighter note, all commands are case-insensitive. For naming attributes you can even use characters like @, #, and _ (underscore). However , it is recommended to keep it consistent and to use upper case letters to represent reserved words and lower case letters to represent user-defined words.
Keep in mind that certain symbols at the beginning of an identifier have special meaning. Table names must follow the rules for naming identifiers and the name itself must be unique in the database.
- For a local variable or parameter the identifier begins with @
- DO NOT begin identifiers with @@ because it is used in SQL function names
Each SQL statement (not line) should be ended with a semi-colon – it signifies the end of the statement.
Difference between DDL, DML and DCL commands
- Data Definition Language (DDL) – for defining database structure and controlling access to the data, examples of such commands are CREATE, ALTER, DROP, TRUNCATE, COMMENT, RENAME. There are two ways of implementing DDL – doing it by script and doing it in the Management Studio (GUI).
- Data Manipulation Language (DML) – for retrieving and updating data, examples of such commands are SELECT, INSERT, UPDATE, DELETE, MERGE, UPSERT, CALL, EXPLAIN PLAN, LOCK TABLE
- Data Control Language (DCL) – is used for giving and restricting access privileges to the database, these are commands like GRANT and REVOKE
Various Data Types
A data type is a constraint on what kind of data can be entered. The most simple data types are integer, character, date, timestamp and money / smallmoney. These are probably the ones you’ll see most of the time.
Integers for example also have several data types – bigint, int, smallint, tinyint (0 to 255), bit (1 or 0).
There is also decimal and numeric types. Money types, quite interestingly, have 4 decimal places for accuracy. For numbers with more decimal places there is float and real data types.
Non-unicode type of data is recorded as char, varchar, and text. If you are using less than 8,000 characters for a field, use char as this is the smallest out of these.
Date and time data also has its limits. You may be in a sense amused about what they call “limits” as I am puzzled to imagine for what practical application I’d need to go that far back in time or that far forward.
- Datetime: from 1st January 1753 through 31st December 9999 with accuracy of 3/100ths of a second (3.33 ms)
- Smalldatetime: from 1st January 1900 through 6th June 2079 with accuracy of one minute
The benefit of selecting proper datatypes is saving your disc space. It also prevents people from entering clearly invalid data.
User-Defined Data Types
Still not enough data types for you? You can create user-defined datatypes. Such datatypes are good when you want to enforce data integrity. They are used when several tables must store the same type of data in a column and you must ensure that these columns have exactly the same data type, length, and null ability
Such user-defined data types can apply a name to a data type that is more descriptive of the types of values to be held in the object.
- System data type upon which new data type is based
- Nullability (whether data type allows null values)
In the Management Studio such data types can be created by using the programmability option for the current database. Once that is selected, it’s quite intuitive to create a new one.
If using new query editor window, here is an example of a command to create your own data type. After typing that, click Execute button to run the query and save your result.
CREATE TYPE empidt FROM char(9) NOT NULL;
Here is an example of actually putting that data type to use:
CREATE TABLE employee
fname char(20) NOT NULL,
lname char(30) NOT NULL,
job_id smallint DEFAULT 1
Creating a Database
The first step is to create the physical files that will hold the database. The CREATE DATABASE statement must run in autocommit mode (the default transaction management mode).
Here is an example of a script that will do that.
Using New Query Editor Window:
CREATE DATABASE cust_sales
This one phrase is enough to create the database, but if you wish to be more specific by imposing your own defaults, you can use the example below.
CREATE DATABASE product
( NAME = product_data,
FILENAME = ‘C:\Data\product_data.mdf’,
MAXSIZE = 15,
FILEGROWTH = 1
( NAME = product_log,
FILENAME = ‘C:\Data\product_log.ldf’,
SIZE = 2,
MAXSIZE = 5,
FILEGROWTH = 1
Log file is used to store changes made to the database. This is a smart way to keep track of going on, especially with multiple users connected to that database.
There is the primary master database and possibly with several users in it. Do not create any user objects in the master database because it contains the system tables. The master database should be backed up whenever a user database is created, modified, or dropped.
What is a Master Database?
The purpose of a master database is to be the core system database to manage the SQL Server instance. In SQL Server 2005, the Master database is the logical repository for the system objects residing in the sys schema. In SQL Server 2000 and previous editions of SQL Server, the Master database physically stored all of the system objects.
What is a User Database?
A user database is any database created by the user, such as Pubs. As mentioned above, SQL Microsoft Server can store close to 33 thousand of such databases.
Deleting a Database
This command will remove the database and all of its contents. Here is the correct Syntax:
DROP DATABASE database name
For example, to delete the Pubs database use the following command:
DROP DATABASE pubs
You can also remove it via Management Studio, which is more user-friendly like any other Microsoft software. Make sure your database is not in use when you are doing that. There is a ‘close connection’ box in Management Studio for that.
Creating a Table
The purpose of a table is storage and manipulation of data. It contains rows and columns. Each row represents a unique record, and each column represents a field within the record.
In the Management Studio
- Click on correct database (expand)
- Right click on Tables
- Click on New Table and the rest is intuitive
In the query window use the following code:
CREATE TABLE publishers
pub_id int NOT NULL,
province char(2) DEFAULT ‘BC’,
or something like this if you want to specify your primary key,
CREATE TABLE invoice
inv_number csid PRIMARY KEY,
cus_code char(10) NOT NULL,
Keep in mind that column definitions appear in a comma-separated list enclosed in parentheses. Also, order of the column definitions determines the left-to-right order of the columns in the table. For each column you can select the data type, required data that cannot be null, and a default value if desired.
Identity Property in SQL
This identity property is used with the table creation code to specify increments for a column.
A column can use the
Identity(seed, increment) function to provide values for a primary key (usually). It is ideal for a surrogate key. Such column cannot contain null values, it has integer datatype and cannot be updated. (This is an equivalent to MySQL’s AUTO_INCREMENT property.)
Seed is the initial value (ex. 1) and increment is exactly that. Usually such combination is (1,1).
In the Management Studio when creating your table:
- Data Type – integer or decimal
- Required data
- Identity Specification
- (Is Identity) – Yes
- Identity Increment: 1
- Identity Seed: 1000
If using script, here is an example of the code to create such a column with the initial value of 1000 and incremented by 1:
CREATE TABLE titles
title_id int IDENTITY(1000,1),
Setting Constraints (Rules for Table Data)
We set constraints on our data to provide data integrity on a table and individual columns by limiting the values that the user can enter.
Time for constraints to be specified is when creating the table (inside the CREATE TABLE statement) or when the table is modified (inside the ALTER TABLE statement).
Some of SQL constraints:
- NOT NULL – cannot have NULL value, there absolutely must be something entered
- UNIQUE – must be a unique value, it prevents duplicates from being entered
- PRIMARY KEY – both NOT NULL and UNIQUE
- FOREIGN KEY – ensures that referential integrity of the data in one table matches values in another table
- CHECK – ensures that the value in a column meets a specific condition set earlier
- DEFAULT – if nothing entered, a default value is displayed
CREATE TABLE Persons
P_Id int NOT NULL UNIQUE,
LastName varchar(255) NOT NULL,
Or let’s say if you want to alter your table and introduce some constraint at a later point, you can use the command below. This will verify all existing data for violations.
ALTER TABLE Persons
ADD CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)
If for some reason you no longer require that constraint, you can drop it.
ALTER TABLE Persons
DROP CONSTRAINT uc_PersonID
ALTER TABLE customers
ADD PRIMARY KEY ( customer_id );
ALTER TABLE sales_order_line
ADD CONSTRAINT fk_sales_order_line_product FOREIGN KEY (product_id)
ALTER TABLE customers
ALTER COLUMN name varchar(50) NOT NULL;
ALTER TABLE customers
ADD CONSTRAINT default_country
DEFAULT ( ‘Canada’ ) FOR country;
ALTER TABLE products
ADD CONSTRAINT min_reorder
CHECK( reorder_level >= 0 );
Pubs Database for Learning Purposes
Most likely your SQL server software comes with this Pubs database for your training purposes. A lot of what we do in class involves this database. If unsure how to structure your query, it’s very handy to have a look at the database diagram.
Basic commands – Select, Insert, Update, Delete
Important tip – before you do any of this, make sure you have the right database selected! Got it? Well then, let’s move on to some SQL command.
Insertion, modification, and deletion are primary data management tasks that one should know in SQL.
What happens if we deliberately try to insert an error?
INSERT INTO province
VALUES (‘BC’, ‘British Colombia’);
Yes, in this case British Columbia is spelled incorrectly!
But let’s say there was already a ‘BC’ record with its p_code value. What will happen?
The system will reject it because we can’t have a duplicate primary key.
What if we want to make a correction?
Let’s use the UPDATE function.
SET name = ‘British Columbia’
WHERE p_code = ‘BC’;
This way we try to correct the spelling. How man records would be updated in this case? The answer is One or None. If there was no entry with BC p_code, there would be nothing to update.
Let’s now consider a basic delete example.
DELETE FROM inventory
WHERE obsolete_marker =’Y’;
In this case we are working with an inventory table. How many records would be removed? The ones that have ‘Y’ will be removed.
But be careful, if you forget to put ‘where’, you will delete your whole database! Don’t forget, otherwise you may forget the entire table instead of a single record.
Keep in mind the writing standards for queries. Each clause should begin with a new line and so on. Use single quotes.
Examples of Writing Standards:
CREATE TABLE Persons
INSERT INTO Persons
VALUES (4, ‘Nilsen’, ‘Johan’, ‘Bakken 2’, ‘Stavanger’);
It’s a good idea not to be sloppy and to watch how you do your formatting. Example above shows how to insert a list of 4 items.
Select function is probably the one most widely used since this is what produces queries.
this would mean that all columns and all records would be selected!
If you want specific columns, please specify them.
SELECT DISTINCT stor_ID
This lists the unique store numbers, each store gets mentioned only once.
Calculated fields are also possible. Let’s say you want monthly average, you just divide the year by 12. A calculated column (or derived column) will have no header in the query.
What about a monetary value?
WHERE balance >$10000;
Do you see, there is no quotes on the monetary value?
Pay attention to SELECT syntax.
The order of the clauses cannot be changed.
Literal values are enclosed in quotes.
Arithmetic operations are allowed like + _ / and so on.
Comparison operators are also allowed. = < > != and so on
Logical operators – AND, OR, NOT
Aggregate operations – COUNT, SUM, AVG, MIN, MAX. They can be applied to numeric and non-numeric fields.
Non-null values work for each functions, except for COUNT which will count all the records.
If you want to search for it, write WHERE price IS NOT NULL or WHERE price IS NULL.
You can use distinct function to avoid duplicates, like for example to list the unique store numbers found in the sales table.
SELECT DISTINCT stor_id
Calculated fields are also possible to work with. In the example below we calculate average monthly sales.
( ytd_sales / 12 )
WHERE clause has 5 basic search conditions.
5 basic search conditions to restrict retrieved rows:
- Comparison – compare the values of one expression to the value of another expression
- Range – test whether the value of an expression falls within a specified range of values
- Set membership – test whether the value of an expression equals one of a set of values
- Pattern match – test whether a string matches a specified pattern
- Null – test whether a column has a NULL (unknown) value
Comparison Search Condition
Here is an example of how to list all the titles with a type of business.
WHERE type = ‘business’
List all sales which have an order date of September 14 1994. Note that there are two ways to write the date and that it is enclosed in single quotes.
WHERE ord_date= ‘ 1994-09-14’
WHERE ord_date= ‘ Sep 14 1994’
Simple compound comparison logical search is also possible, ex – show either by name or by ID. Situations can be mixing OR and AND in various combinations. Remember to use the brackets properly like in any mathematical expression.
This example lists all the titles which are business types or have a publisher id of 1389.
WHERE type = ‘business’
OR pub_id = ‘1389’
Pay attention to the use of brackets in this example:
WHERE (pub_id = ‘1389’
OR pub_id = ‘9999 ‘)
AND job_id = ’11 ‘
- Used to check whether attribute value is within a range
- Used to check whether attribute value matches any value within a value list
- Used to check whether attribute value matches given string pattern
- IS NULL
- Used to check whether attribute value is null
Sometimes you can use the greater / lesser symbols and AND to replace BETWEEN. But the 2nd one is neater.
List all the sales with an order date between September 1, 1994 and September 30,1994. This is called the range search condition.
WHERE ord_date BETWEEN ‘Sep 1 1994’ AND ‘Sep 30 1994’
WHERE ( ord_date >= ‘Sep 1 1994’ AND
ord_date <= ‘Sep 30 1994’ )
Membership search condition – IN condition.
List the title id, title, and type from the titles table where the type is mod_cook or trad_cook.
WHERE type IN ( ‘mod_cook’, ‘trad_cook’ )
WHERE ( type = ‘mod_cook’ OR
type = ‘trad_cook’ )
Pattern Matching in SQL Server
Pattern matching Search Conditions can be used to search for all the books that have “computer” in the beginning of their title and something like that. You need to use LIKE clause for that.
LIKE clause selects rows containing fields that match specified portions of character strings. It is used with data types like char, varchar, text, datetime and smalldatetime data.
% Any string of zero or more characters (percent wildcard)
_ Any single character (underscore wildcard)
 Any single character within the specified range Example: [a-f]
[^] Any single character not within the specified range
For example, here is a task to list all the books that begin with ‘Computer’ in their title using percent wildcard.
WHERE title LIKE ‘Computer%’
SELECT supplier_name FROM suppliers WHERE supplier_name LIKE 'Hew%';
Only interested in the start phrase. % wildcard indicates that we are not interested in the value of the remaining characters in the column
More Pattern Matching examples with LIKE
(on a column called ‘name’)
- ‘%inger’ searches for every name that ends with ‘inger’
- ‘_heryl’ searches for every six-letter name ending with ‘heryl’
- ‘[CK]ars[eo]n’ will find ‘Carsen,’ ‘Karsen,’ ‘Carson,’ and ‘Karson’
- ‘[M-Z]inger’ searches for all names ending with ”inger” that begin with any single letter from M to Z
- ‘M[^c]%’ searches for all names beginning with ‘M’ that do not have ‘c’ as the second letter.
By default all records are sorted in PK sequence. It is possible run a query and display items in certain order.
Null Search Condition
What if you want to search for entries where something is left unfilled? Like let’s say, which books are not priced yet? This is when you can use null search condition.
WHERE price IS NULL
Not Null Search Condition
The same way we are interested to search for null condition, we may have an interest to search for not null. This can be achieved simply by writing Not Null instead of just Null.
WHERE price IS NOT NULL
ORDER BY Clause
You may notice by now that rows of an SQL query result table are not arranged in any particular order. What if you wish them to be arranged in a certain order?
Use the ORDER BY clause to ensure the results of a query are sorted
Column identifier(s) may be either:
Column name or
Combination of column names separated by commas
List the authors ID and names, ordering the result set in last name order (first example) or by first name within last name (second example).
ORDER BY au_lname
ORDER BY au_lname,fname
People complain sometimes that the query is not fast enough if they are not ordering by their PK. This is where a secondary key might help.
What if you want to display a nicer title than what the programmer actually called the field? How to give nicer names to the query results without permanently changing the name of the column?
- Column Headings default to Column Names, unless they are changed in the select list. Examples shown below.
- SELECT ‘First Name’ = fname,
- lname AS ‘Last Name’
- FROM employee
- First Name Last Name
- ————– ————-
- Paolo Accorti
- Pedro Afonso
- Victoria Ashworth
- SELECT [Author’s First Name] = fname,
- lname AS [Author’s Last Name]
- FROM employee
With square brackets you can easily include special characters
You can also use conversion options to show something in a different format, like date or numbers.
Date, especially, has so many formats depending on the country! Different countries show it in different ways – British, Italian, etc.
If you only need the year for example, use DATEPART function.
DATEADD adds number of days, months, etc. to get a new derived date. For example, if you are giving clients 30 day terms you may want to do something like that.
- Returns a new datetime value based on adding an interval to the specified date
- Result is a datetime value equal to the date plus the number of date parts
- If the date parameter is a smalldatetime value, the result is also a smalldatetime
- SELECT title_id,
- ‘NEW_DATE’=DATEADD ( DAY, 30, pubdate )
- FROM titles
- WHERE pubdate >=’MAY 1 2000′
- Displays 3 columns:
- title_id, pubdate and pubdate+30 days
DATEDIFF shows the difference. Let’s say, the difference between publication date and today’s date.
Mathematical functions are also possible, like square root and trigonometry. Rounding function also exists.
Concatenation creates one result from multiple character columns.
Substring can return part of the text.
It is a good idea to just pull up the raw data first to see what you are dealing with. Are there already spaces in the phone number?
‘Name’= au_fname+’ ‘+au_lname,
‘Phone’='(‘+substring(phone,1,3)+’) ‘ +substring(phone,5,8) ,state
ORDER BY name;
//this adds brackers around first 3 characters starting at position 1 and then appends 8 characters starting from the 5th position.
‘Phone’='(‘+substring(phone,1,3)+’) ‘ +substring(phone,5,8) ,state
RTRIM can be quite useful to take out empty spaces.
DATALENGTH function measures the length.
Joins in SQL
A site for practice suggested by our instructor – http://www.w3schools.com/sql/default.asp
Transaction Control (TCL) statements are used to manage the changes made by DML statements. It allows statements to be grouped together into logical transactions.
- COMMIT – save work done
- SAVEPOINT – identify a point in a transaction to which you can later roll back
- ROLLBACK – restore database to original since the last COMMIT
- SET TRANSACTION – Change transaction options like isolation level and what rollback segment to use