Lesson 4: Data manipulation and extraction

Data manipulation and extraction

Learning objectives

Selecting data

Select statement

Retrieves data from the table for the client

Can calculate and use functions

Be careful of the overuse of functions in OLTP use cases where it may be better to add a column such as a virtual column that contains calculations already rendered

SELECT CONCAT(name_first, SPACE(1), name_last) AS 'Name',

(DAYOFYEAR(birthdate) - DAYOFYEAR(CURDATE())) / 12 AS 'Months to Birthday'

FROM clients

LIMIT 1;

An Example of Retrieving Data, Calculating, and Using Functions with a SELECT Statement

Where clause

Used with SELECT, UPDATE, and DELETE Statements

Columns with Operators can be Specified to Filter Results

SELECT Name, Population
FROM City
WHERE Population > 1000000;

+-------------+------------+
| Name        | Population |
+-------------+------------+
| Kabul       | 1780000    |
| Alger       | 2168000    |
| Luanda      | 2022000    |
| Buenos Aires| 2982146    |
| La Matanza  | 1266461    |
| ...         |            |

Basic Example for SELECT Statement using WHERE Clause

Operators

The WHERE Clause Allows For Several Operators

ArithmeticComparison
-AND, &&
+OR, ||
*XOR
/>=
%>

Limit clause

Used to limit the offset and number of rows returned

Values must be positive integer constants

SELECT Name, District
FROM City
WHERE CountryCode = 'FIN'
LIMIT 2, 3;
+-------------------+-----------+
| Name              | District  |
+-------------------+-----------+
| Helsinki          | Newmaa    |
| Espoo             | Newmaa    |
| Tampere           | Pirkanmaa |
+-------------------+-----------+

Order by and limit clause

Data can be Ordered and Broken into Blocks

SELECT cols FROM table WHERE clause ORDER BY col LIMIT offset, count;

Basic Syntax Example for SELECT Statement with ORDER BY Clause

SELECT Name, Population FROM City WHERE Population > 1000000 ORDER BY Population LIMIT 0, 3;

NamePopulation
Zapopan1,012,563
Napoli1,002,079
Perm1,070,162

Ascending and descending order

Add the ASC or DESC Options to Order Results in Ascending or Descending Order

Multiple Columns Allowed

No descending indexes however B+ Tree indexes can be used to optimize sort or range based queries

SELECT Name, District FROM City
WHERE CountryCode = 'FIN'
ORDER BY District DESC, Name ASC;


| Name           | District         |
|----------------|------------------|
| Turku [Åbo]    | Varsinais-Suomi  |
| Lahti          | Päijät-Häme      |
| Oulu           | Pohjois-Pohjanmaa|
| Tampere        | Pirkanmaa        |
| Espoo          | Newmaa           |
| Helsinki [Helsingfors]| Newmaa     |
| Vantaa         | Newmaa           |

Group by clause

Useful with Aggregate Functions to Create Sub-Groups

SELECT Continent, SUM(Population)
FROM Country
GROUP BY Continent;

+---------------+------------------+
| Continent     | SUM(Population)  |
+---------------+------------------+
| Asia          |    3750025700    |
| Europe        |     730074600    |
| North America |     482993000    |
| Africa        |     784475000    |
| Oceania       |      30401150    |
| Antarctica    |              0    |
| South America |     345780000    |
+---------------+------------------+

Select into outfile statement

Exports Results to a File on Server
File is Saved to datadir by Default
Fails if Another File with Same Name
FILE Privilege Required
One Record per Line
Fields Separated by Tabs (Ascii #9\t) by Default

SELECT INTO OUTFILE '/path/City.txt'
FROM City;

Basic Example of SELECT INTO OUTFILE Syntax

SELECT INTO OUTFILE '/path/City.csv'
FROM City
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\r\n'
STARTING BY 'Record: ';

Examples of More Precise Format Options

Inserting and loading data

Single versus multi-row insert

Single Row INSERT Repeats Full Write Process for Each

Multi-Row INSERT Minimizes Overhead

INSERT INTO table1 VALUES ('test',1,2);

INSERT INTO table1 VALUES ('test',1,2);

INSERT INTO table1 VALUES ('test',1,2), ('test',3,4);

Write Process

Multi-row insert caveats

Multi-Row INSERT Statements are Transactions

Size is Set by max_allowed_packet Variable

Faster Loading of Large Data Sets

Insert…select statement

CREATE TABLE employees_personal LIKE employees;

INSERT INTO employees_personal
(emp_id, gender, birth_date)
SELECT emp_id, gender, birth_date
FROM employees;

Duplicate keys statement

DUPLICATE KEY does a select then decides on insert versus update

REPLACE does a delete then an insert

Existing Rows are Updated not Duplicated when using ON DUPLICATE KEY UPDATE

Duplicates are Rows with same PRIMARY or UNIQUE Key Values

Only One is Updated even if Many Exist

INSERT INTO parts (part_code, part)
VALUES ('bt-45','45mm Bolt')
ON DUPLICATE KEY 
UPDATE part_code = part_code + '-1';
UPDATE parts 
SET part_code = part_code + '-1'
WHERE part_code = 'bt-45';
INSERT INTO parts (part_code, part)
VALUES ('bt-45','45mm Bolt');

This DUPLICATE KEYS statement is the equivalent of these statements ====>

Load data infile statement

Imports Data Directly from File to Server

Put File in Data Directory or give

Full Path

FILE Privilege Required

Use LOAD DATA LOCAL INFILE for File on Client Instead of Server

LOAD DATA INFILE '/tmp/City.csv'
INTO TABLE City
FIELDS
    TERMINATED BY ','
    ENCLOSED BY '"'
    ESCAPED BY '\\'
LINES TERMINATED BY '\r\n'
STARTING BY 'Record: ';

Importing SQL files

Import SQL Text Files using mysql Client from Command-Line

Fast Option and No Screen Output

Aborts on Errors by Default

# mysql -u devuser -p world < /path/to/world.sql

Changing and deleting data

Update statement

Use SET Clause to Change Column Values

Several Optional Clauses (i.e., JOIN, WHERE, ORDER BY, LIMIT)

All Rows Changed Without WHERE or LIMIT Clause

ORDER BY and LIMIT Are Not Compatible with JOIN

UPDATE table1
JOIN table2 USING(id)
SET table1.col1 = 'expression'
WHERE table2.field4 = 'whatever';

Basic Syntax Example for UPDATE Statement

UPDATE City
SET Population = Population + 1
WHERE CountryCode = 'FIN';

Query OK, 7 rows affected (0.00 sec)
Rows matched: 7 Changed: 7 Warnings: 0

Replace statement

REPLACE can be used instead of INSERT (MariaDB Extension to SQL)

Deletes Duplicates in Table

Inserts New Rows

Duplicates are Rows with Same Value for PRIMARY or UNIQUE Key

REPLACE INTO table (columns) VALUES (values);

Basic Syntax Example for REPLACE Statement

Delete statement

Specify Table or Tables from which to Delete Rows

Several Optional Clauses (i.e., JOIN, WHERE, ORDER BY, LIMIT)

All Rows Changed Without WHERE or LIMIT Clause

ORDER BY and LIMIT Are Not Compatible with JOIN

DELETE FROM table1
JOIN table2 USING(id)
WHERE condition;

Basic Syntax Example for DELETE Statement

DELETE FROM City
WHERE CountryCode = 'FIN';

Query OK, 7 rows affected (0.02 sec)
Rows matched: 7  Changed: 7  Warnings: 0

Truncate statement

Empties a table completely

TRUNCATE tablename [WAIT n|NOWAIT];

The TRUNCATE <tablename> statement will fail for an InnoDB table if any FOREIGN KEY constraints from other tables reference the table, returning the following error

ERROR 1701 (42000): Cannot truncate a table referenced in a foreign key constraint

Faster than a DELETE statement because it drops and re-creates the table

Will return a row count of 0 so if you need to know how many rows were removed, use the DELETE statement

Drops all historical records from a system-versioned table

Joins

Joins overview

Joins combine records from two or more tables, using values common within each one

Rows from Table A are joined with rows from Table B

The result set has rows containing fields from both tables

Result may be returned to the client, placed into a new table, or used as an implicit table for further operations

INNER OUTER NATURAL
CROSS STRAIGHT_JOIN RIGHT
LEFT USING ON

Implicit join syntax

Join and Other WHERE Conditions Mingled

Without Constraints, the Result is a Full Cartesian Product

SELECT cols
FROM table, table2
WHERE join-condition
AND other-condition;

Basic Syntax Example for Joining Tables

SELECT City.Name, Country.Name, City.Population
FROM Country, City
WHERE CountryCode = Code
AND City.Population < 100000;

Practical Example for Joining Tables within WHERE Clause

Explicit join syntax

With a JOIN Clause the Join Conditions are Separate

Other WHERE Clause Conditions are Not Mixed with Join Points

SELECT cols
FROM table JOIN table2
ON join-condition
WHERE other-condition;

Basic Syntax Example for JOIN Clause

SELECT City.Name, Country.Name, City.Population
FROM Country
JOIN City ON CountryCode = Code
WHERE City.Population < 100000;

Practical Example for Joining Tables with JOIN Clause

Inner joins

SELECT cols FROM
table INNER JOIN
table2
ON join-condition
WHERE
other-conditions;

Outer joins

LEFT JOIN really means LEFT OUTER JOIN

An OUTER JOIN does not require matches in both tables

MariaDB does not support full outer join

SELECT cols FROM
table LEFT JOIN
table2
ON join-condition
WHERE
other-conditions;

Outer join example

SELECT cols FROM table LEFT JOIN table2 ON join-condition WHERE other-conditions;

SELECT Country.Name as Country, City.Name as Capital FROM Country LEFT JOIN City ON Capital = City.Id WHERE Country.Name LIKE 'B%';

+-------------------------------+--------------+
| Country                       | Capital      |
+-------------------------------+--------------+
| Bulgaria                      | Sofija       |
| Burkina Faso                  | Ouagadougou  |
| Burundi                       | Bujumbura    |
| Belarus                       | Minsk        |
| Bouvet Island                 | NULL         | <- no capital
| British Indian Ocean Territory| NULL         | <- no capital
+-------------------------------+--------------+

Natural joins

General tips

Cartesian products of table subsets, with variations

The optimizer does a lot for you such as:

Other performance considerations to consider:

General tips

Watch for ambiguous column references

Example: If two tables contain a Name field

Use full notation such as: tableA.Name = tableB.Name

Use table aliases to shorten and simplify queries

An alias is just an alternative name for a table, within this query

The AS keyword is optional

Makes for easier reading and easier typing!

Less work required when tables are renamed

Ensures column names are never ambiguous

SELECT ... FROM TableA AS a, TableB AS b
WHERE a.Name = b.Name;

General tips

Tables in joins are not necessarily processed in the same order they are listed in the query
MariaDB’s Optimizer will move them around for efficiency
This does not affect the result set from a relational point of view
Use STRAIGHT_JOIN to control table join order, but generally only for debugging! The optimizer is really very good at its job

Beware that cartesian products can grow very large
Particularly when more than two tables are involved

Use inner joins over outer joins, where possible
Unless outer is explicitly required by the application
As a rule the working and result sets are smaller for inner joins
Unexpected NULLs make applications fall over (ie, when data changes)

Avoid using comma joins in queries to attain readability, flexibility and portability

Subqueries

Subqueries

Queries within Other Queries Entered within Parentheses

SELECT Language FROM CountryLanguage
WHERE CountryCode = (SELECT Code FROM Country WHERE Name = 'Finland');

Scalar subquery

SELECT Country.Name, 100 * CountryPopulation / 
(SELECT SUM(Population) FROM Country) AS pct_world_pop 
FROM Country;

+---------------------+--------------+
| Name                | pct_world_pop|
+---------------------+--------------+
| Afghanistan         |      0.3738  |
| Netherlands         |      0.2610  |
| Netherlands Antilles|      0.0036  |
...

Row subqueries

Row subqueries return a single row

The subquery result set must be one row with 2+ cols

Use with equality operators =, <>, !=, <=>

Use with comparison operators <, >, >=, <=

Column order is important here!

SELECT ('London', 'GBR') = 
(SELECT Name, CountryCode 
FROM City WHERE ID = 456) AS is_london;

+-----------+
| is_london |
+-----------+
|         0 |
+-----------+

Table subquery result set

Table subqueries return an implicit table

The subquery result may contain zero or more rows

Can be used in a FROM clause with a table alias

Use with WHERE and IN, EXISTS, ANY, ALL or SOME

SELECT * FROM
(SELECT Code, Name
 FROM Country
 WHERE IndepYear IS NOT NULL)
 AS independent_countries;

+------+---------------+
| Code | Name          |
+------+---------------+
| AFG  | Afghanistan   |
| NLD  | Netherlands   |
...

Rewriting to joins

Why rewrite a perfectly good subquery into a join?

In some cases, joins still outperform subqueries
The MariaDB Optimizer is more mature for joins…though it is getting better all the time!

Non-correlated subqueries generally perform well
They do not reference the outer query and are executed just once
Can be executed alone

Correlated subqueries can lead to performance traps
They depend on the outer query and may be executed repeatedly
Cannot be executed alone

Rewriting to joins

IN subqueries can be an INNER JOIN
Same concept: An inner join only returns rows that match
May require use of DISTINCT

NOT IN subqueries can be an OUTER JOIN
LEFT JOIN and WHERE ... IS NULL

Sometimes, the equivalent join is much more complex
For example using subqueries in the FROM clause for easy aggregation
Or using subqueries used for accessing hierarchical data

Lesson summary

Lab exercises


For academic and non-commercial usage, licensed under CC BY-NC-SA 4.0 by MariaDB plc.