Tags: sql technicalNotes
These are my notes on using SQL and PostgresSQL?.
This will be a collection of notes on best practices and summaries of content found will be added along with references as I find them.
Extracts from reference : <https://data36.com/sql-best-practices-data-analysts/>
ORDER BY
and GROUP BY
as the column order can change down the line, and this point of failure will not be easy to troubleshoot in a longer query.SELECT *
, or atleast use the LIMIT
for rows greater than 5000 entries. This is almost never used in practice as it is inefficient especially for large databases.Installing postgres on ubuntu / debian
sudo apt-get install postgres postgresql-contrib
When installed, a superuser named postgres is automatically created. Use the following to login
sudo -i -u postgres
This will login using the postgres user, and then sql at the command line will be accessible. Note: the latter can be directly accessed using the command psql
to log into a created user and database. Therefore it is necessary to create a user first.
CREATE USER shrysr WITH PASSWORD abcd
Exiting the postgres user is done withy \q
.
Now directly login with the created user and to the specified database
psql -U shrysr -d postgres
SQL commands are now available at the command line.
pgadmin4 and sqlworkbench are admin tools to connect to databases. sqlworkbench will require the specific jdbc driver installed (example postgres
In general, pgdmin is less user friendly and has higher tendency to crash in general. There does not appear to be any major difference in terms of capability of the applications.
Licensing note of sql-workbench: Refer <http://sql-workbench.eu/manual/license.html> This explicitly mentions that the Governments of several countries and organisations related to them are not allowed to use this software. Canada is included in this list. I am surprised to see such a restriction and it is actually not mentioned anywhere in the data36 course. I would actually go to the extent of saying that the data36 courses are more suitable for beginners in data science and it is not truly suitable to even intermediate, let alone advanced users.
CREATE TABLE test(col1 TEXT, col2 INT);
Here test
is the table name, and col1
and col2
are the column names, with the datatype being specified as text and integer.
The types could be for example text, int, bool, date, timestamp, decimal. Once the datatype is specified, this has to be adhered to.
There are additional parameters which can be specified while creating columns. Example from <https://data36.com/create-table-sql/> :
CREATE TABLE test_results ( name TEXT, student_id INTEGER PRIMARY KEY, birth_date DATE, test_result DECIMAL NOT NULL, grade TEXT NOT NULL, passed BOOLEAN NOT NULL );
Inserting values into the created table:
INSERT INTO test VALUES ('hello', 001); INSERT INTO test VALUES ('olo', 002);
Tables can be assigned nicknames when referenced (for convenience, and for tables with similarly named columns). Syntax would be FROM tableX tx
, FROM tableY ty
. The nickname is placed after a space. Specific columns from each table is referenced using the dot operators, i.e tx.column_id
.
Similarly, columns can also assigned a nickname when called SELECT student_id as "Stu ID"
. Note that if there are no spaces in the desired name then the double quote is not required.
The available tables can be viewed using
SHOW TABLE
If any changes to the table or data is being made via pgadmin4 or workbench, then the commit command has to be used to have the changes applied or formally published to the database. There are options to set an 'autocommit' in both pgadmin4 and sqlworkbench.
COMMIT;
After creating a table with the necessary headers and data types specified, the following command can be used to copy data from a csv/excel/txt file into the table.
COPY accidents FROM '/home/shrysr/sql_practice/accidents.csv' WITH CSV HEADER DELIMITER ',';
Here 'accidents' is the name of a created table, the path is the location of hte csv file, and the delimiter being commas is specified.
The copy command cannot be used without superuser privileges.
The user can be converted to a superuser after logging in as the superuser and converting the desired user into a superuser. Alternately, /copy
can be used in place of copy
if the user is not a superuser.
Converting the user to a superuser is done by logging in as super user:
sudo -u postgres -i
And then, on psql - use the sql command:
ALTER USER shrysr WITH superuser;
Alternately, it may be necessary to populate a new table using the query results from another table. Then the INSERT INTO
command is used, and the SQL query within parenthesis.
INSERT INTO accidents (SELECT name FROM test_results);
TRUNCATE TABLE accidents;
The truncate command combined with the table keyword will empty the table accidents of all the rows, but retain the table itself.
The DROP
command can be used to delete a table.
DROP TABLE accidents;
\dt
\q
DISTINCT
is a method of selection and is not a function that transforms the data, like count or sum.
Max function can be run on only one column at a time and the column has to be specified.
The main SELECT
function cannot be used as a calculation for a subsequent subquery. Each subquery will need it's own calculation as shown below.
SELECT damage FROM accidents WHERE damage > 0 ORDER BY damage LIMIT 2 OFFSET (SELECT COUNT(damage) / 2 FROM accidents WHERE damage > 0);
Refer to Ex 29 and 30 (7 da SQL course) in counting the number of insurance paid entries. Using a count just counts all the entries. And using a sum on a boolean throws up an error.
Joins can be a basic join, full join, left join and right join.