Shreyas Ragavan: Editing SQL

Tags: [[tag:sql]] [[tag:technicalNotes]] # Init These are my notes on using SQL and PostgresSQL. # TODO SQL Overview This will be a collection of notes on best practices and summaries of content found will be added along with references as I find them. Extracts from reference : <https://data36.com/sql-best-practices-data-analysts/> - Order in which the queries are processed 1. FROM 2. WHERE 3. GROUP BY 4. SELECT 5. ORDER BY 6. LIMIT - Spaces and line breaks do not affect the output of a query. However, it is good to stick to a convention throughout the program and use the minimal breaks necessary to enhance readability. - Always try to use the actual column name rather than the column number in `ORDER BY` and `GROUP BY` as the column order can change down the line, and this point of failure will not be easy to troubleshoot in a longer query. - Avoid `SELECT *` , or atleast use the `LIMIT` for rows greater than 5000 entries. This is almost never used in practice as it is inefficient especially for large databases. - Additional Reference [SQL Style Guide](https://www.sqlstyle.guide/) - Employ google data studio or tableau or other tools to visualise data. - Break down the query into smaller steps and go at it in portions. - After the join statements - it is helpful to view the table as a 'joined table, i.e all further where, group and order commands act on the joined table, however, it is good practice to have the columns specified as from a particular table. Only the select portions - Use the table-name.column-name format for specifying columns explicitly belonging to a particular table. # Notes on Postgres setup Installing postgres on ubuntu / debian sudo apt-get install postgres postgresql-contrib When installed, a superuser named postgres is automatically created. Use the following to login sudo -i -u postgres This will login using the postgres user, and then sql at the command line will be accessible. Note: the latter can be directly accessed using the command `psql` to log into a created user and database. Therefore it is necessary to create a user first. CREATE USER shrysr WITH PASSWORD abcd Exiting the postgres user is done withy `\q`. Now directly login with the created user and to the specified database psql -U shrysr -d postgres SQL commands are now available at the command line. pgadmin4 and sqlworkbench are admin tools to connect to databases. sqlworkbench will require the specific jdbc driver installed (example [postgres](https://jdbc.postgresql.org/download.html)) In general, pgdmin is less user friendly and has higher tendency to crash in general. There does not appear to be any major difference in terms of capability of the applications. Licensing note of sql-workbench: Refer <http://sql-workbench.eu/manual/license.html> This explicitly mentions that the Governments of several countries and organisations related to them are not allowed to use this software. Canada is included in this list. I am surprised to see such a restriction and it is actually not mentioned anywhere in the data36 course. I would actually go to the extent of saying that the data36 courses are more suitable for beginners in data science and it is not truly suitable to even intermediate, let alone advanced users. # Creating a table and adding content CREATE TABLE test(col1 TEXT, col2 INT); Here `test` is the table name, and `col1` and `col2` are the column names, with the datatype being specified as text and integer. The types could be for example text, int, bool, date, timestamp, decimal. Once the datatype is specified, this has to be adhered to. There are additional parameters which can be specified while creating columns. Example from <https://data36.com/create-table-sql/> : CREATE TABLE test_results ( name TEXT, student_id INTEGER PRIMARY KEY, birth_date DATE, test_result DECIMAL NOT NULL, grade TEXT NOT NULL, passed BOOLEAN NOT NULL ); Inserting values into the created table: INSERT INTO test VALUES ('hello', 001); INSERT INTO test VALUES ('olo', 002); Tables can be assigned nicknames when referenced (for convenience, and for tables with similarly named columns). Syntax would be `FROM tableX tx`, `FROM tableY ty`. The nickname is placed after a space. Specific columns from each table is referenced using the dot operators, i.e `tx.column_id`. Similarly, columns can also assigned a nickname when called `SELECT student_id as "Stu ID"`. Note that if there are no spaces in the desired name then the double quote is not required. The available tables can be viewed using SHOW TABLE # Commit if changes are made using pgadmin4 or sqlworkbench If any changes to the table or data is being made via pgadmin4 or workbench, then the commit command has to be used to have the changes applied or formally published to the database. There are options to set an 'autocommit' in both pgadmin4 and sqlworkbench. COMMIT; # Inserting data into a table from other sources After creating a table with the necessary headers and data types specified, the following command can be used to copy data from a csv/excel/txt file into the table. COPY accidents FROM '/home/shrysr/sql_practice/accidents.csv' WITH CSV HEADER DELIMITER ','; Here 'accidents' is the name of a created table, the path is the location of hte csv file, and the delimiter being commas is specified. The copy command cannot be used without superuser privileges. The user can be converted to a superuser after logging in as the superuser and converting the desired user into a superuser. Alternately, `/copy` can be used in place of `copy` if the user is not a superuser. Converting the user to a superuser is done by logging in as super user: sudo -u postgres -i And then, on psql - use the sql command: ALTER USER shrysr WITH superuser; Alternately, it may be necessary to populate a new table using the query results from another table. Then the `INSERT INTO` command is used, and the SQL query within parenthesis. INSERT INTO accidents (SELECT name FROM test_results); # Emptying table without deleting the table TRUNCATE TABLE accidents; The truncate command combined with the table keyword will empty the table accidents of all the rows, but retain the table itself. # Deleting a table : DROP The `DROP` command can be used to delete a table. DROP TABLE accidents; # Listing all the tables and relations `\dt` # Quit the psql interface : `\q` # Distinct is not a function `DISTINCT` is a method of selection and is not a function that transforms the data, like count or sum. # Max function Max function can be run on only one column at a time and the column has to be specified. # Example of median with subquery The main `SELECT` function cannot be used as a calculation for a subsequent subquery. Each subquery will need it's own calculation as shown below. SELECT damage FROM accidents WHERE damage > 0 ORDER BY damage LIMIT 2 OFFSET (SELECT COUNT(damage) / 2 FROM accidents WHERE damage > 0); # Appears not possible to sum up a boolean datatype in SQL Refer to Ex 29 and 30 (7 da SQL course) in counting the number of insurance paid entries. Using a count just counts all the entries. And using a sum on a boolean throws up an error. # Joins Joins can be a basic join, full join, left join and right join.

Summary:

This change is a minor edit.

To save this page you must answer this question:

What is the greatest editor out there?

Username:

Replace this text with a file