{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Databases and SQL\n", "\n", "- **Authors**: Ian Dennis Miller\n", "- **Research field**: Social Psychology - Social Complexity\n", "- **Lesson topic**: Databases and SQL\n", "- **Lesson content URL**: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Plan\n", "\n", "- get a data set from R: mtcars\n", "- select from it using SQL (via sqldf)\n", "- export it as CSV file\n", "- in sqlite, create database and import CSV\n", "- use RSQLite connection to select direct from database" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# get a data set from R: mtcars" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data contain Motor Trends car testing results. There are 32 cars." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "32" ], "text/latex": [ "32" ], "text/markdown": [ "32" ], "text/plain": [ "[1] 32" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "11" ], "text/latex": [ "11" ], "text/markdown": [ "11" ], "text/plain": [ "[1] 11" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nrow(mtcars)\n", "ncol(mtcars)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX421.0 6 160 110 3.90 2.62016.460 1 4 4
Mazda RX4 Wag21.0 6 160 110 3.90 2.87517.020 1 4 4
Datsun 71022.8 4 108 93 3.85 2.32018.611 1 4 1
Hornet 4 Drive21.4 6 258 110 3.08 3.21519.441 0 3 1
Hornet Sportabout18.7 8 360 175 3.15 3.44017.020 0 3 2
Valiant18.1 6 225 105 2.76 3.46020.221 0 3 1
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllll}\n", " & mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb\\\\\n", "\\hline\n", "\tMazda RX4 & 21.0 & 6 & 160 & 110 & 3.90 & 2.620 & 16.46 & 0 & 1 & 4 & 4 \\\\\n", "\tMazda RX4 Wag & 21.0 & 6 & 160 & 110 & 3.90 & 2.875 & 17.02 & 0 & 1 & 4 & 4 \\\\\n", "\tDatsun 710 & 22.8 & 4 & 108 & 93 & 3.85 & 2.320 & 18.61 & 1 & 1 & 4 & 1 \\\\\n", "\tHornet 4 Drive & 21.4 & 6 & 258 & 110 & 3.08 & 3.215 & 19.44 & 1 & 0 & 3 & 1 \\\\\n", "\tHornet Sportabout & 18.7 & 8 & 360 & 175 & 3.15 & 3.440 & 17.02 & 0 & 0 & 3 & 2 \\\\\n", "\tValiant & 18.1 & 6 & 225 & 105 & 2.76 & 3.460 & 20.22 & 1 & 0 & 3 & 1 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am gear carb\n", "Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 \n", "Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 \n", "Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 \n", "Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 \n", "Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 \n", "Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "head(mtcars)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ " mpg cyl disp hp \n", " Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 \n", " 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 \n", " Median :19.20 Median :6.000 Median :196.3 Median :123.0 \n", " Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 \n", " 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 \n", " Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 \n", " drat wt qsec vs \n", " Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000 \n", " 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000 \n", " Median :3.695 Median :3.325 Median :17.71 Median :0.0000 \n", " Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375 \n", " 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000 \n", " Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000 \n", " am gear carb \n", " Min. :0.0000 Min. :3.000 Min. :1.000 \n", " 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000 \n", " Median :0.0000 Median :4.000 Median :2.000 \n", " Mean :0.4062 Mean :3.688 Mean :2.812 \n", " 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000 \n", " Max. :1.0000 Max. :5.000 Max. :8.000 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "summary(mtcars)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# use SQL to select cars with 6-cylinder engines" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [], "source": [ "library(sqldf)\n", "# help(sqldf)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
mpgcyldisphpdratwtqsecvsamgearcarb
21.0 6 160.0110 3.90 2.62016.460 1 4 4
21.0 6 160.0110 3.90 2.87517.020 1 4 4
21.4 6 258.0110 3.08 3.21519.441 0 3 1
18.1 6 225.0105 2.76 3.46020.221 0 3 1
19.2 6 167.6123 3.92 3.44018.301 0 4 4
17.8 6 167.6123 3.92 3.44018.901 0 4 4
19.7 6 145.0175 3.62 2.77015.500 1 5 6
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllll}\n", " mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb\\\\\n", "\\hline\n", "\t 21.0 & 6 & 160.0 & 110 & 3.90 & 2.620 & 16.46 & 0 & 1 & 4 & 4 \\\\\n", "\t 21.0 & 6 & 160.0 & 110 & 3.90 & 2.875 & 17.02 & 0 & 1 & 4 & 4 \\\\\n", "\t 21.4 & 6 & 258.0 & 110 & 3.08 & 3.215 & 19.44 & 1 & 0 & 3 & 1 \\\\\n", "\t 18.1 & 6 & 225.0 & 105 & 2.76 & 3.460 & 20.22 & 1 & 0 & 3 & 1 \\\\\n", "\t 19.2 & 6 & 167.6 & 123 & 3.92 & 3.440 & 18.30 & 1 & 0 & 4 & 4 \\\\\n", "\t 17.8 & 6 & 167.6 & 123 & 3.92 & 3.440 & 18.90 & 1 & 0 & 4 & 4 \\\\\n", "\t 19.7 & 6 & 145.0 & 175 & 3.62 & 2.770 & 15.50 & 0 & 1 & 5 & 6 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am gear carb\n", "1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 \n", "2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 \n", "3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 \n", "4 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 \n", "5 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 \n", "6 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 \n", "7 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sqldf('select * from mtcars where cyl=6')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## There are 7 cars matching the query." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "7" ], "text/latex": [ "7" ], "text/markdown": [ "7" ], "text/plain": [ "[1] 7" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nrow(sqldf('select * from mtcars where cyl=6'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## We can obtain the same result directly with SQL... but let's not get ahead of ourselves." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "
count(*)
7
\n" ], "text/latex": [ "\\begin{tabular}{r|l}\n", " count(*)\\\\\n", "\\hline\n", "\t 7\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " count(*)\n", "1 7 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sqldf('select count(*) from mtcars where cyl=6')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# write mtcars dataset to the file system" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df = mtcars\n", "df$name = rownames(df) # move R's rownames into their own column\n", "write.table(df, \"mtcars.csv\", quote=TRUE, row.names=FALSE, col.names=FALSE, sep=\",\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# create sqlite database in terminal\n", "\n", " make mtcars-init\n", "\n", "That will run the following:\n", "\n", " sqlite3 mtcars.sqlite < mtcars-init.sql\n", "\n", "That SQL file does the following:\n", "\n", "- drop a table called results\n", "- create a table called results with columns for the mtcars data\n", "- tell sqlite to load a CSV file\n", "- load the CSV file into the results table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# access sqlite from R" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false }, "outputs": [], "source": [ "library(RSQLite)\n", "\n", "# connect to the database\n", "db = dbConnect(SQLite(), dbname=\"mtcars.sqlite\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## list the tables that are available" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "'results'" ], "text/latex": [ "'results'" ], "text/markdown": [ "'results'" ], "text/plain": [ "[1] \"results\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dbListTables(db)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inspect columns in the results table" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 'mpg'
  2. \n", "\t
  3. 'cyl'
  4. \n", "\t
  5. 'disp'
  6. \n", "\t
  7. 'hp'
  8. \n", "\t
  9. 'drat'
  10. \n", "\t
  11. 'wt'
  12. \n", "\t
  13. 'qsec'
  14. \n", "\t
  15. 'vs'
  16. \n", "\t
  17. 'am'
  18. \n", "\t
  19. 'gear'
  20. \n", "\t
  21. 'carb'
  22. \n", "\t
  23. 'name'
  24. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'mpg'\n", "\\item 'cyl'\n", "\\item 'disp'\n", "\\item 'hp'\n", "\\item 'drat'\n", "\\item 'wt'\n", "\\item 'qsec'\n", "\\item 'vs'\n", "\\item 'am'\n", "\\item 'gear'\n", "\\item 'carb'\n", "\\item 'name'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'mpg'\n", "2. 'cyl'\n", "3. 'disp'\n", "4. 'hp'\n", "5. 'drat'\n", "6. 'wt'\n", "7. 'qsec'\n", "8. 'vs'\n", "9. 'am'\n", "10. 'gear'\n", "11. 'carb'\n", "12. 'name'\n", "\n", "\n" ], "text/plain": [ " [1] \"mpg\" \"cyl\" \"disp\" \"hp\" \"drat\" \"wt\" \"qsec\" \"vs\" \"am\" \"gear\"\n", "[11] \"carb\" \"name\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dbListFields(db, \"results\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# use SQL to select cars with 8-cylinder engines - directly from database" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
mpgcyldisphpdratwtqsecvsamgearcarbname
18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout
14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360
16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE
17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL
15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC
10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood
10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental
14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial
15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger
15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin
13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28
19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird
15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L
15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Maserati Bora
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb & name\\\\\n", "\\hline\n", "\t 18.7 & 8 & 360.0 & 175 & 3.15 & 3.440 & 17.02 & 0 & 0 & 3 & 2 & Hornet Sportabout \\\\\n", "\t 14.3 & 8 & 360.0 & 245 & 3.21 & 3.570 & 15.84 & 0 & 0 & 3 & 4 & Duster 360 \\\\\n", "\t 16.4 & 8 & 275.8 & 180 & 3.07 & 4.070 & 17.40 & 0 & 0 & 3 & 3 & Merc 450SE \\\\\n", "\t 17.3 & 8 & 275.8 & 180 & 3.07 & 3.730 & 17.60 & 0 & 0 & 3 & 3 & Merc 450SL \\\\\n", "\t 15.2 & 8 & 275.8 & 180 & 3.07 & 3.780 & 18.00 & 0 & 0 & 3 & 3 & Merc 450SLC \\\\\n", "\t 10.4 & 8 & 472.0 & 205 & 2.93 & 5.250 & 17.98 & 0 & 0 & 3 & 4 & Cadillac Fleetwood \\\\\n", "\t 10.4 & 8 & 460.0 & 215 & 3.00 & 5.424 & 17.82 & 0 & 0 & 3 & 4 & Lincoln Continental\\\\\n", "\t 14.7 & 8 & 440.0 & 230 & 3.23 & 5.345 & 17.42 & 0 & 0 & 3 & 4 & Chrysler Imperial \\\\\n", "\t 15.5 & 8 & 318.0 & 150 & 2.76 & 3.520 & 16.87 & 0 & 0 & 3 & 2 & Dodge Challenger \\\\\n", "\t 15.2 & 8 & 304.0 & 150 & 3.15 & 3.435 & 17.30 & 0 & 0 & 3 & 2 & AMC Javelin \\\\\n", "\t 13.3 & 8 & 350.0 & 245 & 3.73 & 3.840 & 15.41 & 0 & 0 & 3 & 4 & Camaro Z28 \\\\\n", "\t 19.2 & 8 & 400.0 & 175 & 3.08 & 3.845 & 17.05 & 0 & 0 & 3 & 2 & Pontiac Firebird \\\\\n", "\t 15.8 & 8 & 351.0 & 264 & 4.22 & 3.170 & 14.50 & 0 & 1 & 5 & 4 & Ford Pantera L \\\\\n", "\t 15.0 & 8 & 301.0 & 335 & 3.54 & 3.570 & 14.60 & 0 & 1 & 5 & 8 & Maserati Bora \\\\\n", "\\end{tabular}\n" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am gear carb name \n", "1 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout \n", "2 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360 \n", "3 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE \n", "4 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL \n", "5 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC \n", "6 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood \n", "7 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental\n", "8 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial \n", "9 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger \n", "10 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin \n", "11 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28 \n", "12 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird \n", "13 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L \n", "14 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Maserati Bora " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dbGetQuery(conn = db, \"select * from results where cyl=8\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.3.1" } }, "nbformat": 4, "nbformat_minor": 1 }