Anyway I thought I would here write down some of the tools I use in R, python and julia to connect to and interact with such DBs. As I use R on an everyday basis I'll start with it. There are two basic paths in R to connect to DBs in general: DBI and OCDB. I use DBI as it is the first I discovered, but honestly OCDB is as good. In practice one will load the flavoured-DBI package that match the SQL-flavour of the DB one uses: RMySQL, RSQLite, RPostgreSQL, etc. All of them uses DBI in the background and all of them uses the same grammar. Which is:
library(RPostgreSQL)
# First we connect to the database
driver <- dbDriver("PostgreSQL")
host <- "212.201.100.111"
database <- "nsb"
username <- "guest"
password <- "arm_aber_sexy"
con <- dbConnect(driver, host = host, port = "5432", dbname = database,
user = username, password = password)
# Of course one can do that as a oneliner:
con <- dbConnect(dbDriver("PostgreSQL"), host = "212.201.100.111", port = "5432", dbname = "nsb",
user = "guest", password = "arm_aber_sexy")
# Then we can look at which tables are present:
dbListTables(con)
# Read one of them fully:
nhs <- dbReadTable(con, "neptune_hole_summary")
# Or one can send a query:
dbSendQuery(con, "UPDATE whatever SET field=value WHERE other_field='VALUE';") #Dummy code as guest do not have writing permissions anyway
# Or get data:
dbGetQuery(con, "SELECT hole_id FROM neptune_hole_summary WHERE ocean='ANT';")
# And of course one can do batch operations:
sapply(sprintf("SELECT hole_id FROM neptune_hole_summary WHERE leg=%i;", 1:4), function(x) dbGetQuery(con, x))
This is a very simple paradigm. Python uses a slightly different one though. In psycopg2 for instance one creates an engine, a connexion and a cursor:
import psycopg2 as pg
# First create an engine to connect to the db:
host = "212.201.100.111"
database = "nsb"
username = "guest"
password = "arm_aber_sexy"
engine = "host=%s dbname=%s port=5432 user=%s password=%s" % (host, database, username, password)
# Establish a connection
con = pg.connect(engine)
# Create a "cursor" that will send a query to the db
cur = con.cursor()
# Execute a query in SQL:
cur.execute("SELECT hole_id, sample_depth_mbsf FROM neptune_sample;")
# Fetch the results
cur.execute(query)
res = cur.fetchall()
# Then close the cursor (should be done right away)
cur.close()
# Here res is a list of unnamed tuples, i. e. in that case [("113_689B",23.456), ...]
# which is not always ideal.
# To get a list of dictionaries (i. e. [{'hole_id'; '113_689B', sample_depth_mbsf: 23.456}, ...])
# You need to do the following
cur = con.cursor(cursor_factory=ex.RealDictCursor)
query = "SELECT hole_id, sample_depth_mbsf FROM samples;"
cur.execute(query)
res = cur.fetchall()
cur.close()
# If you want to do an UPDATE or INSERT query, here is a way to do that:
input = [{'site': 511, 'geochemistry': 'not done yet'}, {'site': 703, 'geochemistry': 'not planned'}]
cur = con.cursor()
cur.executemany('UPDATE whatever SET geochemistry=%(geochemistry)s WHERE site=%(site)s;', input) #Dummy code here
con.commit() #Note that it is con.commit and not cur.commit!
cur.close()
#Eventually when everything is finished, close the connection:
con.close()
An alternative is to use sqlalchemy as it is neatly connected to pands:
import pandas as pd
import pandas.io.sql as psql
from sqlalchemy import create_engine
#The process is the same.
#First create an engine to connect:
theEngine = "postgresql://%s:%s@%s/%s" % (username, password, host, database)
engine = create_engine(theEngine, client_encoding='utf8')
engine.connect()
query = "SELECT hole_id, sample_depth_mbsf FROM neptune_sample;"
res = psql.read_sql_query(query, engine)
#Here res is a pandas dataframe with columns named after the columns called from the table.
#I'm not as familiar with sqlalchemy but the way I do UPDATE and INSERT queries are as follow:
input = [{'site': 511, 'geochemistry': 'not done yet'}, {'site': 703, 'geochemistry': 'not planned'}]
with engine.begin() as con: #Here this syntax allow us to open a 'cursor' without having to explicitely close it.
#Using dictionary comprehension and printf syntax:
queries = ["UPDATE whatever WITH geochemistry='%s' WHERE site=%s;" % (k['geochemistry'], k['site']) for k in input] #Dummy code of course
for q in queries: #As far as I know they need to be sent one by one.
con.execute(q)
#And close the connection:
engine.dispose()
Finally, in Julia. I am very new to Julia so do not expect a very good piece of code here, but I did write a function recently connecting to the DB to compute sample ages, so here we go:
using LibPQ
using DataFrames
host = "212.201.100.111"
database = "nsb"
username = "guest"
password = "arm_aber_sexy"
con = LibPQ.Connection("dbname=$database host=$host port=5432 user=$username password=$password")
nhs = DataFrame(execute(con, "SELECT hole_id, sample_depth_mbsf FROM neptune_sample;"))
close(con)