This month I finally uploaded on Zenodo a stack of backups I had from the Neptune database: the aim, in addition to make the database available at all time, even when our servers are down (like it is now), is to make it compatible with chronosphere
which will, among other nice things, allow people to work with past versions of the database in order to reproduce the results of old papers.
One of the constraint for this move was to provide a SQLite version of the database, which makes it considerably easier to be manipulated with R. Neptune however is a PostgreSQL database, so a little bit of gymnastic was needed.
The backups I had were saved as standalone files, using the defaults of pg_dump
and pg_dumpall
, which was unfortunate as those are definitely not compatible with SQLite.
#First step is to load locally the globals (i. e. the user accounts)
#as in postgreSQL tables are saved with their owner
system("psql -h localhost -U postgres -w < globals.sql")
system("psql -h localhost -U postgres -w -c 'DROP DATABASE nsb;'")
system("psql -h localhost -U postgres -w -c 'CREATE DATABASE nsb;'")
#Reads in the original backup
system('psql -h localhost -U postgres -w -d "nsb" < nsb.sql')
#Resaves it in a way that is compatible with SQLite, i. e. with inserts and without the table owner business
#Also we do not want to save Django's tables
system('/usr/local/bin/pg_dump -COx --inserts -h localhost -U postgres -w nsb \
-T public.triton -T public.auth* -T public.django* \
-f "nsb_simplified.sql"', intern=TRUE)
#Then we read it in R, and split by statements
sql <- readLines("nsb_simplified.sql")
sql <- sql[!grepl("^--",sql)]
sql <- sql[sql!=""]
statements <- el(strsplit(paste(sql,collapse="\n"),";\n"))
#Then we need to clean-up postgreSQL idiosyncracies. The first one is the use of schemas.
#In Neptune we only uses the default "public" shema
statements <- gsub("public\\.","",statements)
#Also, for what will follow, it is easier if we get rid of leading newlines
statements <- gsub("^\n+","",statements)
The next step is very dependent of the database design and thus need to be tailored.
# One issue is casting, which needs to be getting rid of, like this:
statements <- gsub("::character varying","",statements)
#Then the DEFAULTs values in CREATE TABLEs are rarely portable:
statements <- gsub("DEFAULT nextval\\(.+::regclass\\) NOT NULL,","NOT NULL,",statements)
#Similarly complex CONSTRAINTs do not (afaik) exist in SQLite
statements <- gsub(",\n CONSTRAINT .+\\)$",")",statements)
#And that one was tough to figure out: the word "index" is a reserved keyword in SQLite so it needs to be escaped:
statements <- gsub(" index "," `index` ",statements)
Finally, once sanitized, we only need to grab the CREATE TABLE and INSERT INTO statements and copy them to a file, that we will rid into sqlite. Anything else (like TRIGGERs, FUNCTIONs, VIEWs) do not exist in SQLite so we can't port them.
tab <- grepl("^CREATE TABLE",statements)
ins <- grepl("^INSERT INTO",statements)
#The following contains the foreign keys based on multiple columns which can't be used
#but we might have a look at them to recreate them afterwards one way or another
alt <- grepl("^ALTER TABLE",statements)
#Finally we save to a file, surrounded by BEGIN and END statements.
cat("BEGIN;",statements[tab],statements[ins],"END;", sep=";\n",file="temp.sql")
system("sqlite3 nsb.sqlite < temp.sql")