Using R code in Julia and Python

2021-03-13

As we are moving toward the data analysis part of our project, I am trying to teach my working group how to use the stack of R functions I wrote during the past decade. The issue is however that R does not seem to be a popular option among my working group and that they prefer using either Python or Julia. As translating a decade-worth of code into those two languages is a bit of a bummer, I've been looking at how to read and use R functions in Julia and Python instead.

Julia has, in particular, a very intuitive and well-done package to achieve this: RCall.jl:

using RCall
R""" #Macro to make operations in an R environment
x <- c(0,1,2)
a <- 2
"""
X = @rget x # moving object from the R environemnt to Julia
y = X .+ randn(3)
@rput y # moving object from julia to the R environment
R"z <- y + rpois(3,2)"
@rget z

# Skipping the @rput step using the dollar sign notation
x = randn(10)
R"y <- mean($x)"
@rget y

# Concrete example using a function I wrote in R to compute Chao1 estimator and its Confidence Interval
R"""
chao1 <- function(mat){
  a <- sum(mat==1) #Number of singletons
  b <- sum(mat==2) #Number of doubletons
  S <- sum(mat>0)  #Number of species
  chao1 <- S+(a^2)/(2*b)
  d <- a/b
  varS1 <- b*((d/4)^4+d^3+(d/2)^2)
  t <- chao1-S
  k <- exp(1.96*sqrt(log(1+varS1/t^2)))
  c(S+t/k, chao1, S+t*k)
}
"""
#Alternatively
R"source('path/to/chao1.R')"

sample = [1, 42, 0, 3, 23, 2, 1, 1, 2, 5, 6, 0, 123, 9] #Made-up sample
res = R"chao1($sample)"

It's very easy to use, and the conversion between Julia and R objects is a no-brainer, even with dataframes. In the following example I take a species-accumulation curve (here) and fit a curve (using de Caprariis et al. formula) to estimate its asymptote (the results of the function is a list containing the fitted curve as a data.frame and a named vector containing the parameters of the function along with fit quality measures).

using CSV
using DataFrames
sac = CSV.read("sac.csv", DataFrame, header=1, delim="\t")
R"""
caprariis <- function(SAC){
         caprariis.misfit <- function(parametres,x){
           Smax <- parametres[1]
           b <- parametres[2]
           FIT <- c()
           misfit <- 0
           for (i in 1:nrow(x)){
             FIT[i] <- (Smax*i)/(b+i)
             misfit <- sum(abs(FIT[i]-x[i,2]))+misfit
           }
           return(misfit)
         }
         OPT <- optim(c(50,10),caprariis.misfit,method="BFGS",x=SAC,control=list(trace=1))
         Smax <- OPT$par[1]
         b <- OPT$par[2]
         FIT <- c()
         caprar <- list()
         misfit <- OPT$value
         for (i in 1:nrow(SAC)) FIT[i] <- (Smax*i)/(b+i)
         caprar$Curve.fitting <- cbind(SAC,FIT)
         colnames(caprar$Curve.fitting) <- c("N","SAC","Fitting")
         pearson <- cor(FIT,SAC[,2])
         pearson.squared <- pearson^2
         all <- c(Smax,b,misfit,pearson,pearson.squared)
         names(all) <- c("Smax","b","Misfit","Pearson","Pearson squared")
         caprar$Summary <- all
         return(caprar)
       }
"""
R"caprariis($sac)"

In Python, using rpy2, it's still very easy (though a bit less elegant than with Julia I must say):

import rpy2.robjects as robjects

# As in Julia the object rpy2.robjects.r contains a queryable R environment
robjects.r('''
chao1 <- function(mat){
  a <- sum(mat==1) #Number of singletons
  b <- sum(mat==2) #Number of doubletons
  S <- sum(mat>0)  #Number of species
  chao1 <- S+(a^2)/(2*b)
  d <- a/b
  varS1 <- b*((d/4)^4+d^3+(d/2)^2)
  t <- chao1-S
  k <- exp(1.96*sqrt(log(1+varS1/t^2)))
  c(S+t/k,chao1,S+t*k)
}
''')
#Or read directly from the script
robjects.r('source(/path/to/chao1.R)')
#Or:
robjects.r.source("/path/to/chao1.R")

sample = [1, 42, 0, 3, 23, 2, 1, 1, 2, 5, 6, 0, 123, 9]
r_sample = robjects.IntVector(sample) #The conversion in python needs to be explicit
chao1 = robjects.r['chao1']
chao1(r_sample)

The explicit conversion is a bit annoying, however when using pandas and dataframes, one can bypass that and make it way easier:

import pandas
from rpy2.robjects import pandas2ri
pandas2ri.activate()
robjects.r.source("/path/to/caprariis.R")
cap = robjects.r['caprariis']
sac = pandas.read_csv("sac.csv", delimiter="\t", header=1)
res = cap(sac)

In the other direction, I have been using package reticulate to use Python code in R. Maybe I'll also write a post one day about it. Anyway I'm glad to see all my legacy code can still be used even by non-R-programmers.