Using R code in Julia and Python

2021-03-13

As we are moving toward the data analysis part of our project, I am trying to teach my working group how to use the stack of R functions I wrote during the past decade. The issue is however that R does not seem to be a popular option among my working group and that they prefer using either Python or Julia. As translating a decade-worth of code into those two languages is a bit of a bummer, I've been looking at how to read and use R functions in Julia and Python instead.

Julia has, in particular, a very intuitive and well-done package to achieve this: RCall.jl:

``````using RCall
R""" #Macro to make operations in an R environment
x <- c(0,1,2)
a <- 2
"""
X = @rget x # moving object from the R environemnt to Julia
y = X .+ randn(3)
@rput y # moving object from julia to the R environment
R"z <- y + rpois(3,2)"
@rget z

# Skipping the @rput step using the dollar sign notation
x = randn(10)
R"y <- mean(\$x)"
@rget y

# Concrete example using a function I wrote in R to compute Chao1 estimator and its Confidence Interval
R"""
chao1 <- function(mat){
a <- sum(mat==1) #Number of singletons
b <- sum(mat==2) #Number of doubletons
S <- sum(mat>0)  #Number of species
chao1 <- S+(a^2)/(2*b)
d <- a/b
varS1 <- b*((d/4)^4+d^3+(d/2)^2)
t <- chao1-S
k <- exp(1.96*sqrt(log(1+varS1/t^2)))
c(S+t/k, chao1, S+t*k)
}
"""
#Alternatively
R"source('path/to/chao1.R')"

sample = [1, 42, 0, 3, 23, 2, 1, 1, 2, 5, 6, 0, 123, 9] #Made-up sample
res = R"chao1(\$sample)"
``````

It's very easy to use, and the conversion between Julia and R objects is a no-brainer, even with dataframes. In the following example I take a species-accumulation curve (here) and fit a curve (using de Caprariis et al. formula) to estimate its asymptote (the results of the function is a list containing the fitted curve as a data.frame and a named vector containing the parameters of the function along with fit quality measures).

``````using CSV
using DataFrames
R"""
caprariis <- function(SAC){
caprariis.misfit <- function(parametres,x){
Smax <- parametres
b <- parametres
FIT <- c()
misfit <- 0
for (i in 1:nrow(x)){
FIT[i] <- (Smax*i)/(b+i)
misfit <- sum(abs(FIT[i]-x[i,2]))+misfit
}
return(misfit)
}
OPT <- optim(c(50,10),caprariis.misfit,method="BFGS",x=SAC,control=list(trace=1))
Smax <- OPT\$par
b <- OPT\$par
FIT <- c()
caprar <- list()
misfit <- OPT\$value
for (i in 1:nrow(SAC)) FIT[i] <- (Smax*i)/(b+i)
caprar\$Curve.fitting <- cbind(SAC,FIT)
colnames(caprar\$Curve.fitting) <- c("N","SAC","Fitting")
pearson <- cor(FIT,SAC[,2])
pearson.squared <- pearson^2
all <- c(Smax,b,misfit,pearson,pearson.squared)
names(all) <- c("Smax","b","Misfit","Pearson","Pearson squared")
caprar\$Summary <- all
return(caprar)
}
"""
R"caprariis(\$sac)"``````

In Python, using rpy2, it's still very easy (though a bit less elegant than with Julia I must say):

``````import rpy2.robjects as robjects

# As in Julia the object rpy2.robjects.r contains a queryable R environment
robjects.r('''
chao1 <- function(mat){
a <- sum(mat==1) #Number of singletons
b <- sum(mat==2) #Number of doubletons
S <- sum(mat>0)  #Number of species
chao1 <- S+(a^2)/(2*b)
d <- a/b
varS1 <- b*((d/4)^4+d^3+(d/2)^2)
t <- chao1-S
k <- exp(1.96*sqrt(log(1+varS1/t^2)))
c(S+t/k,chao1,S+t*k)
}
''')
#Or read directly from the script
robjects.r('source(/path/to/chao1.R)')
#Or:
robjects.r.source("/path/to/chao1.R")

sample = [1, 42, 0, 3, 23, 2, 1, 1, 2, 5, 6, 0, 123, 9]
r_sample = robjects.IntVector(sample) #The conversion in python needs to be explicit
chao1 = robjects.r['chao1']
chao1(r_sample)``````

The explicit conversion is a bit annoying, however when using pandas and dataframes, one can bypass that and make it way easier:

``````import pandas
from rpy2.robjects import pandas2ri
pandas2ri.activate()
robjects.r.source("/path/to/caprariis.R")
cap = robjects.r['caprariis']