
Installing rTASSEL
Brandon Monier
2025-04-23
Source:vignettes/rtassel_installation.Rmd
rtassel_installation.Rmd
Prerequisite - installing rJava
Since TASSEL is written primarily in Java, a Java JDK will need to be
installed on your machine. Additionally, for R to communicate with Java,
the R package rJava
will need to
be installed. In order to use rTASSEL
, ensure that you
have:
- A
JDK
(Java Development Kit8
) installed on your system. - Your system environment variable
JAVA_HOME
is configured appropriately and points to yourJDK
of choice. This will usually be included in your PATH environment variable as well. Options and system environmental variables that are available from R can be seen withSys.getenv()
and more specificallySys.getenv("JAVA_HOME")
.
NOTE: If you are using a UNIX system (e.g. Ubuntu) and are experiencing issues, you may need to reconfigure R with Java. To perform this, open a terminal and enter the command:
R CMD javareconf
You may need to have root privileges when performing this so you may
need to add sudo
to the prior command.
If you need additional steps on how to perform these actions, detailed information can be found using the following links, depending on your OS:
Install from GitHub
Building with vignettes
After you have rJava
up and running on your machine,
install rTASSEL
by installing the source code from our
GitHub repository using the devtools
package. Here, we show
how you can install the package and build vignettes locally:
if (!require("devtools")) install.packages("devtools")
devtools::install_github(
repo = "maize-genetics/rTASSEL",
ref = "master",
build_vignettes = TRUE,
dependencies = TRUE
)
The dependencies = TRUE
parameter will have to be set if
you do not have the suggested packages described in the DESCRIPTION
file of this package.
Building without vignettes
If you wish to not build vignettes, the prior method can be simplified as shown below:
if (!require("devtools")) install.packages("devtools")
devtools::install_github("maize-genetics/rTASSEL")
Loading rTASSEL
After source code has been compiled, the package can be loaded using:
## Welcome to rTASSEL (version 0.10.0)
## ℹ Running TASSEL version 5.2.96
## ℹ Consider starting a TASSEL log file (see ?startLogger())
Or, if you want to use a function without violating your environment
you can use rTASSEL::<function>
, where
<function>
is an rTASSEL
function.
Running from Docker
If you wish to run a containerized version of rTASSEL
,
we also have a Docker
image available. This can be retrieved from DockerHub using the
following command:
docker pull maizegenetics/rtassel:latest
With the terminal
Once downloaded, you can run rTASSEL
from a terminal
window:
docker run --rm -ti maizegenetics/rtassel R
With RStudio Server
This image also contains an RStudio Server instance. To run this, you
will need to publish the container’s port(s) to the host
(-p
). For example:
docker run --rm -ti -p 8787:8787 maizegenetics/rtassel
From here, you can go to localhost:8787
on a web browser
and enter a:
- Username (by default, this will be
rstudio
) - Password (this will be a randomly generated password displayed in the terminal output)
Setting rTASSEL
/Java memory
Local overview
Since rTASSEL
leverages the TASSEL 5 Java API via the
rJava
package, it is important to allocate sufficient
memory to the Java Virtual Machine (JVM) before it starts. This is done
using the options(java.parameters = "-Xmx...")
command in
R, which sets JVM parameters such as the maximum heap size (e.g.,
-Xmx4g
for 4 GB). The reason this must be
set before loading rTASSEL
is because the JVM can
only be configured at startup. Once initialized, its memory settings
cannot be changed without restarting the R session. This becomes
especially important when working with large datasets or computationally
intensive method calls, which can quickly exceed the default memory
allocation and lead to OutOfMemoryError
s. By increasing the
available heap space proactively, we ensure that Java operations can be
performed efficiently and without interruption due to memory
constraints.
In short, if you are loading large genotype datasets and/or phenotype
data, it is adamant that you specify the memory allocated
before loading the rTASSEL
package via the options()
function:
Running rTASSEL
on RStudio Server
Certain instances of RStudio Server on computing clusters can
override what you specify in the prior example (i.e., running the
options()
function before loading rTASSEL
) due
to when the JVM is initialized and the options()
function
is called in the recently initialized R session. If the JVM is
initialized, any value provided to the java.parameters
key
in the options()
call will be silently
ignored. To prevent this from happening, make sure to set up a
.Rprofile
configuration file with the aforementioned
options()
call:
## Example .Rprofile entry
# Allocate 4 GB of memory to the JVM
options(java.parameters = "-Xmx4g")
Since setting up a .Rprofile
configuration are out of
scope for this package, please refer to Posit’s
excellent write up on the subject.
Helpful tips
Verify memory has been set
If you are running into OutOfMemory
exceptions, verify
if you have specified enough memory via the prior options()
call. This can help verify if you have properly set enough memory at
startup. By default, rJava
will allocate 500 MB
(0.5 GB) of memory to your session. At any time during your R
session you can report the total memory allocated using a couple of
rJava
calls:
# Call Java Runtime class
runtime <- rJava::.jcall("java/lang/Runtime", "Ljava/lang/Runtime;", "getRuntime")
# Get total memory allocation (reported in gigabytes [GB])
gbConv <- 1024^3 # e.g. ~1e9 (billion) bytes
totMem <- round(round(rJava::.jcall(runtime, "J", "totalMemory") / gbConv, 3))
# Show in console
totMem
If the java.parameters
value in options()
was set up properly, the value specified in totMem
should
be the same value you specify in options()
.
Ensure enough memory is allocated
In most instances, genotype data will be the main determinant of how much memory you should allocate to the JVM. In most cases, the amount of memory you should allocate for genotype data is at least:
(# taxa) * (# sites) * (1 byte)
For example, if you have genotype data consisting of 250 taxa and 3000 sites, this would be 750000 bytes or 0.75 megabytes (MB).
Prior issues and possible resolutions
Problems installing rJava on macOS with M1 CPU architecture
If you are running into issues with installing rJava
using the newer Mac chip architecture, Oracle JDK currently (as of
writing this) does not work. Consider an alternative JDK source such as
OpenJDK or Azul
JDK.
More detailed information about a possible workaround can be found in this Stack Overflow post.
Problems installing if you have both 32- and 64-bit architecture installed for R
If you are using a machine that has both architectures
installed for R, you might run into problems pulling code using
devtools
. If this is the case, one solution would be to add
the parameter --no-multiarch
option in
INSTALL_opts
. This will force building the package for your
currently running R version:
devtools::install_github(
repo = "maize-genetics/rTASSEL",
ref = "master",
build_vignettes = FALSE,
INSTALL_opts = "--no-multiarch"
)
Problems with rJava
if you have upgraded Java
On macOS: if you previously had rJava
working through
RStudio, then you upgraded your Java and it now longer works, try the
following:
At the command line type:
R CMD javareconf
Then check for a left over symbolic link via:
ls -ltr /usr/local/lib/libjvm.dylib
If the link exists, remove it, then create it fresh via these commands:
rm /usr/local/lib/libjvm.dylib
sudo ln -s $(/usr/libexec/java_home)/lib/server/libjvm.dylib /usr/local/lib
You should now be able to enter RStudio and setup
rJava
.