rTASSEL Benchmarks
Brandon Monier
2023-11-08
Source:vignettes/rtassel_benchmarks.Rmd
rtassel_benchmarks.Rmd
Methods
To achieve benchmarking results, several data sets were used. Genotypic and phenotypic maize data consisting of 279 samples, 3093 variant sites, and 1 measured trait were utilized for the analysis of variant call format (VCF) import, generalized linear model (GLM) association, mixed linear model (MLM) association, and kinship generation times (Flint-Garcia et al., 2005). To illustrate the effectiveness of the fast association method, 100 simulated RNA expression traits for the prior genotype data was used. Trait data was generated using the makeExampleDESeqDataSet() function from the R package DESeq2 (Love et al., 2014). A larger genotypic data set consisting of 1,210 samples and 2,255,405 variant sites was also utilized for large VCF import and kinship generation times. All benchmarks were generated using the microbenchmark() function from the R package microbenchmark (Mersmann, 2019).
All benchmarks sans large VCF import and kinship generation times were evaluated 100 times and recorded on a workstation running 16 GB of RAM and 4 cores on an Intel® CoreTM i5-6500 CPU with a clock speed of 3.20 GHz and. Large VCF import and kinship generation benchmarks were evaluated 10 times and recorded on a workstation running 256 GB of RAM and 12 cores on an Intel® Xeon® CPU E5-2643 v3 with a clock speed of 3.40GHz.