GeneDiscoveR enables the statistical association of the presence or absence of coding sequences with multiple treatments or phenotypes through the identification of orthologs/homologs from multiple tools. To perform the association and identification, you can follow the steps indicated in the pipeline:

  • Prepare input files: These can be the results of a run from OrthoFinder or, as recommended, multiple runs of OrthoFinder varying the inflation parameter of the MCScanX algorithm, which allows for appropriate identification of orthogroups for species with significant evolutionary divergence, such as liverworts; For this, scripts can be found in /Python and /bash.
  • Import ortholog definitions and phenotype or treatment table: From the ortholog definitions using OrthoFinder or Phytozome-inParanoiDB and a table with species IDs and treatments or phenotypes, you can start the analyses with GeneDiscoveR;
  • Select the inflation value (I): If you performed multiple runs of OrthoFinder, as recommended, you can select the inflation value that represents a balance between clustering and partitioning. Additionally, consider the root clade of the inferred species tree;
  • Group species by treatment or phenotype: Define the phenotypes for subsequent statistical detection of orthogroups. The phenotypes are listed in the input information table;
  • Statistical detection of orthogroups associated with a phenotype or treatment: A statistical test is performed to evaluate the association of the presence or absence of each orthogroup with the treatments or phenotypes under study;
  • Annotation mapping, visualization, and export: GeneDiscoveR allows functional annotation mapping and provides versatile visualizations, including a Shiny web app for visualization and analysis of the results. Additionally, results can be easily exported;

The pipeline of GeneDiscoveR is illustrated in the following figure.

Installation instructions

Get the latest stable R release from CRAN. Then install GeneDiscoveR from this repository using the following code:

# Install and import GeneDiscoveR package
invisible(lapply(c("usethis", "devtools"), library, character.only = TRUE))

devtools::install_github("AtilioRausch/GeneDiscoveR")

library(GeneDiscoveR)

Installation downloads ~800Mb per example files

Examples of use

The following code shows an example of how to use GeneDiscoveR:

# Directory where the data is located
overallsDir <- system.file("extdata", "Comparatives-1dot3-6", package = "GeneDiscoveR")
N0sDir <- system.file("extdata", "N0-1dot3-6", package = "GeneDiscoveR")
dataFile <- system.file("extdata", "annotatedCDSs.tsv", package = "GeneDiscoveR")

# Create a GeneDiscoveR object
GeneDiscoveRobject <- GeneDiscoveR(
    overallsDir = overallsDir,
    N0sDir = N0sDir,
    dataFile = dataFile,
    minInflation = 1.3,
    maxInflation = 6,
    stepInflation = 0.1,
    orthologsTool = "OrthoFinder"
)

You can follow this example on TPS-map, which shows the detection of orthogroups associated with different types of oil bodies in liverworts. On the other hand, you can run the pipeline for plants with less divergence, such as Brassicaceae, on Brassicaceae. Finally, if you have data from Phytozome-inParanoiDB, you can follow the example on Phytozome-inParanoiDB.

Citation

Below is the citation output from using citation('GeneDiscoveR') in R. Please run this yourself to check for any updates on how to cite GeneDiscoveR.

print(citation("GeneDiscoveR"), bibtex = TRUE)
#> To cite package GeneDiscoveR in publications use:
#> 
#>   Rausch, Atilio O, et. al (2024). GeneDiscoveR: an R package for the
#>   statistical detection of orthogroups associated to plant traits. R
#>   package version 1.0.0. URL
#>   https://github.com/tu_nombre_de_usuario/miPaquete
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {GeneDiscoveR: an R package for the statistical detection of orthogroups associated to plant traits.},
#>     author = {Atilio O. Rausch},
#>     year = {224},
#>     note = {R package version 1.0.0},
#>     url = {https://github.com/AtilioRausch/GeneDiscoveR},
#>   }

Please note that GeneDiscoveR was only made possible thanks to many other R and bioinformatics software authors. GeneDiscoveR is waiting for publication, and we will provide the DOI as soon as possible.