This function performs gene identification based on a given formula, using the Fisher statistic. The identification permits the comparison of the relationship between the response and predictor variables. This function is used to separate the orthogroups by their enrichment in species with analized phenotypes. In the case of multiple response or predictor variables, the formula must be separated by a plus sign (+). For example, if the formula is "one_in_specialized_cell ~ many_in_all_cells", the function will apply the Fisher test to distinguish the orthogroups that are present in the species with the phenotype "one_in_specialized_cell" and absent in the species with the phenotype "many_in_all_cells". and the orthogroups that are present in the species with the phenotype "many_in_all_cells" and absent in the species with the phenotype "one_in_specialized_cell". The function needs the previous selection of species by phenotype with select_species_by_phenotype.

gene_identification_by_phenotype(
  formula,
  GeneDiscoveRobject = NULL,
  statistic = "Fisher",
  name = "PerType",
  cores = 1
)

Arguments

formula

The formula specifying the relationship between the response and predictor variables.

GeneDiscoveRobject

An optional GeneDiscoveR object to store the identification results.

statistic

The statistical test to be used for gene identification. Default is "Fisher".

name

The name of the identification execution.

cores

The number of cores to be used for parallel processing. Default is 1.

Value

The updated GeneDiscoveR object with the identification results.

Examples


# Create a GeneDiscoveR object
N0sDir <- system.file("extdata", "N0-1dot3-6", package = "GeneDiscoveR")
overallsDir <- system.file("extdata", "Comparatives-1dot3-6", package = "GeneDiscoveR")
dataFile <- system.file("extdata", "annotatedCDSs.tsv", package = "GeneDiscoveR")
minInflation <- 1.3
maxInflation <- 6
stepInflation <- 0.1

GeneDiscoveRobject <- GeneDiscoveR(overallsDir = overallsDir, N0sDir = N0sDir, dataFile = dataFile, minInflation = minInflation, maxInflation = maxInflation, stepInflation = stepInflation)

# Set active run
GeneDiscoveRobject <- set_run_active(GeneDiscoveRobject, InflationValue = 1.8)
#> -----------From OrthoFinder-----------
#> The process has been completed successfully

# Select species by phenotype
GeneDiscoveRobject <- select_species_by_phenotype(GeneDiscoveRobject = GeneDiscoveRobject, columnPhenotype = "Oil-body-type", columnID = "OrthofinderID", type = "one_in_specialized_cell")
GeneDiscoveRobject <- select_species_by_phenotype(GeneDiscoveRobject = GeneDiscoveRobject, columnPhenotype = "Oil-body-type", columnID = "OrthofinderID", type = "many_in_all_cells")
GeneDiscoveRobject <- select_species_by_phenotype(GeneDiscoveRobject = GeneDiscoveRobject, columnPhenotype = "Oil-body-type", columnID = "OrthofinderID", type = "noneOB")

# Gene identification by phenotype
# Identifies genes that are present in the species with the phenotype "one_in_specialized_cell" and absent in the species with the phenotype "many_in_all_cells".
GeneDiscoveRobject <- gene_identification_by_phenotype(GeneDiscoveRobject = GeneDiscoveRobject, formula = as.formula("one_in_specialized_cell ~ many_in_all_cells"), statistic = "Fisher", name = "PerOBtype", cores = 8)
# Identifies genes that are present in the species with the phenotype "one_in_specialized_cell + many_in_all_cells" and absent in the species with the phenotype "noneOB".
GeneDiscoveRobject <- gene_identification_by_phenotype(GeneDiscoveRobject = GeneDiscoveRobject, formula = as.formula("noneOB ~ one_in_specialized_cell + many_in_all_cells"), statistic = "Fisher", name = "OBpresence", cores = 8)