The Exemplar modules to be utilized are the:
1 Genetic Algorithm Module (GA Module) – This module implements an Artificial Intelligence approach to finding logical combinations of SNP’s for association based studies
2 Association Study Module (AS Module) – this module calculates many useful statistics like Chi Square, Yates, Fisher Exact, Odds Ratio, LD, D’, etc.
3 Chromosome Alteration module (CA Module) – this module performs LOH analysis on the dataset using user-specified controls as the reference set to identify possible deletions in the chromosome.
The difficulty with such a small sample size is the lack of statistical power. Nonetheless, we hoped that by performing multiple types of analysis on the data, we could reduce the problem space from ~10,000 SNP’s to <50 SNP’s for consideration. Then, applying the biological knowledge to thus reduced set of data will further help to select genes-candidates for the studied disorder.
Analytic Process
STEP 1
Exemplars AS Module are first utilized to provide extensive statistical analysis of the dataset including:
1 Fishers Exact by genotype and by allele.
2 Odds Ratio by genotype and by allele.
The AS module is also used for feature selection of the dataset prior to being input to the GA Module.
STEP 2
Exemplars GA Module is run against the dataset many times with various parameter settings. A brief overview follows:
GA module is run against the entire input dataset and attempts to build models of the smallest size that can effectively predict outcomes while minimizing False Positives and maximizing True Positives. Different sized and type models attempt to improve results as necessary.
Various feature selection methods are employed to reduce the input parameter space, these will include:
a. Statistical Reduction (usually Fishers is used here) whereby each SNP has a p-value calculated and if their p-value does not fall below a certain threshold, they will be eliminated.
b. Minor allele frequency changes – the minor allele frequency is calculated for each SNP for cases and controls, if the variance is below a certain defined threshold, the SNP is eliminated from consideration.
Comprehensive model results are provided in this reports including:
1 Model predictive results for each sample
2 Model statistical p-values when possible
3 Relevant Ontology’s for GA discovered SNP’s
4 Complete details of each discovered SNP including its id, position, chromosome, and related genes.
STEP 3
Exemplars CA Module is run against the dataset to detect possible deletions in the chromosomes by looking for Loss Of Heterozygosity.
Each SNP is assigned a p-value.