All Projects // project_03 · master's thesis · population genetics · simulation

QST–FST Simulation Framework

Mapping the bias and failure modes of a classic evolutionary test across 500 replicates, thousands of conditions, and two population structures.

TypeMaster's Thesis
InstitutionGoudet Group, UNIL
TimelineJun 2024 – Sep 2025
StackPython · R · SLiM 5.0 · SLURM

The Question

The QST–FST comparison is a cornerstone method in evolutionary genetics. The logic: if phenotypic divergence between populations (QST) exceeds neutral genetic divergence (FST), then diversifying selection is likely acting on the trait. If QST ≈ FST, drift alone can explain the pattern.

The theoretical framework is elegant, but its assumptions are rarely satisfied in real data. This thesis asked: under what conditions does the test produce false positives or false negatives — and can we quantify those failure modes precisely?

The specific axes of investigation were trait architecture (number of QTLs, dominance, epistasis), population structure (island model vs. stepping-stone), and MAF filtering thresholds.

Simulation Design

All simulations were run in SLiM 5.0, a forward-time population genetics simulator. Each condition was replicated 500 times to obtain stable false-positive rate estimates.

500 Replicates per condition
1–1K QTL range tested
2 Population models
4 Trait architectures

Demographic models: Island model (all demes equally connected, panmictic assumption holds) and stepping-stone model (demes connected only to neighbours, creates isolation-by-distance).

Trait architectures: Additive, dominant (h=1), recessive (h=0), and epistatic (pairwise interactions). Effect sizes were drawn from an L-shaped distribution to mimic empirical QTL studies.

# SLiM simulation dispatch — Python wrapper
def run_condition(n_qtls, arch, demography, replicate_id):
    slim_script = generate_slim_script(
        n_loci     = n_qtls,
        h          = DOMINANCE[arch],      # 0, 0.5, 1
        n_demes    = 10,
        model      = demography,           # 'island' | 'stepping_stone'
        seed       = replicate_id * 42
    )
    result = subprocess.run(
        ['slim', '-s', str(replicate_id), slim_script],
        capture_output=True
    )
    return parse_slim_output(result.stdout)

# SLURM array — 500 replicates × N conditions in parallel
# sbatch --array=0-499 run_simulation.sh ${CONDITION_ID}

Key Findings

Statistical Validation

All FPR estimates are reported with exact binomial confidence intervals at the 95% level. Differences between conditions were tested with Wilcoxon signed-rank tests (non-parametric, since QST distributions are rarely normal).

The 500-replicate design was chosen to give confidence intervals narrow enough to distinguish FPR = 5% (the nominal level) from FPR = 10% with adequate power. Power analysis confirmed this was sufficient for the expected effect sizes.

# R — exact binomial CI for FPR estimate
compute_fpr_ci <- function(n_replicates, n_false_pos, alpha = 0.05) {
  binom.test(
    x       = n_false_pos,
    n       = n_replicates,
    p       = alpha,
    alternative = "two.sided"
  ) |> broom::tidy()
}

# Wilcoxon test for QST distribution differences
wilcox.test(
  qst_island ~ architecture,
  data   = sim_results,
  paired = FALSE,
  exact  = FALSE
)

Infrastructure

Running 500 replicates across dozens of conditions required careful HPC job management. The pipeline was orchestrated with SLURM job arrays — each condition dispatched as a separate array job, with output files collected and merged post-hoc.

The full simulation-to-result pipeline is fully reproducible: a single shell script regenerates all figures from raw SLiM output. All random seeds are recorded, and intermediate files are versioned.

Tech Stack

Python 3.11 SLiM 5.0 R 4.3 SLURM / HPC NumPy / pandas broom (R) ggplot2 Bash scripting
BioVAE-Phenotyper 3 / 3 DeepLocalAdaptation