Course Title: Biostatistics for Next-Generation Sequencing (NGS) Training Course
Executive Summary
This two-week intensive course provides participants with a comprehensive understanding of biostatistical methods essential for analyzing Next-Generation Sequencing (NGS) data. The course covers fundamental statistical principles, experimental design for NGS studies, quality control, data normalization, differential expression analysis, variant calling, and pathway enrichment analysis. Participants will gain hands-on experience using R and relevant Bioconductor packages to perform statistical analyses on real-world NGS datasets. Emphasis is placed on interpreting results and applying appropriate statistical rigor to ensure the validity and reproducibility of findings. Upon completion, participants will be equipped with the skills to design, analyze, and interpret NGS data effectively, contributing to advancements in genomics research and personalized medicine.
Introduction
Next-Generation Sequencing (NGS) technologies have revolutionized genomic research, enabling scientists to explore the complexity of biological systems at an unprecedented scale. However, the vast amount of data generated by NGS platforms requires sophisticated biostatistical methods for proper analysis and interpretation. This course is designed to equip researchers, bioinformaticians, and other professionals with the knowledge and skills necessary to effectively analyze NGS data using sound statistical principles. The course will cover a wide range of topics, including experimental design, data preprocessing, statistical modeling, and interpretation of results. Through a combination of lectures, hands-on exercises, and case studies, participants will gain practical experience in applying biostatistical methods to address real-world research questions. This course aims to bridge the gap between NGS technology and statistical analysis, empowering participants to make meaningful discoveries from their genomic data.
Course Outcomes
- Understand the fundamental principles of biostatistics relevant to NGS data analysis.
- Design statistically sound NGS experiments to address specific research questions.
- Perform quality control and data preprocessing steps to ensure data integrity.
- Apply appropriate statistical methods for differential expression analysis and variant calling.
- Interpret the results of statistical analyses in the context of biological hypotheses.
- Utilize R and Bioconductor packages for NGS data analysis.
- Critically evaluate the statistical rigor of NGS studies and ensure reproducibility.
Training Methodologies
- Interactive lectures and discussions
- Hands-on exercises using R and Bioconductor
- Case studies of real-world NGS datasets
- Group projects to apply learned concepts
- Individual assignments to reinforce understanding
- Guest lectures from experts in the field
- Online resources and support materials
Benefits to Participants
- Gain expertise in biostatistical methods for NGS data analysis.
- Enhance skills in R programming and Bioconductor usage.
- Improve ability to design and analyze NGS experiments effectively.
- Develop critical thinking skills for interpreting statistical results.
- Expand professional network with experts in genomics and biostatistics.
- Increase competitiveness in the field of genomics research.
- Receive a certificate of completion to demonstrate acquired skills.
Benefits to Sending Organization
- Increased capacity for analyzing NGS data internally.
- Improved quality and reproducibility of genomic research.
- Enhanced ability to make data-driven decisions.
- More efficient use of resources for NGS experiments.
- Attraction and retention of talented researchers.
- Enhanced reputation as a leader in genomics research.
- Improved collaboration with external research partners.
Target Participants
- Researchers in genomics, molecular biology, and related fields
- Bioinformaticians and data scientists working with NGS data
- Graduate students pursuing research in genomics or bioinformatics
- Clinical researchers involved in NGS-based diagnostics
- Laboratory technicians responsible for NGS data generation
- Pharmaceutical scientists involved in drug discovery and development
- Principal Investigators seeking to incorporate NGS into their research programs
Week 1: Foundations of Biostatistics and NGS Data
Module 1: Introduction to Biostatistics
- Basic statistical concepts: probability, distributions, hypothesis testing
- Types of data and measurement scales
- Descriptive statistics: mean, median, standard deviation
- Inferential statistics: confidence intervals, p-values
- Statistical power and sample size calculations
- Introduction to R and RStudio
- Data import, manipulation, and visualization in R
Module 2: Experimental Design for NGS Studies
- Principles of experimental design: randomization, replication, blocking
- Types of NGS experiments: RNA-Seq, ChIP-Seq, Whole-Genome Sequencing
- Factorial designs and their application to NGS experiments
- Controlling for confounding factors
- Power analysis for NGS experiments
- Sample size considerations for differential expression analysis
- Best practices for sample preparation and library construction
Module 3: NGS Data Formats and Quality Control
- Introduction to NGS data formats: FASTQ, BAM, SAM
- Understanding read quality scores and their interpretation
- Quality control metrics for NGS data
- Using FastQC for quality assessment
- Adapter trimming and quality filtering
- Read alignment to a reference genome
- Introduction to Bioconductor packages for NGS data analysis
Module 4: RNA-Seq Data Analysis – Part 1
- Overview of RNA-Seq workflow
- Read alignment and quantification using tools like STAR and Salmon
- Gene expression quantification and normalization
- Introduction to DESeq2 and edgeR for differential expression analysis
- Data exploration and visualization using ggplot2
- Variance stabilization and data transformation
- Batch effect correction using limma
Module 5: RNA-Seq Data Analysis – Part 2
- Differential expression analysis using DESeq2 and edgeR
- Multiple testing correction methods: Bonferroni, Benjamini-Hochberg
- Volcano plots and MA plots for visualizing differential expression results
- Functional enrichment analysis using GO and KEGG
- Pathway analysis using tools like DAVID and Enrichr
- Interpretation of differential expression results in the context of biological hypotheses
- Case study: Analyzing a real-world RNA-Seq dataset
Week 2: Variant Calling and Advanced NGS Analysis
Module 6: Variant Calling – Part 1
- Introduction to variant calling
- Types of genetic variants: SNPs, indels, structural variants
- Read mapping and variant calling workflow
- Using GATK for variant calling
- Variant filtering and quality control
- Annotation of variants using tools like ANNOVAR
- Understanding variant call format (VCF)
Module 7: Variant Calling – Part 2
- Variant filtering based on quality metrics and population frequencies
- Hard filtering vs. variant quality score recalibration (VQSR)
- Annotation of variants with functional information
- Identification of disease-associated variants
- Germline vs. somatic variant calling
- Using variant databases: dbSNP, 1000 Genomes Project
- Case study: Analyzing a real-world whole-genome sequencing dataset
Module 8: ChIP-Seq Data Analysis
- Introduction to ChIP-Seq
- ChIP-Seq experimental design and workflow
- Read alignment and peak calling using tools like MACS2
- Peak annotation and functional enrichment analysis
- Differential binding analysis
- Visualization of ChIP-Seq data using genome browsers
- Case study: Analyzing a real-world ChIP-Seq dataset
Module 9: Metagenomics Data Analysis
- Introduction to metagenomics
- Metagenomics sequencing strategies
- Read assembly and taxonomic classification
- Diversity analysis and community profiling
- Functional analysis of metagenomic data
- Metabolic pathway reconstruction
- Case study: Analyzing a real-world metagenomics dataset
Module 10: Advanced Statistical Methods for NGS Data
- Mixed models for analyzing NGS data with complex experimental designs
- Bayesian methods for differential expression analysis
- Network analysis of gene expression data
- Machine learning methods for predicting disease outcomes from NGS data
- Survival analysis of NGS data
- Reproducible research practices and data sharing
- Course wrap-up and Q&A
Action Plan for Implementation
- Identify a specific NGS project within your organization that can benefit from the skills learned in this course.
- Develop a detailed analysis plan, including experimental design, statistical methods, and expected outcomes.
- Gather the necessary data and resources for the project.
- Implement the analysis plan using R and Bioconductor.
- Document the entire analysis process, including code, results, and interpretations.
- Present the findings to your colleagues and stakeholders.
- Share your code and data to promote reproducibility and collaboration.
Course Features
- Lecture 0
- Quiz 0
- Skill level All levels
- Students 0
- Certificate No
- Assessments Self





