Coursera基因组课程

最后发布时间 : 2023-08-01 21:14:41 浏览量 :

基因组技术基础

This course introduces you to the basic biology of modern genomics and the experimental tools that we use to measure it. We'll introduce the Central Dogma of Molecular Biology and cover how next-generation sequencing can be used to measure DNA, RNA, and epigenetic patterns. You'll also get an introduction to the key concepts in computing and data science that you'll need to understand how data from next-generation sequencing experiments are generated and analyzed.
This is the first course in the Genomic Data Science Specialization.

Overview

In this Module, you can expect to study topics of "Just enough molecular biology", "The genome", "Writing a DNA sequence", "Central dogma", "Transcription", "Translation", and "DNA structure and modifications".

Why Genomics?12分钟
What Is Genomics?7分钟
What Is Genomic Data Science?8分钟
Just Enough Cell Biology8分钟
Important Molecules in Molecular Biology7分钟
The Human Genome Project17分钟
Molecular Biology Structures9分钟
From Genes to Phenotypes9分钟

Measurement Technology

In this module, you'll learn about polymerase chain reaction, next generation sequencing, and applications of sequencing.

Polymerase Chain Reaction8分钟
Next Generation Sequencing7分钟
Applications of Sequencing8分钟

Computing Technology

The lectures for this module cover a few basic topics in computing technology. We'll go over the foundations of computer science, algorithms, memory and data structures, efficiency, software engineering, and computational biology software.

What Is Computer Science?5分钟
Algorithms4分钟
Memory and Data Structures6分钟
Efficiency3分钟
Software Engineering8分钟
What is Computational Biology Software11分钟

Data Science Technology

In this module on Data Science Technology, we'll be covering quite a lot of information about how to handle the data produced during the sequencing process. We'll cover reproducibility, analysis, statistics, question types, the central dogma of inference, analysis code, testing, prediction, variation, experimental design, confounding, power, sample size, correlation, causation, and degrees of freedom.

Why Care About Statistics?4分钟
What Went Wrong?4分钟
The Central Dogma of Statistics3分钟
Data Sharing Plans4分钟
Getting Help with Statistics2分钟
Plotting Your Data5分钟
Sample Size and Variability7分钟
Statistical Significance6分钟
Multiple Testing7分钟
Study Design, Batch Effects, and Confounding7分钟

基因组数据科学中的 Python 应用

This class provides an introduction to the Python programming language and the iPython notebook. This is the third course in the Genomic Big Data Science Specialization from Johns Hopkins University.

Week One

This week we will have an overview of Python and take the first steps towards programming.

Lecture 1: Overview of Python12分钟
Lecture 2.1 - First Steps Toward Programming Part 110分钟
Lecture 2.2 - First Steps Toward Programming Part 215分钟
Lecture 2.3 - First Steps Toward Programming Part 3 (8:57)8分钟
Lecture 2.4 - First Steps Toward Programming Part 4 (9:58)9分钟

Week Two

In this module, we'll be taking a look at Data Structures and Ifs and Loops.

Lecture 3.1: Data Structures Part 1 (11:58)11分钟
Lecture 3.2: Data Structures Part 2 (10:41)10分钟
Lecture 4.1: Ifs and Loops Part 1 (11:26)11分钟
Lecture 4.2: Ifs and Loops Part 2 (15:28)15分钟

Week Three

In this module, we have a long three-part lecture on Functions as well as a 10-minute look at Modules and Packages.

Lecture 5.1: Functions Part 1 (5:54)5分钟
Lecture 5.2: Functions Part 2 (8:20)8分钟
Lecture 5.3: Functions Part 3 (13:24)13分钟
Lecture 6: Modules and Packages (10:32)10分钟

Week Four

In this module, we have another long three-part lecture, this time about Communicating with the Outside, as well as a final lecture about Biopython.

Lecture 7.1: Communicating with the Outside Part 1 (6:41)6分钟
Lecture 7.2: Communicating with the Outside Part 2 (7:38)7分钟
Lecture 7.3: Communicating with the Outside Part 3 (17:42)17分钟
Lecture 8: Biopython (13:32)13分钟

DNA测序算法

We will learn computational methods -- algorithms and data structures -- for analyzing DNA sequencing data. We will learn a little about DNA, genomics, and how DNA sequencing is used. We will use Python to implement key algorithms and data structures and to analyze real genomes and DNA sequencing datasets.

DNA sequencing, strings and matching

This module we begin our exploration of algorithms for analyzing DNA sequencing data. We'll discuss DNA sequencing technology, its past and present, and how it works.

Module 1 Introduction1分钟
Lecture: Why study this?4分钟
Lecture: DNA sequencing past and present3分钟
Lecture: Genomes as strings, reads as substrings5分钟
Lecture: String definitions and Python examples3分钟
Practical: String basics 7分钟
Practical: Manipulating DNA strings 7分钟
Practical: Downloading and parsing a genome 6分钟
Lecture: How DNA gets copied3分钟
Optional lecture: How second-generation sequencers work 7分钟
Optional lecture: Sequencing errors and base qualities 6分钟
Lecture: Sequencing reads in FASTQ format4分钟
Practical: Working with sequencing reads 11分钟
Practical: Analyzing reads by position 6分钟
Lecture: Sequencers give pieces to genomic puzzles5分钟
Lecture: Read alignment and why it's hard3分钟
Lecture: Naive exact matching10分钟
Practical: Matching artificial reads 6分钟
Practical: Matching real reads 7分钟

Preprocessing, indexing and approximate matching

In this module, we learn useful and flexible new algorithms for solving the exact and approximate matching problems. We'll start by learning Boyer-Moore, a fast and very widely used algorithm for exact matching

Week 2 Introduction 1分钟
Lecture: Boyer-Moore basics8分钟
Lecture: Boyer-Moore: putting it all together6分钟
Lecture: Diversion: Repetitive elements5分钟
Practical: Implementing Boyer-Moore 10分钟
Lecture: Preprocessing7分钟
Lecture: Indexing and the k-mer index10分钟
Lecture: Ordered structures for indexing8分钟
Lecture: Hash tables for indexing7分钟
Practical: Implementing a k-mer index 7分钟
Lecture: Variations on k-mer indexes9分钟
Lecture: Genome indexes used in research9分钟
Lecture: Approximate matching, Hamming and edit distance6分钟
Lecture: Pigeonhole principle6分钟
Practical: Implementing the pigeonhole principle 9分钟

Edit distance, assembly, overlaps

This week we finish our discussion of read alignment by learning about algorithms that solve both the edit distance problem and related biosequence analysis problems, like global and local alignment.

Module 3 Introduction 1分钟
Lecture: Solving the edit distance problem12分钟
Lecture: Using dynamic programming for edit distance12分钟
Practical: Implementing dynamic programming for edit distance 6分钟
Lecture: A new solution to approximate matching9分钟
Lecture: Meet the family: global and local alignment10分钟
Practical: Implementing global alignment 8分钟
Lecture: Read alignment in the field4分钟
Lecture: Assembly: working from scratch2分钟
Lecture: First and second laws of assembly8分钟
Lecture: Overlap graphs8分钟
Practical: Overlaps between pairs of reads 4分钟
Practical: Finding and representing all overlaps 3分钟

Algorithms for assembly

In the last module we began our discussion of the assembly problem and we saw a couple basic principles behind it. In this module, we'll learn a few ways to solve the alignment problem.

Lecture: The shortest common superstring problem8分钟
Practical: Implementing shortest common superstring 4分钟
Lecture: Greedy shortest common superstring7分钟
Practical: Implementing greedy shortest common superstring 7分钟
Lecture: Third law of assembly: repeats are bad5分钟
Lecture: De Bruijn graphs and Eulerian walks8分钟
Practical: Building a De Bruijn graph 4分钟
Lecture: When Eulerian walks go wrong9分钟
Lecture: Assemblers in practice8分钟
Lecture: The future is long?9分钟
Lecture: Computer science and life science5分钟
Lecture: Thank yous 43

用于基因组数据科学的命令行工具

Introduces to the commands that you need to manage and analyze directories, files, and large sets of genomic data. This is the fourth course in the Genomic Big Data Science Specialization from Johns Hopkins University.

Basic Unix Commands

In this module, you will be introduced to command Line Tools for Genomic Data Science

Basic Unix Commands 1: Content Representation3分钟
Basic Unix Commands 2: Files, Directories, Paths7分钟
Basic Unix Commands 3: File Naming4分钟
Basic Unix Commands 4: Content Creation9分钟
Basic Unix Commands 5: Accessing Content I6分钟
Basic Unix Commands 6: Accessing Content II4分钟
Basic Unix Commands 7: Redirecting Content6分钟
Basic Unix Commands 8: Querying Content15分钟
Basic Unix Commands 9: Comparing Content11分钟
Basic Unix Commands 10: Archiving Content13分钟
Basic Unix Commands 11: Practical Exercises I13分钟
Basic Unix Commands 12: Practical Exercises II9分钟

Week Two

In this module, we'll be taking a look at Sequences and Genomic Features in a sequence of 10 presentations.

Sequences and Genomic Features 1: Molecular Bio Primer6分钟
Sequences and Genomic Features 2: Sequence Representation and Generation11分钟
Sequences and Genomic Features 3: Annotation14分钟
Sequences and Genomic Features 4.1: Alignment I13分钟
Sequences and Genomic Features 4.2: Alignment II9分钟
Sequences and Genomic Features 5: Recreating Sequences & Features12分钟
Sequences and Genomic Features 6: Genomic Feature Retrieval5分钟
Sequences and Genomic Features 7: SAMtools I11分钟
Sequences and Genomic Features 8: SAMtools II9分钟
Sequences and Genomic Features 9: BEDtools I15分钟
Sequences and Genomic Features 10: BEDtools II15分钟

Week Three

In this module, we'll be going over Alignment and Sequence Variation in another sequence of 8 presentations.

Alignment & Sequence Variation 1: Overview4分钟
Alignment & Sequence Variation 2: Alignment & Variant Detection Tools5分钟
Alignment & Sequence Variation 3: VCF11分钟
Alignment & Sequence Variation 4: Bowtie9分钟
Alignment & Sequence Variation 5: BWA 4分钟
Alignment & Sequence Variation 6: SAMtools (mpileup)6分钟
Alignment & Sequence Variation 7: BCFtools8分钟
Alignment & Sequence Variation 8: Variant Calling5分钟

Week Four

In this module, we'll be going over Tools for Transcriptomics in a sequence of 6 presentations.

Tools for Transcriptomics 1: Overview6分钟
Tools for Transcriptomics 2: RNA-seq7分钟
Tools for Transcriptomics 3.1: Tophat I9分钟
Tools for Transcriptomics 3.2: Tophat II 8分钟
Tools for Transcriptomics 4: Cufflinks10分钟
Tools for Transcriptomics 5: Cuffdiff16分钟
Tools for Transcriptomics 6.1: Integrated Genomics Viewer I8分钟
Tools for Transcriptomics 6.2: Integrated Genomics Viewer II 6分钟

使用Bioconductor分析基因组科学数据

Learn to use tools from the Bioconductor project to perform analysis of genomic data. This is the fifth course in the Genomic Big Data Specialization from Johns Hopkins University.

Week One

The class will cover how to install and use Bioconductor software. We will discuss common data structures, including ExpressionSets, SummarizedExperiment and GRanges used across several types of analyses.

Installing R on Windows 3分钟
Installing R on A Mac 2分钟
Installing R Studio on a Mac 1分钟
What is Bioconductor7分钟
Installing Bioconductor3分钟
The Bioconductor Website9分钟
Useful Online Resources5分钟
R Base Types18分钟
GRanges - Overview4分钟
IRanges - Basic Usage12分钟
GenomicRanges - GRanges8分钟
GenomicRanges - Basic GRanges Usage8分钟
GenomicRanges - seqinfo6分钟
AnnotationHub8分钟
Usecase: AnnotationHub and GRanges, part 112分钟
Usecase: AnnotationHub and GRanges, part 213分钟

Week Two

In this week we will learn how to represent and compute on biological sequences, both at the whole-genome level and at the level of millions of short reads.

Biostrings7分钟
BSgenome6分钟
Biostrings - Matching6分钟
BSgenome - Views9分钟
GenomicRanges - Rle12分钟
GenomicRanges - Lists6分钟
GenomicFeatures18分钟
rtracklayer - Data Import14分钟

Week Three

In this week we will cover Basic Data Types, ExpressionSet, biomaRt, and R S4.

Basic Data Types4分钟
Annotation Overview4分钟
ExpressionSet Overview4分钟
ExpressionSet9分钟
SummarizedExperiment7分钟
GEOquery6分钟
biomaRt13分钟
R S4 Classes16分钟
R S4 Methods10分钟

Week Four

In this week, we will cover Getting data in Bioconductor, Rsamtools, oligo, limma, and minfi

Getting data into Bioconductor6分钟
Short Read4分钟
Rsamtools12分钟
oligo14分钟
limma16分钟
minfi11分钟
Count-based RNA-seq analysis15分钟

基因组数据科学所需的统计学

An introduction to the statistics behind the most popular genomic data science projects. This is the sixth course in the Genomic Big Data Science Specialization from Johns Hopkins University.

Module 1

This course is structured to hit the key conceptual ideas of normalization, exploratory analysis, linear modeling, testing, and multiple testing that arise over and over in genomic studies.

Welcome to Statistics for Genomic Data Science2分钟
What is Statistics?2分钟
Finding Statistics You Can Trust (4:44)4分钟
Getting Help (3:44)3分钟
What is Data? (4:28)4分钟
Representing Data (5:23)5分钟
Module 1 Overview (1:07)1分钟
Reproducible Research (3:42)3分钟
Achieving Reproducible Research (5:02)5分钟
R Markdown (6:26)6分钟
The Three Tables in Genomics (2:10)2分钟
The Three Tables in Genomics (in R) (3:46)3分钟
Experimental Design: Variability, Replication, and Power (14:17)14分钟
Experimental Design: Confounding and Randomization (9:26)9分钟
Exploratory Analysis (9:21)9分钟
Exploratory Analysis in R Part I (7:22)7分钟
Exploratory Analysis in R Part II (10:07)10分钟
Exploratory Analysis in R Part III (7:26)7分钟
Data Transforms (7:31)7分钟
Clustering (8:43)8分钟
Clustering in R (9:09)9分钟

Module 2

This week we will cover preprocessing, linear modeling, and batch effects.

Module 2 Overview (1:12)1分钟
Dimension Reduction (12:13)12分钟
Dimension Reduction (in R) (8:48)8分钟
Pre-processing and Normalization (11:26)11分钟
Quantile Normalization (in R) (4:49)4分钟
The Linear Model (6:50)6分钟
Linear Models with Categorical Covariates (4:08)4分钟
Adjusting for Covariates (4:16)4分钟
Linear Regression in R (13:03)13分钟
Many Regressions at Once (3:50)3分钟
Many Regressions in R (7:21)7分钟
Batch Effects and Confounders (7:11)7分钟
Batch Effects in R: Part A (8:18)8分钟
Batch Effects in R: Part B (3:50)3分钟

Module 3

This week we will cover modeling non-continuous outcomes (like binary or count data), hypothesis testing, and multiple hypothesis testing.

Module 3 Overview (1:07)1分钟
Logistic Regression (7:03)7分钟
Regression for Counts (5:02)5分钟
GLMs in R (9:28)9分钟
Inference (4:18)4分钟
Null and Alternative Hypotheses (4:45)4分钟
Calculating Statistics (5:11)5分钟
Comparing Models (7:08)7分钟
Calculating Statistics in R9分钟
Permutation (3:26)3分钟
Permutation in R (3:33)3分钟
P-values (6:04)6分钟
Multiple Testing (8:25)8分钟
P-values and Multiple Testing in R: Part A (5:58)5分钟
P-values and Multiple Testing in R: Part B (4:23)4分钟

Module 4

In this week we will cover a lot of the general pipelines people use to analyze specific data types like RNA-seq, GWAS, ChIP-Seq, and DNA Methylation studies.

Module 4 Overview (1:21)1分钟
Gene Set Enrichment (4:19)4分钟
More Enrichment (3:59)3分钟
Gene Set Analysis in R (7:43)7分钟
The Process for RNA-seq (3:59)3分钟
The Process for Chip-Seq (5:25)5分钟
The Process for DNA Methylation (5:03)5分钟
The Process for GWAS/WGS (6:12)6分钟
Combining Data Types (eQTL) (6:04)6分钟
eQTL in R (10:36)10分钟
Researcher Degrees of Freedom (5:49)5分钟
Inference vs. Prediction (8:52)8分钟
Knowing When to Get Help (2:31)2分钟
Statistics for Genomic Data Science Wrap-Up (1:53)1分钟

GATK3.0 WGS 分析 GWAS、MWAS、mGWAS、eQTL、miQTL