Top 10 Free Data Science courses from Harvard
1. Principles, Statistical and Computational Tools for Reproducible Science
Start Date — April 17th, 2020
Difficulty level — Intermediate
Duration — 8 weeks long
You’ll learn (source: Course syllabus) —
- Learn the fundamentals of reproducible science and understand why reproducible research matters, definitions, and concepts and factors affecting reproducibility Module
- Key elements required for data provenance and reproducible experimental design
- Statistical methods for reproducible data analysis
- Participants will participate in six modules that will include several case studies that illustrate the significant impact of reproducible research methods on scientific discovery.
- Computational Tools for Reproducible Science using R and Rstudio, Python
- Computational tools for reproducible data analysis and version control (Git/GitHub, Emacs/RStudio/Spyder), reproducible data (Data repositories/Dataverse) and reproducible dynamic report generation (Rmarkdown/R Notebook/Jupyter/Pandoc), and workflows.
Taught By —
Curtis Huttenhower, Associate Professor of Computational Biology and Bioinformatics, Harvard University
John Quackenbush, Professor of Computational Biology and Bioinformatics, Harvard University
Lorenzo Trippa, Associate Professor of Biostatistics, Harvard University
Christine Choirat, Research Associate, Harvard University
2. Data Science: Linear Regression
Start Date — Jan 28th, 2020
Difficulty level — Beginner
Duration — 8 weeks long
You’ll learn (source: Course syllabus) —
- How Galton originally developed the linear regression
- Basics of confounding and detection techniques
- Basics of R
- Learn how to examine the relationships between variables by implementing linear regression in R
Taught By —
Rafael Irizarry, Professor of Biostatistics, Harvard University
3. Data Science: Machine Learning
Start Date — Jan 28th, 2020
Difficulty level — Beginner
Duration — 8 weeks long
You’ll learn (source: Course syllabus) —
- Learn the basics of machine learning
- How to perform cross-validation to avoid overtraining
- Popular machine-learning algorithms
- Basics of regularization
- Learn how to build a recommendation system from scratch
Taught By —
Rafael Irizarry, Professor of Biostatistics, Harvard University
4. Data Science: Visualization
Start Date — Jan 28th, 2020
Difficulty level — Beginner
Duration — 8 weeks long
You’ll learn (source: Course syllabus) —
- Learn the basics of Data visualization principles and how to apply them using ggplot2.
- Communicate data-driven findings, motivate analyses, and detect flaws
- You will learn how to leverage data to reveal valuable insights and advance your career
Taught By —
Rafael Irizarry, Professor of Biostatistics, Harvard University
5. Data Science: Probability
Start Date — Jan 28th, 2020
Difficulty level — Beginner
Duration — 8 weeks long
You’ll learn (source: Course syllabus) —
- Learn the important concepts in probability theory including random variables and independence and how to Monte Carlo simulation
- The meaning of expected values, standard errors and how to compute them in R
- The basics and importance of the Central Limit Theorem
Taught By —
Rafael Irizarry, Professor of Biostatistics, Harvard University
6. Data Science: Inference and Modeling
Start Date — Jan 28th, 2020
Difficulty level — Beginner
Duration — 8 weeks long
You’ll learn (source: Course syllabus) —
- Important concepts, necessary to define estimates and margins of errors of populations, parameters, estimates, and standard errors and learn how you can use these to make predictions relatively well and also provide an estimate of the precision of your forecast.
- How to use models to aggregate data
- Basics of Bayesian statistics and predictive modeling
Taught By —
Rafael Irizarry, Professor of Biostatistics, Harvard University
7. Data Science: R Basics
Start Date — Jan 28th, 2020
Difficulty level — Beginner
Duration — 8 weeks long
You’ll learn (source: Course syllabus) —
- Build a foundation in R and learn how to wrangle, analyze, and visualize data.
- Foundational concepts like data types, vectors arithmetic, and indexing — R programing
- Operations using R like sorting, data wrangling using dplyr, and making plots
Taught By —
Rafael Irizarry, Professor of Biostatistics, Harvard University
8. Introduction to Linear Models and Matrix Algebra
Start Date — April 17th, 2020
Difficulty level — Intermediate
Duration — 4weeks long
You’ll learn (source: Course syllabus) —
- Basics of matrix algebra including notations and operations
- Learn the application of matrix algebra to data analysis
- How to build and work with Linear models
- Learn about QR decomposition
Taught By —
Rafael Irizarry, Professor of Biostatistics, Harvard University
Michael Love, Assistant Professor, Departments of Biostatistics and Genetics, UNC Gillings School of Global Public Health
9. Statistics and R
Start Date — April 17th, 2020
Difficulty level — Intermediate
Duration — 4weeks long
You’ll learn (source: Course syllabus) —
- Learn by examples that will help you make the connection between concepts and implementation
- Learn in-depth about Random variables, Distributions, Inference: p-values and confidence intervals, Non-parametric statistics
- Learn how to do Exploratory Data Analysis using R
- Learn how to use R scripts to analyze data and the basics of reproducible research.
Taught By —
Rafael Irizarry, Professor of Biostatistics, Harvard University
Michael Love, Assistant Professor, Departments of Biostatistics and Genetics, UNC Gillings School of Global Public Health
10. High-Dimensional Data Analysis
Start Date — April 17th, 2020
Difficulty level — Intermediate
Duration — 4weeks long
You’ll learn (source: Course syllabus) —
- Learn the mathematical definition of distance and use of the singular value decomposition (SVD) for dimension reduction of high-dimensional data sets, and multi-dimensional scaling and its connection to principal component analysis.
- Learn the basics of Machine Learning
- Learn the basics of Factor Analysis and how to deal with Batch Effects
- Learn how to implement Clustering and Heatmaps
Taught By —
Rafael Irizarry, Professor of Biostatistics, Harvard University
Michael Love, Assistant Professor, Departments of Biostatistics and Genetics, UNC Gillings School of Global Public Health