Both Certificates start with foundational courses designed to equip aspiring data science professionals with essential technical skills. During the first half of the Certificate (eight weeks), participants will:
For participants pursuing the Data Science Certificate, the second half (eight weeks) focuses on advanced data science topics, including: visualization, algorithms, sampling, and production.
For those pursuing the Machine Learning Software Fundamentals Certificate, the second half (eight weeks) focuses on advanced linear methods, model assessment, and deep learning with neural networks.
Throughout the entire 16-week Certificate, all participants will actively engage in instructor-led live webinars designed to provide practical job readiness skills, including:
The Certificates closes with a networking session, providing participants with the opportunity to connect in-person with prospective employers and secure coveted roles in the fields of data science and machine learning.
All participants in the Data Science and Machine Learning Software Foundations Certificates are required to complete four technical skills courses.
This course provides a comprehensive introduction to Unix shell language, covering file and directory navigation, command usage, script creation, and basic functions involving pipes, filters, and loops. Participants will get started with version control and GitHub, exploring the ethical implications regarding reproducibility. Topics include Git setup, repository management (recording, viewing, and undoing changes), branch creation, and collaborative workflows. Advanced commands, debugging, and history editing will be introduced. Participants will learn effective problem-solving techniques with Google and Stackflow, emphasizing reproducibility and documentation. The course emphasizes ethics and equity considerations in projects, fostering discussion-based learning with pre-class readings and live coding exercises.
Learning Outcomes:
Course delivery:
Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours)
Learners will be assigned to one or the other based on which programming language they are more familiar with as they enter the Certificate. Knowledge of Python is needed for subsequent Certificate courses and its importance for industry.
Introduction to Python
The first half of the course will focus on the essentials of coding in Python and ethical considerations of using algorithms. Participants will learn how to design functions, repeat code using loops, store data in lists, conduct code testing and debugging, and manipulate data using various data analysis and visualization tools such as numpy, pandas, matplotlib, seaborn, and plotly. Participants will participate in a facilitated discussion about the Tuskegee experiment, its long-term effects, and the trustworthiness of AI applications in disparate social systems.
Learning Outcomes
Course delivery
Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours)
Introduction to R
The first part of this course teaches R with a focus on manipulating and visualizing data. Participants will get set up with a functional RStudio workflow, use different file types, transform data tables, import and manipulate data, use functions and loops, create data visualizations, make a Shiny app, and learn how to solve problems with their programming. Both base R and tidyverse methods are taught. To work reproducibly, participants will create R Projects. The second part of the course will cover the ethics of consent, Equity, Diversity & Inclusion (EDI) training, and professional skills including presentation, project management, and data security. Finally, the course will conclude with an industry case study.
Learning Outcomes
Course delivery
Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours)
Much research these days is done using software. Researchers need to develop comfort with building, maintaining and improving high-quality software. The first half of this course focuses on equipping students with the skills to build robust software that can be used to answer research questions. It focuses on how to effectively write short programs, as part of a small team, in a reproducible way. Research software that is built correctly can be used by other teams, not just the researcher who originally wrote it.
SQL is used across the machine learning pipeline, and is a fundamental skill for data scientists to master. The second part of this course will focus on the technical skills needed for working with SQL, including flat-file datasets (JSON, CSV) ingestion, query design, and relational database management. Additionally, it will examine common data management concerns, data access management, and data privacy adherence. Learners will be introduced to principles around reproducibility, sharing data, and data ethics (for example, respecting those whose data we use). This course will also cover professional skills such as communication (with a variety of stakeholders) and documentation.
Learning Outcomes
Course delivery
Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours)
This course provides the skills required to design, implement, test, and validate a variety of supervised learning models. The basics of statistical learning including modelling with the goal of prediction versus inference, prediction accuracy and model interpretability trade-off, and the all-important bias-variance trade-off will be covered. Each section of this course will address a unique set of methods used for supervised learning on real data sets.
Learning Outcomes
Course delivery
Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours)
Participants pursuing the Data Science Certificate must complete the following four electives:
This course is designed to introduce the fundamentals of sampling, probability and survey methodology. It covers various topics, including; simple probability samples, stratified sampling, cluster sampling, addressing non-response, estimating and survey quality. Participants will consider the theoretical foundations of different sampling approaches, as well as practical applications of this knowledge in contexts such as market research, political polling and the Canadian census. Analysis using the R programming language will also be highlighted, drawing on skills developed in Introduction to R.
Learning Outcomes
Course delivery
Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours)
Regardless of the quality of your analyses and data-related findings, if you cannot effectively communicate them, their impact will be severely limited. Technical skills in this course will focus on a step-by-step walkthrough of choosing, creating and modifying data visualizations in R using ggplot. Discussions will include general design principles applicable to other data visualization software used in industry and academia (e.g., Python, Tableau, PowerBI).
Incorporating case studies and real-world examples, the ethical components of this course will include:
Learning Outcomes
Course delivery
Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours)
There is often a need to work out which algorithm or data structure should be used given some practical situation. This course focuses on developing comfort with algorithms and data structures using Big-O notation, recursive functions, and data structures.
Learning Outcomes
Course delivery
Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours)
Building a model differs significantly from creating a model that is usable by others. This course focuses on everything that happens after the model has been put together, specifically addressing machine learning system requirements such as: reliability, scalability, maintainability, and adaptability; feature engineering; model development and deployment; monitoring; and infrastructure and tooling.
Learning Outcomes
Course delivery
Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours)
Participants pursuing the Machine Learning Software Foundations Certificate must complete the following two courses:
This course builds upon the statistical foundation provided in the Estimation, Testing & Machine Learning, adding theory around linear methods and classification. The course will also focus on model assessment, inference and boosting, and sets the foundation for deep learning with neural networks and related approaches.
Learning Outcomes
Course delivery
Three weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 25 hours)
In this course, participants will apply machine learning techniques to software applications using the Python language. Successful participants will apply each step of the machine learning workflow in relevant industry applications, as well as be exposed to cutting-edge techniques and the underlying theory. Technical topics include: Machine Learning in Practice, Data Preparation and Feature Engineering, Supervised and Unsupervised Learning, Model Evaluation, and Ensemble Learning. Participants will also work on other skills, such as networking and building a community of influence, mastering the interview process, including behavioral and technical interviews.
Learning Outcomes
Course delivery
Three weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 25 hours)
Both Certificates go beyond technical proficiency, placing a strong emphasis on equipping participants for success in the competitive world of data science and machine learning. With the guidance of dedicated employment and career experts, participants will acquire professional skills, including:
You can take either 16-week certificate in one of five sessions below. A typical week will have you in class Monday – Thursday, 6 PM – 8:30 PM, and Saturday mornings 9 AM – 11:30 AM.
At this time we are only accepting applications for the second session.
Session 1: November 27, 2023 – March 24, 2024
Session 2: January 15, 2024 – May 5, 2024
Session 3: April 22, 2024 – August 11, 2024
Session 4: August 19, 2024 – December 8th, 2024
Session 5: November 18, 2024– March 16, 2025
Course Instructors and Course Support Staff – Interested in joining our technical skills course team?
Please reach out to courses.dsi@utoronto.ca