Certificates and Sessions

Both Certificates start with foundational courses designed to equip aspiring data science professionals with essential technical skills. During the first half of the Certificate (eight weeks), participants will: 

  • Establish a solid understanding of data science principles, including an introduction to Unix Shell, Git, and Github 
  • Develop practical programming language skills in Python or R 
  • Gain technical proficiency for working with SQL
  • Learn the design, implementation, testing and validation of various supervised learning models 
  • Learn to build, maintain, and improve robust software that can be used to answer research questions 

For participants pursuing the Data Science Certificate, the second half (eight weeks) focuses on advanced data science topics, including: visualization, algorithms, sampling, and production. 

For those pursuing the Machine Learning Software Fundamentals Certificate, the second half (eight weeks) focuses on advanced linear methods, model assessment, and deep learning with neural networks.  

Throughout the entire 16-week Certificate, all participants will actively engage in instructor-led live webinars designed to provide practical job readiness skills, including:  

  • Crafting a resume tailored to data science and machine learning roles 
  • Enhancing interpersonal communication skills 
  • Developing and delivering impactful presentations 
  • Understanding the principles of technical writing
  • Building valuable networks and fostering a community of influence 
  • Preparing for technical interviews 

The Certificates closes with a networking session, providing participants with the opportunity to connect in-person with prospective employers and secure coveted roles in the fields of data science and machine learning. 

All participants in the Data Science and Machine Learning Software Foundations Certificates are required to complete four technical skills courses. 

This course provides a comprehensive introduction to Unix shell language, covering file and directory navigation, command usage, script creation, and basic functions involving pipes, filters, and loops. Participants will get started with version control and GitHub, exploring the ethical implications regarding reproducibility. Topics include Git setup, repository management (recording, viewing, and undoing changes), branch creation, and collaborative workflows. Advanced commands, debugging, and history editing will be introduced. Participants will learn effective problem-solving techniques with Google and Stackflow, emphasizing reproducibility and documentation. The course emphasizes ethics and equity considerations in projects, fostering discussion-based learning with pre-class readings and live coding exercises. 

Learning Outcomes: 

  • Develop the ability to comfortably access the terminal and proficiently write scripts using basic commands, variables, pipes, filters, and loops. 
  • Understand how to utilize version control systems effectively for: preserving personal work; accessing and editing previous code versions; collaborating with peers; and identifying and debugging errors in code. 
  • Develop the skills to independently troubleshoot issues by identifying problems, conducting research, and formulating questions using components of reproducibility. 
  • Identify ethical considerations within the field, including scrutinizing the composition of datasets for biases and consider historical context of power abuses. 

Course delivery: 

Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours) 

Learners will be assigned to one or the other based on which programming language they are more familiar with as they enter the Certificate. Knowledge of Python is needed for subsequent Certificate courses and its importance for industry.

Introduction to Python

The first half of the course will focus on the essentials of coding in Python and ethical considerations of using algorithms. Participants will learn how to design functions, repeat code using loops, store data in lists, conduct code testing and debugging, and manipulate data using various data analysis and visualization tools such as numpy, pandas, matplotlib, seaborn, and plotly. Participants will participate in a facilitated discussion about the Tuskegee experiment, its long-term effects, and the trustworthiness of AI applications in disparate social systems. 

Learning Outcomes 

  • Understand various Python data types and their role in coding, including differentiating and evaluating expressions using numeric types (integer, long, and floating-point numbers), Booleans, strings, and lists.  
  • Implement the Function Design Recipe to create functions in Python and reduce duplication.  
  • Utilize numpy and pandas to analyze a dataset and use these libraries to manipulate numerical and tabular data in Python.  
  • Interact with databases using Python, using visualization techniques like matplotlib, seaborn, and plotly.  
  • Learn debugging and testing techniques to troubleshoot errors, select test cases, and ensure code correctness, reliability, and robustness.  
  • Understand the ethical issues with software and awareness of case studies involving software failures. 
  • Prepare to confidently answer technical job interview questions.  

Course delivery 

Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours) 


Introduction to R

The first part of this course teaches R with a focus on manipulating and visualizing data. Participants will get set up with a functional RStudio workflow, use different file types, transform data tables, import and manipulate data, use functions and loops, create data visualizations, make a Shiny app, and learn how to solve problems with their programming. Both base R and tidyverse methods are taught. To work reproducibly, participants will create R Projects. The second part of the course will cover the ethics of consent, Equity, Diversity & Inclusion (EDI) training, and professional skills including presentation, project management, and data security. Finally, the course will conclude with an industry case study. 

Learning Outcomes 

  • Gain proficiency in utilizing R, including RStudio and coding best practices, and understand R’s role in data science. 
  • Apply manipulation and wrangling techniques to describe and define the characteristics of datasets.  
  • Develop the ability to identify and describe data structures, reshape datasets through manipulation techniques, detect missing values, clean data, summarize and export data, exporting data, and report findings.  
  • Produce a wide range of informative and visually appealing charts, graphs, and plots to effectively communicate insights and patterns hidden within the data. 
  • Develop the skills and strategies necessary to diagnose and fix errors in R code, interpret error messages, and employ effective debugging strategies. 
  • Demonstrate an understanding of data consent principles, addressing ethical and legal considerations for data collection and usage. 
  • Acquire the skills and knowledge required to create compelling presentations and effectively manage projects using R, including visualizations and code documentation. 

Course delivery 

Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours) 

Much research these days is done using software. Researchers need to develop comfort with building, maintaining and improving high-quality software. The first half of this course focuses on equipping students with the skills to build robust software that can be used to answer research questions. It focuses on how to effectively write short programs, as part of a small team, in a reproducible way. Research software that is built correctly can be used by other teams, not just the researcher who originally wrote it.  

SQL is used across the machine learning pipeline, and is a fundamental skill for data scientists to master. The second part of this course will focus on the technical skills needed for working with SQL, including flat-file datasets (JSON, CSV) ingestion, query design, and relational database management. Additionally, it will examine common data management concerns, data access management, and data privacy adherence. Learners will be introduced to principles around reproducibility, sharing data, and data ethics (for example, respecting those whose data we use). This course will also cover professional skills such as communication (with a variety of stakeholders) and documentation.  

Learning Outcomes 

  • Learn how to work as a team within a Git/GitHub setting. Specifically, branching, merging, conflicts, and pull requests.  
  • Know how to create bug reports and prioritize requests.  
  • Develop a comfort with using makefiles and configuring programs.  
  • Proficiently test software, handle errors, and track provenance  
  • Know how to create Python packages.  
  • Acquire a comfort of calling APIs.   
  • Develop comfort with Docker  
  • Develop a better understanding of the structure of databases.  
  • Save and transport data in CSV and JSON file formats.  
  • Familiarity with the essentials of querying and manipulating data in SQL and how to search for future questions.  
  • Familiarity with the legal framework around sharing data.  
  • Analyze data requirements and work with different stakeholders such as analysts and managers. 

Course delivery 

Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours) 

This course provides the skills required to design, implement, test, and validate a variety of supervised learning models. The basics of statistical learning including modelling with the goal of prediction versus inference, prediction accuracy and model interpretability trade-off, and the all-important bias-variance trade-off will be covered. Each section of this course will address a unique set of methods used for supervised learning on real data sets. 

 

Learning Outcomes 

  • Understand, implement and interpret the results from several supervised learning approaches for regression and classification 
  • Utilize resampling methods to extract more information from a data set and to choose the best model 
  • Perform exploratory data analysis for unsupervised learning 
  • Understand what is required for reproducible learning 
  • Appreciate the uncertainties associated with model results and the ethical consequences of acting on these results 

Course delivery 

Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours) 

Participants pursuing the Data Science Certificate must complete the following four electives: 

This course is designed to introduce the fundamentals of sampling, probability and survey methodology. It covers various topics, including; simple probability samples, stratified sampling, cluster sampling, addressing non-response, estimating and survey quality. Participants will consider the theoretical foundations of different sampling approaches, as well as practical applications of this knowledge in contexts such as market research, political polling and the Canadian census. Analysis using the R programming language will also be highlighted, drawing on skills developed in Introduction to R.  

Learning Outcomes 

  • Gain proficiency in executing simple probability samples. 
  • Understand complex sampling procedures and the tradeoffs involved 
  • Acquire the skills to identify and address sources of error or inaccuracies in data stemming from sampling strategies.
  • Develop an intuition around survey quality 

Course delivery 

Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours) 

Regardless of the quality of your analyses and data-related findings, if you cannot effectively communicate them, their impact will be severely limited. Technical skills in this course will focus on a step-by-step walkthrough of choosing, creating and modifying data visualizations in R using ggplot. Discussions will include general design principles applicable to other data visualization software used in industry and academia (e.g., Python, Tableau, PowerBI). 

Incorporating case studies and real-world examples, the ethical components of this course will include:  

  • ensuring reproducibility with data visualization 
  • building awareness of the decision-making that goes into sharing data visually 
  • addressing inequity in data visualization by focusing on accessible design.  

Learning Outcomes 

  • Acquire the skills to create and customize data visualizations start to finish using R
  • Gain insights into the general design principles for creating accessible/equitable data visualizations in R and other software 
  • Develop an understanding of data visualization as purposeful/telling a story (and the ethical/professional implications) 

Course delivery 

Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours) 

There is often a need to work out which algorithm or data structure should be used given some practical situation. This course focuses on developing comfort with algorithms and data structures using Big-O notation, recursive functions, and data structures. 

Learning Outcomes 

  • Assess options and choices around fundamental algorithms and data structures using Big-O notation. 
  • Develop comfort with recursive functions. 
  • Identify appropriate data structures 
  • Transform a client-led problem into an optimization challenge and identify opportunities for improvement 
  • Identify causes for slow-running code and implement strategies to optimize performance 

Course delivery 

Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours) 

Building a model differs significantly from creating a model that is usable by others. This course focuses on everything that happens after the model has been put together, specifically addressing machine learning system requirements such as: reliability, scalability, maintainability, and adaptability; feature engineering; model development and deployment; monitoring; and infrastructure and tooling.  

Learning Outcomes 

  • Design machine learning systems that are reliable, scalable, maintainable, and adaptable. 
  • Apply feature engineering techniques to optimize machine learning models. 
  • Deploy machine learning models in production using various strategies.
  • Implement monitoring and alerting systems to troubleshoot and diagnose issues in production. 

Course delivery 

Two weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 20 hours) 

Participants pursuing the Machine Learning Software Foundations Certificate must complete the following two courses:

This course builds upon the statistical foundation provided in the Estimation, Testing & Machine Learning, adding theory around linear methods and classification. The course will also focus on model assessment, inference and boosting, and sets the foundation for deep learning with neural networks and related approaches. 

Learning Outcomes 

  • Apply advanced linear methods such as Lasso and Ridge regression for feature selection and regularization, and understand their theoretical underpinnings. 
  • Evaluate machine learning models with techniques such as hypothesis testing and confidence intervals, and interpret the results in the context of the problem domain. 
  • Apply boosting algorithms such as XGBoost and LightGBM to improve the performance of machine learning models on large and complex datasets. 
  • Implement neural network architectures such as multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs) in Python and/or R, and understand how to tune their hyperparameters for optimal performance. 
  • Discuss ethical considerations in machine learning, such as fairness, accountability, and transparency, and identify potential biases and issues that may arise in the development and deployment of machine learning models.  

Course delivery 

Three weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 25 hours) 

In this course, participants will apply machine learning techniques to software applications using the Python language. Successful participants will apply each step of the machine learning workflow in relevant industry applications, as well as be exposed to cutting-edge techniques and the underlying theory.  Technical topics include: Machine Learning in Practice, Data Preparation and Feature Engineering, Supervised and Unsupervised Learning, Model Evaluation, and Ensemble Learning.  Participants will also work on other skills, such as networking and building a community of influence, mastering the interview process, including behavioral and technical interviews. 

Learning Outcomes 

  • Propose and present a business pitch for a machine learning project in a real-world business setting through a business case study with an industry partner or at their current organization. 
  • Design, formulate, and construct a full comprehensive lifecycle project in machine learning via a course project by managing a timeline 
  • Apply, deploy, and implement each step of the machine learning lifecycle in Python programming language, and debug errors and iterate on improvements. 

Course delivery 

Three weeks of instructor-led live webinars, each lasting 2.5 hours (totaling 25 hours) 

 

Both Certificates go beyond technical proficiency, placing a strong emphasis on equipping participants for success in the competitive world of data science and machine learning. With the guidance of dedicated employment and career experts, participants will acquire professional skills, including: 

  • Elevating communication and presentation skills: Participants will develop impactful communication and presentation abilities to help convey complex ideas effectively to a broad range of audiences. 
  • Crafting a persuasive resume: Participants will learn how to highlight data-driven results and transferable skills that employers value. 
  • Mastering networking skills: Participants will understand key elements of becoming an effective networker to build strong connections, increase professional visibility and expand their reach to identify new opportunities. 
  • Acing the interview process: Participants will gain insights in the interview process, including preparation for both behavioral and technical interviews.  

You can take either 16-week certificate in one of five sessions below. A typical week will have you in class Monday – Thursday, 6 PM  8:30 PM, and Saturday mornings 9 AM  11:30 AM.

At this time we are only accepting applications for the second session. 

Session 1: November 27, 2023 – March 24, 2024 
Session 2: January 15, 2024 – May 5, 2024 
Session 3: April 22, 2024 – August 11, 2024
Session 4: August 19, 2024 – December 8th, 2024 
Session 5: November 18, 2024– March 16, 2025

Course Instructors and Course Support Staff –  Interested in joining our technical skills course team

Please reach out to courses.dsi@utoronto.ca