Data Sciences Institute (DSI)

Polygenic Risk Score Grant Winners Announced: Advancing Genomic Medicine Through Innovative Research

The Data Sciences Institute (DSI) is pleased to announce the recipients of the DSI-McLaughlin Centre Polygenic Risk Score Grant competition. This grant, created in partnership with the University of Toronto’s McLaughlin Centre and the Dalla Lana School of Public Health, aims to support emerging research and build capacity in the field of polygenic risk score studies. Polygenic risk scores enable researchers to use multiple genetic factors to estimate an individual’s genetic risk for complex diseases, providing important information for predicting, preventing and treating diseases. 

Professor France Gagnon, Chair of the adjudication committee and Associate Dean Research at the Dalla Lana School of Public Health, expressed enthusiasm for the wide range of proposals received from researchers across the University and partner institutions. These proposals demonstrate the potential for innovative methodologies in polygenic risk scores to impact a wide range of fields. “We are thrilled to support this cutting-edge research and look forward to seeing its impact on the field of precision population health and medicine,” said Gagnon. 

Two of the grant recipients are Professors Frank Wendt and Esteban Parra, Department of Anthropology at the University of Toronto Mississauga. They are taking a new approach to the study of major depressive disorder (MDD) and hippocampus volume. Their research aims to improve the accuracy of polygenic risk score predictions for this disorder and expand our understanding of its biology. Wendt and Parra said, “By taking a tandem repeat aware approach to risk scores, we hope to uncover new insights into the biology of major depressive disorder, improve prediction accuracy, and develop scores that better translate across population groups. We are thrilled to contribute to this important area of research that takes an interdisciplinary approach to pressing matters in genomic medicine.” 

Grant recipients Lei Sun and Ziang Zhang from the Department of Statistical Sciences, Faculty of Arts & Science, are collaborating with Dr. Andrew Paterson from The Hospital for Sick Children on a project to develop polygenic risk scores for binary traits, which are traits that can only take on two possible outcomes, such as the presence or absence of a particular disease. Their research aims to investigate how the estimated effects of different genetic factors can be biased and propose a new way to adjust for this bias to improve the accuracy of the polygenic risk scores. “Because of DSI’s emphasis on interdisciplinary research, all team members with complementary expertise worked closely to define and develop a research project with statistical rigor and practical impact. This grant also provides graduate students in Statistical Sciences a unique opportunity to lead a grant application, which is rare in our discipline,” said Sun. These projects have the potential to improve our understanding of complex diseases and advance the fields of precision medicine and population health. 

Congratulations to all the DSI – McLaughlin Centre Polygenic Risk Score Grant collaborative research teams! 

A Multimodal AI Solution for Improved Outcome Prediction using Polygenic Scores and EHR  

  • Zahra Shakeri (Institute of Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, U of T); Kuan Liu (Institute of Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, U of T) 

Addressing non-collapsibility in logistic regression when constructing polygenic risk scores for binary traits 

  • Lei Sun (Department of Statistical Sciences, Faculty of Arts & Science, U of T), Andrew Paterson (Genetics and Genome Biology, The Hospital for Sick Children), and Ziang Zhang (Department of Statistical Sciences, Faculty of Arts & Science, U of T) 

Inclusive Trans-ancestry Polygenic Genetic Risk Scores (iPRS) via Robust Transfer Learning 

  • Jessica Gronsbell (Department of Statistical Sciences, Faculty of Arts & Science, U of T); Jianhui Gao (Department of Statistical Sciences, Faculty of Arts & Science, U of T)

Tandem repeat aware risk scores linking major depression and hippocampus volume 

  • Frank Wendt (Department of Anthropology, University of Toronto Mississauga); Esteban Parra (Department of Anthropology, University of Toronto Mississauga) 

Revolutionizing Neuroscience with DSI Catalyst Grant: UTSC Professors Harness the Power of Machine Learning

Professors Guillaume Filion and Minoru Koyama, DSI members from the University of Toronto Scarborough’s Department of Biological Sciences, are advancing neuroscience with an innovative approach through the help of the Data Sciences Institute Catalyst Grant. Their work repurposes technology found in Google Translate and DeepL to translate images of brain activity into movements, offering a powerful understanding of the relationship between the brain and behaviour. 


One main goal of neuroscience is to understand how complex connections between neurons lead to behaviour when reacting to stimuli. The researchers note that it is now possible to record the activity of tens of thousands of neurons simultaneously in behaving animals. However, there’s still a need for better analytical methods. Their project aims to develop a new approach to understanding the brain by exploring the relationship between neural activity and behaviour.  The team will record the brain activity and movements of zebrafish and use a cross-attention mechanism to interpret the data. The ultimate goal is to change experimental research by introducing a different approach that goes beyond solving short-term problems or answering specific questions related to fish behaviour. 


“As researchers, we are striving to use machine learning for scientific discovery by exploring how machines can teach us the things they figure out about nature. However, it is important to note that not all machine learning techniques are equally helpful in advancing our understanding of the brain. An AI that simply predicts behaviour from the activity of the brain may not give any insight into brain function explained Filion and Koyama. They emphasized that for an AI to help understand the brain, it must be programmed with explanation mechanisms from the start. This is crucial for advancing our understanding of the brain and ultimately developing treatments for brain-related disorders.


Interdisciplinary collaborations are key for advancing knowledge and discovery, according to the researchers. They emphasize the value of combining expertise from different disciplines to unlocking new insights. The support from the DSI is making a significant difference by allowing us to invest in new research avenues, they noted. The developments in this area could have revolutionary implications for experimental neuroscience.


Gary Bader, DSI’s Associate Director, Research & Software, said, This ground-breaking research is pushing the boundaries of interactions between machines and humans in the field of neuroscience. We are thrilled to support their innovative work and look forward to seeing its impact.

Breaking Down the Walls: Data Sciences Institute Explores the Future of the Metaverse

The metaverse may be the next big thing, but its success hinges on transparent data practices. On February 24, the Data Sciences Institute (DSI) ventured deeper into the virtual world as part of the DSI@UTM’s focus on Responsible Data Science event titled “Data and the Metaverse” hosted at the University of Toronto Mississauga (UTM). The Data Sciences Institute supports research activity in Responsible Data Science and encourages innovative data science methodology development and application. This includes initiatives such as this event which aimed to explore the implications of data creation, collection, analysis, and deployment in virtual reality (VR) and augmented reality (AR). 

With VR/AR providing opportunities for research and innovation, speakers explored the future possibilities, challenges and implications of data in the metaverse, which is described as the “universe of universes.” However, according to Associate Director of DSI@UTM, Bree McEwan, the VR industry seems to be stuck in the “walled garden” phase of this potentially revolutionary interactive technology. “Until the VR industry figures out how to move beyond these walled gardens, the metaverse may never live up to the hype,” she says. The walled garden phase is a mediated environment that restricts users to specific content within a website or social media platform. “But for the technology to work, it has to collect data about us. In the metaverse you are data,” McEwan added. 

The event featured several panel discussions on related topics, such as the potential improvements or deployments of VR/AR technologies, the data infrastructure of VR for researchers, and policies for the future regulation of the metaverse. The panelists, including Sun Joo (Grace) Ahn of the Grady College of Journalism and Mass Communication at the University of Georgia, James McCrae of Magic Leap, Luke Stark of the Faculty of Information and Media Studies at Western University, and Daniel Wigdor of the Department of Computer Science at the University of Toronto, discussed pragmatic solutions for VR data to satisfy industry needs and protect users, potential concerns with collecting data from VR users, and how policy makers should approach data collection from metaverse users.   

One of the attendees, Pinyao Liu, came all the way from Simon Fraser University in Vancouver to learn more about the intersection of data science and the metaverse. He says, “I liked the talk on representation matters, particularly regarding API representation, it was insightful in emphasizing the importance of designing with intention to minimize harm – a valuable lesson for designers, researchers, and large corporations.” 

“The event also included an opportunity for attendees to experience VR/AR. Third-year UTM Research Opportunity Program student, Fahim Kamal shares, “This event is to introduce people to the VR world. I have been involved in courses in immersive environment design where we had the opportunity to design different 3D assets for VR spaces, so this is relevant.”  

One of the key themes of the event was the need for a balance between interoperability and privacy. McEwan emphasized the need to protect individual instances of social interaction from surveillance and commodification while ensuring data can contribute to interoperability. “By building VR systems that are interoperable across platforms and devices, we can create a VR ecosystem that truly serves the public and fosters innovation”, she says. 

The event ended with a call to action by DSI for those in the virtual reality industry and research community to collaborate on solutions to move beyond the current walled garden state of the industry. “DSI programs and initiatives are designed to facilitate collaboration as well as the development and application of new data science methodologies and tools in a training-focused environment to push into new frontiers,” says Lisa Strug, Director of the DSI. This event proved to be a valuable opportunity for the VR community to come together to discuss the future of data and the metaverse. 

Data Sciences Institute Catalyst Grants support transformative data science research

by Chris Sasaki

The Data Sciences Institute (DSI) is pleased to announce the 2023 recipients of the annual DSI Catalyst Grant competition.

Catalyst Grants are awarded to multidisciplinary collaborative research teams focused on harnessing the transformative nature of the data sciences. The grants are given to teams working on the development of novel statistical or computational tools or the use of existing methodology in innovative ways to address questions of major societal importance and effect positive social change. Two of this year’s Catalyst Grants are co-funded by the Temerty Centre for AI Research and Education in Medicine at the University (T-CAIREM). T-CAIREM seeks to establish new AI tools and data-driven projects that integrate clinical, translational and basic research into real-world applications.

“The recipients of 2023 grants are an illustration of the Institute’s mission to bring researchers together from across disciplines, divisions, campuses, as well as from the greater community, to address today’s critical challenges,” says Gary Bader, the Institute’s Associate Director, Research & Software. “It’s a truly remarkable list of impactful projects and an inspiring group of people.”

Thirteen interdisciplinary teams spanning all three campuses and external funding partners received grants (full list below), including three collaborations tackling diverse problems regarding harmful social media content, machine learning in healthcare and how marketing affects children’s health:

Reducing harmful content on social media through community-powered AI

Misinformation and hateful social media content has emerged as a critical threat, with several damaging impacts on our health, environment and society. While most social media platforms have taken measures to moderate and identify harmful content, and limit its spread, human moderators and AI algorithms often fail to identify it correctly and take proper actions. Historically marginalized groups are most affected by these failings as they have fewer representations among the human moderators, and their data are less available for the algorithms.

With their project, Syed Ishtiaque Ahmed from the Faculty of Arts & Science’s Department of Computer Science, Shion Guha from U of T’s Faculty of Information, and Shohini Bhattasali from University of Toronto Scarborough’s Department of Language Studies aim to improve the content moderation process by involving the communities affected by harmful or hateful content; and through a more pluralistic, contestability framework that allows multiple perspectives.

Ahmed, Guha, and Bhattasali will design, develop, deploy and evaluate the proposed system to address potentially Islamophobic and Sinophobic posts on Twitter in support of two Canadian non-profit organizations: the Chinese Canadian National Council for Social Justice (CCNC-SJ) and the Islam Unravelled Anti-Racism Initiative.

“Annotating data becomes challenging when the annotators are divided in their opinions” says Ahmed. “Democratically resolving this issue requires representing diverse values through the participation of different communities, which is currently absent in data science practices.

“This project addresses the issue by designing, developing and evaluating a pluralistic framework of justification and contestation in data science while working with two historically marginalized communities in Toronto.”


Tackling the labelling bottleneck in machine learning in healthcare

In many domains, labelling of objects — a fundamental step in machine learning — can be done by non-experts; for example, labelling can be as simple as drawing a box around a cyclist in an image.

“But in machine learning for healthcare (ML4HC) applications, the labeller needs to be a clinical domain expert,” says Sebastian Goodfellow, from the Department of Civil & Mineral Engineering, Faculty of Applied Science & Engineering.

“The time of clinician domain experts is scarce, and labelling becomes the rate-limiting step for ML4HC projects, preventing evaluation of the utility of machine learning in many medical applications. How to address this bottleneck is a knowledge gap in healthcare that leads to a translation gap, which our team will address.”

The goal of this project is to develop and evaluate two possible solutions to better understand this bottleneck. The first is a methodology for crowdsourcing labels from non-experts. The second is a novel framework for labelling medical waveform data by using a “human-in-the-loop,” semi-supervised learning pipeline and an interactive visualization approach.

The project is co-funded by T-CAIREM and the team comprises Goodfellow; and from the Hospital for Sick Children, Mjaye Mazwi, Translational Medicine Labs; Anica Bulic, Translational Medicine Labs; and Melissa McCradden, Genetics & Genome Biology Labs.

Says Goodfellow, “Our team is grateful to the DSI for this funding which will enable us to address an important challenge that affects many industries beyond healthcare. We’re excited to get started!”


Using deep learning and image recognition to measure child-directed food marketing

Childhood obesity and nutrition-related chronic disease are urgent global public health concerns. One of the factors contributing to the problem is that highly processed, energy-dense and nutrient-poor food is being marketed to children through tactics such as cartoon characters, toys and other fun enticements — all of which affect children’s attitudes, preferences and consumption behaviours.

But measuring child-directed marketing is time- and labour-intensive, requires in-depth training and validation, and is often subjective. As a result, there is a paucity of data with which to guide national and global legislation and policies aimed at protecting children from harmful industry practices.

The goal of this project is to develop new systems to capture food labels; develop methodologies using image recognition and deep learning technology to measure indicators of child-directed marketing on food and beverage packaging; and evaluate the relationship between child-directed marketing on food packaging, nutritional quality and price. These methodologies will enable evaluation of how child-directed food marketing may be perpetuating existing dietary and health inequities and whether policies are in fact reducing these disparities and protecting children’s right to health.

The team comprises experts from across four U of T academic divisions: Mary R. L’Abbé, Department of Nutritional Sciences, Temerty Faculty of Medicine; David Soberman, Joseph L. Rotman School of Management; Laura Rosella, Dalla Lana School of Public Health; and Steve Mann, Edward S. Rogers Sr. Department of Electrical & Computer Engineering, Faculty of Applied Science & Engineering.

“This project will help guide the development, implementation and evaluation of national and global food policies aimed at protecting children from harmful food industry marketing practices,” says L’Abbé. “Parliament is in the process of finalizing Bill C-252 to restrict the marketing of unhealthy foods to children and this grant can have a huge policy impact as part of our program on Food and Nutrition Policy for Population Health.” 


Launched in 2021, the DSI is the University of Toronto’s hub and incubator for data science research, training and partnerships, unifying research across the University, its affiliated institutes and external partners. 

Congratulations to all the 2023 DSI Catalyst Grant collaborative research teams!

A computational sociolinguistic approach for studying gender inequities in social media interactions

  • Suzanne Stevenson (Department of Computer Science, Faculty of Arts & Science, U of T); Barend Beekhuizen (Department of Language Studies, University of Toronto Mississauga)

A high-throughput data and AI-driven mRNA transfection (HART) platform for immune cell engineering

  • Bowen Li (Leslie Dan Faculty of Pharmacy, U of T); Bo Wang (Department of Laboratory Medicine and Pathobiology Temerty Faculty of Medicine, U of T)

Accelerating machine learning in healthcare: Solving the labelling bottleneck

  • Project co-funded by T-CAIREM
  • Sebastian Goodfellow (Department of Civil and Mineral Engineering, Faculty of Applied Science Engineering, U of T); Mjaye Mazwi (Translational Medicine Labs, The Hospital for Sick Children); Anica Bulic (Translational Medicine Labs, The Hospital for Sick Children); Melissa McCradden (Genetics & Genome Biology Labs, The Hospital for Sick Children)

Automating sedation state assessments

  • Project co-funded by T-CAIREM
  • Aaron Conway (Lawrence S. Bloomberg Faculty of Nursing, U of T); Babak Taati, Toronto Rehabilitation Institute, KITE, University Health Network); Sebastian Mafeld (Toronto General Hospital Research Institute, University Health Network)

DynaMELD and DynaCOMP: Using machine learning to revamp pre-and-post transplant care

  • Rahul Krishnan (Department of Computer Science, Faculty of Arts & Science, U o f T); Mamatha Bhat (Toronto General Hospital Research Institute, University Health Network)

Inequality in childcare: The case of nannies in Canada

  • Ito Peng (Department of Sociology, Faculty of Arts & Science, U of T); Monica Alexander (Department of Statistical Sciences, Faculty of Arts &Science, U of T)

Machine-learning-assisted screening of metallo-cyanines as light-absorbing and transport layers for organic and perovskite photovoltaics

  • Oleksandr Voznyy (Department of Physical & Environmental Sciences, University of Toronto Scarborough); Timothy Bender (Department of Chemical Engineering and Applied Chemistry, Faculty of Applied Science & Engineering)

Providing data to improve representation by public officials

  • Peter Loewen (Munk School of Global Affairs & Public Policy, Faculty of Arts & Science, U of T); Rohan Alexander (Faculty of Information, U of T); Aya Mitani (Dalla Lana School of Public Health, U of T); Elena Tuzhilina (Department of Statistical Sciences, Faculty of Arts & Science)

Something in the air: Is there an association between exposure to unconventional natural gas development (UNGD) and exacerbations of asthma in northeastern British Columbia?

  • Élyse Caron-Beaudoin (Department of Health & Society, University of Toronto Scarborough); Marianne Hatzopoulou (Department of Civil & Mineral Engineering, Faculty of Applied Science & Engineering, U of T)

Spectroscopy by the millions: A fast, reproducible framework to yield chemical compositions of four million stars

  • Joshua Speagle (Department of Statistical Sciences, Faculty of Arts & Science, U of T); Ting Li (David A. Dunlap Department of Astronomy & Astrophysics, Faculty of Arts & Science, U of T)

The rise of social media and the transformation of influence: Joining foundational sociological theory and data science to rethink influence in social systems

  • Peter Marbach (Department of Computer Science, Faculty of Arts & Science, U of T; Vanina Leschziner (Department of Sociology, Faculty of Arts & Science, U of T; Daniel Silver (Department of Sociology, University of Toronto Scarborough)

Toward reducing harmful contents on social media with pluralistic justifications through community-powered AI

  • Syed Ishtiaque Ahmed (Department of Computer Science, Faculty of Arts & Science, U of T); Shion Guha (Faculty of Information, U of T); Shohini Bhattasali (Department of Language Studies, University of Toronto Scarborough)

Using deep learning and image recognition to develop AI technology to measure child-directed marketing on food and beverage packaging and investigate the relationship between marketing, nutritional quality and price

  • Mary R. L’Abbé (Department of Nutritional Sciences, Temerty Faculty of Medicine, U of T); David Soberman (Joseph L. Rotman School of Management, U of T); Laura Rosella (Dalla Lana School of Public Health, U of T); Steve Mann (Edward S. Rogers Sr. Department of Electrical & Computer Engineering, Faculty of Applied Science & Engineering, U of T)


DSI welcomes Women’s College Hospital as a partner

The Data Sciences Institute (DSI) is excited to announce a new partnership with Women’s College Hospital (WCH).  For more than 100 years WCH has been developing revolutionary advances in healthcare. Today, WCH is a world leader in health equity and Canada’s leading academic ambulatory hospital focused on delivering innovative solutions that address our most pressing issues related to population health, patient experience and health system costs.

“Women’s College Hospital (WCH) is reimaging and redesigning healthcare to enhance access, address inequities, and innovate more readily. To do that, we are leveraging data insights to identify areas for improvement, test new models of care and ultimately improve care for everyone. As a research leader in the field of data science, this collaboration with DSI will enable our teams to further their work, pursue new opportunities, and expand our partnership network,” said Dr. Rulan Parekh, vice president of Academics at Women’s College Hospital.

DSI collaborates with organizations eager to support world-class researchers, educators, and trainees advancing data sciences. We facilitate inclusive research connections, supporting foundational research in data science, as well as supporting the training of a diverse group of highly qualified personnel for their success in interdisciplinary environments. As one of our external funding partners, WCH researchers can apply for research grants, training and networking opportunities at the DSI.   

“We are delighted to announce this partnership with WCH. Our goal is to create a hub to elevate data science research, training, and partnerships. By connecting data science researchers, data and computational platforms, and external partners, the DSI advances research and nurtures the next generation of data- and computationally focused researchers. We are very excited to have WCH researchers join the DSI community.” - Lisa Strug, Director, Data Sciences Institute