We are in the midst of a data revolution. New technologies in every facet of academic and daily life are producing increasingly large-scale, complex data sequencers are generating the entire 3 billion base pairs of the human genetic code across populations to study complex causes of disease; sophisticated telescopes are collecting and processing terabytes of data per second, and the internet is a rich complex source from which we can learn about social interaction and opinion. These data have the potential to inform some of the most pressing societal questions, in completely new ways. With increasingly larger and more complex data comes the challenge of collecting, manipulating, storing, visualizing, and extracting information from the data, in reproducible, fair, and ethical ways. This is the definition of data science.
Data scientists include our world-class scholars in the foundational fields of computer, information, and statistical sciences, mathematics, and engineering. Entirely new fields of data science are also forming at the intersection of foundational and applied fields, such as astroinformatics, genomic data science, and computational social science. Scholars in these areas are asking questions that are pushing the frontiers of their fields forward.
lThe Data Science Initiative (DSI) creates a cross-divisional community for these researchers and offers the programming and support needed to grow in these areas of research.
The DSI facilitates research connections through programming and events; supports foundational research in data science that arises in the context of application through Collaborative Research Teams (CRTs); provides support for, data access and computation; and supports the co-training of an inclusive group of highly qualified personnel to find success in interdisciplinary environments.
Thematic Programs spanning both methodologies and applications to support focussed efforts will also be ongoing in the DSI, as they enable advancement of the next data science breakthrough. Initial Thematic Programs will be in Inequity and Reproducibility, encouraging innovative methodology development and application that will address questions such as: How can genomic analyses incorporate multi-ethnic populations to minimize disease risk for all? How can we foster trust in data-informed research? The DSI will emphasize fair and ethical tools and reproducible and inclusive scientific practices, which in turn will be part of a broader culture from inception of equity, diversity, and inclusion.
The DSI provides the leadership and capacity to catalyze the transformative nature of data sciences in disciplines, in fair and ethical ways, leveraging and strengthening U of T’s pre-eminence in data sciences to solve society’s complex and pressing problems. The downstream impacts of the DSI will be significant, attracting increasingly competitive, cross-disciplinary funding; increased research output with co-authors spanning academic divisions; and the broad spectrum of external partnerships that can advance data sciences. DSI researchers will advance research frontiers across a broad spectrum of foundational and applied fields.
I look forward to working with colleagues, developing the DSI, receiving input.
I encourage my colleagues from across disciplines working in/with data science to affiliate with the DSI and participate in our many upcoming events, training, and funding opportunities.