Research Challenges

Massive data acquisition is transforming many areas of society and has the potential to transform many more.  This is certainly true within science.  In astronomy parallel improvements in automation, telescope size and capability, and sensor technology have led to an increase both in the number of datasets and in the size of these newly available datasets by several orders of magnitude.  Where astronomers once labored for many years to acquire spectra for hundreds of galaxies, the Sloan Digital Sky Survey has returned spectra for millions of galaxies in 3 years, enough to map out 1% of the visible universe.  The field of extragalactic astronomy has gone from a case-by-case study of individuals to one of statistics.  Similar advances in genomic and biological sensing technologies have fundamentally altered biology and biomedical research.  We now have massive data sets characterizing the entire component list of cells and tissues, as well as, complete maps of genomes that are available in publically accessible repositories. Remote sensors can provide detailed real-time information about an environment, which can be fed into ever more complex models.  For example, Chesapeake Bay watershed models have 34k land areas, 1,069 river segments, and 4 million household agents.  Again and again disciplines that were once data-poor are now facing a “data tsunami” that challenges their traditional research methodologies and ability to cope. The Data Science Institute conducts research to address these and many others challenges in development and use of data science for transformation.