Datapalooza 2017 saw seven researchers from across UVA Grounds, embedded in a variety of disciplines, present their work using data science to make new discoveries and enhance the world we live in.

Here, some participants share their experiences from last year.

Visualizing the Holocaust in Eastern Europe

Waitman BeornWaitman Beorn (Department of History, UVA College of Arts & Sciences)

I was honored to be asked to present my work on the Lvov Ghetto during Datapalooza 2017. The ghetto project is part of a larger combination of digital humanities projects that I am working on with an interdisciplinary team of scholars and undergraduates and with the collaboration of the UVA Scholars' Lab

With the help of the Scholars' Lab, our research team - myself, Taras Nazaruk, Drew MacQueen and Chris Gist - were able to map and then visualize the addresses of 17,000 ghetto inhabitants using GIS.  This map allows us to see where individuals lived and where they were used as forced labor, and also allows us to use filters to visualize a large amount of demographic information. 

Presenting at Datapalooza 2017 was a great opportunity not only to highlight the work of my team but also to introduce a digital humanities project to a much broader audience and, hopefully, to show that data science can also be used in historical contexts as well as more science-specific applications. It was also a great format to be forced to boil down topic, methods, and outcomes into a short five-minute presentation for non-specialists. This really helped me to consider the biggest and most important questions raised by the project. 

Observing other presentations also further inspired me to consider the use of social network analysis which has now also become part of our team’s work. I look forward to future events and perhaps an even greater focus on the humanities applications of data and digital methodologies.

Twitter: @waitmanb

Biomedical Data Science Hackathon: Emerging Techniques in Molecular Biology

Organizers: Dr. Nathan Sheffield, Dr. Stefan Bekiranov, Dr. Jason Papin (Biomedical Data Science training grant Principal Investigator), and biomedical engineering grants administrator Margo Jacobson.

Participants: Basel Al-Barghouthi, Derek Bivona, John Lawson, Greg Medlock, Daniel Mietchen, Cassie Robertson, and Jeffrey Xing

For Datapalooza 2017, the Biomedical Data Science Training Grant hosted a hackathon focused on developing analysis methods for an emerging technique in molecular biology. The hackathon provided an opportunity for trainees from diverse backgrounds to apply their own skills and knowledge to an open problem in the biomedical sciences. The participants chose to tackle the problem together, working collaboratively to leverage individual strengths and learn from each other.

The goal of the hackathon was to develop new methods for interpreting and handling single-cell RNA sequencing (scRNA-seq) data. RNA consists of connected sequences of small molecules which are synthesized when cellular machinery read DNA. RNA goes on to serve many functions, the most well-studied of which is translation of RNA sequences into proteins. Because of this relationship between RNA and proteins, measuring the abundance of different RNA sequences can provide hints as to the current activity of a cell, making it a valuable technique across many areas of biology. Before methods such as scRNA-seq were developed, researchers had to collect a population of cells, mix their contents, and measure the abundance of each unique RNA sequence in the mixture. This provided a view of the average abundance of each RNA sequence across all cells.

View their presentation on youtube and view the participants’ code in the github repository created for the hackathon.

However, many problems in biology involve cell-to-cell variation, which is hidden when cells are mixed prior to sequencing. This is where scRNA-seq comes in—using a combination of microfluidics to isolate single cells and DNA barcoding to extract single cells from sequencing data, scRNA-seq measures the abundance of RNA sequences in individual cells. scRNA-seq is still in its infancy, but has already been applied to study cell-to-cell variation in cancer and the immune system, and the technology is the focus of successful startups such as 10X Genomics, Inc.

Although scRNA-seq is a powerful technology, it has caveats that need to be considered and addressed. One issue, sometimes referred to as dropout, occurs when RNA sequences that are present in a cell are missed by the sequencer. If this technical issue isn’t addressed, RNA sequences that randomly experience dropout might incorrectly be associated with a particular cell type or disease during analysis.

Dealing with the sheer volume of data generated with scRNA-seq is also challenging. The hackathon participants were able to used processed scRNA-seq data from a previous publication. Within the dataset, 80,000 single cells were sequenced and each cell had 16,000 unique RNA sequences quantified, resulting in over 1.2 billion data points. This dataset also included quantification of cell surface markers involved in immune system function—while this allows a number of new scientific questions to be asked, it also increases the complexity of the dataset. Datasets will likely increase in size and complexity as scRNA-seq technology matures, becomes more readily available, and is used in creative ways in conjunction with other technologies.

The hackathon lasted two days, after which participants prepared a presentation for the research highlights portion of Datapalooza 2017.

--by Greg Medlock, BME PhD student

Present Your Research at Datapalooza 2018