Reading between the lines — School of Data Science

In today’s increasingly globalized society, groups of all kinds cross cultural, religious, and societal boundaries in person and online. While many of these groups are peaceful and the result is the sharing of information and ideas, a significant number have a tendency towards violence, with dire consequences.

There have been more than 83,000 bombings, 18,000 assassinations, and 11,000 kidnappings attributed to terrorists and terrorist organizations since 1970, according to the Global Terrorism Database.

“If our military could understand the potential violent tendencies of a group before interacting with them,” said MSDS student Ben Greenawald, “the loss of life on both sides could potentially be avoided completely.”

“This is one of those projects that truly could save lives.”

When military or humanitarian organizations worldwide interact with other groups, knowing beforehand how inclined a group is towards violence would be immensely helpful - but no system with the efficient scale and accuracy required is in place. MSDS students Gregory Wert, Elaine Liu and Greenawald, along with faculty advisor Dr. Don Brown and Dr. Mohammad al Boni, are undertaking a capstone research project sponsored by the U.S. Army Research Laboratory to provide that system, with language at its core.

“I come from a computer science and mathematics backgrounds and wanted to pursue a capstone project outside of my academic comfort zone,” Greenawald said. “A project like this, dealing with sociology and linguistics, was just the sort of project to push my boundaries as a student.”

Given that communication is central to all human interaction, it is natural to look towards language to provide an understanding of a particular group or set of groups capacity for violence. But analyzing language comes with an intrinsic set of problems. Human communication is so complicated that traditional models are either ineffective at capturing the nuance of language, or require so much context-specific knowledge about the speech being analyzed that the models do not generalize well.

This project aims to tackle the complexities inherent in studying language by using deep learning models to analyze the highly dimensional data without making any assumptions about the underlying linguistic patterns.

“This project has the potential to show that certain data mining methods, such as neural networks, have the capacity to understand the intricacies of human communication regardless of the language,” said Greenawald.

The goal of this project is to develop a data pipeline implementing a neural net architecture that is able to perform the necessary classification in any language (given enough labeled data). Arabic will be the primary language used to show this concept since there are so many value groups for whom Arabic is a primary language. Urdu may also be tested should enough data be collected.

“It may be possible to show that we as humans signal intent, such as violent tendencies, in a predictable fashion irrespective of language,” Greenawald said.

B. Greenawald, Y. Liu, G. Wert, M. Al Boni, and D. E. Brown. "A Comparison of Language-Dependent and Language-Independent Models for Violence Prediction." 2018 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, 2018.

Learn about other Capstone Projects

Advisors:

Donald Brown

Senior Associate Dean for Research and Quantitative Foundation Distinguished Professor in Data Science

School of Data Science

Mohammad al Boni

Partner:

U.S. Army Research Laboratory

Filter by

Get the latest news