Project 3: From Text to Deep Data

With the Internet having secured an increasingly prominent place in society, the current age is characterized by the ability of individuals to transfer and access information about the world. The textual form in which this information is usually represented is rich and complex. Texts contain information about what is happening in the world, where, when, and who is involved. At the same time, they are a reflection of ongoing debates in our society, stances on particular issues (e.g. abortion, vaccinations, etc.), and interpretative frames on events and their causes (e.g. conspiracy theories on 9/11). Textual data always provide specific perspectives of the author and quoted sources on the information they contain. Mining information from texts thus implies dealing with these perspectives.

In From Text to Deep Data, we are developing a model that provides a representation of things in the (real or assumed) world and allows us to indicate the perspective of different sources on them. In other words, we aim to provide a framework that can represent what is said about a topic, a person or an event and how this is said in and by various sources, making it possible to place alternative perspectives next to each other. We develop software to detect these perspectives in texts and represent the output according to our formal model.

Our formal model is called GRaSP (Grounded Representation and Source Perspective). GRaSP is an overarching model that provides the means to: (1) represent instances (e.g. events, entities) and propositions in the (real or assumed) world, (2) to relate them to mentions in text using the Grounded Annotation Framework, and (3) to characterize the relation between mentions of sources and targets by means of perspective-related annotations such as attribution, factuality and sentiment.

As a first use case, we use the online debate around vaccinations. More specifically, we are collecting and annotating texts about a large measles outbreak in the US that started in December 2014 at Disneyland, California (referred to as ‘the Disneyland measles outbreak’). Our model and software aim to detect and represent the different perspectives on this event and the broader topic of vaccinations, allowing to answer questions such as:

What are the beliefs and sentiments with respect to vaccinations that exist in a certain community? What are the causes of the measles outbreak (low-vaccination rates, vaccinations) according to different groups?Which sources are used to support their evidence? Who do they blame (the anti-vaccination movement, the government)? What are the side-effects associated with vaccines? What are the motivations for participating in the debate (education, commercial reasons)? 

The resulting information can be useful for anyone who is interested in critical thinking and balanced information about a relevant topic. This includes both researchers (think of communication scientists, social psychologists, political scientists, historians) and people working outside academia (think of information professionals, decision makers, journalists, people working in advertising).

Project team:

  • Piek Vossen, Computational Linguistics, Faculty of Humanities, VU University Amsterdam
  • Lora Aroyo, Computer Science, Faculty of Science, VU University Amsterdam
  • Chantal van Son, Computational Linguistics, Faculty of Humanities, VU University Amsterdam