80% of digital data is in unstructured textual form. Textual data is rich and complex. Not only does it contain massive amounts of statements but, more importantly, it also reflects our perspective on these statements: our emotions, opinions, the interpersonal, as well as the current social debate. Textual data is therefore not only big but it is also deep, adding a multitude of complexity.
The QuPiD2 program aims to deliver a framework for deep data representation that makes data provenance, quality and perspective explicit in the way such data is described and consumed. This will ultimately help indicating bias in factual statements. It will allow to track variations over time and thus enhance our understanding of data and its reliability.
QuPiD2 will apply this framework to a variety of textual sources: social media, newspapers, biographies, encyclopaedias, literary texts, e.g. novels, songs.
QuPiD2 aims are four-fold:
- modeling of quality and perspectives by providing transparency and reliability measures, and allowing reasoning within social and historic contexts;
- machine-crowd empowered processing of textual sources for populating QuPiD model;
- collection and analysis of quality factors and perspectives through crowd-expert data interpretation;
- demonstrating the value of data perspectives and quality analysis.
Modeling data quality and perspective variation, common in the humanities, is useful for various data science paradigms. In their recent History Manifesto historians Armitage and Guldi ring the alarm bell against the “ghost of short-termism”: policy makers and scientists base their analysis and decisions on limited data sets that cover incredible short periods of time. They break a case for longue dúree perspectives for policy makers, entrepreneurs, and scientists. Data Science will become an important instrument for bridging the gap between the humanities and other sciences, providing long term and ‘deep’ perspectives.
The QuPiD2 Team
- Lora Aroyo, Computer Science, Faculty of Science, VU University Amsterdam
- Rens Bod, Computational and Digital Humanities, Faculty of Humanities & Faculty of Science, University of Amsterdam
- Inger Leemans, Cultural History, Faculty of Humanities, VU University Amsterdam
- Julia Noordegraaf, Digital Heritage, Faculty of Humanities, University of Amsterdam
- Piek Vossen, Computational Linguistics, Faculty of Humanities, VU University Amsterdam
- Serge ter Braake, Media and Culture, Faculty of Humanities, University of Amsterdam
- Davide Ceolin, Computer Science, Faculty of Science, VU University Amsterdam
- Chantal van Son, Computational Linguistics, Faculty of Humanities, VU University Amsterdam