Skip to main content
Add To List

What is non-consumptive data and what can you do with it?

Date
Tuesday, February 14, 2023 12:30 PM EST
Description
Books, journal articles, blog posts, tweets, and more were written to be read by humans– a “consumptive” use. Turning those books, journal articles, blog posts, and tweets into features that allow for computational analysis without needing to share full-text (which may not be allowed under copyright or licensing restrictions) creates “non-consumptive” data. These extracted features can be as simple as word counts, or as complex as multidimensional vectors used by deep learning models. New learners of text analysis need data formats and feature types that are easy to use and understand, but advanced practitioners also need rich metadata describing the structure and provenance of multiple kinds of extracted features. Non-consumptive formats also need to accommodate the often-intensive storage, compute, and transmission demands of text corpora ranging from thousands to millions of documents. In this session, we’ll describe different features currently provided by ITHAKA’s Constellate platform and HathiTrust Research Center, some of the formats in which non-consumptive data is delivered to researchers, current challenges for both the producers of this data and the practitioners who consume it, and open the floor for a discussion on future standards for non-consumptive data.