Exploring Data Science Initiatives Through an International Lens

Exploring Data Science Initiatives Through an International Lens

Nandita S. Mani, Emily P. Jones, Rebecca Carlson, Fidan Limani, Atif Latif, Klaus Tochtermann, Faten Hamad, Christine J. Urquhart, Victoria Lemieux, Sarah Ames, Jenna Bain, Justin M. Clark
DOI: 10.4018/978-1-7998-9702-6.ch001
Chapter PDF Download
Open access chapters are freely available for download

Abstract

Application of data science tools, techniques, and principles have increased within the field of library and information science. This trend is especially noticeable in academic libraries where strategic priorities around data science initiatives have been created to further support and add value to the research enterprises at their institutions. This chapter seeks to highlight a global outlook on how data science has been addressed in library and information studies via case studies in areas including digital humanities, machine learning, and visual analytics. Increasing awareness of how partners across the globe are addressing data science needs at their institutions can help raise visibility of how data science can be infused and utilized within a variety of contexts.
Chapter Preview
Top

Introduction

This chapter provides a global overview of current work in data science (DS) by library and information science (LIS) stakeholders. It includes case studies from the State Library of New South Wales in Australia, Leibniz Information Centre for Economics in Germany, National Library of Scotland, Institute of Evidence-Based Healthcare at Bond University in Australia, and the University of Jordan. These examples of DS work in LIS settings are contextualized within a review of the growth of this field, discussion of the challenges with global DS projects, and future recommendations to increase information on and access to DS projects around the world. The chapter objectives are to enable discussion on the current state of DS in LIS, identify potential next steps in furthering awareness of DS activities, and provide librarians and information professionals with examples of how others are applying DS in various environments for diverse aims.

Key Terms in this Chapter

Natural Language Processing (NLP): A process (using computational linguistics, statistical, machine learning, and deep learning) in which computers are able to understand information (both written and spoken text) in a similar manner in which a human would relay information.

Optical Character Recognition (OCR): The process of converting images with text (static images) to electronic text that can then be searched, indexed, or otherwise used; this process is usually initiated with human input using training data, but over time, machine learning algorithms are able to process more accurately (after “learning”) and are able to function with minimal human input.

Jupyter Notebook: An open source, online web application which lets one write and interact with plain text, live code, images, and charts.

Lemming (or Lemmatization): A text normalization technique within the field of Natural Language Processing (NLP) that considers the context in which a word is being used to help define it and link it to a broader list of related terms. Lemming is typically seen as a step beyond ‘stemming’ techniques and attempts to identify the ‘root’ of a word to help contextualize it rather than just its inflections. For example, the lemma of the word ‘was’ could be identified as ‘is’ or ‘be’.

Knowledge Graph: A structured (even semantic) definition of resources and the links between them, resulting in a graph. Knowledge graphs represent knowledge about a domain of interest, including both the factual data, and the vocabularies used to describe them.

Handwritten Text Recognition (HTR): Software that uses artificial intelligence to create transcriptions of handwritten documents.

Stemming: A text normalization technique within the field of Natural Language Processing (NLP) that considers the possible prefixes and suffixes of a specific word to identify other words related to it. For example, the word ‘walk’ which could be linked to ‘walks’, ‘walking’ ‘walker’, etc.

Resource Description Framework (RDF): A knowledge representation framework, whose model uses relatively simple statements of the form subject-predicate-object. The model not only captures the “resources” of a situation, but also allows the specification of how these resources link to each-other. Multiple such statements can be made to represent a certain situation, thus forming an RDF graph.

Complete Chapter List

Search this Book:
Reset