Making Data Right: Embedding Ethics and Data Management in Data Science Instruction

Making Data Right: Embedding Ethics and Data Management in Data Science Instruction

Wanda Marsolek (University of Minnesota Libraries, USA), Katie Barrick (University of Minnesota Libraries, USA), Alicia Kubas (University of Minnesota Libraries, USA), Jenny McBurney (University of Minnesota Libraries, USA), and Alicia Hofelich Mohr (University of Minnesota LATIS, USA)
DOI: 10.4018/978-1-7998-9702-6.ch010
OnDemand PDF Download:
No Current Special Offers


Learning how to wield data ethically and responsibly is a critical skill for data scientists, but one that is often lacking from traditional curriculum. Libraries have a long history of teaching data stewardship and sharing, and, in collaboration with collegiate research support entities, are good candidates to expose students engaging in data science to data ethics. This chapter presents four case studies on how the University of Minnesota Libraries and its partners have deeply integrated ethics into data management instruction. The chapter will highlight ethics for general data management instruction to undergraduate and graduate students from various disciplines, human subject data de-identification, qualitative methods and sharing, and biodiversity location data. Together, the case studies show how libraries and their partners are a natural fit to advance the work of data science curricula when it comes to managing data and the myriad ethical considerations that go along with this work.
Chapter Preview

Introduction And Background

Ethics are a critical part of data science education, as analytical and algorithmic decisions can have impactful consequences, both helpful and potentially harmful. This chapter describes approaches to embedding data ethics within data management education, and presents four case studies that demonstrate how academic libraries can successfully collaborate with partners on campus to incorporate these topics into data science instruction.

Interest and enrollment in data science programs in the United States have grown at a rapid rate over the last decade (Tang & Sae-Lim, 2016) to meet the rise in employer demand for data-specific technical and analytical skills. Many campuses that offer data science courses or programs share faculty in several different departments such as math, statistics, computer science, and business. This is true in the authors’ own institution, which offers a Data Science program in the College of Science and Engineering, but draws on faculty from departments across the entire university. Similarly, the “doing” of data science is not restricted to data science programs. According to Schutt and O’Neil (2013), a data scientist is anyone “who works with large amounts of data, and must grapple with computational problems posed by the structure, size, messiness, and the complexity and nature of the data” (p. 15). The work of data science is interdisciplinary and happens across departments and programs, as the datasets and methods used to make sense of them grow larger and more complex. As supporters of research across campus, academic libraries and research support offices are well positioned to reach and educate developing data scientists wherever they are. Many in these offices already perform a range of duties that can be categorized as data science support: managing research data repositories, leading workshops and classroom lectures, and research consulting.

Robinson and Nolis (2020) define data science as the “practice of using data to try to understand and solve real-world problems” (ch. 1.1 para. 1). To solve real-world problems with integrity, it is essential to recognize and heed ethical considerations about the collection, use, and sharing of data. These practices, as well as the organization of data throughout the research lifecycle, encompass what the authors refer to as data management. Data management is the decision-making process for how research data will be collected, organized, preserved, and shared. It is crucial that data scientists, and anyone doing data science work, not only understand what ethical issues might arise during the research process, but also know how to prevent them. For example, when data science teams incorporate representatives from the participant community into data collection, data analysis, and data preservation (and data destruction), the team is able to mitigate risks they may not have thought of on their own and also allows the community to have agency in this work (Research Data Alliance International Indigenous Data Sovereignty Interest Group, 2019).

Key Terms in this Chapter

Data Steward: Anyone working with data who uses standards to maintain data quality, access, and documentation.

Data Management: An ongoing process that involves decisions about collecting, organizing, preserving, and sharing data to make data findable and accessible throughout the research lifecycle.

Reusability: When data are sufficiently documented, findable, and in a format that allows reuse in new or different contexts.

Research Lifecycle: Stages of data within the research process from start to finish; plan, acquire, process, analyze, preserve, share results, reuse.

Reproducibility: The ability to come up with the same results produced by another independently using the same data and tools.

Dataset: Collection of data files for a specific research project.

Data Ethics: Responsible use of data at all stages of the research lifecycle.

Reuse: The analysis or application of data by someone other than the original data producer or collector.

Complete Chapter List

Search this Book: