Hershey, Pennsylvania

New York, New YorkBeijing, China

Special Offers
- Up to 50% off Thousands of Research Books
  From July 1st through October 31st, 2025, we are offering discounts of up to 50% across thousands of titles in Business & Management; Science, Technology, & Medicine; and Education & Social Sciences. Through this campaign, we’re committed to ensuring that our mutual library customers worldwide can continue to access high-quality, peer-reviewed content during these challenging times. If this campaign is successful, we will extend through the end of the year and beyond if there’s a benefit to all parties involved. When hosted on the InfoSci^® Platform, e-books feature no DRM, no additional cost for unlimited-user licensing, full-text PDF & HTML formats, and more. Discount is automatically added at checkout.
  Browse Titles
- IGI Global Scientific Publishing Launches International Brand Ambassador Program
  IGI Global Scientific Publishing has launched a new Ambassador Program, designed to empower research professionals to help spread scholarly resources and foster global research engagement. As a local, mid-sized publisher, this initiative offers IGI Global Scientific Publishing an exciting opportunity to expand its global presence in the academic community and foster meaningful connections among scholars around the world. With currently over 130 ambassadors worldwide, these scholarly experts are dedicated to supporting the publisher’s initiative of disseminating cutting-edge research.
  Learn More
- Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 20 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no hosting or maintenance fees, no additional cost for unlimited-user licensing, full-text PDF & HTML format, and more.
  Learn More
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through the IGI Global Scientific Publishing Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global Scientific Publishing to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open access endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global Scientific Publishing to publish your work under open access? Review the IGI Global Scientific Publishing open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Applying Machine Learning to Online Data?: Beware! Computational Social Science Requires Care

Ulya Bayram (Çanakkale Onsekiz Mart University, Turkey)

Source Title: Opportunities and Challenges for Computational Social Science Methods

DOI: 10.4018/978-1-7998-8553-5.ch005

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The immense impact of social media on contemporary cultural evolution is undeniable, consequently declaring them an essential data source for computational social science studies. Alongside the advancements in natural language processing and machine learning disciplines, computational social science researchers continuously adapt new techniques to the data collected from social media. Although these developments are imperative for studying the sociological transformations in many communities, there are some inconspicuous problems on the horizon. This chapter addresses issues that may arise from the use of social media data, like biased models. It also discusses various obstacles associated with machine learning methods while also providing possible solutions and recommendations to overcome these struggles from an interdisciplinary perspective. In the long term, this chapter will guide computational social science researchers in their future studies, from things to be aware of with data collection to assembling an accurate experimental design.

Chapter Preview

Top

Introduction

In the current digital age, online data availability is exponentially growing. Consequently, the interest in computational social science (CSS) research is rising, which spans many disciplines (social psychology, anthropology, economics, political science, sociology, and various levels of analysis) (Oboler, Welsh, & Cruz, 2012). Studies conducted on past and recent Congressional records of many countries including the United States (Bayram, Pestian, Santel, & Minai, 2019; Diermeier, Godbout, Yu, & Kaufmann, 2012; Gentzkow, Shapiro, & Taddy, 2016; Iyyer, Enns, Boyd-Graber, & Resnik, 2014; Jensen et al., 2012; Lauderdale & Herzog, 2016; Thomas, Pang, & Lee, 2006; Yu, Kaufmann, & Diermeier, 2008), political records from the British Parliament (Peterson & Spirling, 2018), and the Irish Dail debates (Lauderdale & Herzog, 2016), are among the plethora of studies that became possible thanks to the online availability of these collections. Similarly, old and new newspaper articles (Burley et al., 2020; Neresini, 2017), speeches of political figures including those from the past centuries (Jackson, Watts, List, Drabble, & Lindquist, 2021; Savoy, 2010), digital books (Brooke, Hammond, & Hirst, 2015), and many other textual online data have been the main focus of CSS research. While all these online data sources bring rich contributions to the broad range of CSS research areas, there is one domain that requires special attention: social media data.

Data collected from social media platforms facilitate many possibilities such as answering serious social science-related questions, finding insights into both individual-level and anthropological phenomena (Harford, 2014; Lazer et al., 2009; Olteanu, Castillo, Diaz, & Kiciman, 2019). There is also a growing consensus that data collected from these domains can provide more than simple observations (Olteanu et al., 2019); social media domains are amongst the most valuable data sources for CSS research areas. The wide-range public usage of social media domains, the absence of data use restrictions, the simplicity of data acquisition through application programming interfaces (APIs), and the valuable content of the data made them attractive for researchers. These social media and social network platforms (e.g. Twitter, Facebook, Reddit, Wikipedia, other forums) can easily capture the evolution of sociological norms and rapid changes culturally and globally. For example, recent movements such as “Me Too” and “Black Lives Matter” could not expand globally without these platforms. This fact makes these platforms one of the principal sources of information for CSS research on such events. Recent studies utilize social media data to evaluate the effects of crises like the mass killings in the United States (Burley et al., 2020).

Key Terms in this Chapter

Noise: In the context of machine learning, noise corresponds to the type of data or features that do not contain meaningful patterns related to the problem of interest and have a possibility of disrupting and harming the learning process.

Underfitting: Corresponds to the event when a machine learning model does not learn the patterns present within the training set properly for reasons such as incorrect parameter selection or a small number of epochs for the case of neural networks. A machine learning that suffers from underfitting would fail to return acceptable results from the within-dataset experiments and the generalization experiments.

Overfitting: Corresponds to the event when a machine learning model memorizes the training set data instead of learning the patterns present within it for accurate generalization. When a model overfits data, the model can return high within-dataset performance while it fails to generalize to other data.

Bias: Prejudice towards or against a person, a group, or a class. In a machine learning context, there are various types of biases. Each bias can affect a model differently.

Language Generation: It is the process of automatically generating texts from models containing artificial intelligence. The goal of this process is to auto-generate texts that appear to be created by humans.

Tuning: The set of operations within machine learning algorithms to improve the performance of classification and prediction. Some of the tuning procedures happen during the learning process, while it is also possible to tune a model after the initial run of the training process.

Interdisciplinary Research: It is a type of research that employs knowledge, data, techniques, and theories from multiple disciplines. The main goal is to analyze or solve a specific problem of interest using the strengths of these different disciplines.

Machine Learning: A subfield of artificial intelligence where models can learn patterns from the data in a supervised or unsupervised fashion and tune themselves during the learning process.

Deep Learning: A subfield of machine learning that works with artificial neural networks containing many hidden layers and complex structures.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Applying Machine Learning to Online Data?: Beware! Computational Social Science Requires Care

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List