Article Preview
TopIntroduction
Approximately 80% of all data today exists in digital form as unstructured text (e.g., news, e-mails, social media feeds, contracts, memos, clinical notes, and legal briefs) (Raghavan et al., 2004). The Internet is the premier digital platform for online news content and unstructured text (Chung, 2008). The ability to unlock hidden structure and latent meanings in unstructured text is an important area of research because text is a fundamental device of communication and human interaction for expressing real-world issues.
“The formation and transmittal of group standards, values, attitudes, and skills are accomplished largely by means of verbal communication” (Cartwright, 1953). As a consequence, efforts to understand social interaction, cooperation and influence require the study of text. Given the widespread use of unstructured text for individual and mass communication, such as email, online news and social media feeds, it is particularly important to understand how social influence and information about complex socio-environmental issues spread through online content.
The use of strategic devices for presenting salient aspects and perspectives about an issue while using certain keywords, as well as stereotyped images and sentences, for the purpose of conveying latent meanings about an issue, it is called framing (Entman, 1993). The framing of news stories can shape public interpretation of social and environmental news (Druckman & Bolsen, 2002; Sheufele & Tewksbury, 2007; Tewksbury & Scheufele, 2009). Public opinion, attitudes, beliefs, and behaviors can be influenced by how an issue is framed, particularly when framing comes from elites (Druckman & Bolsen, 2002).
The causal effect of media communication, specifically the influence of the words and frames the media uses to influence public perceptions of social and environmental issues, has been studied extensively. However, it traditionally has been examined using content analysis, which was developed specifically to aid in the interpretation of social discourse or text for communication research (Holsti, 1969; Krippendorff, 1980). Content analysis involves methodical evaluation and categorization of text (Riffe et al., 2005). Current approaches to content analysis, however, require scholars and researchers to thoroughly examine documents in search of patterns in the text. This approach to text analysis is dependent on humans and limits the applicability of the analysis of large-scale unstructured text. Thus, a more effective tool for unlocking latent meanings found in unstructured text could enhance our understanding of online behaviors, responses to online advertising, and media influence on public perceptions.
Text mining, also known as text data mining (Hearst, 1997), is a multidisciplinary field involving information retrieval, text analysis, information extraction, clustering, categorization, visualization, database technology, and machine learning. Text mining, coupled with an interest in understanding social influence and information diffusion for document summarization (i.e., topic modeling, sentiment analysis, and opinion mining), is an active area of research (Bindela et al., 2015; Kempe et al., 2003). Text mining, when combined with machine learning algorithms, techniques and methodologies, offers an added value to data integration tasks: they highlight the similarities between heterogeneous sources and text features, which reduces uncertainty and risk exposure when performing the integration tasks.
The widespread availability of large data repositories, like those found in online news articles, creates an opportunity to develop new methods of text mining. These methods use machine learning algorithms and mathematical techniques to select, organize, and evaluate large quantitates of unstructured text. While research on the use of frames and public attitudes in traditional news venues has been widely explored, the identification and analysis of frames in unstructured online news has received minimal attention.