COVID-19-Related Predictions Using NER on News Headlines

COVID-19-Related Predictions Using NER on News Headlines

Bidyut B. Hazarika, Urvish Trivedi, Harshita Dahiya, Nishtha Nandwani, Aakriti Gupta
DOI: 10.4018/978-1-7998-3299-7.ch012
OnDemand:
(Individual Chapters)
Available
$33.75
List Price: $37.50
10% Discount:-$3.75
TOTAL SAVINGS: $3.75

Abstract

Today, all the newspapers and online news content are flooded with the news of coronavirus (i.e., COVID-19). The virus has spread across the globe at an alarming rate. Thus, people need to remain updated about news regarding the ongoing pandemic which has taken whole world by storm. Therefore, named entity recognition (NER) is applied to extract important information from these news headlines and articles and further used for more applications related to COVID-19 in India. This chapter uses the SpaCy module to categorize the tokens extracted from the news headlines database into various pre-defined tags. Further, four different machine learning models, namely CRF Model, LSTM Model, LightGBM Model, and AdaBoost Model, are applied for performing tagging. After that, these tags are used to predict different information regarding COVID-19. Some of these applications include finding nearby hospitals and pharmacies, predicting future potential hotspots in India, worst affected states of India, gender-based comparisons, age group-based comparisons, and area-based spreading of the virus.
Chapter Preview
Top

Introduction

Today the world has become so technically advanced that the Internet has become the central point for all activities occurring around the globe. The Internet has a very important role to play in the lives of the people (Papacharissi & Rubin, 2000). The Internet itself contains very enormous data which maybe html pages, images, texts, audios, videos, etc. This information available on the Internet is also growing each day. Therefore, it becomes a very humongous task to maintain, update and store all this data on the Internet. There are many technologies that are available today for carrying out these processes (Chang, Kayed, Girgis & Shaalan, 2006). We can retrieve important information, prepare summary of whole information, identify the topic of the information and do a whole lot of other operations on the data available on the Internet (Soderland, 1997).

Thus, we can use various technologies of Machine Learning and Natural Language Processing (NLP) to manipulate the data accordingly as per the requirement. One of the main applications of NLP is NER i.e., Named Entity Recognition (Marrero, Urbano, Sanchez-Cuadrado, Morato & Gomez-Berbis, 2013; Mayfield, McNamee & Piatko, 2013; Lin & Hung, 2007). Any real-world object which has a particular meaning attached to it is known as an entity. The entities which tend to be more important, informative, and useful as compared to others are known as named entities.

Therefore, NER is basically the process of identifying important and useful entities and recognizing them so that they can be further used for many applications (“Named Entity Recognition: Applications and Use Cases”, 2021). These entities can be anything ranging from names of persons, organizations, locations, cities, states, numeric values, languages, etc. The newer NER systems can classify these named entities into a wide range of categories and tags. This process of classifying the entities into specific categories is known as tagging (Tkachenko & Simanovsky, 2012).

The ongoing pandemic i.e., COVID-19 is an infectious respiratory disease, which originated from the city of Wuhan in China in December 2019 has now spread its roots to almost all countries of the world (Singhal, 2020). Various guidelines and preventive measures are being regularly issued by the WHO (World Health Organization) which has declared the outbreak of this virus as a pandemic. This virus has mild symptoms and generally spreads through air and contact. This virus has spread at a very fast rate which has brought the lives of people around the globe at a standstill.

The cases in India have crossed the 2-lakh mark while more than 6.4 million cases have been discovered worldwide (“India Coronavirus: 11,727,733 Cases and 160,437 Deaths - Worldometer”, 2021). As a result, a nationwide lockdown had been imposed in India which prevented the spread of coronavirus to some extent as compared to other countries. Table 1 presents the information regarding the spread in India during lockdown periods.

Table 1.
Spread of coronavirus in India based on lockdown period
Lockdown Start DateTotal CasesDeathsGrowth Rate (%)
Part 1 (Mar 24)5621121.5
Part 2 (Apr 14)11,39339211.4
Part 3 (May 3)42,4561,3906.2
Part 4 (May 17)95,6793,0235.2

Complete Chapter List

Search this Book:
Reset