Tapering Malicious Language for Identifying Fake Web Content

Tapering Malicious Language for Identifying Fake Web Content

Shyamala Devi N., Sharmila K.
DOI: 10.4018/978-1-6684-6444-1.ch011
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The neoteric occurrence, the pandemic, and global crisis entails the extensive use of web portals to unfurl information. While this has built the cognizance of the common man, the infinitely unnoticed enumeration of malicious content on the web has escalated copiously. Spurious data and fake information has done more harm than what is actually unraveled to the public; however, scrupulously meticulous measures to agonize their source and delve into mitigating these data has become quite a challenge. This indignation delves into step-wise analysis of identifying the hoax through systematically programmed algorithms using natural language processing.
Chapter Preview
Top

Introduction

The proposed methodology for identifying the malicious fake web content comprises of text mining from the malicious web site .The tools and techniques of Natural language processing and the deep learning techniques are applied in the methodology to detect the Malicious web content and the framework is represented in the below figure 1. Classify the Malicious fake Web content

Figure 1.
978-1-6684-6444-1.ch011.f01

Text analysis and the process of detecting fake content require an elaborate approach due to the intricacies it involves. The initial process incorporates the potently anatomized data to be obtained. This datasets holds a commingled data of authentic and spurious content for further processing. This datasets necessitates the method of web scrapping method that is explicated for extracting the appropriate data from the gargantuan web content that is available. The next phase of processing incorporates the scrapped texts which are a single clustered large embodiment of text to be separated into individual tokens. This process is effectively implemented using lexical parsing which scans the text clusters to effectively transform them into sequence of tokens. The next step is stop word removal in order to spurious over the spurious content and it is effectuated. Now that the content holds adequate correlation, a certain amount of pre-processing done through normalization is to scale the data is implemented. This process of normalization is utilized through the method of stemming and lemmatization that help in obtaining relevant content for Part of Speech(POS) tagging. Then the Subsequent approach of vectorization of text is words Vectorization in order to construct the environment ready for effective detection of anomalies. The final approach is to identify and extract this malicious content is executed through the method of BERT (Bidirectional encoder from transformer) in order to procure the associative correlation between the words and render an augmented precision of the web hoax that is disseminated. Thus the relationship behaviours between a text thereby evincing the malicious entailed web content through the series of we processing simulated using python programming.

Top

Platform Used For Identifying The Fake Web Content

Spyder (Python 3.6)

Spyder, the Scientific Python Development Environment, is a open itegerated development environment (IDE) that is incorporated with Anaconda. It incorporates altering, intelligent testing, troubleshooting, and reflection highlights. After you have introduced Anaconda, start Spyder on Windows, macOS, or Linux by running the spyder. Spyder is additionally pre-introduced in Anaconda Navigator, which is remembered for Anaconda. On the Navigator Home tab, click the Spyder symbol.

Jupyter Notebook

The Jupyter Notebook is an open source web application that you can use to make and share archives that contain live code, conditions, perceptions, and text. Jupyter Notebook is kept up with by individuals at Project Jupyter.

Jupyter Notebooks are a side project from the IPython project, which used to have an IPython Notebook project itself. The name, Jupyter, comes from the center upheld programming dialects that it upholds: Julia, Python, and R. Jupyter ships with the IPython part, which permits you to compose your projects in Python.

Top

Website Scrapping

Web scraping is a process of collecting the data from the website using the application programming interface. For the process of extraction the python code is written for querying a webserver and requesting the data from the web page which extract the data needed.

Initial steps to install the python beautiful soup for scraping the website

Beautiful Soup is a Python library for hauling information out of HTML and XML documents. It works with your cherished parser to give informal methods of exploring, looking, and altering the parse tree. It regularly saves developers hours or long stretches of work

  • 1.

    Building the web scrapper

    • Installing the beautiful soup

      • Beautiful soup is a python library used for web scrapping.

      • The basic method for installing in Linux platform

      • $ sudo apt-get install python-bs4

      • For the Macs

      • $ sudo easy_install pip beautifulsoup

Complete Chapter List

Search this Book:
Reset