Cancer Prediction Using Graph Database

Cancer Prediction Using Graph Database

Ansh Gulati, Ameya Taneja, Saurabh Rawat, Anushree Sah
Copyright: © 2024 |Pages: 11
DOI: 10.4018/979-8-3693-5271-7.ch005
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This research chapter aims to provide a comprehensive overview of cancer cases and rates in the various states of the United States. It explores the trends and patterns of cancer incidence and mortality in the country, as well as the factors such as age, sex/gender, type of cancer whether it is lung or breast cancer and its rates and also the factors that contribute to the development and progression of the disease. The chapter reviews the latest statistics on cancer rates mainly breast and lung cancer in different population groups, including age, sex/gender, and geographical location/different states of USA. By analyzing the data, the project aims to provide insights and predictions related to the occurrence of cancer in the US. The Python code implements visualizations of cancer data for various states in the USA using Pandas and Matplotlib libraries. The dataset is read into a Pandas data frame and various types of visualizations are produced for the cancer data, including scatter plots, and bar graphs. The scatter plot represents the rate of lung and breast cancer in various states of the USA, and the bar graphs represent the total number of breast cancer and lung cancer, as well as the cancer rates in people of different age groups for each state. The visualizations allow for the comparison of cancer rates and total numbers between different states and age groups, aiding in identifying the states with higher cancer rates and potentially identifying any trends or patterns. The chapter concludes by discussing the challenges and opportunities for cancer prevention, early detection, and treatment in the United States, and the implications for public health policy and practice. Potential applications of this analysis include informing strategies for cancer prevention and treatment in different states and age groups. The project could have implications for public health and policy, as well as for advancing the understanding of cancer and its impact on society. Overall, this chapter aims to provide a comprehensive and up-to-date picture of the burden of cancer in the United States and to identify areas for further research and action.
Chapter Preview
Top

Introduction

About Cancer

Cancer is a complex and life-threatening disease that has become a significant health concern worldwide. It’s a disease in which the human's cells grow out of control and multiply throughout the body. Cancer is a cluster of circumstances in which cells in the body start to reproduce and multiply in an unconstrained manner. The damage may succeed, or may be caused by errors that occur during normal cell regeneration (Bevilacqua et al., 2006). Cancer cells can go to different parts of the body, where they start to multiply and shape new tumors. This is called metastases. It happens when cells enter the bloodstream (Shandilya & Chandankhede, 2018).

How Common is Cancer?

In the US, Cancer is the second major source of death, holding for nearly one in every four deaths. Cancer incidence and its types are affected by many factors, including age, gender, ethnicity, etc (Rawat & Sah, 2013b). Lung cancer is the most commonly diagnosed cancer in men, followed by oral cavity and laryngeal cancers whereas cervical cancer and breast cancer are most commonly diagnosed in women (Mathur et al., 2023). Therefore, it’s crucial to acknowledge the attributes that help to progress cancer growth and find solutions to predict its occurrence (Da Cruz et al., 2018).

Aim of This Project

Machine learning and data analysis techniques have proven to be effective in identifying patterns and predicting outcomes in various fields, including cancer research (Rawat & Kumar, 2020). In this project, we aim to use machine learning and data analysis techniques to analyze cancer prediction rates in various states of the USA and predict the likelihood of breast and lung cancer development in the future. By analyzing this data, the project aims to provide insights and predictions related to the occurrence of cancer in the US (Bevilacqua et al., 2006; H. Chen et al., 2016).

Top

Methodology Of The Project

About the Dataset

We will use a publicly available dataset (a CSV file) that contains information on the total rates and numbers of breast cancer and lung cancer cases in different states of the US, as well as the cancer rates of different age groups. Using this dataset, we will perform data analysis and visualization techniques to identify any patterns or correlations between the different factors such as age, sex/gender, type of cancer whether it is lung or breast cancer and its rates (Oluyide et al., 2018).

Use of Python Programming Language and Its Specific Libraries in the Project

The project utilizes the pandas library in python to read and manipulate the data and the code also utilizes matplotlib library to create various graphs and visualizations (such as scatter plot and bar graphs) to better understand the trends and patterns in the data. Also the PrettyTable module is used to create a table that maps the full name of each state to its abbreviation (Sah et al., 2020) (Sah, Bhadula, et al., 2018). Specifically, the code creates scatter plots and bar graphs to represent the total rates and numbers of breast cancer and lung cancer cases in different states of the US, as well as the cancer rates of different age groups (Lamine et al., 2017; Rodriguez-Mier et al., 2016).

Complete Chapter List

Search this Book:
Reset