Performance Analysis of Machine Learning Algorithms for Cervical Cancer Detection

Performance Analysis of Machine Learning Algorithms for Cervical Cancer Detection

Sanjay Kumar Singh, Anjali Goyal
DOI: 10.4018/978-1-6684-7136-4.ch019
Chapter PDF Download
Open access chapters are freely available for download

Abstract

Cervical cancer is second most prevailing cancer in women all over the world and the Pap smear is one of the most popular techniques used to diagnosis cervical cancer at an early stage. Developing countries like India has to face the challenges in order to handle more cases day by day. In this article, various online and offline machine learning algorithms has been applied on benchmarked data sets to detect cervical cancer. This article also addresses the problem of segmentation with hybrid techniques and optimizes the number of features using extra tree classifiers. Accuracy, precision score, recall score, and F1 score are increasing in the proportion of data for training and attained up to 100% by some algorithms. Algorithm like logistic regression with L1 regularization has an accuracy of 100%, but it is too much costly in terms of CPU time in comparison to some of the algorithms which obtain 99% accuracy with less CPU time. The key finding in this article is the selection of the best machine learning algorithm with the highest accuracy. Cost effectiveness in terms of CPU time is also analysed.
Chapter Preview
Top

1. Introduction

Cervical cancer is the one of most common cancer found in women for the last 30 years or more, and almost every woman is at risk (Office on women's health, U.S. Department of Health and Human Services, 2018). The best way to prevent cervical cancer is to get an HPV vaccine, get regular pap tests, and be monogamous. In cervical cancer, malignant (cancer) cells form in the tissues of the cervix and Human papillomavirus (HPV) contamination is the real hazard factor for cervical disease. Pap test is one of the methods used to detect and diagnose cervical cancer. In this test, piece of cotton and a brush is used to collect cells from the surface of cervix and vagina. The cells are examined under a microscope to check for abnormality, this procedure is sometimes called as pap smear (National institutes of health, 2018). It is a challenging task to detect cervical cancer because there are no symptoms found in women at early stages of cervical cancer (American cancer society, 2018). A well-demonstrated approach to prevent cervical cancer is to have screening to discover pre-tumours before they can transform into obtrusive growth.

According to (international agency for research on cancer, 2018), number of new cancer cases predicted in 2020 for all age groups are shown in Figure 1. It is found that there is approx. 15.48% increase in patients of all age in world.

Figure 1.

World cervix uteri number of new cancers in 2020, includes all ages

978-1-6684-7136-4.ch019.f01

Hence it is recommended to have a reliable computer-based system for detection of cervical cancer in the world, as number of patients is increasing day by day. To implement such system, machine learning algorithms can be most suitable, and they can easily detect cervical cancer at its early stage.

Machine learning algorithms can be used at two different stages for detecting cervical cancer. First one is when it is used after extracting meaningful features. In this case, machine learning algorithms can used to learn these features that will further predict and detect the cancer. Second one is when machine learning algorithms are used to extract important features, learning of features followed by prediction and detection of cervical cancer. In this paper, online and offline machine learning algorithms are also used which are experimented upon benchmarked data sets. Online machine learning algorithms has ability to learn data sets one by one, also called incremental learning whereas offline machine learning algorithms also called batch learning learns through batches or groups of data at a time. Online machine learning algorithms are relatively fast as compared to offline machine learning algorithms.

Cervical cell classification can be used to classify the problem in major 2 class, 3 class, 4 class and 7 class problems. 2-class problem differentiates normal cells from abnormal cells. The 3-class problem differentiates cells among normal, low grade squamous intra-epithelial lesion (LSIL) and high grade squamous intra-epithelial lesion (HSIL). The 4-class problem differentiates between normal cells and 3 categories of abnormal cells mainly mild, moderate and severe level of dysplastic abnormal cells. The 7-class problem relates to differentiation of each class of cells. Minimum distance classifiers and ant colony optimization for classification of these cells in seven classes (Jantzen & Dounias, 2006). A metaheuristic algorithm with genetic algorithm for feature selection has been used for classification task for Pap smear images into two or more classes (Marinakis, Dounias & Jantzen, 2009). Various completive intelligent techniques such as clustering algorithms, inductive machine learning, genetic programming, ANFIS and second order neural networks for cervical cells classification in two or more classes (Dounias et al., 2006; Sing & Pandey, 2014). It has been observed that the classification rate of 2-class problem is better than 7-class problem. These algorithms have been used along with feature selection on extracted features.

Complete Chapter List

Search this Book:
Reset