Exploratory Cluster Analysis Using Self-Organizing Maps: Algorithms, Methodologies, and Framework

Exploratory Cluster Analysis Using Self-Organizing Maps: Algorithms, Methodologies, and Framework

Nuno C. Marques, Bruno Silva
Copyright: © 2023 |Pages: 27
DOI: 10.4018/978-1-6684-9591-9.ch010
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

As the volume and complexity of data streams continue to increase, exploratory cluster analysis is becoming increasingly important. In this chapter, the authors explore the use of artificial neural networks (ANNs), particularly self-organizing maps (SOMs), for this purpose. They propose additional methodologies, including concept drift detection, as well as distributed and collaborative learning strategies and introduce a new open-source Java ANN library, designed to support practical applications of SOMs across various domains. By following our tutorial, users will gain practical insights into visualizing and analyzing these challenging datasets, enabling them to harness the full potential of our approach in their own projects. Overall, this chapter aims to provide readers with a comprehensive understanding of SOMs and their place within the broader context of artificial neural networks. Furthermore, we offer practical guidance on the effective development and utilization of these models in real-world applications.
Chapter Preview
Top

Introduction

Self-organizing maps (SOMs), also known as Kohonen maps (Kohonen, 1982), represent a significant advancement in artificial intelligence, providing a robust framework for data visualization and analysis. Their ability to transform complex, high-dimensional data into simple, comprehensible visual representations has made them an indispensable tool in a variety of fields, from computer science to bioinformatics, finance, and beyond. Since their inception in the early 1980s by Teuvo Kohonen, SOMs have significantly contributed to our understanding of complex systems, pattern recognition, and data representation (Kohonen, 2001). SOMs provide a robust method for visualizing and comprehending intricate data structures. By organizing high-dimensional input data into a low-dimensional grid, they facilitate the identification of hidden patterns, clusters, and relationships within the data, which might otherwise be challenging to discern (Vesanto, 1999).

In many scientific and engineering domains, researchers often encounter datasets with numerous variables and complex relationships. SOMs offer a unique solution to this challenge, enabling visualization and interpretation. The ability of self-organizing maps to uncover latent patterns within datasets has played a pivotal role in data mining and knowledge discovery. By identifying clusters and similarities in data, SOMs support the exploration of large datasets to extract valuable information. SOMs have proven instrumental in discovering new relationships, trends, and correlations, enabling researchers and scientists to gain valuable insights into data distributions and spatial relationships (Oja, Kaski, & Kohonen, 2003). Today, SOMs continue to empower informed decision-making and drive innovation across diverse domains, including genomics, finance, environmental sciences, marketing, healthcare, and social sciences.

SOMs have served as a crucial milestone in the development of artificial neural networks. By demonstrating how a simplified model of the brain's self-organization could be applied to data analysis, SOMs laid the foundation for subsequent advancements in neural network research. They have contributed to the evolution of deep learning, reinforcement learning, and other neural network architectures, leading to breakthroughs in fields such as computer vision, natural language understanding, and robotics (Goodfellow, Bengio, & Courville, 2016).

This chapter aims to guide readers through the role of SOMs in machine learning and introduce a publicly available neural network framework for exploratory cluster analysis. We will explore the practical application of these techniques using the Wine Dataset, a popular choice for machine learning and data mining tasks. By following along, readers will gain practical insights into the power of SOMs in handling complex, multi-dimensional data.

Key Terms in this Chapter

Ubiquitous: Self-Organizing Maps (ubiSOMs): An extension of the SOM algorithm designed for generating local SOM models over potentially unbounded, non-stationary data streams.

Best Matching Unit (BMU): The neuron closest to the input data point, determined by a metric distance such as Euclidean distance.

Learning Rate: Determines the extent to which the prototypes are adjusted during each iteration.

Self-Organizing Maps (SOM): Also known as Kohonen maps, these are a grid of prototype units. Values in these units are adjusted using the SOM learning algorithm over a dataset.

Data Stream: Continuous and potentially infinite data set with each example represented as a fixed-dimensional vector of values.

Unified Distance Matrix (U-Matrix): Represents distances between adjacent neurons. It is presented with different colorings following a temperature-like color scale: warmer coloring (e.g., red) corresponds to a large distance, while cooler coloring (e.g., blue) indicates that codebook vectors are close to each other in the input space.

Prototype Unit: ( also known as codebook vector) A prototype vector with the same dimensionality as the input data. The set of all NxM prototype units makes up the SOM grid of neurons (the lattice) for a map of size NxM units.

Neighborhood Radius: Determines the range of neighboring neurons that are updated along with the winning neuron.

Complete Chapter List

Search this Book:
Reset