Conditional Random Fields for Modeling Structured Data

Conditional Random Fields for Modeling Structured Data

Tom Burr, Alexei Skurikhin
Copyright: © 2015 |Pages: 10
DOI: 10.4018/978-1-4666-5888-2.ch608
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Chapter Preview

Top

Introduction

In statistical science or machine learning, pattern recognition is the task of using measurements from an input domain X to predict their labels from an output domain Y. A typical setting is binary classification where Y ={0,1}, with 0 being the label for “objects of type A” and 1 being the label for “objects of type B.” For example, measurements of sepal and petal length and width from each of many iris plants have been used to predict the species from among the three species of iris studied by Fisher (1936). The inferred label for a specific sample of iris plants as one of three types is independent of the inferred label for another specific sample so this task does not have structure (defined below) in the output domain.

In general, there can be dependence among samples in the input domain, or output domain, or both or neither. For example, fraud detection in credit card transactions relies partly on pattern recognition methods that exploit relations among the samples (transactions), both in the input and output domains (Bolton & Hand, 2002). Pattern recognition is commonly applied to unstructured data for which the samples are independent with respect to both the input and output domains, as in Fisher’s iris data. In contrast, when there is structure in the input and output domain, i.e., sample input and sample labels are not independent, the problem is referred as structured machine learning or structured prediction. Examples, among many others, include fraud detection in credit card transactions, labeling pixel patches in images and named entity recognition (NER) in text. NER is the task of identifying and classifying proper names in text, including locations (e.g., New York, China), names of people, companies and organizations. One challenge in NER is that correctly recognizing named entities requires context – information on co-occurrence of other entities in the surrounding text. Similar situations often occur in higher-level image interpretation. For example, recognizing a scene as an office environment increases the expectation of detecting a computer inside this scene; in contrast, the expectation of a computer inside a grocery store scene is not high as in an office scene.

In addition to providing some general discussion, this chapter mainly uses image analysis applications in which there is dependence among samples (pixels or pixel patches) in the output domain because nearby pixels tend to have similar labels such as “natural” or “manmade,” as explained below. Figure 1 illustrates the role of context in disambiguating object recognition. In Figure 1, a spatial context illustrates the central element, 13 in the context of numbers versus B in the context of letters (Bruner & Minturn, 1955) and in Figure 1b, what may appear as spiders, when viewed alone out of the surrounding context, becomes the top burner grates for gas range, when viewed in the context of the whole scene.

Figure 1.

An illustration of context to improve recognition of:(a) the central element, 13 in the context of numbers vs. B in the context of letters; (b) what may appear as spiders, when viewed alone out of the surrounding context, becomes the top burner grates for gas range, when viewed in the context of the whole scene

978-1-4666-5888-2.ch608.f01

Probabilistic graphical models (PGMs) are being increasingly used to model problems having a structured domain. PGMs are represented by two main categories of models. Directed graphical models are known as Bayesian Networks (BNs) and undirected graphical models are known as Markov Random Fields (MRF) and Conditional Random Fields (CRF). PGMs are used to express dependencies between the input and output domains as well as dependencies within domains and to enable probabilistic inferences such as answering queries using the probabilistic model (e.g., a model based on a CRF or a MRF or a BN) of the problem (such as NER in text, image segmentation, or object recognition in images). A key task is to compute the probability distribution over the variables of interest (for a test sample called the query) given the observed values of other random variables (the evidence).

Key Terms in this Chapter

Markov Chain Monte Carlo (MCMC): MCMC is a broad term for a class of numerical algorithms used to estimate a posterior distribution, often the posterior distribution of model parameters. Even more generally, MCMC algorithms are used for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution .

Inference: Inference is a term that does not have a consistent definition. To many statisticians, inference includes all aspects of model formulation, including choosing a model, choosing a neighborhood structure, estimating the corresponding model parameters, and the final step of inferring the class label of a sample inference. In the areas of probabilistic graphical models and machine learning, inference is defined as answering queries using a probabilistic model of the problem domain. That is, inference refers to computing the probability distribution over the variables of interest (the query) given the observed values of other random variables (the evidence). Marginal inference then refers to computing the distribution of a subset of variables. More specifically, given a graphical model, we often wish to answer probabilistic queries of the form P( y | x ), where y is the “label” variable and x is the “input”' variables.

Pattern Recognition: The task of assigning each input value to one of a given set of classes (for example, to infer whether a given pixel is a natural or man-made object).

Structured Data: Structured data is data for which there is a structure in the output domain, i.e. labels of the individual samples are interdependent, such as often occurs when pixels in an image have to be labeled. Typically, the class label y and/or the pixel’s feature data x are not spatially independent, because pixels in the neighborhood of a given pixel i tend to have similar or dissimilar labels and/or feature data.

Conditional Random Field (CRF): A CRF is a probabilistic graphical model that is often applied in pattern recognition for structured prediction . Whereas an ordinary classifier predicts a label for a single sample without regard to “neighboring” sample’s labels, a CRF can take the labels of neighboring samples as well as features corresponding to neighboring samples (context) into account while predicting a label for a given sample.

Complete Chapter List

Search this Book:
Reset