Implementation Details of Decision Tree Algorithms Using Dataflow

Implementation Details of Decision Tree Algorithms Using Dataflow

DOI: 10.4018/978-1-7998-8350-0.ch007
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This chapter presents dataflow paradigm in general and streams of data, offsets, different number representations as key points for acceleration and discusses implementation details of C4.5 on the dataflow accelerators. The limited number range of the algorithm makes it suitable for dataflow implementation using custom data types and its iterative nature enables utilization of data streams. It is shown how part of an algorithm (information gain entropy) can be migrated using advanced optimization constructs.
Chapter Preview
Top

Algorithm

This section presents implementation details of the C4.5 algorithm (Hssina, Merbouha, Ezzikouri, & Erritali, 2014) (Dai & Ji, 2014) for decision tree construction. Listing 1 presents a pseudocode of the algorithm that utilizes information theory methods to choose the attribute of the training data that most effectively splits instances into subsets described with labels.

Listing 1. implementation details of the C4.5 algorithm. Information gain ratio is the ratio of observations to the total number of observations.
•	Check for the above base cases.
•	For each attribute a, find the normalized information gain ratio from splitting on a.
•	Let a_best be the attribute with the highest normalized information gain.
•	Create a decision node that splits on a_best.
•	Recurse on the sublists obtained by splitting on a_best, and add those nodes as children of node.

Figure 2 presents an example of a decision tree created by the C4.5 algorithm. According to the training data, attribute outlook has the best information gain ratio and thus is in the root of the tree. Leaves in the tree present the class or output of an algorithm which in this case predicts whether an action will be taken.

Figure 1.

An example of a decision tree which predicts whether an action will be taken according to the weather data.

978-1-7998-8350-0.ch007.f01
Top

Implementation Details

In order to implement such an algorithm on the dataflow accelerator, the first thing is to encode categorical values to numerical. Numerical values could be stored on the host machine either on card memory and streamed to the accelerator in order to compute information gain for each attribute.

Data streams connect two endpoints and transfer data in ticks between them (Pell, Mencer, Tsoi, & Luk, 2013). Figure 2 illustrates data streams used in the accelerators for streaming variables to the accelerator and back to the host machine or on card memory. In the first tick, the start of the stream presents number 1 while in the second tick stream shifts and next element is 2.

Figure 2.

An example of data streams in dataflow accelerators

978-1-7998-8350-0.ch007.f02

Streams enable accessing neighbor elements relative to the head of the stream. It could be beneficial for calculating information gain to access neighbor elements in a stream in order to produce the result faster. They are used in use cases where neighbor elements are necessary for producing the result, as illustrated in Figure 3.

Figure 3.

An example of stream offsets utilization in a matrix

978-1-7998-8350-0.ch007.f03

In order to use offsets it is important to determine what type of offsets is needed:

  • static offsets have a size fixed at compile-time

  • variable offsets have size set at run time before a stream is processed

  • dynamic offsets have sizes set at run time during stream processing

Key Terms in this Chapter

Offset: Dataflow technique for accessing neighbor elements in stream.

Fixed-Point: Number representation frequently used in the dataflow paradigm.

Floating-Point: Number representation frequently used in the controlflow paradigm.

Stream: Dataflow technique for moving data between the host machine and the DFE.

C4.5: Decision tree algorithm.

Complete Chapter List

Search this Book:
Reset