Abstractive Turkish Text Summarization and Cross-Lingual Summarization Using Transformer

Abstractive Turkish Text Summarization and Cross-Lingual Summarization Using Transformer

Eymen Kagan Taspinar, Yusuf Burak Yetis, Onur Cihan
DOI: 10.4018/978-1-6684-6001-6.ch011
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Abstractive summarization aims to comprehend texts semantically and reconstruct them briefly and concisely where the summary may consist of words that do not exist in the original text. This chapter studies the abstractive Turkish text summarization problem by a transformer attention-based mechanism. Moreover, this study examines the differences between transformer architecture and other architectures as well as the attention block, which is the heart of this architecture, in detail. Three summarization datasets were generated from the available text data on various news websites for training abstractive summarization models. It is shown that the trained model has higher or comparable ROUGE scores than existing studies, and the summaries generated by models have better structural properties. English-to-Turkish translation model has been created and used in a cross-lingual summarization model which has a ROUGE score that is comparable to the existing studies. The summarization structure proposed in this study is the first example of cross-lingual English-to-Turkish text summarization.
Chapter Preview
Top

Background

For abstractive summarization, a number of models utilizing sequence-to-sequence architecture have been presented recently. The transformer model, which exclusively relies on the attention process, was introduced by Vaswani et al. (2017). The attention mechanism was further utilized by the researchers to provide promising results in summarization (Lewis et al., 2019; Raffel et al., 2020). Lewis et al. (2019) proposed the BART model which contains both a bidirectional encoder and an autoregressive decoder. In the BART model, random noise is added to the text data and the original text is reconstructed using a sequence-to-sequence architecture. Raffel et al. (2020) introduced the T5 model which is a text-to-text framework based on an attention mechanism that can be used for various text processing tasks including translation, classification, and summarization. These models are remarkably successful in making sense of sentences since they consist of both the encoder and decoder structures of the Transformer language model, which makes them preferred for translation and summarization problems.

English text summarization problem has been examined by many authors in the literature (Rush et al., 2015; Chopra et al., 2016; Lin et al., 2018). Rush et al. (2015) used a convolutional and attention-based encoder for summarization. Chopra et al. (2016) utilized RNN cells to create a decoder block. Nallapati et al. (2016) suggested an abstractive summarization system for English texts using RNN cells in both encoder and decoder blocks. However, these attention-based structures lead to grammatical errors, semantic irrelevance, and repetition. Lin et al. (2018) provided a solution to this problem using CNN filters and LSTM cells. The studies containing both encoder and decoder structures of the Transformer architecture show higher performances in perceiving text and produce better texts (Raffel et al., 2020; Lewis et al., 2019). Zhang et al. (2019) performed a pre-trained model for English with the C4 corpus that is proposed by Raffel et al. (2020). The fine-tuning stage is performed with ready-to-use datasets such as XSum, CNN, NEWSROOM, and Multi-News. Researchers have recently concentrated on cross-lingual text summarization as a result of the popularity of English text summarization models. Wan et al. (2010) applied the “first summarize then translate” principle to summarize Chinese texts. While they stated that the advantage of this method is translating less text, the results were not very satisfactory. Zhu et al. (2020) utilized Transformer and translation layer together in their model. Most of the cross-lingual summarization studies in the literature have been conducted to summarize Chinese texts.

Key Terms in this Chapter

Self-Attention Mechanism: It is the mechanism that expresses the amount of attention that words show to each other within the encoder and decoder layers. Basically, after the multiplication process is applied to the matrices created by the Query, Key, and Value linear layers, the amount of attention the words pay to each other is determined. The created attention map is used when transforming.

FLOPS: Floating Point Operations is the unit that expresses the computational cost. It can also be used to evaluate the cost of training and expresses how many basic mathematical operations will be applied to the floating-point number while performing the algorithm.

ROUGE-L: Refers to the scoring of the longest common subtext between texts.

ROUGE-1: Determine the overlap of a single word between generated and reference text.

ROUGE: Recall-Oriented Understudy for Gisting Evaluation is a scoring algorithm that evaluates the similarities of the texts. It is used for summarization and translation. References are compared to the text, generated summary, or translation. Different ROUGE measures are defined based on the comparison mechanism.

ROUGE-2: Determine the overlap of a pair of words between generated and reference text.

BLEU: Bilingual Evaluation Understudy Score is a scoring algorithm for comparing a target translation of the text to a reference translation.

Complete Chapter List

Search this Book:
Reset