Abstract
Text summarization refers to creating a summary of a document without detracting from or distorting the message. The text of the document must be presented in a concise and clear manner. The need for text summarization has been felt for more than a decade as the volume of available data increased. In this paper, an attempt has been made to understand the methods of summarization proposed by various scholars. Five papers have been selected for analysis and review. The aim of the study is to understand the process of text summarization.
Introduction
As the volume of digital data increases, the need for automation increases. Automatic text summarization is one such use of technology for the summarization of a literary work using computer software. Automated summarization is a relatively new technology which has yet to be refined. Many researchers and scholars have studied the technology and approach. For the purpose of this paper, we have selected 5 studies or surveys conducted on the subject. We summarize and analyze the chosen surveys herein.
A total of 5 studies were selected for analysis. Each of these studies examines the methodology for automated summarization. New algorithms, methods and techniques for summarization are explored. Each new technique or method attempts to overcome the traditional extraction and abstraction method and combine the two to get a perfect summary of the document.
Majid Ramezania & Mohammad-Reza Feizi-Derakhshib
This article by Ramezania et. al. focuses on a part of automated text summarization called Natural Language Processing (NLP). The authors present three different approaches to text summarization. Due to the increasing amount of research being conducted and the availability of large volumes of data on the internet, the need for software to summarize large volumes of text has been on the rise. The approach to summarizing text is extraction, abstraction or a mixed approach involving extraction and then further summarization. The authors study the effectiveness of the various approaches and study some systems and the method of evaluation of the effectiveness. The authors conclude by stating that although automated summarization has come a long way from its inception, there is still much research ongoing in the field and new approaches and evaluation techniques may be forthcoming in the future.
M. Kalaiselvan and A. Vijaya Kathiravan
As is evident from the title of their paper, Kalaiselvan and Kathiravan present a pioneering tool for text summarization based on the relevance of certain repeated words. The study is concisely summarised in the abstract, presenting the methods and conclusion. After a brief introduction, they expound the problems in the extraction and abstraction approaches to summarization stating that nether approach can give a true summary of the text. One leads to too much information and problems of coherence while the other may not be able to present a true summary because of the systems inherent inability to capture certain representations.
They conclude by saying that the star map approach has the advantage of text extraction approach and the weightage given to the words and sentences allow for extraction of the most relevant portions of the text. They claim that their system produced a summary which was comparable to manual summarization.
Sankar K and Sobha L
Sankar and Sobha, authors of “An Approach to Text Summarization”, propose a new summarization technique that involves identifying coherent blocks in the document and ranking them. Although the title does not specifically state what the new technique is, it nevertheless indicates that it is a new approach to text summarization. In the abstract, Sankar and Sobha touch upon the fact that their technique exploits the lexical relationship between sentences. The introduction further elaborates the theory talking about the extraction of chunks and the rules for selection. Here they state that the technique is a hybrid approach involving word frequency and position. The authors go on to explain the technique as follows.
First they explain the term “Lexical Cohesion” and then expound the rules for selection of the chunks. They then go on to give a detailed and technical explanation of the technique.
In conclusion, Sankar and Sobha refer to their system as a “coherence chunker”. They claim that their approach is highly portable to any language or domain because of the ranking and coherent chunks selected through recognition of lexical cohesion.
Rasim ALGULIEV Ramiz ALIGULIYEV
In his paper “Evolutionary Algorithm for Extractive Text Summarization”, Alguliev and Aliguliyev propose a new method for text summarization that extracts clusters of sentences from the document. They use a differential evolution algorithm and claim that their method shows better results than most summarization systems. They explain the concept of summarization and classify the process into three parts or phases – analysis, transformation, and synthesis. In the first phase the document is analysed, in the second phase a rough summary is prepared and in the final phase a summarization is presented to the user. The authors credit other researchers with similar approach and opine that the centroid based method of clustering is the best. They explain the mathematical functions used for clustering and selection and conclude by saying that in their system, sentences are first clustered, and then important sentences that represent each cluster are extracted. These sentences are then used to summarize the document. They state that they have evaluated their method using the ROUGE – 1, ROUGE – 2, and ROUGE –SU4 metrics.
Mohammed Salem Binwahlan, Naomie Salim, and Ladda Suanmali
Binwahlan et. al. present a new model for text summarization which is based on swarm intelligence. They claim that their model shows 43% similarities to the human summarizations as against 39% shown by MS-WORD. They attempt to use particle swarm optimization, a technique used in machine learning, for text summarization. They credit Ziegler and Skubacz with attempting to use the technique on HTML pages. In this technique, sentences which cross a predetermined PSO threshold are deemed significant. In the first run the punctuation marks and stop words are removed. Then each sentence is featured and the features are vectored. These vectors are used for PSO scoring. In their concluding paragraph they state that the purpose of using PSO is to allocate proper importance to each sentence in the text and selection is based on order of importance. The authors Binwahlan et. al. state that they will continue to research further on their technique for summarization.
Conclusion
Many attempts have been made to automate text summarization. Researchers have attempted to formulate the selection of important portions of documents so that the reader can only read the highlights without having to spend time reading the entire text. The underlying methodology in all techniques is that of extraction and abstraction. The technique for text summarization needs further research and refinement and in future researchers will continue to explore new methodologies and approaches.