Article Preview
Top1. Introduction And Problematic
Day by day, the body of electronic textual information increases. It becomes increasingly difficult to access relevant information without using specific tools to access to the content of texts by rapid and effective means. Software engineering is more developed, we have not the same application generation problem, the hardware is also very developed, in our day personal machine are powerful. For this last has become a necessary task to find the specific method to access to the content of the texts.
A summary of a text is an effective way to represent the content of the texts, and allow quick access to their content. The proposition of an automatic summarization is to produce a short text covering the essential content of the source text. “We cannot imagine our daily life without summary” says Inderjeet Mani [Mani, 2001].
Headlines, the first paragraph of a newspaper article, newsletters, weather, tables of results of sports competitions and catalogues library are just the summary. Even in research, the author of the article must accompany their scientific papers with summaries (abstract) written by them.
We can use the automatic summaries to reduce the time and find the relevant documents or to reduce processing large text by identifying key information. The suggested procedure claims on the principle that high-frequency words in a document are important words” Luhn, H. P. (1958)
The current literature presents three approaches of automatic summarization:
- •
Automatic Summarization by extraction: where we have three essential techniques: By Scoring, by Similarity or by prototype phrase. Edmundson, H. P. (1969) and Van Dijk, T. A. (1985).
- •
Automatic Summarization by understanding: using method of semantics analysis. Salton, G., and al (1997) and Kintsch, W., & Van Dijk, T. A. (1978)
- •
Automatic Summarization by automatic classification: using the method of bi-classification. Litvak, M., & Last, M. (2008, August)
In this paper we worked on automatic summarization by extraction, because it is a simple method to implement and gives good results; only that in previous works the summarization produced by extraction using a single technique at a time: Score, Similarity or sentence prototype.
Scoring gives generally good results, only that his weak point is its reduced ability to eliminate the phrase that is similar, in fact, if a sentence X passes the filter scoring a sentence Y which is similar to a point X will probably have a score that allows it to also pass the filter, which produces a repetitive sentence in the summary, which is logically false; secondly, the technical similarity has the strength to eliminate repetitive sentence, but its weakness is that it cannot ensure that the sentence is to keep a high weight, actually, as long as the sentence is greater the probability to have more similar phrase increases and we know that sentence is large tend to wear more information.
This work aims to use two techniques one after another, so that each one covers the technical point of weakness of the other and brings its power to the general approach, to see the impact of this proposition we experimented our approach and compared it with the result of using one technique.
TopAutomatic summarization appeared earlier as a field of research in computer science from the axis of NLP (automatic Natural Language Processing), HP Luhn [Luhn 1958] proposed in 1958 a first approach to the development of automatic abstracts from extracting phrases.
In the early 1960s, HP Edmundson and other participants in the project TRW (Thompson Ramo Wooldridge Inc) [Edmundson 1963] proposed a new system of automatic summarization where it combined several criteria to assess the relevance of sentences to extract.
These works were made to identify the fundamental ideas around the automatic summarization, such as problems caused by extraction to build summaries (problems of redundancy, incompleteness, break, etc..), the theoretical inadequacy of the use of statistics, or the difficulties to understand a text (from semantic analysis) to summarize.