Keyword Extraction

Keyword Extraction

DOI: 10.4018/978-1-7998-3772-5.ch006
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Keywords are defined as phrases that capture the main topics discussed in a document. As they offer a brief yet precise summary of document content, they can be utilized for various applications. In an IR (information retrieval) environment, they serve as an indication of document relevance for users, as the list of keywords can quickly help to determine whether a given document is relevant to their interest. As keywords reflect a document's main topics, they can be utilized to classify documents into groups by measuring the overlap between the keywords assigned to them. Keywords are also used proactively in information retrieval (i.e., in indexing).
Chapter Preview
Top

Terminology And Notations Used

The general terminology used in this chapter is brie y discussed in Table 1.

Table 1.
Terminology used for keyword extraction
  NotationTermMeaning
  DDocumentA text document consisting of a set of words
  WWordA sequence of non-blank characters
  WLWord ListA list of meaningful words
  SWStop WordsA collection of stop words
  SStemmed WordThe stem of the word
  FCFrequency count of a wordThe number of times the word is found in the document
  TFrequency thresholdUser Input criteria to find dense words
  NDocument sizeTotal number of words found in the document
  MExtracted words sizeTotal number of words from document after removing stop words from the N words.
  MSMin SupportT * (N/ M)
  DWDense WordA word whose frequency count (FC) in the document is greater than or equal to the Min Support (MS)
  CWxCandidate Word phrase of length x in document, DSequence of x Dense Words which could be the Frequent Word phrase for the document, D
  FWxFrequent Word phrase of length x in document, DCandidate Word phrase of length x whose FC is greater or equal to Min Support and can be considered as a keyword phrase of length x for Document, D
  KW [i][j]Keyword SetA table (2 dim array) of frequent word phrases FWi,.i.e. ith row consists of all frequent word phrases of length i. KW[i][j] represents the jth FWi, frequent word phrase of length i

Complete Chapter List

Search this Book:
Reset