Notation | Term | Meaning |
D | Document | A text document consisting of a set of words |
W | Word | A sequence of non-blank characters |
WL | Word List | A list of meaningful words |
SW | Stop Words | A collection of stop words |
S | Stemmed Word | The stem of the word |
FC | Frequency count of a word | The number of times the word is found in the document |
T | Frequency threshold | User Input criteria to find dense words |
N | Document size | Total number of words found in the document |
M | Extracted words size | Total number of words from document after removing stop words from the N words. |
MS | Min Support | T * (N/ M) |
DW | Dense Word | A word whose frequency count (FC) in the document is greater than or equal to the Min Support (MS) |
CWx | Candidate Word phrase of length x in document, D | Sequence of x Dense Words which could be the Frequent Word phrase for the document, D |
FWx | Frequent Word phrase of length x in document, D | Candidate Word phrase of length x whose FC is greater or equal to Min Support and can be considered as a keyword phrase of length x for Document, D |
KW [i][j] | Keyword Set | A table (2 dim array) of frequent word phrases FWi,.i.e. ith row consists of all frequent word phrases of length i. KW[i][j] represents the jth FWi, frequent word phrase of length i |