Anatomizing Lexicon With Natural Language Tokenizer Toolkit 3

Anatomizing Lexicon With Natural Language Tokenizer Toolkit 3

Simran Kaur Jolly (Manav Rachna International Institute of Research and Studies, India) and Rashmi Agrawal (Manav Rachna International Institute of Research and Studies, India)
Copyright: © 2019 |Pages: 35
DOI: 10.4018/978-1-5225-6117-0.ch011
OnDemand PDF Download:
No Current Special Offers


NLTK toolkit is an API platform built with Python language to interact with humans through natural language. The very first version of NLTK was released in 2005 (1.4.3), which was compatible with Python 2.4. The latest version was in September 2017 NLTK (3.2.5), which incorporated features like Arabic stemmers, NIST evaluation, MOSES tokenizer, Stanford segmenter, treebank detokenizer, verbnet, and vader, etc. NLTK was created in 2001 as a part of Computational Linguistic Department at the University of Pennsylvania. Since then it has been tested and developed. The important packages of this system are 1) corpus builder, 2) tokenizer, 3) collocation, 4) tagging, 5) parsing, 6) metrics, and 7) probability distribution system. Toolbox NLTK was built to meet four primary requirements: 1) Simplicity: An substantive framework for building blocks; 2) Consistency: Consistent interface; 3) Extensibility: Which can be easily scaled; and 4) Modularity: All modules are independent of each other.
Chapter Preview

Installing Python And Nltk

In order to install python 3.4 version go to and install the version 3.4. (Figure 1)

Figure 1.

Installation of Python 3.4


After installing python version 3.4 install the NLTK toolkit version 3.0.

  • 1.

    First we install NLTK(natural language toolkit): pip install nltk.

  • 2.

    Install Numpy (optional) package if user needs: pip install numpy.

  • 3.

    While testing the installation of nltk toolkit we can run it on python GUI and write the command:

    • >>>import nltk(it will import the whl(wheel files) of nltk and related packages)

There are two existing versions of nlp i.e. python 2.7 and python 3.4 which are very much incompatible with each other. The python 3.y versions are more coherent, more consistent and user friendly GUI is provided. All the instructions written for versions of python 2x may not run in version of python 3y and if they run the output of the code id different in both the versions. Not all the organizations have updated to python 3.y versions and are still relying on python 2.x versions due to the ongoing service and credibility.


Installing Python 3.4 On Windows Systems

There are two variants of Python 3.4 for Windows — a 32-bit version and a 64-bit version. Obviously, the 64-bit version requires a 64-bit Windows computer. Fortunately, most Windows PCs sold over the past few years are 64-bit. However, the 32-bit version of Python can run on both 64-bit Windows PCs and 32-bit Windows PCs.

For this training user are using 32 bit version of python GUI on windows 8 because of the official release of numerical python i.e. numpy is currently available on 32 bits windows only. course, user must use the 32-bit version of Python on Windows. The reason is that the official release of numpy is currently available for Windows only in 32-bit format.

If the reader wants to install the correct version of python 3.4 then click on the hyperlink given: python-3.4.1.msi— and download the executable file for the setup or reader can browse to and download them from there.

Double-click on the file python-3.4.1.msi to start the installation. Following dialog box pops up on the screen. (Figure 2)

Figure 2.

Python Installation Completed


If subsequently a dialog box resembling below for any version of Python 3, select Remove Python for that version.

Removing Python will take several minutes and may require to confirm in one or more additional dialog boxes.

After having removed the previous version of Python, click Finish and start over. After clicking Next, user should see a dialog box resembling the following:

Complete Chapter List

Search this Book: