Forums

using NLTK

I want to make a project which use NLTK module (pos_tag). But I get the following error while using nltk.pos_tag():

Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not found. Please use the NLTK Downloader to obtain the resource:

nltk.download()

I try NLTK download but I dont know which package is needed by nltk.pos_tag(). Using nltk.download('all') also doesn't solve the problem since I use beginner free account which only has 512 MB space. Any solution? Thanks

That's very odd. I just created a simple Bottle app with a view like this:

@route('/')
def hello_world():
    words= " this is a test sentence"
    return str(nltk.word_tokenize(words))

...and it returns ['this', 'is', 'a', 'test', 'sentence'], as you'd expect. Could you give a bit more details about what exactly you're doing in your Django view?

I just realized that the problem is the nltk.pos_tag(), not nltk.word_tokenize(). Maybe I forgot to reload the web so that the nltk.pos_tag() code is not yet 'comment'ed. Now I have successfully display the result using word_tokenize, but still fail using nltk.pos_tag().. Here is the updated view of my application:

# Create your views here.
from django.shortcuts import render
from django import forms

import nltk

class DiscourseForm(forms.Form):
    Input_discourse= forms.CharField(widget= forms.Textarea)

def pos_tagger(myText):
    tempText= nltk.word_tokenize(myText)
    tokens= nltk.pos_tag(tempText)
    return tokens

def discourse_req(request):
    if request.method=='POST':
        form= DiscourseForm(request.POST)
        if form.is_valid():
            txt= form.cleaned_data['Input_discourse']
            txt_pos_tag=pos_tagger(txt)
            context= {'result': txt_pos_tag}
            return render(request, 'result.html', context)
    else:
        form= DiscourseForm()
        return render(request, 'discourse.html',{'form': form})

Here is the error message:

LookupError at /discourse/result/

**********************************************************************
  Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not
  found.  Please use the NLTK Downloader to obtain the resource:
  >>> nltk.download()
  Searched in:
    - '/home/salacceovanz/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

Request Method:     POST
Request URL:    http://salacceovanz.pythonanywhere.com/discourse/result/
Django Version:     1.3.7
Exception Type:     LookupError
Exception Value:

**********************************************************************
  Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not
  found.  Please use the NLTK Downloader to obtain the resource:
  >>> nltk.download()
  Searched in:
    - '/home/salacceovanz/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

Exception Location:     /usr/local/lib/python2.7/dist-packages/nltk/data.py in find, line 467
Python Executable:  /usr/local/bin/uwsgi
Python Version:     2.7.5
Python Path:

['/var/www',
 '.',
 '',
 '/usr/local/lib/python2.7/dist-packages/setuptools-5.4.2-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/snappy-2.2-py2.7-linux-x86_64.egg',
 '/usr/local/lib/python2.7/dist-packages/cypari-1.1-py2.7-linux-x86_64.egg',
 '/usr/local/lib/python2.7/dist-packages/spherogram-1.3-py2.7-linux-x86_64.egg',
 '/usr/local/lib/python2.7/dist-packages/pypng-0.0.17-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/plink-1.7-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/decorator-3.4.0-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/Orange_Text-1.2a1-py2.7-linux-x86_64.egg',
 '/usr/local/lib/python2.7/dist-packages/Orange-2.7.5-py2.7-linux-x86_64.egg',
 '/usr/local/lib/python2.7/dist-packages/matplotlib-1.3.1-py2.7-linux-x86_64.egg',
 '/usr/local/lib/python2.7/dist-packages/backports.ssl_match_hostname-3.4.0.2-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/certifi-14.05.14-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/pyhdf-0.8.3-py2.7-linux-x86_64.egg',
 '/usr/local/lib/python2.7/dist-packages/texcaller-0-py2.7-linux-x86_64.egg',
 '/usr/local/lib/python2.7/dist-packages',
 '/var/www',
 '/usr/lib/python2.7',
 '/usr/lib/python2.7/plat-x86_64-linux-gnu',
 '/usr/lib/python2.7/lib-tk',
 '/usr/lib/python2.7/lib-old',
 '/usr/lib/python2.7/lib-dynload',
 '/usr/local/lib/python2.7/dist-packages/PIL',
 '/usr/lib/python2.7/dist-packages',
 '/usr/lib/pymodules/python2.7',
 '/home/salacceovanz']

Server time:    Fri, 26 Sep 2014 09:16:41 -0500

The only thing I can think of is that it is something to do with the actual value that you're passing in to word_tokenize. So the tests that you're doing on the console don't need the resource, but whatever is being passed in through the web app needs to load the tokenizer.

I have edited my post. Sorry, I just found that the cause of error was the nltk.pos_tag() not nltk.word_tokenize().. nltk.pos_tag() is the one which yield lookupError. Any solution? Thanks for the feedback.

I have downloaded the resource called maxent_treebank_pos_tagger using nltk.download() and successfully used nltk.pos_tag() in my project..problem solved, thanks for your concern. :)