Forums

Local file path

I'm trying to get text processing working with NLTK. I have successfully gotten a simple program running to use a text file from the web, and I can't figure out how to get it to point at one of my own files. I've looked at the PA advice about setting a path to one of my own files, but if I start the address with /home I get "ValueError: unknown url type:"--I think because it doesn't start with http?

Here's the code. It works fine if the URL is, say, a Project Gutenberg text file address. How can I substitute one of my uploaded files?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/usr/bin/python2.7

import nltk, re, pprint from nltk import word_tokenize import urllib2

response = urllib2.urlopen("/home/siopold/nltk_playing/amrampuns.txt") raw = response.read().decode("utf-8-sig").encode("utf-8")

print type(raw) print raw[:74]

tokens = word_tokenize(raw) print type(tokens) print tokens[:10] 
text = nltk.Text(tokens)

print text.concordance("pasta")

urllib is a library for accessing urls, not files. You need to pass it a valid url, not a file path.

Use the standard Python open() function to open the file.

Awesome. Thanks.