Text file with all English words and their part of speech

Superposed_Cat · Feb 16, 2015

Hey all, been wanting to get into NLP (natural language processing) but I require a text file with a list of all English words (not the definitions) and a tag indicating their part of speech, I know it exists because I had it on my old laptop but I can't seem to refind it. Any help apreciated.

phinds · Feb 16, 2015

Superposed_Cat said:

Hey all, been wanting to get into NLP (natural language processing) but I require a text file with a list of all English words (not the definitions) and a tag indicating their part of speech, I know it exists because I had it on my old laptop but I can't seem to refind it. Any help apreciated.

ALL the words in English? That's going to be one hell of a file. And mostly useless. Of the 1,000,000+ words in English (depending on who you believe), an average speaker has a vocab of about 6,000 to 8,000 words and a highly educated one has under 20,000 so even highly educated English speakers use less than 2% of the words in the language (and may have "receptive" knowledge of another 1% or less). I suspect that your list problably had 20,000 to 30,000 words, not "all" the words in English.

TheDemx27 · Feb 16, 2015

I won't be able to help you find your file, but if you want a dictionary with words in it https://github.com/TheBerkin/Rantionary/blob/master/Prepositions.dic is one. It has pronunciation as well.

Carno Raar · Feb 16, 2015

http://wordnet.princeton.edu/

WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations

These guys are often used as corpora for natural language, and their database is downloadable (free). Python NLTK uses this, as do a lot of other NLP libraries.

ScottSalley · Feb 17, 2015

You might want to search for the 'Brown Corpus', one of the earliest best known corpus with parts of speech. I don't think any two groups of computational linguists agree on the parts of speech; you may not even need parts of speech data depending on what you're doing.

Carno Raar · Feb 18, 2015

http://www.nltk.org/nltk_data/

That's the complete list of sources used by the Python natural language toolkit. Wordnet and Brown Corpus are in there, as are others. That's quite a good library.

Text file with all English words and their part of speech

Thread 'How to connect Frontend and Backend?'

Thread 'Is this public key encryption?'

Thread 'Who is responsible for the software when AI takes over programming?'

Similar threads

Hot Threads

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Trying To Debug A Python File

Python Complaining About Python

Fortran Reading files in pre-f77 - handling end of file

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem