Text file with all English words and their part of speech

  • Thread starter Thread starter Superposed_Cat
  • Start date Start date
  • Tags Tags
    English File Text
AI Thread Summary
The discussion centers on finding a comprehensive text file containing all English words along with their parts of speech for natural language processing (NLP) purposes. It highlights that while there are over a million words in English, most speakers use a limited vocabulary of 6,000 to 20,000 words. Suggestions for resources include the Rantionary dictionary, which provides pronunciation, and WordNet, a large lexical database that groups words into synsets based on meaning. The Brown Corpus is also recommended as a well-known resource for parts of speech tagging. Additionally, it is noted that there is no consensus among computational linguists on parts of speech, and depending on the NLP task, this data may not be necessary. The Python Natural Language Toolkit (NLTK) is mentioned as a valuable library that includes these resources.
Superposed_Cat
Messages
388
Reaction score
5
Hey all, been wanting to get into NLP (natural language processing) but I require a text file with a list of all English words (not the definitions) and a tag indicating their part of speech, I know it exists because I had it on my old laptop but I can't seem to refind it. Any help apreciated.
 
Technology news on Phys.org
Superposed_Cat said:
Hey all, been wanting to get into NLP (natural language processing) but I require a text file with a list of all English words (not the definitions) and a tag indicating their part of speech, I know it exists because I had it on my old laptop but I can't seem to refind it. Any help apreciated.
ALL the words in English? That's going to be one hell of a file. And mostly useless. Of the 1,000,000+ words in English (depending on who you believe), an average speaker has a vocab of about 6,000 to 8,000 words and a highly educated one has under 20,000 so even highly educated English speakers use less than 2% of the words in the language (and may have "receptive" knowledge of another 1% or less). I suspect that your list problably had 20,000 to 30,000 words, not "all" the words in English.
 
I won't be able to help you find your file, but if you want a dictionary with words in it https://github.com/TheBerkin/Rantionary/blob/master/Prepositions.dic is one. It has pronunciation as well.
 
Last edited by a moderator:
http://wordnet.princeton.edu/
WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations

These guys are often used as corpora for natural language, and their database is downloadable (free). Python NLTK uses this, as do a lot of other NLP libraries.
 
Last edited:
  • Like
Likes jim mcnamara
You might want to search for the 'Brown Corpus', one of the earliest best known corpus with parts of speech. I don't think any two groups of computational linguists agree on the parts of speech; you may not even need parts of speech data depending on what you're doing.
 
http://www.nltk.org/nltk_data/

That's the complete list of sources used by the Python natural language toolkit. Wordnet and Brown Corpus are in there, as are others. That's quite a good library.
 
Thread 'Is this public key encryption?'
I've tried to intuit public key encryption but never quite managed. But this seems to wrap it up in a bow. This seems to be a very elegant way of transmitting a message publicly that only the sender and receiver can decipher. Is this how PKE works? No, it cant be. In the above case, the requester knows the target's "secret" key - because they have his ID, and therefore knows his birthdate.
I tried a web search "the loss of programming ", and found an article saying that all aspects of writing, developing, and testing software programs will one day all be handled through artificial intelligence. One must wonder then, who is responsible. WHO is responsible for any problems, bugs, deficiencies, or whatever malfunctions which the programs make their users endure? Things may work wrong however the "wrong" happens. AI needs to fix the problems for the users. Any way to...
Back
Top