Calculating Word Lengths in Fiction & Non-Fiction Books

  • Thread starter Thread starter Natasha1
  • Start date Start date
  • Tags Tags
    Books Fiction
Click For Summary
SUMMARY

This discussion focuses on calculating word lengths in fiction and non-fiction books using statistical methods. Participants suggest using a sample size of 100,000 words for fiction and 80,000 for non-fiction to determine mean, median, modal word lengths, and standard deviation. They recommend using programming languages like C or Tcl, or tools like Excel for calculations. A technique for random sampling of words is also proposed to avoid bias in word selection.

PREREQUISITES
  • Basic understanding of statistical concepts such as mean, median, mode, and standard deviation.
  • Familiarity with programming in C or Tcl for automated calculations.
  • Proficiency in using Excel for statistical functions and data analysis.
  • Knowledge of random sampling techniques to ensure unbiased data collection.
NEXT STEPS
  • Learn how to implement statistical calculations in C programming.
  • Explore Tcl scripting for data analysis tasks.
  • Study Excel functions for calculating mean, median, mode, and standard deviation.
  • Research random sampling methods to improve data collection accuracy.
USEFUL FOR

This discussion is beneficial for data analysts, educators, writers, and anyone interested in understanding word usage patterns in literature through statistical analysis.

Natasha1
Messages
494
Reaction score
9
Take two books, of different authors, one fiction, one non-fiction. Choose a reasonable sample size of words from each (say 100 000 words for the fiction one and 80 000 words for the non-fiction one) and find the mean, median, modal word-length in each and standard deviation.
 
Physics news on Phys.org
Pretty easy. Just write a C program, or even just a Tcl script.
 
berkeman said:
Pretty easy. Just write a C program, or even just a Tcl script.

Don't know anything about programing I'm affraid.
 
Well how in the world are you supposed to calculate those stats? By hand?!

Maybe what they are asking is how few words can you use as your sample size in order to get those stats to within some amount of error?

I guess you could do it in Excel if you had to... mighty big spreadsheet, though.
 
berkeman said:
Well how in the world are you supposed to calculate those stats? By hand?!

Maybe what they are asking is how few words can you use as your sample size in order to get those stats to within some amount of error?

I guess you could do it in Excel if you had to... mighty big spreadsheet, though.

ok this his how the whole question is asked:

Any help would be much much appreciated in advance :-)

Would you expect word-length in general to differ in fiction and non-fiction books?

Take two books, of different authors, one fiction, one non-fiction. Choose a reasonable sample size of words from each, and find the mean, median, modal word-length in each and standard deviation. Make it clear how you have done the various calculations without presenting detailed arithmetic.

In the light of the figures found, comment on the initial question (no formal inferential work needed) just an informed view from the figures found.
 
Well, it obviously depends a lot on the books (some non-fiction books will by their nature use bigger words). But now that the multiple 100,000 sample size has been reduced, the task becomes much more do-able by hand.

Do you know how to calculate those statistics? Do you have a statistics calculator? If not, do you know how to use these functions in Excel? If not, do you have access to Excel? (just use the Help feature to show you how to enter the functions)

BTW, to get a good statistical sample without having to enter too many word sizes into your Excel spreadsheet, I'd use the close-the-eyes, flip-open-randomly, and poka-word technique for chosing about 40-50 words randomly from each book.
 
Well, flip it open randomly and instead of poking a word choose say the fifth word. If you poke a word you would be biased towards selecting longer words.
 
0rthodontist said:
Well, flip it open randomly and instead of poking a word choose say the fifth word. If you poke a word you would be biased towards selecting longer words.

I don't really get it. What am I suppose to do exactly? Can someone just start the problem for me to set me on the right direction :-)
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 10 ·
Replies
10
Views
2K
Replies
3
Views
4K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 21 ·
Replies
21
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K