Calculating Word Lengths in Fiction & Non-Fiction Books

Natasha1 · May 10, 2006

Take two books, of different authors, one fiction, one non-fiction. Choose a reasonable sample size of words from each (say 100 000 words for the fiction one and 80 000 words for the non-fiction one) and find the mean, median, modal word-length in each and standard deviation.

berkeman · May 10, 2006

Pretty easy. Just write a C program, or even just a Tcl script.

Natasha1 · May 10, 2006

berkeman said:

Pretty easy. Just write a C program, or even just a Tcl script.

Don't know anything about programing I'm affraid.

berkeman · May 10, 2006

Well how in the world are you supposed to calculate those stats? By hand?!

Maybe what they are asking is how few words can you use as your sample size in order to get those stats to within some amount of error?

I guess you could do it in Excel if you had to... mighty big spreadsheet, though.

Natasha1 · May 10, 2006

berkeman said:

Well how in the world are you supposed to calculate those stats? By hand?!

Maybe what they are asking is how few words can you use as your sample size in order to get those stats to within some amount of error?

I guess you could do it in Excel if you had to... mighty big spreadsheet, though.

ok this his how the whole question is asked:

Any help would be much much appreciated in advance :-)

Would you expect word-length in general to differ in fiction and non-fiction books?

Take two books, of different authors, one fiction, one non-fiction. Choose a reasonable sample size of words from each, and find the mean, median, modal word-length in each and standard deviation. Make it clear how you have done the various calculations without presenting detailed arithmetic.

In the light of the figures found, comment on the initial question (no formal inferential work needed) just an informed view from the figures found.

berkeman · May 10, 2006

Well, it obviously depends a lot on the books (some non-fiction books will by their nature use bigger words). But now that the multiple 100,000 sample size has been reduced, the task becomes much more do-able by hand.

Do you know how to calculate those statistics? Do you have a statistics calculator? If not, do you know how to use these functions in Excel? If not, do you have access to Excel? (just use the Help feature to show you how to enter the functions)

BTW, to get a good statistical sample without having to enter too many word sizes into your Excel spreadsheet, I'd use the close-the-eyes, flip-open-randomly, and poka-word technique for chosing about 40-50 words randomly from each book.

0rthodontist · May 10, 2006

Well, flip it open randomly and instead of poking a word choose say the fifth word. If you poke a word you would be biased towards selecting longer words.

Natasha1 · May 10, 2006

0rthodontist said:

Well, flip it open randomly and instead of poking a word choose say the fifth word. If you poke a word you would be biased towards selecting longer words.

I don't really get it. What am I suppose to do exactly? Can someone just start the problem for me to set me on the right direction :-)

Calculating Word Lengths in Fiction & Non-Fiction Books

Similar threads

Distance between a Clock's hands when the distance is increasing most rapidly

Volume with spherical coordinates

Does this series converge uniformly?

Use greedy vertex coloring algorithm to prove the upper bound of χ

Conflicting definitions of linear independence

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers