1. May 10, 2006

### Natasha1

Take two books, of different authors, one fiction, one non-fiction. Choose a reasonable sample size of words from each (say 100 000 words for the fiction one and 80 000 words for the non-fiction one) and find the mean, median, modal word-length in each and standard deviation.

2. May 10, 2006

### Staff: Mentor

Pretty easy. Just write a C program, or even just a Tcl script.

3. May 10, 2006

### Natasha1

Don't know anything about programing I'm affraid.

4. May 10, 2006

### Staff: Mentor

Well how in the world are you supposed to calculate those stats? By hand?!

Maybe what they are asking is how few words can you use as your sample size in order to get those stats to within some amount of error?

I guess you could do it in Excel if you had to.... mighty big spreadsheet, though.

5. May 10, 2006

### Natasha1

ok this his how the whole question is asked:

Any help would be much much appreciated in advance :-)

Would you expect word-length in general to differ in fiction and non-fiction books?

Take two books, of different authors, one fiction, one non-fiction. Choose a reasonable sample size of words from each, and find the mean, median, modal word-length in each and standard deviation. Make it clear how you have done the various calculations without presenting detailed arithmetic.

In the light of the figures found, comment on the initial question (no formal inferential work needed) just an informed view from the figures found.

6. May 10, 2006

### Staff: Mentor

Well, it obviously depends a lot on the books (some non-fiction books will by their nature use bigger words). But now that the multiple 100,000 sample size has been reduced, the task becomes much more do-able by hand.

Do you know how to calculate those statistics? Do you have a statistics calculator? If not, do you know how to use these functions in Excel? If not, do you have access to Excel? (just use the Help feature to show you how to enter the functions)

BTW, to get a good statistical sample without having to enter too many word sizes into your Excel spreadsheet, I'd use the close-the-eyes, flip-open-randomly, and poka-word technique for chosing about 40-50 words randomly from each book.

7. May 10, 2006

### 0rthodontist

Well, flip it open randomly and instead of poking a word choose say the fifth word. If you poke a word you would be biased towards selecting longer words.

8. May 10, 2006

### Natasha1

I don't really get it. What am I suppose to do exactly? Can someone just start the problem for me to set me on the right direction :-)