Java Text Analysis Script: Count Word Frequency & Sentences | Get Help Now"

  • Context: Java 
  • Thread starter Thread starter Franco
  • Start date Start date
  • Tags Tags
    Java
Click For Summary

Discussion Overview

The discussion revolves around creating a Java application to analyze text files by counting word frequencies and the number of sentences. Participants explore methods for reading sentences and words from a text file and generating an output file with the results, while addressing challenges related to handling punctuation and multiple lines in the input file.

Discussion Character

  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant seeks help with a Java script to read sentences and words from a text file, aiming to output a new file with word frequencies and sentence counts.
  • Another participant suggests using the StringTokenizer class for parsing the text.
  • A different participant points out that while StringTokenizer is functional, its use is discouraged in favor of the split method or regex for new code.
  • One participant shares their experience with StringTokenizer, recommending it for separating sentences and words, while also advising the removal of punctuation marks from words.
  • Another participant agrees that StringTokenizer works well but reiterates the recommendation to use split or regex as preferred methods.

Areas of Agreement / Disagreement

Participants express differing opinions on the use of StringTokenizer versus the split method or regex, indicating a lack of consensus on the best approach for text parsing in this context.

Contextual Notes

Participants note challenges related to handling punctuation and the structure of the input text file, which contains multiple lines, but do not resolve these issues.

Franco
Messages
12
Reaction score
0
hello everyone, thanks for taking ur time reading this po
i need help with something to do with java, designing an application to read sentences & words from a txt file, and create a new txt file, with all the words (in single) from the original txt file, next to each word, containing the frequency of how often each word appeared in the txt file and the number of sentences appeared. Assuming each sentence is paused with a full-stop. commas, question-marks, etc can be ignored.

Example:
Input file:
This is a simple simple example test. Another test.

Output file:
this 1 1
is 1 1
a 1 1
simple 2 1
test 2 2
example 1 1
another 1 1


so far i only have written my script as...


import java.io.*;
import java.util.*;

public class Analysis {

public static void main(String args[]) throws IOException {

File inputFile = null;
File outputFile = null;


inputFile = new File("Analysis_output.txt");
outputFile = new File("Analysis_source.txt");

FileReader in = new FileReader(inputFile);
FileWriter out = new FileWriter(outputFile);

int c;




in.close();
out.close();
}
}



i'm not sure how to convert sentences into arraylists (use charAt, seeking for fullstop?)
converting each words from a sentence into a sub-arraylist (use charAt, seeking for spaces in between?)

and the original txt file contains more than 1 line...
not all sentences stuffed into 1 line



THX FOR READING
 
Technology news on Phys.org
Use the StringTokenizer class.
 
so-crates said:
Use the StringTokenizer class.
From the http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html :
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

:smile:
 
Last edited by a moderator:
abhishek said:
From the http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html :


:smile:

I've been using StringTokenizer myself with jsdk1.5 and it works fine. You just initialize it with the String and with the separator and the class does the job for you.
As suggestion use ". " as separator for sentences and " " for words. Also to improve your functionality during the separation of words, if the word contains a punctuation mark such as "." "," "?" etc. in the end, remove it.
 
Last edited by a moderator:
ramollari said:
I've been using StringTokenizer myself with jsdk1.5 and it works fine. You just initialize it with the String and with the separator and the class does the job for you.


Yes, it will work fine. The text I quoted implies that clearly. I'm only pointing out that Sun would rather have you use split or regex. :smile:
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
15K
  • · Replies 5 ·
Replies
5
Views
9K