Java Word Counter: Define Method & HashMap Return

  • Comp Sci
  • Thread starter prosteve037
  • Start date
  • Tags
    Counter Java
In summary: I think I have an idea of what you mean by using a StringBuilder but I'm not too sure about what you mean by using a Map for state management. Right now I have it so that the String s is stored in the Map if the Map does not already contain the String s. But I think this is wrong...In summary, the conversation is about a homework assignment where the student has to define part of a method that takes in a String, segments it into words, keeps track of the number of each word, and returns a HashMap<String, Integer> that shows the count of each word. The student has written some code but is still getting out-of-bounds exceptions and is unsure if they are using the HashMap correctly. The
  • #1
prosteve037
110
3

Homework Statement


I have to define part of a method that will take in a String, segment the input into words, keep track of how many of each word there are in the string, and then return a HashMap<String, Integer> that shows how many of each word there are in the string.

A separate helper class named CharacterFromFileReader "reads" the String, character-by-character, and can iterate through the string. Here's its definition:

Code:
package util.general;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Iterator;

public class CharacterFromFileReader implements Iterator<Character> {

	private final static int EOF_VALUE = -1; 

	private FileReader inputStream;
	private int lastRead;

	public CharacterFromFileReader(String path) {
		try {
			inputStream = new FileReader(path);
			read();
		} catch (FileNotFoundException e) {
			e.printStackTrace();
			finish();
		}
	}

	private void finish() {
		lastRead = EOF_VALUE;
		if (inputStream != null) {
			try {
				inputStream.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}

	private void read() {
		try {
			lastRead = inputStream.read();
		} catch (IOException e) {
			finish();
		}
	}

	@Override
	public boolean hasNext() {
		return lastRead != EOF_VALUE;
	}

	@Override
	public Character next() {
		char c = (char) lastRead;
		read();
		return c;
	}

	@Override
	public void remove() {
		throw new UnsupportedOperationException();
	}
}

Homework Equations



-----------------------------------------------------------------------

The Attempt at a Solution



Here's the code I've written so far and am confident with:

Code:
package hw3;

import java.util.HashMap;
import util.general.CharacterFromFileReader;

public class Homework3Class {
	
	public Homework3Class() {}
	
	public HashMap<String, Integer> wordCounter(String inputPath) {
		
		HashMap<String, Integer> hm = new HashMap<String, Integer>();
		
		CharacterFromFileReader cffr = new CharacterFromFileReader(inputPath);
		
		String s = new String();
		
		while(cffr.hasNext()) {
			char c = cffr.next();
			if (this.characterChecker(c)) {
				.
				.
				.
			}
			
			else if (this.characterChecker(c) == false) {
				s = new String();
			}
		}
		
		return hm; 
	}
	
	public boolean characterChecker(char c) {
		if (c != ' ' && c != '\t' && c != '\n' && c != ',' && c != '.') {
			return true;
		}
		
		else {
			return false;
		}
	}

}

Not much at all so far :P

I was thinking that in the dotted space there should be code that takes the character stored in reference c and adds it to the string s... I just don't know how :/
 
Physics news on Phys.org
  • #2
Okay I worked a bit more on it since last post. Here's what I have now:

Code:
package hw3;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

import util.general.CharacterFromFileReader;

public class Homework3Class {
	
	private List<String> _list;
	
	public Homework3Class() {}
	
	public HashMap<String, Integer> wordCounter(String inputPath) {		
		HashMap<String, Integer> hm = new HashMap<String, Integer>();
		
		CharacterFromFileReader cffr = new CharacterFromFileReader(inputPath);
		
		List<String> list = new ArrayList<String>();
		_list = list;
		
		String st = new String();
		
		_list.add(st);
		
		while (cffr.hasNext()) {
			char c = cffr.next();
			
			if (this.characterChecker(c)) {
				String s = _list.get(_list.size());
				StringBuffer sb = new StringBuffer(s);
				sb.insert(s.length() - 1, c);
			}
			
			else if(this.characterChecker(c) == false) {
				this.newString();
			}
		}
		
		for (int n = 0; n < _list.size(); n++) {
			String s = _list.get(n);
			
			if (hm.containsValue(s)) {
				hm.put(s, hm.get(s));
			}
			
			else if (hm.containsValue(s) == false) {
				hm.put(s, hm.size());
			}
		}
		
		return hm; 
	}
	
	private boolean characterChecker(char c) {
		if (c != ' ' && c != '\t' && c != '\n' && c != ',' && c != '.') {
			return true;
		}
		
		else {
			return false;
		}
	}
	
	private void newString() {
		String s = new String();
		_list.add(s);
	}

}

It still doesn't green-bar when I test them though :/ I'm getting out-of-bounds exceptions, but I can't seem to find what's wrong with my code...
 
  • #3
Do you have any indication of where in your code you're getting the exceptions? That would be helpful information.

I'm guessing that the places to look are where you are accessing the list, such as here:
Code:
String s = _list.get(_list.size());
If you have a list with size() elements in it, the indexes run from 0 through size() - 1, so by attempting to access the element at index size(), you are out of bounds of the list.

Caveat: I haven't written any Java code for lo, these many years, so I could be wrong here.
 
  • #4
Mark44 said:
Do you have any indication of where in your code you're getting the exceptions? That would be helpful information.

I'm guessing that the places to look are where you are accessing the list, such as here:
Code:
String s = _list.get(_list.size());
If you have a list with size() elements in it, the indexes run from 0 through size() - 1, so by attempting to access the element at index size(), you are out of bounds of the list.

Caveat: I haven't written any Java code for lo, these many years, so I could be wrong here.

Ahh yes, thank you. I changed the value of s to hold the value of _list.size() - 1 and the tests now seem to be giving feedback. Still not accomplishing the task, however :frown:

Maybe I'm not using the HashMap class correctly? Here I have the HashMap put the strings in the list; if the HashMap already has the string, I use the HashMap's put method to put the String key into the HashMap in the position where it already exists. However, this is where I think I may be thinking the wrong way.

Here's a shot of the method description for HashMap's put method:

HashMap.jpg


It says there on the screenshot that the put method takes in a key and a value.

I thought that this was the index number of the key, but now I'm starting to think its actually the number of times the key exists in the HashMap.

Am I thinking the wrong way again here? Or is the value the "count" of the specified key?
 
  • #5
I will recommend that you take a step back and reformulate (using pseudo-code if that is easier for you) the algorithm you want to implement. Your algorithm needs to build up words one character at a time and then store the count for each such word.

While you may of course implement such an algorithm in many ways you should be able to make an implementation that only uses a StringBuilder and a Map for state management. You probably also want to consider using one of the Character classification methods instead of your own characterChecker method.
 
  • #6
Filip Larsen said:
While you may of course implement such an algorithm in many ways you should be able to make an implementation that only uses a StringBuilder and a Map for state management. You probably also want to consider using one of the Character classification methods instead of your own characterChecker method.

I wish I could use one of the Character classification methods instead of the one I wrote but unfortunately I can't since we haven't covered it in lecture :frown:
 
  • #7
Okay. I've rewritten a bit of it (the if statement in the while loop):

Code:
package hw3;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

import util.general.CharacterFromFileReader;

public class Homework3Class {
	
	private List<String> _list;
	
	public Homework3Class() {}
	
	public HashMap<String, Integer> wordCounter(String inputPath) {		
		HashMap<String, Integer> hm = new HashMap<String, Integer>();
		
		CharacterFromFileReader cffr = new CharacterFromFileReader(inputPath);
		
		List<String> list = new ArrayList<String>();
		_list = list;
		
		String st = new String();
		
		_list.add(st);
		
		while (cffr.hasNext()) {
			char c = cffr.next();
			
			int lastStringIndex = _list.size() - 1;
			
			if (this.characterChecker(c)) {
				String s = _list.get(lastStringIndex);
				String newString = s + c;
				_list.remove(lastStringIndex);
				_list.add(newString);
			}
			
			else if(this.characterChecker(c) == false) {
				this.newString();
			}
		}
		
		for (int n = 0; n < _list.size(); n++) {
			String s = _list.get(n);
						
			if (hm.containsKey(s)) {
				hm.put(s, hm.get(s) + 1);
			}
			
			else if (hm.containsKey(s) == false) {
				hm.put(s, 1);
			}
		}
		
		return hm; 
	}
	
	private boolean characterChecker(char c) {
		if (c != ' ' && c != '\t' && c != '\n' && c != ',' && c != '.') {
			return true;
		}
		
		else {
			return false;
		}
	}
	
	private void newString() {
		String s = new String();
		_list.add(s);
	}

}

All the test results are the same now so I know that I'm almost there. What it's saying, however, is confusing :P

It says that my computed answer (the returned HashMap) has the extra key ""...

I didn't understand what this meant. I thought, "That shouldn't be happening. "" isn't a character." So, I emailed my professor earlier today and he replied that the code may be counting the space between two separator characters as an empty string... ?

I'm so confused :frown:
 
Last edited:
  • #8
Try to trace your code when there are two or more consecutive white-space characters in the input, or when your input ends with a white-space character. (Hint: you are adding an empty string to your list for every white-space character in the input).

You should also know that your code is rather "clumsy" (like your "if-else-if" constuctions) and wasteful of resources (like creating a new string for each input character and re-evaluation of the same expression). If a person (as opposed to a test) is going to rate your code it may pay of to clean it up first.
 
  • #9
Filip Larsen said:
Try to trace your code when there are two or more consecutive white-space characters in the input, or when your input ends with a white-space character. (Hint: you are adding an empty string to your list for every white-space character in the input).

Ahh okay, thank you! It works now :smile: All it needed was a hm.remove(""); method call :tongue:

Filip Larsen said:
You should also know that your code is rather "clumsy" (like your "if-else-if" constuctions) and wasteful of resources (like creating a new string for each input character and re-evaluation of the same expression). If a person (as opposed to a test) is going to rate your code it may pay of to clean it up first.

You're absolutely right, I'm going to need to "optimize" my code somehow. I'm not able to change the CharacterFromFileReader class's definition so I'm not sure how much that limits my ability to make the code any better :P Any hints?
 
  • #10
prosteve037 said:
Any hints?

Things you may want to consider (in no particular order):

  • Change statements matching if (expression) { ... } else if (expression == false) { ... } into if (expression) { ... } else { ... }.
  • Use a StringBuilder to build up each word instead of appending strings.
  • Insert each word directly into the map instead of collecting them in a list first.
  • If you choose to keep the list, then define it in the method used and pass it around as a parameter if necessary instead of having it as member variable on your class (having the list as a member variable makes instances of your class thread-unsafe for no particular reason and unless you clear it before returning you are also keeping a lot of strings from being garbage collected. Of course, if your class has a lot of state to keep track of or has to track state between multiple method invocations, then member variables are the way forward).
 
  • #11
Thank you all for the help! :smile:

I received my next assignment in the class and I figured instead of starting an entirely new thread on it, I'd just post again on here. This works well because I think these two assignments are similar.

For this new assignment, the objective is to have the implementation return a HashMap that contains, as the values, the counts of the number of times that an author's name is read in a file. The file is passed in as an argument and read through using the same CharacterFromFileReader class that was pre-defined for us in the last assignment.

An author name is defined in this assignment as a contiguous sequence of characters that appear between the tags <AU> and </AU>.

Here's the code I have so far:

Code:
package hw4;

import java.util.HashMap;
import util.general.CharacterFromFileReader;

public class Homework4Class {

	private String _author;
	
	public Homework4Class() {}
	
	public HashMap<String, Integer> authorFinder(String inputPath) {
		HashMap<String, Integer> hm = new HashMap<String, Integer>();
		
		CharacterFromFileReader cffr = new CharacterFromFileReader(inputPath);
		
		int state = 0;
		
		while (cffr.hasNext()) {
			char c = cffr.next();
			
			switch (state) {
				case 0:
					if (c == '<') {state = 1;}
					
					break;
				
				case 1:
					if (c == 'A') {state = 2;}
					
					else if (c == '<') {}
					
					else {state = 0;}
					
					break;
				
				case 2:
					if (c == 'U') {state = 3;}
					
					else if (c == '<') {state = 1;}
					
					else {state = 0;}
					
					break;
					
				case 3:
					if (c == '>') {state = 4;}
					
					else if (c == '<') {state = 1;}
					
					else {state = 0;}
					
					break;
					
				case 4:
					if (c == '<') {state = 5;}
					
					else {_author += c;}
					
					break;
					
				case 5:
					if (c == '/') {state = 6;}
					
					else if (c == '<') {_author += c;}
					
					else {
						_author += "<" + c;
						state = 4;
					}
					
					break;
					
				case 6:
					if (c == 'A') {state = 7;}
					
					else if (c == '<') {
						_author += "/";
						state = 5;
					}
					
					else {
						_author += "/" + c;
						state = 3;
					}
					
					break;
					
				case 7:
					if (c == 'U') {state = 8;}
					
					else if (c == '<') {
						_author += "A";
						state = 5;
					}
					
					else {
						_author += "A" + c;
						state = 4;
					}
					
					break;
					
				case 8:
					if (c == '>') {state = 9;}
					
					else if (c == '<') {
						_author += "U";
						
						state = 5;
					}
					
					else {
						_author += "U" + c;
						state = 4;
					}
					
					break;
					
				case 9:
					if (hm.containsKey(_author)) {
						hm.put(_author, hm.get(_author) + 1);
					}
					
					if (!hm.containsKey(_author)) {
						hm.put(_author, 1);
					}
					
					if (c == '<') {
						state = 1;
						_author = "";
					}
					
					if (c != '<') {
						state = 0;
						_author = ""; 
					}
					
					break;
			}
		}
		
		return hm;
	}
}

Now just like the last assignment, the packaged assignment came with reference tests. Running these tests helped me identify the key problems I had with the code I'd written:

1.) The results of the failed reference tests show that my current algorithm doesn't take into account the possible instances in which we have an author name in the form of:

Example:
Charles Dickens</AU

Right now with my current algorithm, this author name would return a HashMap value of 1 for the key Charles DickensU

2.) My current code will also create the problem of the empty string, which I ran into in the last assignment as well.

----------------------------------------------------------------------------------------

My original plans to solve these problems were very inefficient and resembled my method for the last assignment; my thinking goes the same route, using another string field or an entire ArrayList to store all of the characters read from the input file.

Any help/hints would be awesome.

Thanks
 
  • #12
I'm not sure I understand some of what you're asking. Are you saying that it's possible to have a malformed line in the file that looks like
.
.
.
Charles Dickens </AU
.
.
?

IOW, without the leading <AU> tag, and with an incomplete </AU> tag? That seems to be in conflict with the what you wrote before:
prostevep37 said:
An author name is defined in this assignment as a contiguous sequence of characters that appear between the tags <AU> and </AU>.

Instead of storing all of the characters in the file in a single ArrayList, I would be inclined to store just a single line in the file (I'm assuming that the file consists of lines of text terminated by CR/LF character pairs).
 
  • #13
Hey Steve look at your case 8:
case 8:
if (c == '>') {state = 9;}

else if (c == '<') {
_author += "U";

state = 5;
}

else {
_author += "U" + c; //try to change to _author += "</AU";
state = 4;
}

And at your private variable:
private String _author; // change to private String_author = "";
 
  • #14
haichau6990 said:
Hey Steve look at your case 8:
case 8:
if (c == '>') {state = 9;}

else if (c == '<') {
_author += "U";

state = 5;
}

else {
_author += "U" + c; //try to change to _author += "</AU";
state = 4;
}

And at your private variable:
private String _author; // change to private String_author = "";

Unfortunately, this didn't work :/ Thanks though

Mark44 said:
I'm not sure I understand some of what you're asking. Are you saying that it's possible to have a malformed line in the file that looks like
.
.
.
Charles Dickens </AU
.
.
?

IOW, without the leading <AU> tag, and with an incomplete </AU> tag? That seems to be in conflict with the what you wrote before:Instead of storing all of the characters in the file in a single ArrayList, I would be inclined to store just a single line in the file (I'm assuming that the file consists of lines of text terminated by CR/LF character pairs).

My anything between the tags <AU> and </AU> is counted as an author.

So if the CharacterFromFileReader reads the line "Charles Dickens </AU" from the input file, it would count as an author.

Does that make sense?
 
  • #15
case 8:
if (c == '>') {state = 9;}

else if (c == '<') {
_author += "U"; // change to _author += "</AU"
//I forgot to change this line in the last post. Take an example of <AU>Dickens</AU</AU>, //the author name is supposed to be "Dickens</AU" and after that is "<", but your code //after reach the "<", it only add U to the already Dickens

state = 5;
}

else {
_author += "U" + c; //try to change to _author += "</AU";
state = 4;
}
 
  • #16
prosteve037 said:
Unfortunately, this didn't work :/ Thanks though



My anything between the tags <AU> and </AU> is counted as an author.

So if the CharacterFromFileReader reads the line "Charles Dickens </AU" from the input file, it would count as an author.

Does that make sense?
I understand what you are saying, but how robust does your program need to be? Can't you assume that there is some error checking on the front end where the data is entered, and the data in the file is well-formed XML?

You haven't given the problem statement for this assignment, so I don't know how fault-tolerant your code needs to be, but it seems to me that all you should have to do is look for a pair of <AU> </AU> tags, and what's in between is the author. Same for the other fields.
 
  • #17
Mark44 said:
I understand what you are saying, but how robust does your program need to be? Can't you assume that there is some error checking on the front end where the data is entered, and the data in the file is well-formed XML?

You haven't given the problem statement for this assignment, so I don't know how fault-tolerant your code needs to be, but it seems to me that all you should have to do is look for a pair of <AU> </AU> tags, and what's in between is the author. Same for the other fields.

Nevermind, I fixed this problem! :smile:

Code:
package hw4;

import java.util.HashMap;
import util.general.CharacterFromFileReader;

public class Homework4Class {

	private String _author;
	
	private int _tally;
	
	public Homework4Class() {}
	
	public HashMap<String, Integer> authorFinder(String inputPath) {
		HashMap<String, Integer> hm = new HashMap<String, Integer>();
		
		CharacterFromFileReader cffr = new CharacterFromFileReader(inputPath);
		
		int state = 0;
		
		while (cffr.hasNext()) {
			char c = cffr.next();
			
			switch (state) {
				case 0:
					if (c == '<') {state = 1;}
					
					break;
				
				case 1:
					if (c == 'A') {state = 2;}
					
					else if (c == '<') {}
					
					else {state = 0;}
					
					break;
				
				case 2:
					if (c == 'U') {state = 3;}
					
					else if (c == '<') {state = 1;}
					
					else {state = 0;}
					
					break;
					
				case 3:
					if (c == '>') {state = 4;}
					
					else if (c == '<') {state = 1;}
					
					else {state = 0;}
					
					break;
					
				case 4:
					if (c == '<') {
						_tally = 1;
						state = 5;
					}
					
					else {_author += c;}
					
					break;
					
				case 5:
					if (c == '/') {
						_tally = 2;
						state = 6;
					}
					
					else if (c == '<') {stringModifier();}
					
					else {
						stringModifier();
						_author += c;
						state = 4;
					}
					
					break;
					
				case 6:
					if (c == 'A') {
						_tally = 3;
						state = 7;
					}
					
					else if (c == '<') {
						stringModifier();
						state = 5;
					}
					
					else {
						stringModifier();
						_author += c;
						state = 4;
					}
					
					break;
					
				case 7:
					if (c == 'U') {
						_tally = 4;
						state = 8;
					}
					
					else if (c == '<') {
						stringModifier();
						state = 5;
					}
					
					else {
						stringModifier();
						_author += c;
						state = 4;
					}
					
					break;
					
				case 8:
					if (c == '>') {state = 9;}
					
					else if (c == '<') {
						stringModifier();
						state = 5;
					}
					
					else {
						stringModifier();
						_author += c;
						state = 4;
					}
					
					break;
					
				case 9:
					if (hm.containsKey(_author)) {
						hm.put(_author, hm.get(_author) + 1);
					}
					
					if (!hm.containsKey(_author)) {
						hm.put(_author, 1);
					}
					
					if (c == '<') {
						state = 1;
						_author = "";
					}
					
					if (c != '<') {
						state = 0;
						_author = ""; 
					}
					
					break;
			}
		}
		
		return hm;
	}
	
	private void stringModifier() {
		switch (_tally) {
			case 1:
				_author += "<";
				
			case 2:
				_author += "</";
				
			case 3:
				_author += "</A";
				
			case 4:
				_author += "</AU";
		}
		
		_tally = 1;
	}
}

I added another int field and wrote a private helper method that appended the _author string based on the value of that field.

Each time the character was found to be a "tag" character, I set the value of the int field to the corresponding character(s).

In other words, an int value of 1 would add the character "<" to the _author string, a value of 2 would add the character "/" to the _author string, a value of 3 would add the character "A" to the _author string, and so on.

Now I have to deal with the problem of the null string :grumpy:
 

1. What is a Java Word Counter?

A Java Word Counter is a program that counts the number of occurrences of each word in a given text or string. It allows for efficient analysis and statistics on the use of words in a document.

2. What does the "Define Method" mean in a Java Word Counter?

The "Define Method" in a Java Word Counter refers to the process of creating a function or procedure that performs a specific task within the program. In this case, it would be the method used to count the words in a given text.

3. How does the Java Word Counter use HashMap to return the word count?

The Java Word Counter uses a HashMap data structure to store the words and their corresponding counts. This allows for efficient retrieval and updating of word counts, resulting in a faster and more accurate word counting process.

4. Can the Java Word Counter handle different languages or special characters?

Yes, the Java Word Counter can handle different languages and special characters as long as they are supported by the Java programming language. It uses a built-in character encoding system to process and count these characters accurately.

5. Is the Java Word Counter case-sensitive?

By default, the Java Word Counter is case-sensitive, meaning that it will count words with different capitalization as separate words. However, this can be changed by modifying the method to convert all words to lowercase before counting.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
5
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
12
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
2
Views
949
  • Engineering and Comp Sci Homework Help
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
7
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
2
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
12
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
11
Views
2K
Back
Top