Java Word Counter: Define Method & HashMap Return

prosteve037 · Feb 8, 2012

Homework Statement

I have to define part of a method that will take in a String, segment the input into words, keep track of how many of each word there are in the string, and then return a HashMap<String, Integer> that shows how many of each word there are in the string.

A separate helper class named CharacterFromFileReader "reads" the String, character-by-character, and can iterate through the string. Here's its definition:

Code:

package util.general;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Iterator;

public class CharacterFromFileReader implements Iterator<Character> {

	private final static int EOF_VALUE = -1; 

	private FileReader inputStream;
	private int lastRead;

	public CharacterFromFileReader(String path) {
		try {
			inputStream = new FileReader(path);
			read();
		} catch (FileNotFoundException e) {
			e.printStackTrace();
			finish();
		}
	}

	private void finish() {
		lastRead = EOF_VALUE;
		if (inputStream != null) {
			try {
				inputStream.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}

	private void read() {
		try {
			lastRead = inputStream.read();
		} catch (IOException e) {
			finish();
		}
	}

	@Override
	public boolean hasNext() {
		return lastRead != EOF_VALUE;
	}

	@Override
	public Character next() {
		char c = (char) lastRead;
		read();
		return c;
	}

	@Override
	public void remove() {
		throw new UnsupportedOperationException();
	}
}

Homework Equations

-----------------------------------------------------------------------

The Attempt at a Solution

Here's the code I've written so far and am confident with:

Code:

package hw3;

import java.util.HashMap;
import util.general.CharacterFromFileReader;

public class Homework3Class {
	
	public Homework3Class() {}
	
	public HashMap<String, Integer> wordCounter(String inputPath) {
		
		HashMap<String, Integer> hm = new HashMap<String, Integer>();
		
		CharacterFromFileReader cffr = new CharacterFromFileReader(inputPath);
		
		String s = new String();
		
		while(cffr.hasNext()) {
			char c = cffr.next();
			if (this.characterChecker(c)) {
				.
				.
				.
			}
			
			else if (this.characterChecker(c) == false) {
				s = new String();
			}
		}
		
		return hm; 
	}
	
	public boolean characterChecker(char c) {
		if (c != ' ' && c != '\t' && c != '\n' && c != ',' && c != '.') {
			return true;
		}
		
		else {
			return false;
		}
	}

}

Not much at all so far :P

I was thinking that in the dotted space there should be code that takes the character stored in reference c and adds it to the string s... I just don't know how :/

prosteve037 · Feb 10, 2012

Okay I worked a bit more on it since last post. Here's what I have now:

Code:

package hw3;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

import util.general.CharacterFromFileReader;

public class Homework3Class {
	
	private List<String> _list;
	
	public Homework3Class() {}
	
	public HashMap<String, Integer> wordCounter(String inputPath) {		
		HashMap<String, Integer> hm = new HashMap<String, Integer>();
		
		CharacterFromFileReader cffr = new CharacterFromFileReader(inputPath);
		
		List<String> list = new ArrayList<String>();
		_list = list;
		
		String st = new String();
		
		_list.add(st);
		
		while (cffr.hasNext()) {
			char c = cffr.next();
			
			if (this.characterChecker(c)) {
				String s = _list.get(_list.size());
				StringBuffer sb = new StringBuffer(s);
				sb.insert(s.length() - 1, c);
			}
			
			else if(this.characterChecker(c) == false) {
				this.newString();
			}
		}
		
		for (int n = 0; n < _list.size(); n++) {
			String s = _list.get(n);
			
			if (hm.containsValue(s)) {
				hm.put(s, hm.get(s));
			}
			
			else if (hm.containsValue(s) == false) {
				hm.put(s, hm.size());
			}
		}
		
		return hm; 
	}
	
	private boolean characterChecker(char c) {
		if (c != ' ' && c != '\t' && c != '\n' && c != ',' && c != '.') {
			return true;
		}
		
		else {
			return false;
		}
	}
	
	private void newString() {
		String s = new String();
		_list.add(s);
	}

}

It still doesn't green-bar when I test them though :/ I'm getting out-of-bounds exceptions, but I can't seem to find what's wrong with my code...

Mark44 · Feb 10, 2012

Do you have any indication of where in your code you're getting the exceptions? That would be helpful information.

I'm guessing that the places to look are where you are accessing the list, such as here:

Code:

String s = _list.get(_list.size());

If you have a list with size() elements in it, the indexes run from 0 through size() - 1, so by attempting to access the element at index size(), you are out of bounds of the list.

Caveat: I haven't written any Java code for lo, these many years, so I could be wrong here.

prosteve037 · Feb 11, 2012

Mark44 said:
Do you have any indication of where in your code you're getting the exceptions? That would be helpful information.

I'm guessing that the places to look are where you are accessing the list, such as here:
Code:
String s = _list.get(_list.size());
If you have a list with size() elements in it, the indexes run from 0 through size() - 1, so by attempting to access the element at index size(), you are out of bounds of the list.

Caveat: I haven't written any Java code for lo, these many years, so I could be wrong here.

Ahh yes, thank you. I changed the value of s to hold the value of _list.size() - 1 and the tests now seem to be giving feedback. Still not accomplishing the task, however

Maybe I'm not using the HashMap class correctly? Here I have the HashMap put the strings in the list; if the HashMap already has the string, I use the HashMap's put method to put the String key into the HashMap in the position where it already exists. However, this is where I think I may be thinking the wrong way.

Here's a shot of the method description for HashMap's put method:

It says there on the screenshot that the put method takes in a key and a value.

I thought that this was the index number of the key, but now I'm starting to think its actually the number of times the key exists in the HashMap.

Am I thinking the wrong way again here? Or is the value the "count" of the specified key?

Filip Larsen · Feb 11, 2012

I will recommend that you take a step back and reformulate (using pseudo-code if that is easier for you) the algorithm you want to implement. Your algorithm needs to build up words one character at a time and then store the count for each such word.

While you may of course implement such an algorithm in many ways you should be able to make an implementation that only uses a StringBuilder and a Map for state management. You probably also want to consider using one of the Character classification methods instead of your own characterChecker method.

prosteve037 · Feb 11, 2012

Filip Larsen said:

While you may of course implement such an algorithm in many ways you should be able to make an implementation that only uses a StringBuilder and a Map for state management. You probably also want to consider using one of the Character classification methods instead of your own characterChecker method.

I wish I could use one of the Character classification methods instead of the one I wrote but unfortunately I can't since we haven't covered it in lecture

prosteve037 · Feb 12, 2012

Okay. I've rewritten a bit of it (the if statement in the while loop):

Code:

package hw3;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

import util.general.CharacterFromFileReader;

public class Homework3Class {
	
	private List<String> _list;
	
	public Homework3Class() {}
	
	public HashMap<String, Integer> wordCounter(String inputPath) {		
		HashMap<String, Integer> hm = new HashMap<String, Integer>();
		
		CharacterFromFileReader cffr = new CharacterFromFileReader(inputPath);
		
		List<String> list = new ArrayList<String>();
		_list = list;
		
		String st = new String();
		
		_list.add(st);
		
		while (cffr.hasNext()) {
			char c = cffr.next();
			
			int lastStringIndex = _list.size() - 1;
			
			if (this.characterChecker(c)) {
				String s = _list.get(lastStringIndex);
				String newString = s + c;
				_list.remove(lastStringIndex);
				_list.add(newString);
			}
			
			else if(this.characterChecker(c) == false) {
				this.newString();
			}
		}
		
		for (int n = 0; n < _list.size(); n++) {
			String s = _list.get(n);
						
			if (hm.containsKey(s)) {
				hm.put(s, hm.get(s) + 1);
			}
			
			else if (hm.containsKey(s) == false) {
				hm.put(s, 1);
			}
		}
		
		return hm; 
	}
	
	private boolean characterChecker(char c) {
		if (c != ' ' && c != '\t' && c != '\n' && c != ',' && c != '.') {
			return true;
		}
		
		else {
			return false;
		}
	}
	
	private void newString() {
		String s = new String();
		_list.add(s);
	}

}

All the test results are the same now so I know that I'm almost there. What it's saying, however, is confusing :P

It says that my computed answer (the returned HashMap) has the extra key ""...

I didn't understand what this meant. I thought, "That shouldn't be happening. "" isn't a character." So, I emailed my professor earlier today and he replied that the code may be counting the space between two separator characters as an empty string... ?

I'm so confused

Filip Larsen · Feb 13, 2012

Try to trace your code when there are two or more consecutive white-space characters in the input, or when your input ends with a white-space character. (Hint: you are adding an empty string to your list for every white-space character in the input).

You should also know that your code is rather "clumsy" (like your "if-else-if" constuctions) and wasteful of resources (like creating a new string for each input character and re-evaluation of the same expression). If a person (as opposed to a test) is going to rate your code it may pay of to clean it up first.

prosteve037 · Feb 13, 2012

Filip Larsen said:

Try to trace your code when there are two or more consecutive white-space characters in the input, or when your input ends with a white-space character. (Hint: you are adding an empty string to your list for every white-space character in the input).

Ahh okay, thank you! It works now

All it needed was a hm.remove(""); method call :tongue:

Filip Larsen said:

You should also know that your code is rather "clumsy" (like your "if-else-if" constuctions) and wasteful of resources (like creating a new string for each input character and re-evaluation of the same expression). If a person (as opposed to a test) is going to rate your code it may pay of to clean it up first.

You're absolutely right, I'm going to need to "optimize" my code somehow. I'm not able to change the CharacterFromFileReader class's definition so I'm not sure how much that limits my ability to make the code any better :P Any hints?

Filip Larsen · Feb 13, 2012

prosteve037 said:

Any hints?

Things you may want to consider (in no particular order):

Change statements matching if (expression) { ... } else if (expression == false) { ... } into if (expression) { ... } else { ... }.
Use a StringBuilder to build up each word instead of appending strings.
Insert each word directly into the map instead of collecting them in a list first.
If you choose to keep the list, then define it in the method used and pass it around as a parameter if necessary instead of having it as member variable on your class (having the list as a member variable makes instances of your class thread-unsafe for no particular reason and unless you clear it before returning you are also keeping a lot of strings from being garbage collected. Of course, if your class has a lot of state to keep track of or has to track state between multiple method invocations, then member variables are the way forward).

prosteve037 · Feb 19, 2012

Thank you all for the help!

I received my next assignment in the class and I figured instead of starting an entirely new thread on it, I'd just post again on here. This works well because I think these two assignments are similar.

For this new assignment, the objective is to have the implementation return a HashMap that contains, as the values, the counts of the number of times that an author's name is read in a file. The file is passed in as an argument and read through using the same CharacterFromFileReader class that was pre-defined for us in the last assignment.

An author name is defined in this assignment as a contiguous sequence of characters that appear between the tags <AU> and </AU>.

Here's the code I have so far:

Code:

package hw4;

import java.util.HashMap;
import util.general.CharacterFromFileReader;

public class Homework4Class {

	private String _author;
	
	public Homework4Class() {}
	
	public HashMap<String, Integer> authorFinder(String inputPath) {
		HashMap<String, Integer> hm = new HashMap<String, Integer>();
		
		CharacterFromFileReader cffr = new CharacterFromFileReader(inputPath);
		
		int state = 0;
		
		while (cffr.hasNext()) {
			char c = cffr.next();
			
			switch (state) {
				case 0:
					if (c == '<') {state = 1;}
					
					break;
				
				case 1:
					if (c == 'A') {state = 2;}
					
					else if (c == '<') {}
					
					else {state = 0;}
					
					break;
				
				case 2:
					if (c == 'U') {state = 3;}
					
					else if (c == '<') {state = 1;}
					
					else {state = 0;}
					
					break;
					
				case 3:
					if (c == '>') {state = 4;}
					
					else if (c == '<') {state = 1;}
					
					else {state = 0;}
					
					break;
					
				case 4:
					if (c == '<') {state = 5;}
					
					else {_author += c;}
					
					break;
					
				case 5:
					if (c == '/') {state = 6;}
					
					else if (c == '<') {_author += c;}
					
					else {
						_author += "<" + c;
						state = 4;
					}
					
					break;
					
				case 6:
					if (c == 'A') {state = 7;}
					
					else if (c == '<') {
						_author += "/";
						state = 5;
					}
					
					else {
						_author += "/" + c;
						state = 3;
					}
					
					break;
					
				case 7:
					if (c == 'U') {state = 8;}
					
					else if (c == '<') {
						_author += "A";
						state = 5;
					}
					
					else {
						_author += "A" + c;
						state = 4;
					}
					
					break;
					
				case 8:
					if (c == '>') {state = 9;}
					
					else if (c == '<') {
						_author += "U";
						
						state = 5;
					}
					
					else {
						_author += "U" + c;
						state = 4;
					}
					
					break;
					
				case 9:
					if (hm.containsKey(_author)) {
						hm.put(_author, hm.get(_author) + 1);
					}
					
					if (!hm.containsKey(_author)) {
						hm.put(_author, 1);
					}
					
					if (c == '<') {
						state = 1;
						_author = "";
					}
					
					if (c != '<') {
						state = 0;
						_author = ""; 
					}
					
					break;
			}
		}
		
		return hm;
	}
}

Now just like the last assignment, the packaged assignment came with reference tests. Running these tests helped me identify the key problems I had with the code I'd written:

1.) The results of the failed reference tests show that my current algorithm doesn't take into account the possible instances in which we have an author name in the form of:

Example:
Charles Dickens</AU

Right now with my current algorithm, this author name would return a HashMap value of 1 for the key Charles DickensU

2.) My current code will also create the problem of the empty string, which I ran into in the last assignment as well.

----------------------------------------------------------------------------------------

My original plans to solve these problems were very inefficient and resembled my method for the last assignment; my thinking goes the same route, using another string field or an entire ArrayList to store all of the characters read from the input file.

Any help/hints would be awesome.

Thanks

Mark44 · Feb 19, 2012

I'm not sure I understand some of what you're asking. Are you saying that it's possible to have a malformed line in the file that looks like
.
.
.
Charles Dickens </AU
.
.
?

IOW, without the leading <AU> tag, and with an incomplete </AU> tag? That seems to be in conflict with the what you wrote before:

prostevep37 said:

An author name is defined in this assignment as a contiguous sequence of characters that appear between the tags <AU> and </AU>.

Instead of storing all of the characters in the file in a single ArrayList, I would be inclined to store just a single line in the file (I'm assuming that the file consists of lines of text terminated by CR/LF character pairs).

haichau6990 · Feb 19, 2012

Hey Steve look at your case 8:
case 8:
if (c == '>') {state = 9;}

else if (c == '<') {
_author += "U";

state = 5;
}

else {
_author += "U" + c; //try to change to _author += "</AU";
state = 4;
}

And at your private variable:
private String _author; // change to private String_author = "";

prosteve037 · Feb 19, 2012

haichau6990 said:

Hey Steve look at your case 8:
case 8:
if (c == '>') {state = 9;}

else if (c == '<') {
_author += "U";

state = 5;
}

else {
_author += "U" + c; //try to change to _author += "</AU";
state = 4;
}

And at your private variable:
private String _author; // change to private String_author = "";

Unfortunately, this didn't work :/ Thanks though

Mark44 said:

I'm not sure I understand some of what you're asking. Are you saying that it's possible to have a malformed line in the file that looks like
.
.
.
Charles Dickens </AU
.
.
?

IOW, without the leading <AU> tag, and with an incomplete </AU> tag? That seems to be in conflict with the what you wrote before:Instead of storing all of the characters in the file in a single ArrayList, I would be inclined to store just a single line in the file (I'm assuming that the file consists of lines of text terminated by CR/LF character pairs).

My anything between the tags <AU> and </AU> is counted as an author.

So if the CharacterFromFileReader reads the line "Charles Dickens </AU" from the input file, it would count as an author.

Does that make sense?

haichau6990 · Feb 19, 2012

case 8:
if (c == '>') {state = 9;}

else if (c == '<') {
_author += "U"; // change to _author += "</AU"
//I forgot to change this line in the last post. Take an example of <AU>Dickens</AU</AU>, //the author name is supposed to be "Dickens</AU" and after that is "<", but your code //after reach the "<", it only add U to the already Dickens

state = 5;
}

else {
_author += "U" + c; //try to change to _author += "</AU";
state = 4;
}

Mark44 · Feb 19, 2012

prosteve037 said:

Unfortunately, this didn't work :/ Thanks though

My anything between the tags <AU> and </AU> is counted as an author.

So if the CharacterFromFileReader reads the line "Charles Dickens </AU" from the input file, it would count as an author.

Does that make sense?

I understand what you are saying, but how robust does your program need to be? Can't you assume that there is some error checking on the front end where the data is entered, and the data in the file is well-formed XML?

You haven't given the problem statement for this assignment, so I don't know how fault-tolerant your code needs to be, but it seems to me that all you should have to do is look for a pair of <AU> </AU> tags, and what's in between is the author. Same for the other fields.

prosteve037 · Feb 19, 2012

Mark44 said:

I understand what you are saying, but how robust does your program need to be? Can't you assume that there is some error checking on the front end where the data is entered, and the data in the file is well-formed XML?

You haven't given the problem statement for this assignment, so I don't know how fault-tolerant your code needs to be, but it seems to me that all you should have to do is look for a pair of <AU> </AU> tags, and what's in between is the author. Same for the other fields.

Nevermind, I fixed this problem!

Code:

package hw4;

import java.util.HashMap;
import util.general.CharacterFromFileReader;

public class Homework4Class {

	private String _author;
	
	private int _tally;
	
	public Homework4Class() {}
	
	public HashMap<String, Integer> authorFinder(String inputPath) {
		HashMap<String, Integer> hm = new HashMap<String, Integer>();
		
		CharacterFromFileReader cffr = new CharacterFromFileReader(inputPath);
		
		int state = 0;
		
		while (cffr.hasNext()) {
			char c = cffr.next();
			
			switch (state) {
				case 0:
					if (c == '<') {state = 1;}
					
					break;
				
				case 1:
					if (c == 'A') {state = 2;}
					
					else if (c == '<') {}
					
					else {state = 0;}
					
					break;
				
				case 2:
					if (c == 'U') {state = 3;}
					
					else if (c == '<') {state = 1;}
					
					else {state = 0;}
					
					break;
					
				case 3:
					if (c == '>') {state = 4;}
					
					else if (c == '<') {state = 1;}
					
					else {state = 0;}
					
					break;
					
				case 4:
					if (c == '<') {
						_tally = 1;
						state = 5;
					}
					
					else {_author += c;}
					
					break;
					
				case 5:
					if (c == '/') {
						_tally = 2;
						state = 6;
					}
					
					else if (c == '<') {stringModifier();}
					
					else {
						stringModifier();
						_author += c;
						state = 4;
					}
					
					break;
					
				case 6:
					if (c == 'A') {
						_tally = 3;
						state = 7;
					}
					
					else if (c == '<') {
						stringModifier();
						state = 5;
					}
					
					else {
						stringModifier();
						_author += c;
						state = 4;
					}
					
					break;
					
				case 7:
					if (c == 'U') {
						_tally = 4;
						state = 8;
					}
					
					else if (c == '<') {
						stringModifier();
						state = 5;
					}
					
					else {
						stringModifier();
						_author += c;
						state = 4;
					}
					
					break;
					
				case 8:
					if (c == '>') {state = 9;}
					
					else if (c == '<') {
						stringModifier();
						state = 5;
					}
					
					else {
						stringModifier();
						_author += c;
						state = 4;
					}
					
					break;
					
				case 9:
					if (hm.containsKey(_author)) {
						hm.put(_author, hm.get(_author) + 1);
					}
					
					if (!hm.containsKey(_author)) {
						hm.put(_author, 1);
					}
					
					if (c == '<') {
						state = 1;
						_author = "";
					}
					
					if (c != '<') {
						state = 0;
						_author = ""; 
					}
					
					break;
			}
		}
		
		return hm;
	}
	
	private void stringModifier() {
		switch (_tally) {
			case 1:
				_author += "<";
				
			case 2:
				_author += "</";
				
			case 3:
				_author += "</A";
				
			case 4:
				_author += "</AU";
		}
		
		_tally = 1;
	}
}

I added another int field and wrote a private helper method that appended the _author string based on the value of that field.

Each time the character was found to be a "tag" character, I set the value of the int field to the corresponding character(s).

In other words, an int value of 1 would add the character "<" to the _author string, a value of 2 would add the character "/" to the _author string, a value of 3 would add the character "A" to the _author string, and so on.

Now I have to deal with the problem of the null string :grumpy:

Java Word Counter: Define Method & HashMap Return

Homework Statement

Homework Equations

The Attempt at a Solution

1. What is a Java Word Counter?

2. What does the "Define Method" mean in a Java Word Counter?

3. How does the Java Word Counter use HashMap to return the word count?

4. Can the Java Word Counter handle different languages or special characters?

5. Is the Java Word Counter case-sensitive?

Similar threads

Hot Threads

Recent Insights