Java I want to better write this JavaScript procedure

AI Thread Summary
The discussion revolves around optimizing a JavaScript function, text2words, which extracts words from a string based on character definitions. Key points include the inefficiency of adding characters to strings one at a time, which can lead to performance issues due to dynamic memory allocation. Suggestions include using a substring approach instead of concatenation to improve efficiency. The is_letter function's limitations are highlighted, particularly its inability to handle Unicode characters effectively, suggesting a need for a more robust solution. The conversation also touches on the immutability of JavaScript strings, explaining that each addition creates a new string, which can be costly in terms of memory and processing time. Overall, while optimization is acknowledged as interesting, the practicality of such efforts is questioned, especially for modern computing capabilities.
Jamin2112
Messages
973
Reaction score
12
As you may have figured out, I'm obsessed with spending exorbitant amounts of time making optimizations to my code. Can anyone help me cut down on the number of comparisons in the following text2words function? The intent of the function should be easy to figure out from the comments.

Code:
		function is_letter(c)
		{
		    /* Returns whether a character c is in the set {'a', 'b', ..., 'z', 'A', 'B', ..., 'Z'} 
		    */
		    return ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'));
		    
		}
		
		function text2words(S)
		{
		    /* Given a string S, return an array A of all the words
		       in S, a word being defined as a sequence of consecutive
		       characters in the set {'a', 'b', ..., 'z', 'A', 'B', ..., 'Z'}.
		    */
		    var A = new Array();
		    var thisword = new String();
		    var i = 0, j = S.length;
		    while (i < j)
		    {
		       if (is_letter(S[i]))
		       {
		           while (i < j && is_letter(S[i]))
                        thisword += S[i++];
		           A.push(thisword);
		           thisword = "";
		       }
		       else
		       {
		           ++i;
		       }
		    }
		    return A;
		}
 
Technology news on Phys.org
Intellectually, this kind of optimization can be interesting, but in practical terms, unless this code is going to run on ZILLIONS of records, ever night and needs to be done by 6am, you are wasting time optimizing it, given the speed of modern computers. I used to do exactly the same thing just on general principles 'cause I grew up with computers when they were slow, but I gave it up years ago as a waste of time.
 
I agree optimizing code can be rather pointless, but the best trick is to learn to write reasonably optimal code first time around. Most programmers of a certain age learned to do that from necessity!

Adding characters to the string "thisword" one at a time could be slow, and hammer dynamic memory allocation if the implementation isn't very clever. It doesn't complicate the code much to search for the end of the word and then deal with the substring of S[] just once.

If your isletter() function is only working on 8-bit characters, the quickest way is to set up an array of true/false flags with 256 elements, instead of doing a function call, up to 4 comparisons, and some logical operations.

If you are reading Unicode text, your isletter function is completely broken anyway. There are of "letters" that are not encoded in 7-bit ascii, even in European languages.
 
AlephZero said:
Adding characters to the string "thisword" one at a time could be slow

What faster way is there of adding characters to the end of a string? JavaScript's string class doesn't have any kind of push() function.
 
Jamin2112 said:
That accomplishes the same task, but why is it more efficient than string addition? Is the implementation different?

Javascript strings are effectively immutable (I say "effectively" because pedants might jump on a definition of "immutable" which doesn't quite apply to Javascriot.)

So every time you add one character to a string, you actually create a new string. For short strings, you lose because allocating the new memory area is expensive compared with copying a few bytes of data. For long strings, you lose because you are repeatedly copying large amounts of data, which not only eats up CPU clock cycles, but also eats up the CPU's fast memory cache.
 

Similar threads

Replies
9
Views
1K
Replies
4
Views
1K
Replies
10
Views
3K
Replies
9
Views
2K
Replies
6
Views
3K
Replies
3
Views
2K
Replies
3
Views
3K
Back
Top