How Can I Parse Non-Uniform Substrings in VBA Excel?

Saladsamurai · Aug 9, 2009

This is VBA Excel:

Here is what I am trying to do. I have a MathML file saved as a .txt file. It is simialr to XML.

From the XML file, we have a bunch of text that looks something like:

<mname>Salad</mname><mrow>xyz</mrow>

I would ultimately like to have an array whose elements are the substrings:

Array(1) = <mname>
Array(2) = Salad
Array(3) = </mname>
...So far what I have done is this

Load the entire text file into one continuous string.

I have then fed the string into an array called ElementaryArray such that each individual character of the giant string is an element of the array.

Now I would like to sweep through the array, element by element, and determine the start and end positions of each substring.

If it were simply a bunch of substrings like <mmm><mmm><rrr><rrr><ooo><ooo> it would be easy enough. I could simply find the first "<" and then find its corresponding closing ">" and then restart the loop at the ">" position.

The problem is that not all of the strings start and end with the "<" & ">" characters.

I need a way to determine if the character after a ">" is another "<" or not. And then if it is not another "<" I must mark that character's position and then find the next occurrence of "<" which will be the end position (+1) of the substring that does not start with a "<".Does that all make sense

The tricky part is relating the two different cases via the counter such that I do not get any overlap.

Any ideas?

Saladsamurai · Aug 9, 2009

I am of course also open to suggestions of alternative approaches to this problem.

mXSCNT · Aug 9, 2009

Don't try to hack up a parser by yourself. Use an XML parser library (MathML is XML).

Saladsamurai · Aug 9, 2009

mXSCNT said:

Don't try to hack up a parser by yourself. Use an XML parser library (MathML is XML).

Is that your suggestion? Because that was not what I had in mind when I said I was open to suggestions.

Also, what does that mean?

mXSCNT · Aug 9, 2009

What does what mean? Maybe a couple links will help
http://en.wikipedia.org/wiki/Parsing
http://msdn.microsoft.com/en-us/library/aa163921(office.10).aspx

How Can I Parse Non-Uniform Substrings in VBA Excel?

Similar threads

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect