How Can I Parse Non-Uniform Substrings in VBA Excel?

  • Thread starter Thread starter Saladsamurai
  • Start date Start date
  • Tags Tags
    Logic Program
AI Thread Summary
The discussion revolves around processing a MathML file saved as a .txt file using VBA in Excel. The user aims to extract substrings from a continuous string representation of the MathML content, specifically targeting elements structured similarly to XML. The initial approach involves loading the entire text file into a string and then converting it into an array where each character is an element. The challenge arises in identifying the start and end positions of substrings, especially when not all elements are enclosed in "<" and ">". The user seeks a method to determine if the character following a closing ">" is another opening "<", and if not, to mark that position and find the next occurrence of "<". The conversation highlights the complexity of manually parsing XML-like structures and emphasizes the recommendation to utilize an XML parser library, given that MathML is a form of XML. The user expresses openness to suggestions but clarifies that they are not looking for a DIY parser solution. Additional resources are provided to aid understanding of parsing concepts.
Saladsamurai
Messages
3,009
Reaction score
7
This is VBA Excel:

Here is what I am trying to do. I have a MathML file saved as a .txt file. It is simialr to XML.

From the XML file, we have a bunch of text that looks something like:

<mname>Salad</mname><mrow>xyz</mrow>

I would ultimately like to have an array whose elements are the substrings:

Array(1) = <mname>
Array(2) = Salad
Array(3) = </mname>
...So far what I have done is this

Load the entire text file into one continuous string.

I have then fed the string into an array called ElementaryArray such that each individual character of the giant string is an element of the array.

Now I would like to sweep through the array, element by element, and determine the start and end positions of each substring.

If it were simply a bunch of substrings like <mmm><mmm><rrr><rrr><ooo><ooo> it would be easy enough. I could simply find the first "<" and then find its corresponding closing ">" and then restart the loop at the ">" position.

The problem is that not all of the strings start and end with the "<" & ">" characters.

I need a way to determine if the character after a ">" is another "<" or not. And then if it is not another "<" I must mark that character's position and then find the next occurrence of "<" which will be the end position (+1) of the substring that does not start with a "<".Does that all make sense :smile: The tricky part is relating the two different cases via the counter such that I do not get any overlap.

Any ideas?
 
Technology news on Phys.org
I am of course also open to suggestions of alternative approaches to this problem.
 
Don't try to hack up a parser by yourself. Use an XML parser library (MathML is XML).
 
mXSCNT said:
Don't try to hack up a parser by yourself. Use an XML parser library (MathML is XML).

Is that your suggestion? Because that was not what I had in mind when I said I was open to suggestions. :wink:

Also, what does that mean?
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I had a Microsoft Technical interview this past Friday, the question I was asked was this : How do you find the middle value for a dataset that is too big to fit in RAM? I was not able to figure this out during the interview, but I have been look in this all weekend and I read something online that said it can be done at O(N) using something called the counting sort histogram algorithm ( I did not learn that in my advanced data structures and algorithms class). I have watched some youtube...

Similar threads

Back
Top