How Can I Parse Non-Uniform Substrings in VBA Excel?

  • Thread starter Thread starter Saladsamurai
  • Start date Start date
  • Tags Tags
    Logic Program
Click For Summary

Discussion Overview

The discussion revolves around parsing non-uniform substrings from a MathML file saved as a .txt file using VBA in Excel. Participants explore methods to extract specific elements from a string representation of XML-like data.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Exploratory

Main Points Raised

  • One participant describes their approach of loading a MathML text file into a string and then into an array, aiming to identify start and end positions of substrings based on the presence of "<" and ">".
  • Another participant suggests using an XML parser library instead of manually parsing the data, emphasizing that MathML is a form of XML.
  • A later reply questions the suggestion of using an XML parser, indicating a preference for alternative methods and seeking clarification on the suggestion itself.
  • Additional links are provided by another participant to resources on parsing, but the relevance of these links to the original question is not clarified.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the best approach to parsing the data, with some advocating for manual parsing and others suggesting the use of an XML parser library. The discussion remains unresolved regarding the preferred method.

Contextual Notes

Participants express uncertainty about the implications of using an XML parser versus manual parsing, and the discussion does not clarify the limitations or assumptions underlying the proposed methods.

Saladsamurai
Messages
3,009
Reaction score
7
This is VBA Excel:

Here is what I am trying to do. I have a MathML file saved as a .txt file. It is simialr to XML.

From the XML file, we have a bunch of text that looks something like:

<mname>Salad</mname><mrow>xyz</mrow>

I would ultimately like to have an array whose elements are the substrings:

Array(1) = <mname>
Array(2) = Salad
Array(3) = </mname>
...So far what I have done is this

Load the entire text file into one continuous string.

I have then fed the string into an array called ElementaryArray such that each individual character of the giant string is an element of the array.

Now I would like to sweep through the array, element by element, and determine the start and end positions of each substring.

If it were simply a bunch of substrings like <mmm><mmm><rrr><rrr><ooo><ooo> it would be easy enough. I could simply find the first "<" and then find its corresponding closing ">" and then restart the loop at the ">" position.

The problem is that not all of the strings start and end with the "<" & ">" characters.

I need a way to determine if the character after a ">" is another "<" or not. And then if it is not another "<" I must mark that character's position and then find the next occurrence of "<" which will be the end position (+1) of the substring that does not start with a "<".Does that all make sense :smile: The tricky part is relating the two different cases via the counter such that I do not get any overlap.

Any ideas?
 
Technology news on Phys.org
I am of course also open to suggestions of alternative approaches to this problem.
 
Don't try to hack up a parser by yourself. Use an XML parser library (MathML is XML).
 
mXSCNT said:
Don't try to hack up a parser by yourself. Use an XML parser library (MathML is XML).

Is that your suggestion? Because that was not what I had in mind when I said I was open to suggestions. :wink:

Also, what does that mean?
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 18 ·
Replies
18
Views
6K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 3 ·
Replies
3
Views
5K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 7 ·
Replies
7
Views
13K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
9K