HTML/CSS Parsing HTML and Searching Text

  • Thread starter Thread starter NerseC
  • Start date Start date
  • Tags Tags
    Html Text
AI Thread Summary
To parse HTML and search for specific text, such as "XXX," within any HTML tags, users can utilize Java, JavaScript, or PHP, all of which have DOM parsers available. The discussion highlights the need for clarity on where the HTML is sourced from and the intended use of the CSS attributes once retrieved. Python is suggested as a more straightforward option for this task, particularly with libraries like Beautiful Soup and regular expressions. Understanding the Document Object Model (DOM) is emphasized as essential for effectively navigating and manipulating HTML structures. Ultimately, the choice of programming language will depend on the specific requirements and context of the task.
NerseC
Messages
7
Reaction score
0
I need to parse HTML and search for some text. For example, a text containing XXX (just random input) needs to be searched in HTML. I need to parse HTML, search this XXX and get its CSS attributes. However, this XXX text can be in any HTML tags.


Java/JavaScript/PHP are all available for this task. Can anybody help me on this?
 
Technology news on Phys.org
Nerse said:
I need to parse HTML and search for some text. For example, a text containing XXX (just random input) needs to be searched in HTML. I need to parse HTML, search this XXX and get its CSS attributes. However, this XXX text can be in any HTML tags.


Java/JavaScript/PHP are all available for this task. Can anybody help me on this?

Can you clarify your requirements a little? Where is the HTML that you wish to search? Where are you outputting the results?
 
<html>
<head>
<title>asd
</title>
</head>
<body>
<div class ="abc"> xxx</div>
<div class ="yyy"> foo</div>
<div class ="zzz"> zoo</div>
</body>
</html>

for example, this is the html file. I'm searching the xxx, but I don't know if it is in a div or span, or a. In this example, it is in the div which has class abc. However, it could be in any html tags. I need to find xxx and get their css styles (e.g bold, italic, font size)
 
It's a pity you're not using Python. This is something that could be solved trivially using a combination of Python, regular expressions, and Beautiful Soup.
 
Nerse said:
<html>
<head>
<title>asd
</title>
</head>
<body>
<div class ="abc"> xxx</div>
<div class ="yyy"> foo</div>
<div class ="zzz"> zoo</div>
</body>
</html>

for example, this is the html file. I'm searching the xxx, but I don't know if it is in a div or span, or a. In this example, it is in the div which has class abc. However, it could be in any html tags. I need to find xxx and get their css styles (e.g bold, italic, font size)

That doesn't really answer my questions. Are you writing the HTML files yourself? Are they given to you in a directory somewhere, or are you getting them from a website(s)? What do you plan to do with the css info once you find it?

The answers to these questions will help you choose between using Java, javascript, or php.
 
Each language has a DOM parser, a quick Google search reveals many tutorials on how read XML or HTML tags programatically.

If you don't know what the DOM is yet, that's the first thing you should find out.
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I have a quick questions. I am going through a book on C programming on my own. Afterwards, I plan to go through something call data structures and algorithms on my own also in C. I also need to learn C++, Matlab and for personal interest Haskell. For the two topic of data structures and algorithms, I understand there are standard ones across all programming languages. After learning it through C, what would be the biggest issue when trying to implement the same data...

Similar threads

Replies
3
Views
2K
Replies
2
Views
1K
Replies
1
Views
2K
Replies
3
Views
1K
Replies
11
Views
2K
Replies
4
Views
2K
Replies
3
Views
2K
Back
Top