HTML/CSS Parsing HTML and Searching Text

  • Thread starter Thread starter NerseC
  • Start date Start date
  • Tags Tags
    Html Text
AI Thread Summary
To parse HTML and search for specific text, such as "XXX," within any HTML tags, users can utilize Java, JavaScript, or PHP, all of which have DOM parsers available. The discussion highlights the need for clarity on where the HTML is sourced from and the intended use of the CSS attributes once retrieved. Python is suggested as a more straightforward option for this task, particularly with libraries like Beautiful Soup and regular expressions. Understanding the Document Object Model (DOM) is emphasized as essential for effectively navigating and manipulating HTML structures. Ultimately, the choice of programming language will depend on the specific requirements and context of the task.
NerseC
Messages
7
Reaction score
0
I need to parse HTML and search for some text. For example, a text containing XXX (just random input) needs to be searched in HTML. I need to parse HTML, search this XXX and get its CSS attributes. However, this XXX text can be in any HTML tags.


Java/JavaScript/PHP are all available for this task. Can anybody help me on this?
 
Technology news on Phys.org
Nerse said:
I need to parse HTML and search for some text. For example, a text containing XXX (just random input) needs to be searched in HTML. I need to parse HTML, search this XXX and get its CSS attributes. However, this XXX text can be in any HTML tags.


Java/JavaScript/PHP are all available for this task. Can anybody help me on this?

Can you clarify your requirements a little? Where is the HTML that you wish to search? Where are you outputting the results?
 
<html>
<head>
<title>asd
</title>
</head>
<body>
<div class ="abc"> xxx</div>
<div class ="yyy"> foo</div>
<div class ="zzz"> zoo</div>
</body>
</html>

for example, this is the html file. I'm searching the xxx, but I don't know if it is in a div or span, or a. In this example, it is in the div which has class abc. However, it could be in any html tags. I need to find xxx and get their css styles (e.g bold, italic, font size)
 
It's a pity you're not using Python. This is something that could be solved trivially using a combination of Python, regular expressions, and Beautiful Soup.
 
Nerse said:
<html>
<head>
<title>asd
</title>
</head>
<body>
<div class ="abc"> xxx</div>
<div class ="yyy"> foo</div>
<div class ="zzz"> zoo</div>
</body>
</html>

for example, this is the html file. I'm searching the xxx, but I don't know if it is in a div or span, or a. In this example, it is in the div which has class abc. However, it could be in any html tags. I need to find xxx and get their css styles (e.g bold, italic, font size)

That doesn't really answer my questions. Are you writing the HTML files yourself? Are they given to you in a directory somewhere, or are you getting them from a website(s)? What do you plan to do with the css info once you find it?

The answers to these questions will help you choose between using Java, javascript, or php.
 
Each language has a DOM parser, a quick Google search reveals many tutorials on how read XML or HTML tags programatically.

If you don't know what the DOM is yet, that's the first thing you should find out.
 
Thread 'Is this public key encryption?'
I've tried to intuit public key encryption but never quite managed. But this seems to wrap it up in a bow. This seems to be a very elegant way of transmitting a message publicly that only the sender and receiver can decipher. Is this how PKE works? No, it cant be. In the above case, the requester knows the target's "secret" key - because they have his ID, and therefore knows his birthdate.

Similar threads

Replies
3
Views
2K
Replies
2
Views
1K
Replies
1
Views
2K
Replies
3
Views
2K
Replies
11
Views
2K
Replies
4
Views
2K
Replies
3
Views
2K
Back
Top