HTML/CSS Parsing HTML and Searching Text

  • Thread starter Thread starter NerseC
  • Start date Start date
  • Tags Tags
    Html Text
Click For Summary
To parse HTML and search for specific text, such as "XXX," within any HTML tags, users can utilize Java, JavaScript, or PHP, all of which have DOM parsers available. The discussion highlights the need for clarity on where the HTML is sourced from and the intended use of the CSS attributes once retrieved. Python is suggested as a more straightforward option for this task, particularly with libraries like Beautiful Soup and regular expressions. Understanding the Document Object Model (DOM) is emphasized as essential for effectively navigating and manipulating HTML structures. Ultimately, the choice of programming language will depend on the specific requirements and context of the task.
NerseC
Messages
7
Reaction score
0
I need to parse HTML and search for some text. For example, a text containing XXX (just random input) needs to be searched in HTML. I need to parse HTML, search this XXX and get its CSS attributes. However, this XXX text can be in any HTML tags.


Java/JavaScript/PHP are all available for this task. Can anybody help me on this?
 
Technology news on Phys.org
Nerse said:
I need to parse HTML and search for some text. For example, a text containing XXX (just random input) needs to be searched in HTML. I need to parse HTML, search this XXX and get its CSS attributes. However, this XXX text can be in any HTML tags.


Java/JavaScript/PHP are all available for this task. Can anybody help me on this?

Can you clarify your requirements a little? Where is the HTML that you wish to search? Where are you outputting the results?
 
<html>
<head>
<title>asd
</title>
</head>
<body>
<div class ="abc"> xxx</div>
<div class ="yyy"> foo</div>
<div class ="zzz"> zoo</div>
</body>
</html>

for example, this is the html file. I'm searching the xxx, but I don't know if it is in a div or span, or a. In this example, it is in the div which has class abc. However, it could be in any html tags. I need to find xxx and get their css styles (e.g bold, italic, font size)
 
It's a pity you're not using Python. This is something that could be solved trivially using a combination of Python, regular expressions, and Beautiful Soup.
 
Nerse said:
<html>
<head>
<title>asd
</title>
</head>
<body>
<div class ="abc"> xxx</div>
<div class ="yyy"> foo</div>
<div class ="zzz"> zoo</div>
</body>
</html>

for example, this is the html file. I'm searching the xxx, but I don't know if it is in a div or span, or a. In this example, it is in the div which has class abc. However, it could be in any html tags. I need to find xxx and get their css styles (e.g bold, italic, font size)

That doesn't really answer my questions. Are you writing the HTML files yourself? Are they given to you in a directory somewhere, or are you getting them from a website(s)? What do you plan to do with the css info once you find it?

The answers to these questions will help you choose between using Java, javascript, or php.
 
Each language has a DOM parser, a quick Google search reveals many tutorials on how read XML or HTML tags programatically.

If you don't know what the DOM is yet, that's the first thing you should find out.
 
Learn If you want to write code for Python Machine learning, AI Statistics/data analysis Scientific research Web application servers Some microcontrollers JavaScript/Node JS/TypeScript Web sites Web application servers C# Games (Unity) Consumer applications (Windows) Business applications C++ Games (Unreal Engine) Operating systems, device drivers Microcontrollers/embedded systems Consumer applications (Linux) Some more tips: Do not learn C++ (or any other dialect of C) as a...

Similar threads

  • · Replies 187 ·
7
Replies
187
Views
11K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
24
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K