Parsing HTML and Searching Text

  • Context: HTML/CSS 
  • Thread starter Thread starter NerseC
  • Start date Start date
  • Tags Tags
    Html Text
Click For Summary

Discussion Overview

The discussion revolves around the task of parsing HTML to search for specific text (e.g., "XXX") and retrieving its associated CSS attributes. Participants explore various programming languages available for this task, including Java, JavaScript, and PHP, while also considering alternative approaches such as Python.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Homework-related

Main Points Raised

  • One participant seeks assistance in parsing HTML to find text and retrieve CSS attributes, noting that the text could be in any HTML tag.
  • Another participant requests clarification on the requirements, asking about the source of the HTML and the intended output for the results.
  • A participant provides an example HTML structure and expresses uncertainty about the tag type containing the text, emphasizing the need to find the text and its CSS styles.
  • One participant suggests that using Python with regular expressions and Beautiful Soup would simplify the task.
  • A follow-up post reiterates the example HTML and questions the original poster about the source of the HTML files and the purpose of the CSS information.
  • Another participant mentions that each programming language has a DOM parser and suggests searching for tutorials on reading XML or HTML programmatically.
  • One participant advises that understanding the DOM is essential for this task.

Areas of Agreement / Disagreement

Participants express varying preferences for programming languages and approaches, with no consensus on the best method to achieve the task. The discussion remains unresolved regarding the specifics of the implementation.

Contextual Notes

Participants have not fully defined the context of the HTML source, the output requirements, or the specific CSS attributes of interest. There are also unresolved questions about the programming environment and the nature of the HTML files.

NerseC
Messages
7
Reaction score
0
I need to parse HTML and search for some text. For example, a text containing XXX (just random input) needs to be searched in HTML. I need to parse HTML, search this XXX and get its CSS attributes. However, this XXX text can be in any HTML tags.


Java/JavaScript/PHP are all available for this task. Can anybody help me on this?
 
Technology news on Phys.org
Nerse said:
I need to parse HTML and search for some text. For example, a text containing XXX (just random input) needs to be searched in HTML. I need to parse HTML, search this XXX and get its CSS attributes. However, this XXX text can be in any HTML tags.


Java/JavaScript/PHP are all available for this task. Can anybody help me on this?

Can you clarify your requirements a little? Where is the HTML that you wish to search? Where are you outputting the results?
 
<html>
<head>
<title>asd
</title>
</head>
<body>
<div class ="abc"> xxx</div>
<div class ="yyy"> foo</div>
<div class ="zzz"> zoo</div>
</body>
</html>

for example, this is the html file. I'm searching the xxx, but I don't know if it is in a div or span, or a. In this example, it is in the div which has class abc. However, it could be in any html tags. I need to find xxx and get their css styles (e.g bold, italic, font size)
 
It's a pity you're not using Python. This is something that could be solved trivially using a combination of Python, regular expressions, and Beautiful Soup.
 
Nerse said:
<html>
<head>
<title>asd
</title>
</head>
<body>
<div class ="abc"> xxx</div>
<div class ="yyy"> foo</div>
<div class ="zzz"> zoo</div>
</body>
</html>

for example, this is the html file. I'm searching the xxx, but I don't know if it is in a div or span, or a. In this example, it is in the div which has class abc. However, it could be in any html tags. I need to find xxx and get their css styles (e.g bold, italic, font size)

That doesn't really answer my questions. Are you writing the HTML files yourself? Are they given to you in a directory somewhere, or are you getting them from a website(s)? What do you plan to do with the css info once you find it?

The answers to these questions will help you choose between using Java, javascript, or php.
 
Each language has a DOM parser, a quick Google search reveals many tutorials on how read XML or HTML tags programatically.

If you don't know what the DOM is yet, that's the first thing you should find out.
 

Similar threads

  • · Replies 187 ·
7
Replies
187
Views
11K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
24
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K