Parsing HTML and Searching Text

  • Thread starter NerseC
  • Start date
  • #1
7
0

Main Question or Discussion Point

I need to parse HTML and search for some text. For example, a text containing XXX (just random input) needs to be searched in HTML. I need to parse HTML, search this XXX and get its CSS attributes. However, this XXX text can be in any HTML tags.


Java/JavaScript/PHP are all available for this task. Can anybody help me on this?
 

Answers and Replies

  • #2
gabbagabbahey
Homework Helper
Gold Member
5,002
6
I need to parse HTML and search for some text. For example, a text containing XXX (just random input) needs to be searched in HTML. I need to parse HTML, search this XXX and get its CSS attributes. However, this XXX text can be in any HTML tags.


Java/JavaScript/PHP are all available for this task. Can anybody help me on this?
Can you clarify your requirements a little? Where is the HTML that you wish to search? Where are you outputting the results?
 
  • #3
7
0
<html>
<head>
<title>asd
</title>
</head>
<body>
<div class ="abc"> xxx</div>
<div class ="yyy"> foo</div>
<div class ="zzz"> zoo</div>
</body>
</html>

for example, this is the html file. I'm searching the xxx, but I don't know if it is in a div or span, or a. In this example, it is in the div which has class abc. However, it could be in any html tags. I need to find xxx and get their css styles (e.g bold, italic, font size)
 
  • #4
216
1
It's a pity you're not using Python. This is something that could be solved trivially using a combination of Python, regular expressions, and Beautiful Soup.
 
  • #5
gabbagabbahey
Homework Helper
Gold Member
5,002
6
<html>
<head>
<title>asd
</title>
</head>
<body>
<div class ="abc"> xxx</div>
<div class ="yyy"> foo</div>
<div class ="zzz"> zoo</div>
</body>
</html>

for example, this is the html file. I'm searching the xxx, but I don't know if it is in a div or span, or a. In this example, it is in the div which has class abc. However, it could be in any html tags. I need to find xxx and get their css styles (e.g bold, italic, font size)
That doesn't really answer my questions. Are you writing the HTML files yourself? Are they given to you in a directory somewhere, or are you getting them from a website(s)? What do you plan to do with the css info once you find it?

The answers to these questions will help you choose between using Java, javascript, or php.
 
  • #6
191
3
Each language has a DOM parser, a quick Google search reveals many tutorials on how read XML or HTML tags programatically.

If you don't know what the DOM is yet, that's the first thing you should find out.
 

Related Threads on Parsing HTML and Searching Text

Replies
7
Views
3K
Replies
17
Views
4K
Replies
1
Views
2K
Replies
15
Views
1K
  • Last Post
Replies
19
Views
942
  • Last Post
Replies
3
Views
9K
  • Last Post
Replies
4
Views
2K
  • Last Post
Replies
7
Views
2K
  • Last Post
Replies
1
Views
1K
Top