Why Does My Python URL Extraction Code Not Work?

AI Thread Summary
The Python URL extraction code fails because it incorrectly searches for the substring ' " ' instead of just '"', resulting in both calls to find() returning -1. To fix the issue, the search strings should be adjusted to eliminate the extra spaces. Additionally, debugging techniques, such as printing variable values and using Python's built-in debugger Pdb, can help identify problems in the code. Implementing these changes will allow the code to correctly extract and print the URL. Proper debugging skills are essential for effective coding.
doktorwho
Messages
181
Reaction score
6
Thread moved from a technical forum, so homework template is missing
I am suppose to write a code that print put the url of a link given below. The url is defined to start where the first " appears and end where the last " url appears starting from the start_link. Its actually the last project from te lecture 1 in Udacity and the forst code of mine.. but its wring haha
Here it goes:
# Write Python code that assigns to the
# variable url a string that is the value
# of the first URL that appears in a link
# tag in the string page.
# Your code should print http://udacity.com
# Make sure that if page were changed to

# page = '<a href="http://udacity.com">Hello world</a>'

# that your code still assigns the same value to the variable 'url',
# and therefore still prints the same thing.

# page = contents of a web page
page =('<div id="top_bin"><div id="top_content" class="width960">'
'<div class="udacity float-left"><a href="http://udacity.com">')

start_link = page.find('<a href=')
new_page=page[start_link:]
num_ofstart=new_page.find(' " ')
new_page1=new_page[(num_ofstart+1):]
num_ofend=new_page1.find(' " ')
url=new_page[(num_ofstart):(num_ofend)]
print(url)
It prints out only "http://ud
Whats wrong?
 
Last edited by a moderator:
Physics news on Phys.org
doktorwho said:
num_ofend=new_page1.find(' " ')
➡[/color]
url=new_page[(num_ofstart):(num_ofend)]
In between these two lines add code to print out the values of new_page, num_ofstart, and num_ofend to make sure that you and your code are operating in sync.
 
In addition to what @NascentOxygen said, you have a problem with these two lines of code:
Python:
num_ofstart=new_page.find(' " ')
.
.
.
num_ofend=new_page1.find(' " ')
In each case, the character you should be searching for is ". What you are actually doing is searching for <space>"<space>. In other words, in the argument to the find() function, you have extra space characters before and after the double-quote. The string you're searching in doesn't contain a substring of <space>"<space>, so both calls to find() are returning -1.

One more thing - when you post code, especially Python code, surround your code with code tags.
What I did above looks like this:
Python:
num_ofstart=new_page.find(' " ')
.
.
.
num_ofend=new_page1.find(' " ')
 

Similar threads

Replies
3
Views
2K
Replies
5
Views
2K
Back
Top