Why Does My Python URL Extraction Code Not Work?

doktorwho · Nov 3, 2016

I am suppose to write a code that print put the url of a link given below. The url is defined to start where the first " appears and end where the last " url appears starting from the start_link. Its actually the last project from te lecture 1 in Udacity and the forst code of mine.. but its wring haha
Here it goes:
# Write Python code that assigns to the
# variable url a string that is the value
# of the first URL that appears in a link
# tag in the string page.
# Your code should print http://udacity.com
# Make sure that if page were changed to

# page = '<a href="http://udacity.com">Hello world</a>'

# that your code still assigns the same value to the variable 'url',
# and therefore still prints the same thing.

# page = contents of a web page
page =('<div id="top_bin"><div id="top_content" class="width960">'
'<div class="udacity float-left"><a href="http://udacity.com">')

start_link = page.find('<a href=')
new_page=page[start_link:]
num_ofstart=new_page.find(' " ')
new_page1=new_page[(num_ofstart+1):]
num_ofend=new_page1.find(' " ')
url=new_page[(num_ofstart):(num_ofend)]
print(url)
It prints out only "http://ud
Whats wrong?

NascentOxygen · Nov 3, 2016

doktorwho said:

num_ofend=new_page1.find(' " ')
➡[/color]
url=new_page[(num_ofstart):(num_ofend)]

In between these two lines add code to print out the values of new_page, num_ofstart, and num_ofend to make sure that you and your code are operating in sync.

Mark44 · Nov 3, 2016

In addition to what @NascentOxygen said, you have a problem with these two lines of code:

Python:

num_ofstart=new_page.find(' " ')
.
.
.
num_ofend=new_page1.find(' " ')

In each case, the character you should be searching for is ". What you are actually doing is searching for <space>"<space>. In other words, in the argument to the find() function, you have extra space characters before and after the double-quote. The string you're searching in doesn't contain a substring of <space>"<space>, so both calls to find() are returning -1.

One more thing - when you post code, especially Python code, surround your code with code tags.
What I did above looks like this:

Python:

num_ofstart=new_page.find(' " ')
.
.
.
num_ofend=new_page1.find(' " ')

Mark44 · Nov 3, 2016

When you're writing code, an important skill to develop is learning how to use a debugger. Python has a built-in debugger module, Pdb. It's somewhat primitive, but it's useful enough. I wrote a couple of Insights articles on this debugger last year.
https://www.physicsforums.com/insights/simple-python-debugging-pdb-part-1/
https://www.physicsforums.com/insights/simple-python-debugging-pdb-part-2/

Using this debugger I was able to get your code working.

Why Does My Python URL Extraction Code Not Work?

Thread 'How Do I Draw This Shear and Moment Diagram?'

Similar threads

Hot Threads

Engineering Why is my output current so low in this Transconductance Amplifier cell?

LTspice: Implementing a Single Balanced BJT Mixer

Max water pressure allowable on solar panels

Spiral scissor lift statics

Final project ideas using Noether's theorem in simulation class

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective