Guidance for a beginner to extract data from a website

  • Thread starter Thread starter Mr.Husky
  • Start date Start date
Click For Summary

Discussion Overview

The discussion revolves around a beginner's goal to extract data from a website related to a competitive exam ranking system. The participant seeks guidance on whether it is feasible to find the name of a person based on their rank and what skills are necessary to achieve this, particularly in the context of computer science and programming.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Homework-related

Main Points Raised

  • One participant suggests that the most effective methods to find the name associated with a rank are to either ask the marking body or inquire among students, noting that the rank might only refer to the participant themselves.
  • Another participant raises concerns about the feasibility of trying random names to find a match, highlighting potential website restrictions after multiple failed attempts.
  • It is mentioned that the total number of students who participated in the exam is over 137,000, complicating the search for a specific name.
  • Some participants discuss the possibility of using a public API if available, but note that most websites do not offer this option, leading to the need for web scraping techniques.
  • BeautifulSoup is mentioned as a common tool in Python for web scraping, indicating a potential learning path for the original poster.
  • One participant expresses that hacking is not a primary focus in computer science, which may imply ethical considerations in data extraction.
  • The original poster acknowledges the need to learn Python and expresses interest in pursuing this to achieve their goal.

Areas of Agreement / Disagreement

Participants express differing views on the feasibility and ethics of trying to extract data from the website. There is no consensus on the best approach to achieve the original poster's goal, and the discussion remains unresolved regarding the ethical implications of data extraction.

Contextual Notes

The discussion includes limitations regarding the assumptions about the website's design, the availability of public APIs, and the ethical considerations of data scraping. These factors remain unresolved.

Who May Find This Useful

Individuals interested in web scraping, data extraction techniques, and beginners in computer science and programming may find this discussion relevant.

Mr.Husky
Gold Member
Messages
89
Reaction score
28
TL;DR
Guidance to extract data from a website for a beginner.
Hello!
I just completed my high school and about to major in computer science and engineering. I thought it will be better if I create a goal to keep myself interested on the field. It is simple, concrete and I think it is doable. And I need someone to guide me cause I know nothing about CS.

My goal is to find name of a person whose "rank" in a competitive exam is known. That's it. Let me expand on it. Recently, the exam conducting body released results based on "names". That means you don't have to enter any other details or verify yourself to see your or any other's result. You can know it just by knowing full name. And they provide some data related to self's rank. Now, I got 9307 rank in this exam. And the data mentioned, " no. Of students with same rank, boys- 0 and girls- 1". My goal is to find who got that rank. If you know the name, you just enter it and see the rank. If I know the rank, can I conversely find the name? Is it possible? Well I know nothing about web applications. Do you think is it doable? If so, how to approach it? What skills do I need to know? If you know how to do this task, please don't mention the process. But guide me so that I can do it myself. I recently opened a book, it said, type print("hello world!") In python.and boom I got the same words down the line. Then i stopped learning programming. I didn't found it any exciting. Maybe this task may teach me something.

Thank you!
Ganesh kumara.
 
Technology news on Phys.org
The most effective ways of doing what you want are to either :

a) ask the marking body who belongs to that ranking, or

b) inquire amongst the students being ranked to see who matches.

However, bear in mind that the phrase "no. of students with same rank" may not mean "no. of other students with same rank", ie 1(one) person is in that rank : presumably you.
 
So you have no idea what the names are of the others who took the exam, and want to keep trying random names until you find a score equal to your 9307?

If the website is designed well, it will lock you out after you have tried 3-4 random names with no match to the database.
 
  • Like
Likes   Reactions: Vanadium 50
berkeman said:
So you have no idea what the names are of the others who took the exam, and want to keep trying random names until you find a score equal to your 9307?

If the website is designed well, it will lock you out after you have tried 3-4 random names with no match to the database.
Well that's not the case sir. I don't know whether it is ethical or not but I checked results of more than 30 people since I know their names. ( Some are from exam hall, some from my college).

The problem is trying random names doesn't work because total number of students participated is 137,000+ .
 
hmmm27 said:
The most effective ways of doing what you want are to either :

a) ask the marking body who belongs to that ranking, or

b) inquire amongst the students being ranked to see who matches.

However, bear in mind that the phrase "no. of students with same rank" may not mean "no. of other students with same rank", ie 1(one) person is in that rank : presumably you.
For option b, the total student count participated is 137,000+.

Thanks I rechecked the analytics they provided and it said, "No. of Girls (Equal your Rank)=1" since I am a boy, there must be a girl with the same rank. But my interest is not in figuring out who is that but to understand what I can do in computer science and what I can't do. I just got this idea and want to know is it possible to conversely find the data from a website?
 
Mr.Husky said:
is it possible to conversely find the data from a website?
If you're lucky and the website has a public API, you can just use that.

Most websites don't, though, so your only option other than manually browsing is to scrape the data--write a program to automatically download web pages and extract data from the html. My usual go-to in Python for doing that is BeautifulSoup.
 
  • Informative
Likes   Reactions: Mr.Husky
Mr.Husky said:
But my interest is not in figuring out who is that but to understand what I can do in computer science and what I can't do.
Hacking is generally not considered a main component of computer sciences.
 
  • Like
Likes   Reactions: berkeman, Vanadium 50 and Mr.Husky
PeterDonis said:
If you're lucky and the website has a public API, you can just use that.

Most websites don't, though, so your only option other than manually browsing is to scrape the data--write a program to automatically download web pages and extract data from the html. My usual go-to in Python for doing that is BeautifulSoup.
So I have to learn python now. Thanks for mentioning BeautifulSoup. Just got to know about it. So I will just learn how to code in python and maybe after a few months, I will get to know who got the same rank.
 
  • Skeptical
Likes   Reactions: berkeman
This is really creepy.

Thread closed at least temporarily for moderator discussion
 
  • Like
Likes   Reactions: Vanadium 50 and berkeman

Similar threads

  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 43 ·
2
Replies
43
Views
7K
  • · Replies 10 ·
Replies
10
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
Replies
4
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
Replies
2
Views
3K
  • · Replies 15 ·
Replies
15
Views
3K