Python [Python] Keeping the session alive

  • Thread starter Thread starter adjacent
  • Start date Start date
  • Tags Tags
    Python
AI Thread Summary
The discussion revolves around a Python script designed to interact with a CAPTCHA system hosted on a local server. The script retrieves a CAPTCHA image from a PHP file, uses Tesseract OCR to extract the text, and then submits this text along with a manually entered number to another PHP file. The expected outcome is a generated HTML table based on the submitted data. However, the script fails to produce the table, likely due to session management issues, as it starts a new session when posting to the second PHP file. A participant suggests that a missing ".txt" extension in the system call might be causing issues with the text file generation, but it's clarified that Tesseract automatically saves the text file. Ultimately, the original poster resolves the issue by encapsulating the requests in a session context, ensuring the session remains consistent throughout the process.
adjacent
Gold Member
Messages
1,552
Reaction score
62
I am experimenting with capatcha images. I have a capatcha.php in my local host which will generate an image and that image will be put into the form

Here is my python code to get the image, extract the text in it and send them back to the form. And finally save the resulting form as html.
The form has Two fields, Number and Code. And it will return a table of things.
But it's not working. The html saved by python does not have the table. I don't see any problem with the code :confused:
Code:
import os
import requests

p = requests.session()
q = p.get('[PLAIN]http://localhost/Test/Capatcha.php')[/PLAIN] 
with open('data/a.png', 'wb') as f:
    f.write(q.content)
os.system("tesseract C:\\Users\\Me\\Desktop\\Test\\data\\a.png C:\\Users\\Me\\Desktop\\Test\\data\\a")
with open("data\\a.txt") as cap:
    capData = cap.read()
print("Capatcha line:"+capData)
num = input("Please enter the number :")
payload = {
    'Code': capData,
    'q': num
}

url = "[PLAIN]http://localhost/Test/index.php"[/PLAIN] 
r = p.post(url, data=payload)


with open("data\\log.html", "w") as file:
    log = file.write(r.text)
 
Last edited by a moderator:
Technology news on Phys.org
I have not idea what you are talking about; but, is there, by any chance, a missing extension ".txt" in the system call? That might be the reason why your a.txt file does not have what is supposed to?
 
gsal said:
I have not idea what you are talking about;
Why? Can you tell me what you didn't understand?
I have a folder in my localhost which is called 'Test'. In that folder, I have two php files: Capatcha.php and Index.php. I also have a folder called 'data' in the 'test' folder.
What the capatcha.php does is, start a session and generate a random image with numbers.

The index.php has two input fields one is called 'Code' and other is called 'q'.
What I am trying to do is, download the capatcha image from the capatcha.php and get the text in the capatcha image with an OCR engine(Tesseract).
The code in os.system() will save the resulting text in a text file called data.txt. And I will save that text in a variable called 'capData'
Then the python program will ask for a number which I will enter manually. It will be saved in a variable called 'num'
Then the python program will connect to index.php and enter the values of 'q' which will be the value for 'num' ,and 'Code' which will be the value for 'capData'.
Then If the 'Code' matches the one in the capatcha image, the php file will generate a table containing a name and the number.

My problem is that it does not generate the table and I think the reason is because when the python program connects the index.php, it is not in the capatcha session. It started a new session.

So My question is how to keep the session alive. I don't see any problem in my code above. However, it does not work.


gsal said:
is there, by any chance, a missing extension ".txt" in the system call? That might be the reason why your a.txt file does not have what is supposed to?
'Tesseract' (OCR engine) will automatically save it as a text file.



Here is a php code which does the same thing I am trying to do.
Code:
<?php
include("simple_html_dom.php");
$tmp_fname = tempnam("data", "COOKIE");

$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: ";

$curl_handle = curl_init("[PLAIN]http://localhost/Test/Capatcha.php");[/PLAIN] 
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0');
curl_setopt($curl_handle, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl_handle, CURLOPT_REFERER, '[PLAIN]http://localhost/');[/PLAIN] [/PLAIN] 
curl_setopt($curl_handle, CURLOPT_COOKIEJAR, $tmp_fname);
curl_setopt($curl_handle, CURLOPT_COOKIEFILE, $tmp_fname);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($curl_handle);

file_put_contents("data/a.png", $output);
$path = "C:/xampp/htdocs/edi/";
try
{
   exec($path."Tesseract-OCR/tesseract.exe ".$path."data/a.png ".$path."data/a", $msg);
   $capture = trim(file_get_contents("data/a.txt"));
   echo $capture;
   unlink("data/a.txt");
}
catch (Exception $e)
{
   echo $e;
}

$curl_handle = curl_init("[PLAIN]http://localhost/Test/index.php");[/PLAIN] 
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0');
curl_setopt($curl_handle, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl_handle, CURLOPT_REFERER, '[PLAIN]http://localhost/');[/PLAIN] [/PLAIN] 
curl_setopt($curl_handle, CURLOPT_COOKIEFILE, $tmp_fname);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_handle, CURLOPT_POSTFIELDS, array("captcha"=>$capture, "number"=>$_REQUEST['q'], "submit"=>""));
$output = curl_exec($curl_handle);

[B]$html = str_get_html($output);[/B]

?>
 
Last edited by a moderator:
I have solved this by putting everything in a "with requests.session() as s:"

Thanks anyway. :D
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I have a quick questions. I am going through a book on C programming on my own. Afterwards, I plan to go through something call data structures and algorithms on my own also in C. I also need to learn C++, Matlab and for personal interest Haskell. For the two topic of data structures and algorithms, I understand there are standard ones across all programming languages. After learning it through C, what would be the biggest issue when trying to implement the same data...
Back
Top