[Python] Keeping the session alive

  • Context: Python 
  • Thread starter Thread starter adjacent
  • Start date Start date
  • Tags Tags
    Python
Click For Summary

Discussion Overview

The discussion revolves around a Python script intended to interact with a local PHP application that generates captcha images. Participants explore issues related to maintaining session state while submitting form data, particularly focusing on the failure to retrieve a generated table from the PHP script.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Exploratory

Main Points Raised

  • One participant describes their Python code that downloads a captcha image, uses Tesseract to extract text, and submits this along with a user-provided number to a PHP script.
  • Another participant questions whether a missing file extension in the system call could be causing issues with the text file generated by Tesseract.
  • A participant elaborates on their setup, explaining the roles of the PHP files and the expected behavior of the captcha system, suggesting that the session may not be maintained across requests.
  • There is a mention of a PHP code snippet that achieves similar functionality using cURL, which includes session handling through cookies.
  • A later reply indicates that the original issue was resolved by encapsulating the requests in a session context manager.

Areas of Agreement / Disagreement

Participants express varying levels of understanding regarding the original problem, with some uncertainty about the specifics of the Python code and its interaction with the PHP backend. The discussion includes both suggestions for troubleshooting and a resolution that one participant found effective.

Contextual Notes

There are unresolved assumptions regarding the behavior of the captcha system and the specifics of session management between the Python script and the PHP application.

adjacent
Gold Member
Messages
1,552
Reaction score
62
I am experimenting with capatcha images. I have a capatcha.php in my local host which will generate an image and that image will be put into the form

Here is my python code to get the image, extract the text in it and send them back to the form. And finally save the resulting form as html.
The form has Two fields, Number and Code. And it will return a table of things.
But it's not working. The html saved by python does not have the table. I don't see any problem with the code :confused:
Code:
import os
import requests

p = requests.session()
q = p.get('[PLAIN]http://localhost/Test/Capatcha.php')[/PLAIN] 
with open('data/a.png', 'wb') as f:
    f.write(q.content)
os.system("tesseract C:\\Users\\Me\\Desktop\\Test\\data\\a.png C:\\Users\\Me\\Desktop\\Test\\data\\a")
with open("data\\a.txt") as cap:
    capData = cap.read()
print("Capatcha line:"+capData)
num = input("Please enter the number :")
payload = {
    'Code': capData,
    'q': num
}

url = "[PLAIN]http://localhost/Test/index.php"[/PLAIN] 
r = p.post(url, data=payload)


with open("data\\log.html", "w") as file:
    log = file.write(r.text)
 
Last edited by a moderator:
Technology news on Phys.org
I have not idea what you are talking about; but, is there, by any chance, a missing extension ".txt" in the system call? That might be the reason why your a.txt file does not have what is supposed to?
 
gsal said:
I have not idea what you are talking about;
Why? Can you tell me what you didn't understand?
I have a folder in my localhost which is called 'Test'. In that folder, I have two php files: Capatcha.php and Index.php. I also have a folder called 'data' in the 'test' folder.
What the capatcha.php does is, start a session and generate a random image with numbers.

The index.php has two input fields one is called 'Code' and other is called 'q'.
What I am trying to do is, download the capatcha image from the capatcha.php and get the text in the capatcha image with an OCR engine(Tesseract).
The code in os.system() will save the resulting text in a text file called data.txt. And I will save that text in a variable called 'capData'
Then the python program will ask for a number which I will enter manually. It will be saved in a variable called 'num'
Then the python program will connect to index.php and enter the values of 'q' which will be the value for 'num' ,and 'Code' which will be the value for 'capData'.
Then If the 'Code' matches the one in the capatcha image, the php file will generate a table containing a name and the number.

My problem is that it does not generate the table and I think the reason is because when the python program connects the index.php, it is not in the capatcha session. It started a new session.

So My question is how to keep the session alive. I don't see any problem in my code above. However, it does not work.


gsal said:
is there, by any chance, a missing extension ".txt" in the system call? That might be the reason why your a.txt file does not have what is supposed to?
'Tesseract' (OCR engine) will automatically save it as a text file.



Here is a php code which does the same thing I am trying to do.
Code:
<?php
include("simple_html_dom.php");
$tmp_fname = tempnam("data", "COOKIE");

$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: ";

$curl_handle = curl_init("[PLAIN]http://localhost/Test/Capatcha.php");[/PLAIN] 
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0');
curl_setopt($curl_handle, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl_handle, CURLOPT_REFERER, '[PLAIN]http://localhost/');[/PLAIN] [/PLAIN] 
curl_setopt($curl_handle, CURLOPT_COOKIEJAR, $tmp_fname);
curl_setopt($curl_handle, CURLOPT_COOKIEFILE, $tmp_fname);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($curl_handle);

file_put_contents("data/a.png", $output);
$path = "C:/xampp/htdocs/edi/";
try
{
   exec($path."Tesseract-OCR/tesseract.exe ".$path."data/a.png ".$path."data/a", $msg);
   $capture = trim(file_get_contents("data/a.txt"));
   echo $capture;
   unlink("data/a.txt");
}
catch (Exception $e)
{
   echo $e;
}

$curl_handle = curl_init("[PLAIN]http://localhost/Test/index.php");[/PLAIN] 
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0');
curl_setopt($curl_handle, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl_handle, CURLOPT_REFERER, '[PLAIN]http://localhost/');[/PLAIN] [/PLAIN] 
curl_setopt($curl_handle, CURLOPT_COOKIEFILE, $tmp_fname);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_handle, CURLOPT_POSTFIELDS, array("captcha"=>$capture, "number"=>$_REQUEST['q'], "submit"=>""));
$output = curl_exec($curl_handle);

[B]$html = str_get_html($output);[/B]

?>
 
Last edited by a moderator:
I have solved this by putting everything in a "with requests.session() as s:"

Thanks anyway. :D