Python Match Numbers in Two Files & Get Results: Results of Comparing .lw & .pw Files

AI Thread Summary
The discussion centers on a Python script intended to match numbers from two files with extensions .lw and .pw, specifically targeting the identifier "X2d12G". The user, Balaji, provides examples of the data format in both files and expresses difficulty in obtaining the expected output when running the script. The output should list matched entries in a specific format, but the script currently produces no results, leading to a 0kb output file.Key points of feedback include concerns about the method of parsing the input files using fixed-width slices, which may lead to errors if the data format varies. Suggestions are made to utilize the `split()` method for more reliable parsing, as well as to consider using regular expressions for improved accuracy. Additionally, the code's repetitive structure is highlighted, indicating a need for refactoring to enhance readability and maintainability. The discussion emphasizes the importance of clarifying what output is currently being generated versus the expected results to diagnose the issue effectively.
Bala06
Messages
10
Reaction score
0
Dear Members

I would like to match numbers in two files of extensions .lw & .pw and put the results according to matching numbers.

For example, the .lw file contains data as

59880 SPC X2d12G 4714 UNK X 900B

and .pw file has

59474 SPC X2c8bG 991 ILE A 118B
59726 SPC X2cdfG 1803 SER A 168B
59876 SPC X2d11G 4055 ASP A 356B
59879 SPC X2d12G 3849 ASN A 344B

I want to match according to this number "X2d12G" and put in output as

For example like this (result):
431-hydrogen-bond-frame.dat.c.d.pw [(4714, 'UNK', 'X 900B', 59880, 'SPC', 'X2d12G', 59879, 'SPC', 'X2d12G', 4186, 'ASN', 'A 344B')]
453-hydrogen-bond-frame.dat.c.d.pw [(4714, 'UNK', 'X 900B', 59880, 'SPC', 'X2d12G', 59879, 'SPC', 'X2d12G', 4186, 'ASN', 'A 344B')]

Since the attachments is limited, I couldn't attached thos file 453-hydrogen-bond-frame_lw.txt. It also contains the same data as 431-hydrogen-bond-frame_lw.txt.

When ever I run my python code, I'm not getting the result as expected.

I'm running python script as (python water_cont.py > summary.txt)

I 'm posting the python code for your reference.

Code:
#! /usr/bin/env python
import sys, os, math, glob
#  Run as:  python water_cont.py 
#  This script will provide the summary of waters along the trajectory.  
#  Use it after the run of python_water_cont.py
#  Bala 28 May. 2011
#

def read_lig_wat(file):
    file = open (file, "r")
    data=file.readlines()
    atom_number1 = map(lambda x: int(x[0:7]), data)
    resname1 = map(lambda x: x[10:13], data)
    res_number1=map(lambda x: x[17:23], data)
    atom_number2=map(lambda x: int(x[27:36]), data)
    resname2=map(lambda x: x[37:40], data)
    res_number2= map(lambda x: x[44:50], data)
    return atom_number1, resname1, res_number1, atom_number2, resname2, res_number2 def read_prot_wat(file1):
    file1 = open (file, "r")
    data1=file.readlines()
    atom_number11 = map(lambda x: int(x[0:7]), data)
    resname11 = map(lambda x: x[10:13], data)
    res_number11=map(lambda x: x[17:23], data)
    atom_number22=map(lambda x: int(x[27:36]), data)
    resname22=map(lambda x: x[37:40], data)
    res_number22= map(lambda x: x[44:50], data)
    return atom_number11, resname11, res_number11, atom_number22, resname22, res_number22 

 
for filename in glob.glob1("/home/water", "*.lw"):
   atom_number1, resname1, res_number1, atom_number2, resname2, res_number2 =read_lig_wat(filename)
#   column_file=summary+".lw"
#   file2=open( column_file, "w")
   text=len(atom_number1)

for filename in glob.glob1("/home/water", "*.pw"):
   atom_number11, resname11, res_number11, atom_number22, resname22, res_number22 =read_lig_wat(filename)
#   column_file=filename+".lw"
#   file2=open( column_file, "w")
   text1=len(atom_number11)

   List=[]

   for i in range(text):
      for j in range(text1):
          
#         print  res_number2[i], res_number22[j]
#         if res_number2[i]==res_number11[j]:
#             print res_number1[i], res_number2[i]
         if res_number1[i]==res_number11[j] or res_number1[i]==res_number22[j]\
            or res_number2[i]==res_number11[j] or res_number2[i]==res_number22[j]:
#            print atom_number1[i], resname1[i], res_number1[i], atom_number2[i], resname2[i], res_number2[i]

            List.append((atom_number1[i], resname1[i], res_number1[i], atom_number2[i], resname2[i], res_number2[i], atom_number11[j], resname11[j], res_number11[j], atom_number22[j], resname22[j], res_number22[j]))
#            print List
            print filename, List

#             file2.write("%5i%8s%11s%8i%8s%11s%5i%8s%11s%8i%8s%11s \n" % (atom_number1[i], resname1[i], res_number1[i], atom_number2[i], resname2[i], res_number2[i], atom_number11[i], resname11[i], res_number11[i], atom_number22[i], resname22[i], res_number22[i]  ))

Kindly advice.

Many Thanks
Balaji
 

Attachments

Last edited:
Technology news on Phys.org
First off: I think using slices in this way is a very bad idea and not the way most python programmers would do it. A more sensible thing would be rather than expecting fixed-width character fields (!) to do the readlines, then for each line do a split() on each line, then this will return a list of the whitespace-delimited tokens in that line. For starters, what if ONE LINE in your file is deformed and has, say, one whitespace character too many? Second off, if I try to run the program "in my head" (haven't tried to run it on disk yet) the very first thing I notice is your sample inputs begin with a five-character ID, yet when you parse the files you first attempt to grab a seven-character token from the beginning of the string. You're sure this is correct?

Second off, even if you were to use slices in this way, I think there is something bad about your repeated "map lambda" construction. A rule of thumb: if you find yourself repeating yourself in a computer program, this is a good place to. I would be very uncomfortable if this were my program until I took that repeated map lambda construction into a separate slice_out_field(0,7, data) method. The problem is you've copy and pasted this so many times, what if there was an error in one of your copy and pastes? It would be very easy to overlook.

Third off-- you say "I'm not getting the result as expected". What result are you getting instead?

I think you should start by just rewriting this to use a more conventional text parsing method like split(), or even better a regular expression (these are easy to use in Python and well fit to your problem). You have what looks to me like error-prone code and you are trying to chase a mysterious error in it... cleaning things up is a good first step. It is probably fixable as is though if you give us some more information (what is it doing now instead of working, why is it 0:7 then 10:13 and not 0:5 and then 6:9).
 
Dear Python Users

I have to map the text "X2d12G"in two files .lw and .pw and draw output as the contents of both the files.

The small correction in the .lw file. It should be like this:
" 4714 UNK X 900B 59880 SPC X2d12G"

In the attachment the format was not correct.

Say for example like this

431-hydrogen-bond-frame.dat.c.d.pw [(4714, 'UNK', 'X 900B', 59880, 'SPC', 'X2d12G', 59879, 'SPC', 'X2d12G', 4186, 'ASN', 'A 344B')]

Now, by running the script it doesn't produces any output for me (output file size is 0kb).

Kindly advice

Many Thanks
Balaji
 
Last edited:
Thread 'Is this public key encryption?'
I've tried to intuit public key encryption but never quite managed. But this seems to wrap it up in a bow. This seems to be a very elegant way of transmitting a message publicly that only the sender and receiver can decipher. Is this how PKE works? No, it cant be. In the above case, the requester knows the target's "secret" key - because they have his ID, and therefore knows his birthdate.
I tried a web search "the loss of programming ", and found an article saying that all aspects of writing, developing, and testing software programs will one day all be handled through artificial intelligence. One must wonder then, who is responsible. WHO is responsible for any problems, bugs, deficiencies, or whatever malfunctions which the programs make their users endure? Things may work wrong however the "wrong" happens. AI needs to fix the problems for the users. Any way to...

Similar threads

Back
Top