Match Numbers in Two Files & Get Results: Results of Comparing .lw & .pw Files

Bala06 · May 29, 2011

Dear Members

I would like to match numbers in two files of extensions .lw & .pw and put the results according to matching numbers.

For example, the .lw file contains data as

59880 SPC X2d12G 4714 UNK X 900B

and .pw file has

59474 SPC X2c8bG 991 ILE A 118B
59726 SPC X2cdfG 1803 SER A 168B
59876 SPC X2d11G 4055 ASP A 356B
59879 SPC X2d12G 3849 ASN A 344B

I want to match according to this number "X2d12G" and put in output as

For example like this (result):
431-hydrogen-bond-frame.dat.c.d.pw [(4714, 'UNK', 'X 900B', 59880, 'SPC', 'X2d12G', 59879, 'SPC', 'X2d12G', 4186, 'ASN', 'A 344B')]
453-hydrogen-bond-frame.dat.c.d.pw [(4714, 'UNK', 'X 900B', 59880, 'SPC', 'X2d12G', 59879, 'SPC', 'X2d12G', 4186, 'ASN', 'A 344B')]

Since the attachments is limited, I couldn't attached thos file 453-hydrogen-bond-frame_lw.txt. It also contains the same data as 431-hydrogen-bond-frame_lw.txt.

When ever I run my python code, I'm not getting the result as expected.

I'm running python script as (python water_cont.py > summary.txt)

I 'm posting the python code for your reference.

Code:

#! /usr/bin/env python
import sys, os, math, glob
#  Run as:  python water_cont.py 
#  This script will provide the summary of waters along the trajectory.  
#  Use it after the run of python_water_cont.py
#  Bala 28 May. 2011
#

def read_lig_wat(file):
    file = open (file, "r")
    data=file.readlines()
    atom_number1 = map(lambda x: int(x[0:7]), data)
    resname1 = map(lambda x: x[10:13], data)
    res_number1=map(lambda x: x[17:23], data)
    atom_number2=map(lambda x: int(x[27:36]), data)
    resname2=map(lambda x: x[37:40], data)
    res_number2= map(lambda x: x[44:50], data)
    return atom_number1, resname1, res_number1, atom_number2, resname2, res_number2 def read_prot_wat(file1):
    file1 = open (file, "r")
    data1=file.readlines()
    atom_number11 = map(lambda x: int(x[0:7]), data)
    resname11 = map(lambda x: x[10:13], data)
    res_number11=map(lambda x: x[17:23], data)
    atom_number22=map(lambda x: int(x[27:36]), data)
    resname22=map(lambda x: x[37:40], data)
    res_number22= map(lambda x: x[44:50], data)
    return atom_number11, resname11, res_number11, atom_number22, resname22, res_number22 

 
for filename in glob.glob1("/home/water", "*.lw"):
   atom_number1, resname1, res_number1, atom_number2, resname2, res_number2 =read_lig_wat(filename)
#   column_file=summary+".lw"
#   file2=open( column_file, "w")
   text=len(atom_number1)

for filename in glob.glob1("/home/water", "*.pw"):
   atom_number11, resname11, res_number11, atom_number22, resname22, res_number22 =read_lig_wat(filename)
#   column_file=filename+".lw"
#   file2=open( column_file, "w")
   text1=len(atom_number11)

   List=[]

   for i in range(text):
      for j in range(text1):
          
#         print  res_number2[i], res_number22[j]
#         if res_number2[i]==res_number11[j]:
#             print res_number1[i], res_number2[i]
         if res_number1[i]==res_number11[j] or res_number1[i]==res_number22[j]\
            or res_number2[i]==res_number11[j] or res_number2[i]==res_number22[j]:
#            print atom_number1[i], resname1[i], res_number1[i], atom_number2[i], resname2[i], res_number2[i]

            List.append((atom_number1[i], resname1[i], res_number1[i], atom_number2[i], resname2[i], res_number2[i], atom_number11[j], resname11[j], res_number11[j], atom_number22[j], resname22[j], res_number22[j]))
#            print List
            print filename, List

#             file2.write("%5i%8s%11s%8i%8s%11s%5i%8s%11s%8i%8s%11s \n" % (atom_number1[i], resname1[i], res_number1[i], atom_number2[i], resname2[i], res_number2[i], atom_number11[i], resname11[i], res_number11[i], atom_number22[i], resname22[i], res_number22[i]  ))

Kindly advice.

Many Thanks
Balaji

Coin · May 30, 2011

First off: I think using slices in this way is a very bad idea and not the way most python programmers would do it. A more sensible thing would be rather than expecting fixed-width character fields (!) to do the readlines, then for each line do a split() on each line, then this will return a list of the whitespace-delimited tokens in that line. For starters, what if ONE LINE in your file is deformed and has, say, one whitespace character too many? Second off, if I try to run the program "in my head" (haven't tried to run it on disk yet) the very first thing I notice is your sample inputs begin with a five-character ID, yet when you parse the files you first attempt to grab a seven-character token from the beginning of the string. You're sure this is correct?

Second off, even if you were to use slices in this way, I think there is something bad about your repeated "map lambda" construction. A rule of thumb: if you find yourself repeating yourself in a computer program, this is a good place to. I would be very uncomfortable if this were my program until I took that repeated map lambda construction into a separate slice_out_field(0,7, data) method. The problem is you've copy and pasted this so many times, what if there was an error in one of your copy and pastes? It would be very easy to overlook.

Third off-- you say "I'm not getting the result as expected". What result are you getting instead?

I think you should start by just rewriting this to use a more conventional text parsing method like split(), or even better a regular expression (these are easy to use in Python and well fit to your problem). You have what looks to me like error-prone code and you are trying to chase a mysterious error in it... cleaning things up is a good first step. It is probably fixable as is though if you give us some more information (what is it doing now instead of working, why is it 0:7 then 10:13 and not 0:5 and then 6:9).

Bala06 · May 30, 2011

Dear Python Users

I have to map the text "X2d12G"in two files .lw and .pw and draw output as the contents of both the files.

The small correction in the .lw file. It should be like this:
" 4714 UNK X 900B 59880 SPC X2d12G"

In the attachment the format was not correct.

Say for example like this

431-hydrogen-bond-frame.dat.c.d.pw [(4714, 'UNK', 'X 900B', 59880, 'SPC', 'X2d12G', 59879, 'SPC', 'X2d12G', 4186, 'ASN', 'A 344B')]

Now, by running the script it doesn't produces any output for me (output file size is 0kb).

Kindly advice

Many Thanks
Balaji

Match Numbers in Two Files & Get Results: Results of Comparing .lw & .pw Files

Attachments

Similar threads

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

File Structure vs Data Structure

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect