How to Import and Manipulate Data from a Text File in Python?

  • Context: Python 
  • Thread starter Thread starter Avatrin
  • Start date Start date
  • Tags Tags
    Matrix Numpy Python
Click For Summary

Discussion Overview

The discussion revolves around methods for importing and manipulating data from a text file in Python, specifically focusing on extracting specific rows and columns to create a matrix. The scope includes practical coding approaches and considerations for data handling in Python.

Discussion Character

  • Technical explanation
  • Exploratory
  • Homework-related

Main Points Raised

  • One participant describes a method using basic file handling and numpy to read a text file, extract every p'th row, and select the second and third columns to form a matrix.
  • Another participant proposes an alternative approach using numpy's genfromtxt function to read the entire file into a numpy array, then selects the desired rows and columns.
  • There is a mention of the need to define the value of p and a note about the potential issue of m/p not being an integer in certain cases.
  • Some participants suggest using pandas for reading tables from files, indicating it as a potential alternative method.
  • One participant expresses a desire to learn pandas but indicates they have not yet done so.

Areas of Agreement / Disagreement

Participants present multiple approaches to the problem, with no consensus on a single method. There is a general agreement on the utility of numpy, but pandas is also suggested as a viable option, reflecting differing familiarity and preferences among participants.

Contextual Notes

Participants assume familiarity with Python and numpy, and there are considerations regarding the size of the text file and the implications for memory usage. The discussion does not resolve the issue of m/p being an integer.

Avatrin
Messages
242
Reaction score
6
Hi

Lets say I have a txt file with m rows with n columns of numbers of the form:

1 .12222E+01 .27300E+01 -.41442E+01 -.49391E+00
1 .80375E+00 .15953E+00 .47715E+00 -.10432E+01
2 .11046E+01 .79376E-01 -.12639E+00 -.10389E+00
2 -.95291E-02 -.54210E+01 .36199E+00 -.13710E+01
2 .16524E+00 -.27779E+01 .74098E+00 -.34125E+00

Lets say I want to take every p'th row and take the second and third columns and turn it into a \frac{m}{p}\times 2 matrix. How would I go about doing that?
 
Technology news on Phys.org
These things are very easy in Python. Below is the way I would do this, but there are many other ways. I'm assuming that you know m and p in advance. You must know p, but if you don't know m, you can read this file first and get m from len(lines). I'm also assuming that the text file is small enough that you can read the whole thing into memory, which is the easiest thing to do. If the file is too big for that, there are ways to read it line by line.
Code:
import numpy as np
m = <whatever>
p = <whatever>
data = np.zeros([m/p,2])
file = open(textfile, 'r')
lines = file.readlines()
file.close()
for i,line in enumerate(lines):
     if i%p == 0:
          items = line.split()
          data[i/p,0] = float(items[1])
          data[i/p,1] = float(items[2])
     else:
          continue
 
  • Like
Likes   Reactions: Avatrin
My approach would be the following:
Code:
import numpy as np
# Import the data as a numpy array of arrays (a matrix). The elements of the array are the rows of the matrix.
raw_data = np.genfromtxt('data.txt', delimiter=' ', dtype=float)

# Select p'th rows
p_data = raw_data[::p]

# Select the 2nd and 3rd columns
data = p_data[:,[1,2]]

Of course don't forget to put a value for p. For p=2 you'd get that data is worth
Code:
array([[ 1.2222  ,  2.73    ], [ 1.1046  ,  0.079376], [ 0.16524 , -2.7779  ]])

I don't know whether this is what you were looking for, especially because m=5 and p=2 so m/p is not an integer while I obtain a 3x2 matrix. You could even easily condense the above code into a 3 liners, if you don't mind having no comment (bad practice of coding though) and lose some readability.
 
ChrisVer said:
Can't you use pandas (http://pandas.pydata.org/) to read tables from a file?

Well, Pandas is on my list of things I have to learn. However, I am not quite there yet. :)
 

Similar threads

  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 8 ·
Replies
8
Views
5K
  • · Replies 18 ·
Replies
18
Views
4K
  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 7 ·
Replies
7
Views
5K
  • · Replies 8 ·
Replies
8
Views
2K