Python equivalent of MATLAB textscan?

In summary, the author found a way to read an entire array of floats from a text file using numpy and the genfromtxt function.
  • #1
JesseC
251
2
Is there one?

Or do I really have to write something like this:

Code:
from numpy import *

with open('file.txt','r') as f:
    #read only data, ignore headers
    lines = f.readlines()[31:]
    
    # convert strings to floats and put into arrays
    for i in xrange(len(lines)):
        s = lines[i].split()
        y1[i] = float(s[0])
        y2[i] = float(s[1])
        y3[i] = float(s[2])
 
Physics news on Phys.org
  • #2
A quick Google search suggests that - slightly to my surprise - there isn't a built in parser like textscan (similar to fscanf, for C/C++ aficionados). The canonical solution is to learn regular expressions and use the re module - a useful skill, by the way. However, if you're just reading in lists of floats, your way is good enough.

A slightly neater version of what you've done is:
Code:
with open('file.txt','r') as f:
    #read only data, ignore headers
    lines = f.readlines()[31:]
    # create the arrays (you forgot this step)
    y1=[0]*len(lines)
    y2=[0]*len(lines)
    y3=[0]*len(lines)
    # convert strings to floats and put into arrays
    for i in xrange(len(lines)):
        y1[i],y2[i],y3[i] = float(s) for s in lines[i].split()
which has the added advantage of erroring out if your assumption that there are exactly three floats per line is wrong.

A more pythonic way of doing it, if you aren't too wedded to your variable names, is:
Code:
with open('file.txt','r') as f:
    #read only data, ignore headers
    y=[[float(s) for s in line.split()] for line in f.readlines()[31:]]
which gives you a list of lists. What you called y1 is now y[0], y2 is now y[1], and y3 is y[2]. Presumably you're just going to make a numpy array anyway, so you can just transpose() if the array indices are now in the wrong order. Note that this version doesn't care if the data file isn't in the right format and will happily load a ragged list if that's what's in the file - so the first option may be better.

Note also that I haven't checked whether readlines() strips trailing newlines - you may need to do that to prevent the float() failing on the last element.

Edit: Note that I haven't actually run any of the code above - it looks right, but caveat programptor.
 
  • #3
Thanks Ibix, that's really helpful. I need to keep variable names because in reality they are not actually y1, 2 etc but are physical variables like temperature and pressure. Guess I'll stick to what I've done for the moment.
 
  • #4
Fair enough. I'd still suggest my first bit of code as a slight improvement - it gives you a bit better chance of spotting messed-up data (it'll fail if there's too much data on a line as well as too little) when you try to load it instead of when the results make no sense. If you trust your data, it makes no difference.
 
  • #5
don't know MATLAB nor textscan

but in python with numpy and its genfromtxt (generate from text) function...you can read an entire array at a time and have a way to skip heading line, trailing line, skip desired columns, assign default value to missing ones, etc...read up and see if it is something you can use.
 
  • #6
Thanks gsal that is exactly what I was looking for! It has shortened my code massively:

Code:
from numpy import *
       
# open test data
testdata = genfromtxt('file.txt',dtype='float',skip_header=31)
y1 = testdata[:,0]; y2 = testdata[:,1]; y3 = testdata[:,2]

Not sure how I missed that in my google searching, I guess the function name is kinda unusual.
 

What is the Python equivalent of MATLAB's textscan function?

The Python equivalent of MATLAB's textscan function is the numpy.loadtxt() function. This function allows you to read data from a text file and store it in a NumPy array.

How do I use the textscan function in Python?

To use the numpy.loadtxt() function in Python, you first need to import the NumPy library. Then, you can use the function by passing in the file name and any other relevant parameters, such as the delimiter or data type.

What are the advantages of using Python's textscan over MATLAB's textscan?

One advantage of using Python's numpy.loadtxt() function over MATLAB's textscan is that it is part of a larger ecosystem of scientific computing libraries, making it easier to integrate with other tools and packages. Additionally, Python is an open-source language, so there is a large and active community providing support and updates.

Can I use the textscan function in Python to read in data from a text file with multiple columns?

Yes, the numpy.loadtxt() function can handle text files with multiple columns. You can specify the delimiter in the function's parameters, and it will automatically create a multidimensional NumPy array with the data from each column.

Are there any alternative functions to textscan in Python?

Yes, there are alternative functions to numpy.loadtxt() in Python for reading in text data, such as csv.reader() and pandas.read_csv(). It is important to consider the specific needs of your project and the format of your data when choosing which function to use.

Similar threads

  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
121
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
14
Views
2K
  • Programming and Computer Science
Replies
8
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
1K
  • Programming and Computer Science
Replies
8
Views
876
  • Programming and Computer Science
Replies
2
Views
894
  • MATLAB, Maple, Mathematica, LaTeX
Replies
10
Views
2K
  • Programming and Computer Science
Replies
2
Views
911
Back
Top