Can't read strange mixed format text file in Matlab

In summary, there could be several reasons for not being able to read a mixed format text file in Matlab, such as non-standard characters or formatting errors. To fix this issue, you can try using the "importdata" function instead of "textread" or converting the file to a different format. It is possible to convert the file to a standard format using the "regexp" or "fprintf" functions, or by reading it as a CSV file with the "csvread" function. However, there is a limit to the size of a text file that Matlab can read, so it may be necessary to break the file into smaller chunks or use alternative methods such as "fscanf", "readtable", or "fileread".
  • #1
madtraveller
28
0

Homework Statement



I have to read a text file with fixed format as follows
The first column is supposed to be year, 2nd is month and 3rd is day
The file has following format: 4d 2d 2d f9.4 f9.4 f9.4 f9.4 4d 4d 4d 4d 4d 4d

Code:
2000 218   0.4546   0.2394   0.0761   0.1167  55  58   1   1   1   1
2000 226  -9.9999  -9.9999   0.3701   0.7276 -99 -99 100 100 100 100
2000 3 5   0.4571   0.2612   0.3657   0.8069  54  56 100 100 100 100
2000 313  -9.9999  -9.9999   0.3310   0.7816 -99 -99 100 100 100 100
2000 321   0.5156   0.2806   0.3777   0.8762  56  56  97  97  98  98
2000 329  -9.9999  -9.9999   0.4171   1.0047 -99 -99 100 100 100 100
2000 4 6   0.5190   0.3154   0.4273   1.0269  58  54 100 100 100 100
2000 414  -9.9999  -9.9999   0.4521   1.1319 -99 -99 100 100 100 100
2000 422   0.5845   0.3109   0.4627   1.1363  56  56 100 100 100 100

Homework Equations





The Attempt at a Solution



I tried to use textscan or fscanf but it just didn't work. The program always had problem with the first 3 column

E.g.
Code:
clc
clear all
fid = fopen('data.txt','r'); 
data1 = fscanf(fid, '%4d %2d %2d %9.4f %9.4f %9.4f %9.4f %4d %4d %4d %4d %4d %4d')
data2 = fscanf(fid, '%d %d %f %f %f %f %d %d %d %d %d %d')
data3 = fscanf(fid, '%4d%2d%2d%9.4f%9.4f%9.4f%9.4f%4d%4d%4d%4d%4d%4d')
fclose(fid);

Plz suggest me with a better solutions. Thank you very much

madtraveller
 
Physics news on Phys.org
  • #2
What's going on with rows 3 and 7? Instead of a 3-digit number in the second column like all the others, these have two 1-digit numbers.

So the number of elements on each line is not consistent.

Also, why are you using 3 fscanf statements?
 
  • #3
P.S. You don't need to specify the number of digits when reading, e.g. you can use %d instead of %4d.
 
  • #4
jbunniii, thank you for your answer

As I stated: The first column is supposed to be year, 2nd is month and 3rd is day
So The 1st row is 2000 Feb 8th
The 2nd row is 2000 Feb 26th
The 3rd row is 2000 March 5th
...
The 7th row is 2000 April 7th

So the format is fixed: 4d 2d 2d
That's the nasty thing about this file format :)

I tried 3 fscanf statements to see which one worked. Apparently neither of them did

madtraveller
 
  • #5
OK, thanks for the clarification.

Try the following:

data = fscanf(fid, '%d %1d%2d %f %f %f %f %d %d %d %d %d %d');
 
  • #6
P.S. That will read all the data into a single column vector. If you would rather read it into a matrix of the dimensions of the original file, try this:

data = fscanf(fid, '%d %1d%2d %f %f %f %f %d %d %d %d %d %d', [13,inf]);

This will reverse your rows and columns, so follow it with

data = data';

to get the same dimensions as the file.
 
  • #7
Thank you so much jbunniii. It works perfectly now

Have a great weekend

madtraveller
 
  • #8
I've just realized that the code suggested above didn't work correctly if I have a longer data like this. Again the first column is Year (4d format), 2nd column is Month (2d) and 3rd column is Day (2d)

Code:
1948 9 1    0.0000    5.3940    0.0292   35.4944   19.2500
1948 9 2    0.0000    5.3600    0.0292   36.0056   16.7556
1948 9 3    0.1000    5.3250    0.0292   35.9722   17.0000
1948 9 4    0.4400    5.2900    0.0292   36.7500   17.4556
1948 9 5    0.0000    5.2550    0.0255   37.1611   17.2167
1948 9 6    0.2900    5.2190    0.0255   35.9556   19.5778
1948 9 7    3.3200    5.1830    0.0255   35.9056   20.8333
1948 9 8    6.3700    5.1460    0.0255   34.6500   20.8778
1948 9 9   23.0600    5.1100    0.0371   25.6667   17.8500
1948 910    2.6800    5.0720    0.0398   23.1889   16.7889
1948 911    0.1500    5.0350    0.0371   27.3222   15.7556
1948 912    0.0000    4.9970    0.0318   30.1833   14.0278
1948 913    0.1000    4.9590    0.0292   31.0444   16.0889
1948 914    0.0000    4.9210    0.0255   28.8167   19.4833
1948 915    0.0800    4.8820    0.0255   31.3000   18.2389
1948 916    0.9500    4.8440    0.0255   31.8278   15.5722
1948 917    6.7100    4.8050    0.0292   31.1611   18.2667
1948 918    1.7100    4.7650    0.0292   29.8278   20.2389
1948 919    0.5500    4.7260    0.0292   30.6389   20.0222
1948 920    0.0000    4.6860    0.0255   32.1167   19.3500
1948 921    5.5700    4.6460    0.0255   32.9111   19.0333
1948 922    3.0300    4.6060    0.0255   32.3833   18.4056
1948 923    0.2700    4.5660    0.0255   32.2278   17.2389
1948 924    0.0000    4.5260    0.0255   32.6889   17.2500
1948 925    0.0000    4.4850    0.0255   31.5056   16.0833
1948 926    0.0000    4.4440    0.0223   27.6333   12.4778
1948 927    0.0000    4.4040    0.0191   26.4000    9.7000
1948 928    0.0000    4.3630    0.0223   26.4556    8.4333
1948 929    0.0000    4.3220    0.0223   28.6778    9.8444
1948 930    0.0000    4.2810    0.0223   30.8222    9.3389
194810 1    0.0000    4.2400    0.0255   31.9000    9.9222
194810 2    0.0000    4.1990    0.0292   31.7000   10.1222
194810 3    0.0000    4.1570    0.0292   31.0056   10.8778
194810 4    0.3400    4.1160    0.0255   31.2278    9.7722
194810 5    0.0000    4.0750    0.0255   30.4222   10.1944
194810 6    0.0000    4.0330    0.0255   32.3389   12.8778
194810 7    0.0000    3.9920    0.0223   27.8222   16.2833
194810 8    0.0000    3.9510    0.0223   29.3611   13.8444
194810 9    0.0000    3.9100    0.0223   30.7722   14.2833
19481010    0.0000    3.8680    0.0223   29.8833   17.7722
19481011    0.0000    3.8270    0.0223   29.3444   17.6722
19481012    0.0000    3.7860    0.0223   30.2444   12.3389
19481013    0.1200    3.7450    0.0223   31.6056   12.5056
19481014    0.0400    3.7040    0.0223   33.6556   14.0667
19481015    2.4400    3.6630    0.0223   32.7222   16.8278
19481016   23.7700    3.6230    0.0260   32.3333   17.8333
19481017   41.4600    3.5820    1.3760   19.4667    7.4500
19481018    0.0000    3.5410    0.2757   17.9889    4.1556
19481019    0.1600    3.5010    0.0954   21.5944    4.4667
19481020    0.8900    3.4610    0.0530   23.5389    6.6056
19481021    0.4100    3.4210    0.0451   22.0556   11.4222
19481022    0.1000    3.3810    0.0424   24.8889   13.4333
19481023    0.0000    3.3410    0.0398   24.6778   10.6889
19481024    0.0000    3.3020    0.0371   24.3222   10.3444
19481025    0.0300    3.2630    0.0371   25.0611    8.4167
19481026    0.0400    3.2240    0.0371   24.4944    6.0444
19481027    0.0600    3.1850    0.0371   25.4722    5.8722
19481028    0.0700    3.1470    0.0345   24.6056   10.8500
19481029    0.0100    3.1080    0.0345   26.1556   14.1667
19481030    0.0900    3.0700    0.0345   27.2944   18.0556
19481031    0.0200    3.0330    0.0318   27.9167   18.1278

I tried to modified my code as follows

Code:
fid = fopen('data.txt','r'); 

data2 = fscanf(fid, '%4d%2d%2d %f %f %f %f %f', [8 inf]);
data2 = data2';

fclose(fid);

It did work for the data from October however it misread the data in Sept ( 9)

Could anyone help me with a better solution?

Thx

madtraveller
 
  • #9
Your columns (year, month, and date) are bleeding into each other. I'd suggest either better delimiting (e.g. Using tabs or forcing fixed numbers of spaces between columns) in whatever you're using to generate these values or going in and manually or automatically increasing the delimiting between them.
 
  • #10
MATLABdude,

Thank you. So there is no way to read that kind of file using Matlab? That's really strange to me tbh :)
That's file was provided on a website and I can read it quite easily using R. However for some reasons I would like to read it using Matlab and don't want to call R script from Matlab

madtraveller
 
  • #11
Are you certain that R actually reads it correctly?

I was hoping that textread (replaced by the not-quite drop-in textscan in newer versions of MATLAB) would do the trick, but the problem seems to be in how it treats the whitespace when there is and is not a space between the year and month, or month and date. Too bad, since it almost did the trick (textscan may still, but I don't have access to a newer version of MATLAB on this computer).
http://www.mathworks.com/help/techdoc/ref/textread.html
http://www.mathworks.com/help/techdoc/ref/textscan.html

Actually, I have a suspicion that that's probably the C standard.

Nevertheless, you can still get around this issue by treating the first 8 characters of every line as a date string and then doing conversion using some matrix operations and the str2num function:
http://www.mathworks.com/help/techdoc/ref/str2num.html

The following code (for use with your most recent dataset starting in 1948) uses textread, but you'll have to modify a little for the newer textscan (it produces cell arrays instead of forcing you to explicitly declare variable names for every column).
Code:
[date, var1, var2, var3, var4, var5] = textread('sample.txt', '%8c %f %f %f %f %f');

year=date(:, 1:4);		% extracts the first 4 columns of every row of date
year=str2num(year);		% converts them to numbers
month=str2num(date(:, 5:6));	% both operations in one line
day=str2num(date(:, 7:8));

% now horizontally concatenate everything together
format short g;			%just so everything looks properly...
big_ol_data = horzcat(year, month, day, var1, var2, var3, var4, var5)

EDIT: Ooops, wrong year: 1948 instead of 1968.
 
Last edited:
  • #12
Thank you very much MATLABdude. Your code really did the trick.

Have a good weekend

madtraveller
 
  • #13
Forgot adding the R code. R could really does this very easily with read.fwf :)

Code:
data <- read.fwf("data.txt", widths=c(4,2,2,10,10,10,10,10) )
 

1. Why am I unable to read a strange mixed format text file in Matlab?

There could be several reasons for this issue. One possibility is that the file contains non-standard or unexpected characters that Matlab is not able to interpret. Another possibility is that the file is corrupted or has formatting errors. Additionally, it could be due to a mismatch between the file format and the function used to read it in Matlab.

2. How can I fix the problem of not being able to read a mixed format text file in Matlab?

First, try using the "importdata" function instead of "textread" to read the file. This function is more flexible and can handle a wider range of file formats. If that doesn't work, check the file for any errors or try converting it to a different format. You can also try using a different programming language or software to read the file.

3. Can I convert the mixed format text file to a standard format in Matlab?

Yes, it is possible to convert the file to a standard format in Matlab. You can use the "regexp" function to remove any non-standard characters or use the "fprintf" function to save the data in a different format. Another option is to use the "csvread" function to read in the file as a comma-separated value (CSV) file, which is a standard format.

4. Is there a limit to the size of a mixed format text file that Matlab can read?

Yes, there is a limit to the size of a text file that Matlab can read. This limit varies depending on the available memory and system resources. If you are having trouble reading a large mixed format text file, try breaking it into smaller chunks or using a different method to read in the data.

5. Are there any alternative methods for reading a mixed format text file in Matlab?

Yes, there are several alternative methods for reading a mixed format text file in Matlab. Some options include using the "fscanf" function, importing the file as a table with the "readtable" function, or using the "fileread" function to read in the file as a string. It's worth trying out different methods to see which one works best for your specific file and data format.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
6
Views
2K
  • Programming and Computer Science
Replies
4
Views
4K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
11
Views
3K
  • Computing and Technology
Replies
1
Views
7K
Back
Top