Register to reply

Can't read strange mixed format text file in Matlab

by madtraveller
Tags: file, format, matlab, mixed, strange, text
Share this thread:
madtraveller
#1
Jun3-11, 05:41 PM
P: 28
1. The problem statement, all variables and given/known data

I have to read a text file with fixed format as follows
The first column is supposed to be year, 2nd is month and 3rd is day
The file has following format: 4d 2d 2d f9.4 f9.4 f9.4 f9.4 4d 4d 4d 4d 4d 4d

2000 218   0.4546   0.2394   0.0761   0.1167  55  58   1   1   1   1
2000 226  -9.9999  -9.9999   0.3701   0.7276 -99 -99 100 100 100 100
2000 3 5   0.4571   0.2612   0.3657   0.8069  54  56 100 100 100 100
2000 313  -9.9999  -9.9999   0.3310   0.7816 -99 -99 100 100 100 100
2000 321   0.5156   0.2806   0.3777   0.8762  56  56  97  97  98  98
2000 329  -9.9999  -9.9999   0.4171   1.0047 -99 -99 100 100 100 100
2000 4 6   0.5190   0.3154   0.4273   1.0269  58  54 100 100 100 100
2000 414  -9.9999  -9.9999   0.4521   1.1319 -99 -99 100 100 100 100
2000 422   0.5845   0.3109   0.4627   1.1363  56  56 100 100 100 100
2. Relevant equations



3. The attempt at a solution

I tried to use textscan or fscanf but it just didn't work. The program always had problem with the first 3 column

E.g.

clc
clear all
fid = fopen('data.txt','r'); 
data1 = fscanf(fid, '%4d %2d %2d %9.4f %9.4f %9.4f %9.4f %4d %4d %4d %4d %4d %4d')
data2 = fscanf(fid, '%d %d %f %f %f %f %d %d %d %d %d %d')
data3 = fscanf(fid, '%4d%2d%2d%9.4f%9.4f%9.4f%9.4f%4d%4d%4d%4d%4d%4d')
fclose(fid);
Plz suggest me with a better solutions. Thank you very much

madtraveller
1. The problem statement, all variables and given/known data



2. Relevant equations



3. The attempt at a solution
Phys.Org News Partner Science news on Phys.org
Hoverbike drone project for air transport takes off
Earlier Stone Age artifacts found in Northern Cape of South Africa
Study reveals new characteristics of complex oxide surfaces
jbunniii
#2
Jun3-11, 06:30 PM
Sci Advisor
HW Helper
PF Gold
jbunniii's Avatar
P: 3,169
What's going on with rows 3 and 7? Instead of a 3-digit number in the second column like all the others, these have two 1-digit numbers.

So the number of elements on each line is not consistent.

Also, why are you using 3 fscanf statements?
jbunniii
#3
Jun3-11, 06:31 PM
Sci Advisor
HW Helper
PF Gold
jbunniii's Avatar
P: 3,169
P.S. You don't need to specify the number of digits when reading, e.g. you can use %d instead of %4d.

madtraveller
#4
Jun3-11, 06:38 PM
P: 28
Can't read strange mixed format text file in Matlab

jbunniii, thank you for your answer

As I stated: The first column is supposed to be year, 2nd is month and 3rd is day
So The 1st row is 2000 Feb 8th
The 2nd row is 2000 Feb 26th
The 3rd row is 2000 March 5th
...
The 7th row is 2000 April 7th

So the format is fixed: 4d 2d 2d
That's the nasty thing about this file format :)

I tried 3 fscanf statements to see which one worked. Apparently neither of them did

madtraveller
jbunniii
#5
Jun3-11, 06:49 PM
Sci Advisor
HW Helper
PF Gold
jbunniii's Avatar
P: 3,169
OK, thanks for the clarification.

Try the following:

data = fscanf(fid, '%d %1d%2d %f %f %f %f %d %d %d %d %d %d');
jbunniii
#6
Jun3-11, 06:54 PM
Sci Advisor
HW Helper
PF Gold
jbunniii's Avatar
P: 3,169
P.S. That will read all the data into a single column vector. If you would rather read it into a matrix of the dimensions of the original file, try this:

data = fscanf(fid, '%d %1d%2d %f %f %f %f %d %d %d %d %d %d', [13,inf]);

This will reverse your rows and columns, so follow it with

data = data';

to get the same dimensions as the file.
madtraveller
#7
Jun3-11, 07:10 PM
P: 28
Thank you so much jbunniii. It works perfectly now

Have a great weekend

madtraveller
madtraveller
#8
Jun10-11, 02:06 AM
P: 28
I've just realized that the code suggested above didn't work correctly if I have a longer data like this. Again the first column is Year (4d format), 2nd column is Month (2d) and 3rd column is Day (2d)


1948 9 1    0.0000    5.3940    0.0292   35.4944   19.2500
1948 9 2    0.0000    5.3600    0.0292   36.0056   16.7556
1948 9 3    0.1000    5.3250    0.0292   35.9722   17.0000
1948 9 4    0.4400    5.2900    0.0292   36.7500   17.4556
1948 9 5    0.0000    5.2550    0.0255   37.1611   17.2167
1948 9 6    0.2900    5.2190    0.0255   35.9556   19.5778
1948 9 7    3.3200    5.1830    0.0255   35.9056   20.8333
1948 9 8    6.3700    5.1460    0.0255   34.6500   20.8778
1948 9 9   23.0600    5.1100    0.0371   25.6667   17.8500
1948 910    2.6800    5.0720    0.0398   23.1889   16.7889
1948 911    0.1500    5.0350    0.0371   27.3222   15.7556
1948 912    0.0000    4.9970    0.0318   30.1833   14.0278
1948 913    0.1000    4.9590    0.0292   31.0444   16.0889
1948 914    0.0000    4.9210    0.0255   28.8167   19.4833
1948 915    0.0800    4.8820    0.0255   31.3000   18.2389
1948 916    0.9500    4.8440    0.0255   31.8278   15.5722
1948 917    6.7100    4.8050    0.0292   31.1611   18.2667
1948 918    1.7100    4.7650    0.0292   29.8278   20.2389
1948 919    0.5500    4.7260    0.0292   30.6389   20.0222
1948 920    0.0000    4.6860    0.0255   32.1167   19.3500
1948 921    5.5700    4.6460    0.0255   32.9111   19.0333
1948 922    3.0300    4.6060    0.0255   32.3833   18.4056
1948 923    0.2700    4.5660    0.0255   32.2278   17.2389
1948 924    0.0000    4.5260    0.0255   32.6889   17.2500
1948 925    0.0000    4.4850    0.0255   31.5056   16.0833
1948 926    0.0000    4.4440    0.0223   27.6333   12.4778
1948 927    0.0000    4.4040    0.0191   26.4000    9.7000
1948 928    0.0000    4.3630    0.0223   26.4556    8.4333
1948 929    0.0000    4.3220    0.0223   28.6778    9.8444
1948 930    0.0000    4.2810    0.0223   30.8222    9.3389
194810 1    0.0000    4.2400    0.0255   31.9000    9.9222
194810 2    0.0000    4.1990    0.0292   31.7000   10.1222
194810 3    0.0000    4.1570    0.0292   31.0056   10.8778
194810 4    0.3400    4.1160    0.0255   31.2278    9.7722
194810 5    0.0000    4.0750    0.0255   30.4222   10.1944
194810 6    0.0000    4.0330    0.0255   32.3389   12.8778
194810 7    0.0000    3.9920    0.0223   27.8222   16.2833
194810 8    0.0000    3.9510    0.0223   29.3611   13.8444
194810 9    0.0000    3.9100    0.0223   30.7722   14.2833
19481010    0.0000    3.8680    0.0223   29.8833   17.7722
19481011    0.0000    3.8270    0.0223   29.3444   17.6722
19481012    0.0000    3.7860    0.0223   30.2444   12.3389
19481013    0.1200    3.7450    0.0223   31.6056   12.5056
19481014    0.0400    3.7040    0.0223   33.6556   14.0667
19481015    2.4400    3.6630    0.0223   32.7222   16.8278
19481016   23.7700    3.6230    0.0260   32.3333   17.8333
19481017   41.4600    3.5820    1.3760   19.4667    7.4500
19481018    0.0000    3.5410    0.2757   17.9889    4.1556
19481019    0.1600    3.5010    0.0954   21.5944    4.4667
19481020    0.8900    3.4610    0.0530   23.5389    6.6056
19481021    0.4100    3.4210    0.0451   22.0556   11.4222
19481022    0.1000    3.3810    0.0424   24.8889   13.4333
19481023    0.0000    3.3410    0.0398   24.6778   10.6889
19481024    0.0000    3.3020    0.0371   24.3222   10.3444
19481025    0.0300    3.2630    0.0371   25.0611    8.4167
19481026    0.0400    3.2240    0.0371   24.4944    6.0444
19481027    0.0600    3.1850    0.0371   25.4722    5.8722
19481028    0.0700    3.1470    0.0345   24.6056   10.8500
19481029    0.0100    3.1080    0.0345   26.1556   14.1667
19481030    0.0900    3.0700    0.0345   27.2944   18.0556
19481031    0.0200    3.0330    0.0318   27.9167   18.1278
I tried to modified my code as follows

fid = fopen('data.txt','r'); 

data2 = fscanf(fid, '%4d%2d%2d %f %f %f %f %f', [8 inf]);
data2 = data2';

fclose(fid);
It did work for the data from October however it misread the data in Sept ( 9)

Could anyone help me with a better solution?

Thx

madtraveller
MATLABdude
#9
Jun10-11, 02:37 AM
Sci Advisor
P: 1,724
Your columns (year, month, and date) are bleeding into eachother. I'd suggest either better delimiting (e.g. Using tabs or forcing fixed numbers of spaces between columns) in whatever you're using to generate these values or going in and manually or automatically increasing the delimiting between them.
madtraveller
#10
Jun10-11, 02:51 AM
P: 28
MATLABdude,

Thank you. So there is no way to read that kind of file using Matlab??? That's really strange to me tbh :)
That's file was provided on a website and I can read it quite easily using R. However for some reasons I would like to read it using Matlab and don't want to call R script from Matlab

madtraveller
MATLABdude
#11
Jun10-11, 08:20 AM
Sci Advisor
P: 1,724
Are you certain that R actually reads it correctly?

I was hoping that textread (replaced by the not-quite drop-in textscan in newer versions of MATLAB) would do the trick, but the problem seems to be in how it treats the whitespace when there is and is not a space between the year and month, or month and date. Too bad, since it almost did the trick (textscan may still, but I don't have access to a newer version of MATLAB on this computer).
http://www.mathworks.com/help/techdoc/ref/textread.html
http://www.mathworks.com/help/techdoc/ref/textscan.html

Actually, I have a suspicion that that's probably the C standard.

Nevertheless, you can still get around this issue by treating the first 8 characters of every line as a date string and then doing conversion using some matrix operations and the str2num function:
http://www.mathworks.com/help/techdoc/ref/str2num.html

The following code (for use with your most recent dataset starting in 1948) uses textread, but you'll have to modify a little for the newer textscan (it produces cell arrays instead of forcing you to explicitly declare variable names for every column).
[date, var1, var2, var3, var4, var5] = textread('sample.txt', '%8c %f %f %f %f %f');

year=date(:, 1:4);		% extracts the first 4 columns of every row of date
year=str2num(year);		% converts them to numbers
month=str2num(date(:, 5:6));	% both operations in one line
day=str2num(date(:, 7:8));

% now horizontally concatenate everything together
format short g;			%just so everything looks properly...
big_ol_data = horzcat(year, month, day, var1, var2, var3, var4, var5)
EDIT: Ooops, wrong year: 1948 instead of 1968.
madtraveller
#12
Jun10-11, 01:41 PM
P: 28
Thank you very much MATLABdude. Your code really did the trick.

Have a good weekend

madtraveller
madtraveller
#13
Jun10-11, 01:50 PM
P: 28
Forgot adding the R code. R could really does this very easily with read.fwf :)

data <- read.fwf("data.txt", widths=c(4,2,2,10,10,10,10,10) )


Register to reply

Related Discussions
Gaussian 09 .out file format specification (and freqchk) Chemistry 4
Read CSV format in C/C++ Programming & Computer Science 7
How can Capture Text form image file and save it as txt file or word file Computers 2
Format text(ifstream, ofstream) Programming & Computer Science 9
Read the mixed up words Fun, Photos & Games 16