## Can't read strange mixed format text file in Matlab

1. The problem statement, all variables and given/known data

I have to read a text file with fixed format as follows
The first column is supposed to be year, 2nd is month and 3rd is day
The file has following format: 4d 2d 2d f9.4 f9.4 f9.4 f9.4 4d 4d 4d 4d 4d 4d

Code:
2000 218   0.4546   0.2394   0.0761   0.1167  55  58   1   1   1   1
2000 226  -9.9999  -9.9999   0.3701   0.7276 -99 -99 100 100 100 100
2000 3 5   0.4571   0.2612   0.3657   0.8069  54  56 100 100 100 100
2000 313  -9.9999  -9.9999   0.3310   0.7816 -99 -99 100 100 100 100
2000 321   0.5156   0.2806   0.3777   0.8762  56  56  97  97  98  98
2000 329  -9.9999  -9.9999   0.4171   1.0047 -99 -99 100 100 100 100
2000 4 6   0.5190   0.3154   0.4273   1.0269  58  54 100 100 100 100
2000 414  -9.9999  -9.9999   0.4521   1.1319 -99 -99 100 100 100 100
2000 422   0.5845   0.3109   0.4627   1.1363  56  56 100 100 100 100
I tried to use textscan or fscanf but it just didn't work. The program always had problem with the first 3 column

E.g.
Code:
clc
clear all
fid = fopen('data.txt','r');
data1 = fscanf(fid, '%4d %2d %2d %9.4f %9.4f %9.4f %9.4f %4d %4d %4d %4d %4d %4d')
data2 = fscanf(fid, '%d %d %f %f %f %f %d %d %d %d %d %d')
data3 = fscanf(fid, '%4d%2d%2d%9.4f%9.4f%9.4f%9.4f%4d%4d%4d%4d%4d%4d')
fclose(fid);
Plz suggest me with a better solutions. Thank you very much

 Blog Entries: 1 Recognitions: Gold Member Homework Help Science Advisor What's going on with rows 3 and 7? Instead of a 3-digit number in the second column like all the others, these have two 1-digit numbers. So the number of elements on each line is not consistent. Also, why are you using 3 fscanf statements?
 Blog Entries: 1 Recognitions: Gold Member Homework Help Science Advisor P.S. You don't need to specify the number of digits when reading, e.g. you can use %d instead of %4d.

As I stated: The first column is supposed to be year, 2nd is month and 3rd is day
So The 1st row is 2000 Feb 8th
The 2nd row is 2000 Feb 26th
The 3rd row is 2000 March 5th
...
The 7th row is 2000 April 7th

So the format is fixed: 4d 2d 2d

I tried 3 fscanf statements to see which one worked. Apparently neither of them did

 Blog Entries: 1 Recognitions: Gold Member Homework Help Science Advisor OK, thanks for the clarification. Try the following: data = fscanf(fid, '%d %1d%2d %f %f %f %f %d %d %d %d %d %d');
 Blog Entries: 1 Recognitions: Gold Member Homework Help Science Advisor P.S. That will read all the data into a single column vector. If you would rather read it into a matrix of the dimensions of the original file, try this: data = fscanf(fid, '%d %1d%2d %f %f %f %f %d %d %d %d %d %d', [13,inf]); This will reverse your rows and columns, so follow it with data = data'; to get the same dimensions as the file.
 Thank you so much jbunniii. It works perfectly now Have a great weekend madtraveller
 I've just realized that the code suggested above didn't work correctly if I have a longer data like this. Again the first column is Year (4d format), 2nd column is Month (2d) and 3rd column is Day (2d) Code: 1948 9 1 0.0000 5.3940 0.0292 35.4944 19.2500 1948 9 2 0.0000 5.3600 0.0292 36.0056 16.7556 1948 9 3 0.1000 5.3250 0.0292 35.9722 17.0000 1948 9 4 0.4400 5.2900 0.0292 36.7500 17.4556 1948 9 5 0.0000 5.2550 0.0255 37.1611 17.2167 1948 9 6 0.2900 5.2190 0.0255 35.9556 19.5778 1948 9 7 3.3200 5.1830 0.0255 35.9056 20.8333 1948 9 8 6.3700 5.1460 0.0255 34.6500 20.8778 1948 9 9 23.0600 5.1100 0.0371 25.6667 17.8500 1948 910 2.6800 5.0720 0.0398 23.1889 16.7889 1948 911 0.1500 5.0350 0.0371 27.3222 15.7556 1948 912 0.0000 4.9970 0.0318 30.1833 14.0278 1948 913 0.1000 4.9590 0.0292 31.0444 16.0889 1948 914 0.0000 4.9210 0.0255 28.8167 19.4833 1948 915 0.0800 4.8820 0.0255 31.3000 18.2389 1948 916 0.9500 4.8440 0.0255 31.8278 15.5722 1948 917 6.7100 4.8050 0.0292 31.1611 18.2667 1948 918 1.7100 4.7650 0.0292 29.8278 20.2389 1948 919 0.5500 4.7260 0.0292 30.6389 20.0222 1948 920 0.0000 4.6860 0.0255 32.1167 19.3500 1948 921 5.5700 4.6460 0.0255 32.9111 19.0333 1948 922 3.0300 4.6060 0.0255 32.3833 18.4056 1948 923 0.2700 4.5660 0.0255 32.2278 17.2389 1948 924 0.0000 4.5260 0.0255 32.6889 17.2500 1948 925 0.0000 4.4850 0.0255 31.5056 16.0833 1948 926 0.0000 4.4440 0.0223 27.6333 12.4778 1948 927 0.0000 4.4040 0.0191 26.4000 9.7000 1948 928 0.0000 4.3630 0.0223 26.4556 8.4333 1948 929 0.0000 4.3220 0.0223 28.6778 9.8444 1948 930 0.0000 4.2810 0.0223 30.8222 9.3389 194810 1 0.0000 4.2400 0.0255 31.9000 9.9222 194810 2 0.0000 4.1990 0.0292 31.7000 10.1222 194810 3 0.0000 4.1570 0.0292 31.0056 10.8778 194810 4 0.3400 4.1160 0.0255 31.2278 9.7722 194810 5 0.0000 4.0750 0.0255 30.4222 10.1944 194810 6 0.0000 4.0330 0.0255 32.3389 12.8778 194810 7 0.0000 3.9920 0.0223 27.8222 16.2833 194810 8 0.0000 3.9510 0.0223 29.3611 13.8444 194810 9 0.0000 3.9100 0.0223 30.7722 14.2833 19481010 0.0000 3.8680 0.0223 29.8833 17.7722 19481011 0.0000 3.8270 0.0223 29.3444 17.6722 19481012 0.0000 3.7860 0.0223 30.2444 12.3389 19481013 0.1200 3.7450 0.0223 31.6056 12.5056 19481014 0.0400 3.7040 0.0223 33.6556 14.0667 19481015 2.4400 3.6630 0.0223 32.7222 16.8278 19481016 23.7700 3.6230 0.0260 32.3333 17.8333 19481017 41.4600 3.5820 1.3760 19.4667 7.4500 19481018 0.0000 3.5410 0.2757 17.9889 4.1556 19481019 0.1600 3.5010 0.0954 21.5944 4.4667 19481020 0.8900 3.4610 0.0530 23.5389 6.6056 19481021 0.4100 3.4210 0.0451 22.0556 11.4222 19481022 0.1000 3.3810 0.0424 24.8889 13.4333 19481023 0.0000 3.3410 0.0398 24.6778 10.6889 19481024 0.0000 3.3020 0.0371 24.3222 10.3444 19481025 0.0300 3.2630 0.0371 25.0611 8.4167 19481026 0.0400 3.2240 0.0371 24.4944 6.0444 19481027 0.0600 3.1850 0.0371 25.4722 5.8722 19481028 0.0700 3.1470 0.0345 24.6056 10.8500 19481029 0.0100 3.1080 0.0345 26.1556 14.1667 19481030 0.0900 3.0700 0.0345 27.2944 18.0556 19481031 0.0200 3.0330 0.0318 27.9167 18.1278 I tried to modified my code as follows Code: fid = fopen('data.txt','r'); data2 = fscanf(fid, '%4d%2d%2d %f %f %f %f %f', [8 inf]); data2 = data2'; fclose(fid); It did work for the data from October however it misread the data in Sept ( 9) Could anyone help me with a better solution? Thx madtraveller
 Blog Entries: 1 Recognitions: Science Advisor Your columns (year, month, and date) are bleeding into eachother. I'd suggest either better delimiting (e.g. Using tabs or forcing fixed numbers of spaces between columns) in whatever you're using to generate these values or going in and manually or automatically increasing the delimiting between them.
 MATLABdude, Thank you. So there is no way to read that kind of file using Matlab??? That's really strange to me tbh :) That's file was provided on a website and I can read it quite easily using R. However for some reasons I would like to read it using Matlab and don't want to call R script from Matlab madtraveller
 Blog Entries: 1 Recognitions: Science Advisor Are you certain that R actually reads it correctly? I was hoping that textread (replaced by the not-quite drop-in textscan in newer versions of MATLAB) would do the trick, but the problem seems to be in how it treats the whitespace when there is and is not a space between the year and month, or month and date. Too bad, since it almost did the trick (textscan may still, but I don't have access to a newer version of MATLAB on this computer). http://www.mathworks.com/help/techdoc/ref/textread.html http://www.mathworks.com/help/techdoc/ref/textscan.html Actually, I have a suspicion that that's probably the C standard. Nevertheless, you can still get around this issue by treating the first 8 characters of every line as a date string and then doing conversion using some matrix operations and the str2num function: http://www.mathworks.com/help/techdoc/ref/str2num.html The following code (for use with your most recent dataset starting in 1948) uses textread, but you'll have to modify a little for the newer textscan (it produces cell arrays instead of forcing you to explicitly declare variable names for every column). Code: [date, var1, var2, var3, var4, var5] = textread('sample.txt', '%8c %f %f %f %f %f'); year=date(:, 1:4); % extracts the first 4 columns of every row of date year=str2num(year); % converts them to numbers month=str2num(date(:, 5:6)); % both operations in one line day=str2num(date(:, 7:8)); % now horizontally concatenate everything together format short g; %just so everything looks properly... big_ol_data = horzcat(year, month, day, var1, var2, var3, var4, var5) EDIT: Ooops, wrong year: 1948 instead of 1968.
 Thank you very much MATLABdude. Your code really did the trick. Have a good weekend madtraveller
 Forgot adding the R code. R could really does this very easily with read.fwf :) Code: data <- read.fwf("data.txt", widths=c(4,2,2,10,10,10,10,10) )