| New Reply |
Can't read strange mixed format text file in Matlab |
Share Thread | Thread Tools |
| Jun3-11, 05:41 PM | #1 |
|
|
Can't read strange mixed format text file in Matlab
1. The problem statement, all variables and given/known data
I have to read a text file with fixed format as follows The first column is supposed to be year, 2nd is month and 3rd is day The file has following format: 4d 2d 2d f9.4 f9.4 f9.4 f9.4 4d 4d 4d 4d 4d 4d Code:
2000 218 0.4546 0.2394 0.0761 0.1167 55 58 1 1 1 1 2000 226 -9.9999 -9.9999 0.3701 0.7276 -99 -99 100 100 100 100 2000 3 5 0.4571 0.2612 0.3657 0.8069 54 56 100 100 100 100 2000 313 -9.9999 -9.9999 0.3310 0.7816 -99 -99 100 100 100 100 2000 321 0.5156 0.2806 0.3777 0.8762 56 56 97 97 98 98 2000 329 -9.9999 -9.9999 0.4171 1.0047 -99 -99 100 100 100 100 2000 4 6 0.5190 0.3154 0.4273 1.0269 58 54 100 100 100 100 2000 414 -9.9999 -9.9999 0.4521 1.1319 -99 -99 100 100 100 100 2000 422 0.5845 0.3109 0.4627 1.1363 56 56 100 100 100 100 3. The attempt at a solution I tried to use textscan or fscanf but it just didn't work. The program always had problem with the first 3 column E.g. Code:
clc
clear all
fid = fopen('data.txt','r');
data1 = fscanf(fid, '%4d %2d %2d %9.4f %9.4f %9.4f %9.4f %4d %4d %4d %4d %4d %4d')
data2 = fscanf(fid, '%d %d %f %f %f %f %d %d %d %d %d %d')
data3 = fscanf(fid, '%4d%2d%2d%9.4f%9.4f%9.4f%9.4f%4d%4d%4d%4d%4d%4d')
fclose(fid);
madtraveller 1. The problem statement, all variables and given/known data 2. Relevant equations 3. The attempt at a solution |
| Jun3-11, 06:30 PM | #2 |
|
|
What's going on with rows 3 and 7? Instead of a 3-digit number in the second column like all the others, these have two 1-digit numbers.
So the number of elements on each line is not consistent. Also, why are you using 3 fscanf statements? |
| Jun3-11, 06:31 PM | #3 |
|
|
P.S. You don't need to specify the number of digits when reading, e.g. you can use %d instead of %4d.
|
| Jun3-11, 06:38 PM | #4 |
|
|
Can't read strange mixed format text file in Matlab
jbunniii, thank you for your answer
As I stated: The first column is supposed to be year, 2nd is month and 3rd is day So The 1st row is 2000 Feb 8th The 2nd row is 2000 Feb 26th The 3rd row is 2000 March 5th ... The 7th row is 2000 April 7th So the format is fixed: 4d 2d 2d That's the nasty thing about this file format :) I tried 3 fscanf statements to see which one worked. Apparently neither of them did madtraveller |
| Jun3-11, 06:49 PM | #5 |
|
|
OK, thanks for the clarification.
Try the following: data = fscanf(fid, '%d %1d%2d %f %f %f %f %d %d %d %d %d %d'); |
| Jun3-11, 06:54 PM | #6 |
|
|
P.S. That will read all the data into a single column vector. If you would rather read it into a matrix of the dimensions of the original file, try this:
data = fscanf(fid, '%d %1d%2d %f %f %f %f %d %d %d %d %d %d', [13,inf]); This will reverse your rows and columns, so follow it with data = data'; to get the same dimensions as the file. |
| Jun3-11, 07:10 PM | #7 |
|
|
Thank you so much jbunniii. It works perfectly now
Have a great weekend madtraveller |
| Jun10-11, 02:06 AM | #8 |
|
|
I've just realized that the code suggested above didn't work correctly if I have a longer data like this. Again the first column is Year (4d format), 2nd column is Month (2d) and 3rd column is Day (2d)
Code:
1948 9 1 0.0000 5.3940 0.0292 35.4944 19.2500 1948 9 2 0.0000 5.3600 0.0292 36.0056 16.7556 1948 9 3 0.1000 5.3250 0.0292 35.9722 17.0000 1948 9 4 0.4400 5.2900 0.0292 36.7500 17.4556 1948 9 5 0.0000 5.2550 0.0255 37.1611 17.2167 1948 9 6 0.2900 5.2190 0.0255 35.9556 19.5778 1948 9 7 3.3200 5.1830 0.0255 35.9056 20.8333 1948 9 8 6.3700 5.1460 0.0255 34.6500 20.8778 1948 9 9 23.0600 5.1100 0.0371 25.6667 17.8500 1948 910 2.6800 5.0720 0.0398 23.1889 16.7889 1948 911 0.1500 5.0350 0.0371 27.3222 15.7556 1948 912 0.0000 4.9970 0.0318 30.1833 14.0278 1948 913 0.1000 4.9590 0.0292 31.0444 16.0889 1948 914 0.0000 4.9210 0.0255 28.8167 19.4833 1948 915 0.0800 4.8820 0.0255 31.3000 18.2389 1948 916 0.9500 4.8440 0.0255 31.8278 15.5722 1948 917 6.7100 4.8050 0.0292 31.1611 18.2667 1948 918 1.7100 4.7650 0.0292 29.8278 20.2389 1948 919 0.5500 4.7260 0.0292 30.6389 20.0222 1948 920 0.0000 4.6860 0.0255 32.1167 19.3500 1948 921 5.5700 4.6460 0.0255 32.9111 19.0333 1948 922 3.0300 4.6060 0.0255 32.3833 18.4056 1948 923 0.2700 4.5660 0.0255 32.2278 17.2389 1948 924 0.0000 4.5260 0.0255 32.6889 17.2500 1948 925 0.0000 4.4850 0.0255 31.5056 16.0833 1948 926 0.0000 4.4440 0.0223 27.6333 12.4778 1948 927 0.0000 4.4040 0.0191 26.4000 9.7000 1948 928 0.0000 4.3630 0.0223 26.4556 8.4333 1948 929 0.0000 4.3220 0.0223 28.6778 9.8444 1948 930 0.0000 4.2810 0.0223 30.8222 9.3389 194810 1 0.0000 4.2400 0.0255 31.9000 9.9222 194810 2 0.0000 4.1990 0.0292 31.7000 10.1222 194810 3 0.0000 4.1570 0.0292 31.0056 10.8778 194810 4 0.3400 4.1160 0.0255 31.2278 9.7722 194810 5 0.0000 4.0750 0.0255 30.4222 10.1944 194810 6 0.0000 4.0330 0.0255 32.3389 12.8778 194810 7 0.0000 3.9920 0.0223 27.8222 16.2833 194810 8 0.0000 3.9510 0.0223 29.3611 13.8444 194810 9 0.0000 3.9100 0.0223 30.7722 14.2833 19481010 0.0000 3.8680 0.0223 29.8833 17.7722 19481011 0.0000 3.8270 0.0223 29.3444 17.6722 19481012 0.0000 3.7860 0.0223 30.2444 12.3389 19481013 0.1200 3.7450 0.0223 31.6056 12.5056 19481014 0.0400 3.7040 0.0223 33.6556 14.0667 19481015 2.4400 3.6630 0.0223 32.7222 16.8278 19481016 23.7700 3.6230 0.0260 32.3333 17.8333 19481017 41.4600 3.5820 1.3760 19.4667 7.4500 19481018 0.0000 3.5410 0.2757 17.9889 4.1556 19481019 0.1600 3.5010 0.0954 21.5944 4.4667 19481020 0.8900 3.4610 0.0530 23.5389 6.6056 19481021 0.4100 3.4210 0.0451 22.0556 11.4222 19481022 0.1000 3.3810 0.0424 24.8889 13.4333 19481023 0.0000 3.3410 0.0398 24.6778 10.6889 19481024 0.0000 3.3020 0.0371 24.3222 10.3444 19481025 0.0300 3.2630 0.0371 25.0611 8.4167 19481026 0.0400 3.2240 0.0371 24.4944 6.0444 19481027 0.0600 3.1850 0.0371 25.4722 5.8722 19481028 0.0700 3.1470 0.0345 24.6056 10.8500 19481029 0.0100 3.1080 0.0345 26.1556 14.1667 19481030 0.0900 3.0700 0.0345 27.2944 18.0556 19481031 0.0200 3.0330 0.0318 27.9167 18.1278 Code:
fid = fopen('data.txt','r');
data2 = fscanf(fid, '%4d%2d%2d %f %f %f %f %f', [8 inf]);
data2 = data2';
fclose(fid);
Could anyone help me with a better solution? Thx madtraveller |
| Jun10-11, 02:37 AM | #9 |
|
|
Your columns (year, month, and date) are bleeding into eachother. I'd suggest either better delimiting (e.g. Using tabs or forcing fixed numbers of spaces between columns) in whatever you're using to generate these values or going in and manually or automatically increasing the delimiting between them.
|
| Jun10-11, 02:51 AM | #10 |
|
|
MATLABdude,
Thank you. So there is no way to read that kind of file using Matlab??? That's really strange to me tbh :) That's file was provided on a website and I can read it quite easily using R. However for some reasons I would like to read it using Matlab and don't want to call R script from Matlab madtraveller |
| Jun10-11, 08:20 AM | #11 |
|
|
Are you certain that R actually reads it correctly?
I was hoping that textread (replaced by the not-quite drop-in textscan in newer versions of MATLAB) would do the trick, but the problem seems to be in how it treats the whitespace when there is and is not a space between the year and month, or month and date. Too bad, since it almost did the trick (textscan may still, but I don't have access to a newer version of MATLAB on this computer). http://www.mathworks.com/help/techdoc/ref/textread.html http://www.mathworks.com/help/techdoc/ref/textscan.html Actually, I have a suspicion that that's probably the C standard. Nevertheless, you can still get around this issue by treating the first 8 characters of every line as a date string and then doing conversion using some matrix operations and the str2num function: http://www.mathworks.com/help/techdoc/ref/str2num.html The following code (for use with your most recent dataset starting in 1948) uses textread, but you'll have to modify a little for the newer textscan (it produces cell arrays instead of forcing you to explicitly declare variable names for every column). Code:
[date, var1, var2, var3, var4, var5] = textread('sample.txt', '%8c %f %f %f %f %f');
year=date(:, 1:4); % extracts the first 4 columns of every row of date
year=str2num(year); % converts them to numbers
month=str2num(date(:, 5:6)); % both operations in one line
day=str2num(date(:, 7:8));
% now horizontally concatenate everything together
format short g; %just so everything looks properly...
big_ol_data = horzcat(year, month, day, var1, var2, var3, var4, var5)
|
| Jun10-11, 01:41 PM | #12 |
|
|
Thank you very much MATLABdude. Your code really did the trick.
Have a good weekend madtraveller |
| Jun10-11, 01:50 PM | #13 |
|
|
Forgot adding the R code. R could really does this very easily with read.fwf :)
Code:
data <- read.fwf("data.txt", widths=c(4,2,2,10,10,10,10,10) )
|
| New Reply |
| Thread Tools | |
Similar Threads for: Can't read strange mixed format text file in Matlab
|
||||
| Thread | Forum | Replies | ||
| Gaussian 09 .out file format specification (and freqchk) | Chemistry | 4 | ||
| read CSV format in C/C++ | Programming & Comp Sci | 7 | ||
| how can Capture Text form image file and save it as txt file or word file | Computers | 2 | ||
| Format text(ifstream, ofstream) | Programming & Comp Sci | 9 | ||
| Read the mixed up words | Brain Teasers | 16 | ||