Setting Up File.txt Position for Writing Function

In summary, the dibble-dabble algorithm converts a number into another number system by dividing it by a number. However, the whole input data must be in memory, which can be a problem for large files.
  • #1
Matt654
12
0
Hi,

is possible to set up somehow cursor on end file.txt position for write function ? I can use only read,close, open functions from /fcntl.h/,

+ anybody know how convert number system from one to another and to write result into the file in correct way... e.g. I need covert [123456]Z7 to [xxxx]Z5? I need to gain the Z5 in correct order to write into the file. (input number is big big big... couple gigaB)


Thanks for any helper :)
 
Technology news on Phys.org
  • #2
(I am using dibble-dobble method for coverting among number systems but for large data I would like to write partial results into the file and then continue with next part of data, but I have problem with writing the partial result... e.g. I have [188]Dec -> (Hex) so e.g. partialy I got at first "C" and then "B" (this is because of algorithm of converting...), but if I use write function so in file I will have CB but should by BC... )

How to figure out this...
 
  • #3
Welcome to PhysicsForums!

If you use fopen() to open a file stream, you should also be able to use fseek() to offset to a particular location (both are in the stdio library). From A Little C Primer and Wikipedia:
http://www.vectorsite.net/tscpp_3.html#m2
http://en.wikipedia.org/wiki/Fseek

Note that you can use the -a (append) option with fopen, so that you skip all the way to the end of the file and can start appending data.

I don't completely understand what you mean by converting numbers (I have no context for Z5 vs Z7), but for outputting hexadecimal (binary, octal, etc.) you can use formatted output along with fprintf():
http://en.wikibooks.org/wiki/C_Prog...put_functions:_the_printf_family_of_functions

If you're doing conversion using this dibble-dabble method, you just have to write yourself a loop to keep track of the values right to left, and then output the result right to left after exiting the loop. That or have a temporary character array wherein you store the values in the appropriate position.
 
  • #4
Thx for the answer and greeting in physicsforum.com :) ,

I am trying to use only low level functions (read, write, open, close)
You know I have on input (a file) couple gigaB of data in specified number base e.g. hexa (marked as Z16) and I need these data transform to another number system e.g. Z34.
I am trying save pc memory (and also no use any temporary files) so I want to read part of the input data then to convert them and again read next input data into buffer until end of input data.
I did find on the internet dibble-dabble algorithms where I need to have whole input data in memory (remember the input is couple GB). This method converts in the order partially input data (characters) into int representation so I have to use an queue-linked list of struc. And it takes lot of memory still.

Do you know better algorithm for converting, or any link how to increase int type ?

I think that in this case fprint feature of conversion doesn't help...
 
  • #5
Can you post an example of the input file, and the desired output?

Regardless of how big the file is, these are the simple methods--the point of having a file stream is precisely that you don't have to store everything in memory.

While I've never used the dibble-dabble method, it looks like you only need to store the single value that you're trying to convert in memory. Unless the whole file is a single value, in which case, you'll have to come up with something more clever. That or store everything backwards first in one file, and then use it to construct a 'correct' forwards file.
 
  • #6
example of an input data :
[1012222121310101]4=32

output should be : [13AJ78H]32

I am using dibble-dabble method only because the input data at first converts to dec value of self - what is in our case number 1185520913 and it takes only 4B instead of the input data where each number/char takes 1B(in our case 16B) . But also for big input data this dec value is also long and I need to malloc for this value(so I need to keep this whole dec value in memory becasue of next calculation :( ). Next issue is that I need to convert this dec value into ouput number system so the dibble-dabble method is divide this dec value etc. - and it also takes lot of memory ...

But I don't know about any better (more memory save) method(s).
If exist any better method or steps... , please send me it...

Or how to make a int value with "infinite" size and do same math operation with this ( divisions, adds,) in ANSI C
 
Last edited:
  • #7
While you may take a small amount of memory for each operation, you do not need to allocate enough memory for every operation. You can just reuse the same memory for every number you convert. YOU DO NOT NEED ANY 'INFINITE' SIZED VARIABLES, since you're only dealing with a single finite-sized input at a time. Deal with a single line of input at one time.

I suspect that English isn't your first language, as I'm having difficulty understanding your proposed solution.

It looks like you're already inputting the data as strings and then tokenizing to extract the actual number and the base. And then you're converting the string into an int that the computer can work with. (e.g. the string '12' in base 10 is converted to an int via 1*10^1 + 2*10^0) Is this the case?

If this is so, you cannot do the dibble dabble method on it--the computer does not store the numbers in decimal or hexadecimal representation. What you can do is to divide by powers of your new base, and keeping track of remainders (the modulo--e.g. 179%16 = 3):
179/16^2 = 0, Remainder 179
179/16^1 = 11, Remainder 3
3 / 16^0 = 3

You then need to pack the quotients into another string, for your output.

The method is described more thoroughly here:
http://www.mathsisfun.com/base-conversion-method.html

If you insist on using the dibble-dabble method, you can omit the string-to-integer conversion above (though you will need to convert characters to integers using something like the atoi function).

Again, the key is not to try to work with the entire file, but one line of the file at a time.
 
  • #8
Matt654 said:
Hi,

is possible to set up somehow cursor on end file.txt position for write function ? I can use only read,close, open functions from /fcntl.h/,

+ anybody know how convert number system from one to another and to write result into the file in correct way... e.g. I need covert [123456]Z7 to [xxxx]Z5? I need to gain the Z5 in correct order to write into the file. (input number is big big big... couple gigaB)


Thanks for any helper :)

If you're using something similar to the C standard library, use fseek and seek to the end of the file and then use ftell to get the size of your file.

As for converting numbers between different bases, you basically use a DIV/MOD loop that gets the remainder of numbers with respect to a base and then these numbers (working backwards) give you the number in a certain base.

Here's a good tutorial about how to convert any number to an arbitrary base:

http://www.purplemath.com/modules/numbbase.htm

I should also tell you that if the total number fits within your machine register then the algorithm will be fine. If it doesn't you'll have to use a big integer library and basically use the libraries divide and modulus functions in place of your standard / and % functions in your code.
 
  • #9
Thx for your patient MATLABdude, mainly for my no good eng.

You are right, the key is not to try to work with the entire file, but one line of the file at a time.
I would like to ask, that exist(is there) any convert method with which I can do that.
(I did find the dibble-dabble method which can do that, but has another limitation-it is true that with this method I can read the input partially and convert partialy, but I need to have summary of all these partially conversions - and it takes same couple Bytes of memory and moreover I need to do with this number /, % during converting to the specified number system. I would like to avoid to use this method bacause I need to develop this big big int and math operations with this big int so I am asking if somebody know another method for conversion between number systems with which I can directly convert partioanlly readed input to specified number system and so I will not need allocate big big int.)

Thx chiro for link etc. I did find GMP library - I will try this library...
Anyway, have you any link (for C) of source code for the big integer and couple math operations for that ? ( I would like to see how it works...)

Thx a lot...
 
  • #10
Both chiro and I have posted a good, workable algorithm. You seem to have a mix of more advanced concepts (e.g. malloc) along with holes in your working knowledge (e.g. working with arrays, and input/output)--I don't say that to insult you, but rather to let you know that you may need to go back and fill these in order to actually implement something that works. We may also have taken for granted that you can do certain things (e.g. the parsing, and single character reading).

If you know how big the largest number you're working with is (both in terms of the actual number itself, and how many characters it takes up in whatever base you're using) then you can pre-allocate these, via an array of unsigned integers, longs, chars, or whatever works best. Yes, you need to keep track of partial operations, but only for the number that you're converting. Again, arrays can take care of this.

Why don't you post what you already have, and perhaps we can give you some pointers?

EDIT: if you do, please put the
Code:
[ /CODE] (no space between the slash and the bracket in the closing tag) around your code--this greatly increases the readability.
 
  • #11
here is code:
Code:
.
.
int main(int argc, char* argv) {
  char *readBuffer;
  char *ch2;
  int i;
  int temp;
  long readedChars;
  int readedCharsRestSeq;	fd=open("mreadme.txt",O_RDONLY);
	if(fd==-1) { perror(NULL);  return 1; }        readBuffer=(char*)malloc(sizeOfBuffer*sizeof(char));
        if (readBuffer==0)
	{
		fprintf(stderr, "Memory error!");
                close(fd);
		return 0;
	}

.//some tests
.
while(readedChars>0){
             countRead++;
             readedChars = read(fd,readBuffer,sizeOfBuffer-1);
.
.
.
//end of this while is when I reach end of the input file.
//I have to do at first to read whole the input data(input stream) because information about 
//number systems(e.g. 2 and 34 ) are at the end of the file ( format of input data - []2=34)
//after this I have information about input data number system ( e.g. 2 /decimal ) and 
//requested output number system (e.g. 34)
//NOTE: sizeOfBuffer is set up on e.g. 10 value - it means that I can read 8chars at ones...
//I want to add same next logic for increasing this size with each next loop later...

//so next step is again start reading the file from start and to convert, but...

.
.
while(readedChars){
             countRead++;
             readedChars = read(fd,readBuffer,sizeOfBuffer-1);
             if ( readedChars == 0){
              break;
             }
             convertNumbers(readBuffer);
        }
.
.
.

// here is link for C source of dibble-dabble method : hxtp://www.uloz.to/8985919/numstr-zip
// so this method, at first converts readBuffer(s) to the dec number system. I was thinking 
//that I can use a linked list (of any struc?) for doing summ of partial results(the linked list
//because this dec value will need many Bytes-I am not to able to say, at any time how many 
//ints(longs) I will need for store the subtotals, so I will need, probably to increase this linked list
// at runtime. Then the dibble-dabble method is formating this dec value to the requested 
//number system by % and / this dec value...  probably for this I will need next linked list 
//because the result will be in oposite order, so I will need to write it in backward order,  
//it means, that conversion is started from lowest bits... check the C code of dibble-dabble...

So the question is : is there any more simply covnert method with which I can convert directly the partially readBuffer to requested number system? if no then the is linked list the best solution to keep summ of the subtotals ?

P.S.: I have setted up sizeOfBuffer to 10 values, but it is just for test... we can assume that the size will be about 255, but also the input file will be couple GB so the program will need to read this file as a stream of data throught the readBuffer anyway...I hope I wrote that with no many gram. mistakes... :)
 
Last edited:
  • #12
Matt654 said:
here is code:
Code:
.
.
int main(int argc, char* argv) {
  char *readBuffer;
  char *ch2;
  int i;
  int temp;
  long readedChars;
  int readedCharsRestSeq;


	fd=open("D:\\Downloads\\BCProj\\projectFinal\\mreadme.txt",O_WRONLY);
	if(fd==-1) { perror(NULL);  return 1; }


        readBuffer=(char*)malloc(sizeOfBuffer*sizeof(char));
        if (readBuffer==0)
	{
		fprintf(stderr, "Memory error!");
                close(fd);
		return 0;
	}

.//some tests
.
while(readedChars>0){
             countRead++;
             readedChars = read(fd,readBuffer,sizeOfBuffer-1);

where do you set the variable sizeOfBuffer? I'm assuming its dynamic since you are using a malloc statement. I'd recommend fixing this problem before you do any more code.
 
  • #13
I'd like to make a few suggestions.

Use fopen() and fscanf() or fgets() instead of open() and read().
Don't worry about performance overhead, since file I/O is way slower than memory I/O or calculations.

Read and write lines 1 at a time.

Don't use dynamically allocated buffers.

Use the type "unsigned long long" if your numbers are too big to fit into an "unsigned long".

The resulting program would become shorter, simpler and better understandable.
 
  • #14
- unfortunatelly I have to use only open,read, malloc, free functions...
- there is not lines... I mean there is a stream of data between [] - so I need to use the buffer with specified size
- unsigned long long - 64bit ( 20 digits ) I thought about that, but e.g. I have number system 34 ( it means letters, digits) and I want dec system so :
e.g. [ZZZZZZZZZZZZZZZZZ - couple GBs -]34=10

unsgned long long res;
res = 1; first Z : res = res*34+('Z' - 'A' +10)
res = 1*34+(90-64+10) ascii
res = 70

second Z:
res = 70*34+36
res = 2416

third Z:
res = 2416*34+36
res = 82180
etc...
for third 'Z' the subtotal is already on more then 2Bytes. so I assume that for couple GBs of 'Z's it takes couple GBs so I think that unsigned long long is not enought...

I don't know if is possible somehow splits process of the conversion because of less memory-consuming... :(
 
  • #15
Well, this changes things a fair bit. All of us were under the impression that you were converting millions of values with small numbers of digits, not a single value with millions of digits.

If this is the case, I'd go back to the Dibble-Dabble method, and consider long division. You only ever need to buffer one or two characters at a time, and as long as you write the quotients out to your output file, you don't have to buffer more than one or two values, either.

Is there a reason you can't use fopen? Is this an assignment / homework with these constraints?

EDIT: this might be a dumb question, but why is this base 34 instead of base 36?
0, 1, 2, 3, ..., 9, A, B, ..., Y, Z
 
Last edited:
  • #16
For starters I'd add the O_BINARY flag to the open() call.
If you're on Linux/Unix it won't matter, but in Windows it'll prevent an obscure and hard-to-track crash-bug.


I'd suggest to use an algorithm like the following.
Note I did not program it out or verify it. It's just a suggested approach.

Code:
uint64_t lo = 0, hi = 0;
const uint64_t boundary = UINT64_MAX / 256;

for each char in input buffer
{
    unsigned int digit = convert_digit(input character);
    lo = lo * input_base + digit;
    if (lo >= boundary)
    {
        hi += lo / boundary;
        lo %= boundary;
    }
}

// equivalently fill the output buffer
 
Last edited:
  • #17
Matt654 said:
- unfortunatelly I have to use only open,read, malloc, free functions...
- there is not lines... I mean there is a stream of data between [] - so I need to use the buffer with specified size
- unsigned long long - 64bit ( 20 digits ) I thought about that, but e.g. I have number system 34 ( it means letters, digits) and I want dec system so :
e.g. [ZZZZZZZZZZZZZZZZZ - couple GBs -]34=10

unsgned long long res;
res = 1;


first Z : res = res*34+('Z' - 'A' +10)
res = 1*34+(90-64+10) ascii
res = 70

second Z:
res = 70*34+36
res = 2416

third Z:
res = 2416*34+36
res = 82180
etc...
for third 'Z' the subtotal is already on more then 2Bytes. so I assume that for couple GBs of 'Z's it takes couple GBs so I think that unsigned long long is not enought...

I don't know if is possible somehow splits process of the conversion because of less memory-consuming... :(

I can't off the top of my head think of an algorithm that converts between bases that is stream-based. By stream-based I mean that there is some word size small enough that you can convert from (in your case) base 34 to 10.

There are some counterexamples for example when one base is a power of another (like converting base 2 to base 16 for example), but other than that I've never heard of it.

If my assumption above is wrong I'd be very happy if someone would point that out since I am very interested myself.

If you assume that there is no streaming mechanism that takes a word of your entire number and converts it block by block, you will have to first load in your number to memory using the kind of method you have described above, and then use the method I've outlined above to convert it to base 10 and hence produce a string.

Like I said unless you get an algorithm that is guaranteed to work on fixed word sizes, you will need to allocate whatever memory you need. Remember that the amount of memory is related to the logarithm of the highest number you need, so even really really high numbers (like those used in cryptography) should easily fit in a modern day PC's RAM.
 
  • #18
I am happy that my eng. was succesfully :) but I will work on that at any time :)

Yes it is anassignment(with these constraints)... it looked easy, so I took it - but...

about need to buffer one or two data - I need to buffer more data at time because then there will be much systems calling of the read function and it get slow whole process...

about example I wrote above...
The dibble-dabble method use dec number system like a between step number system from which the method can converts this dec value to any else(another) number system so for this reason I need to have whole the number(dec is easy convertable to another numbers system) in memory...
I don't know any else method which can converts directly from the input data(stream) n one number system to the requested number system (without any betwee step number system...)

EDIT: yes, you are true...
0...9 = 10 char
A...Z = 25 char
som max number system for 'Z' is 35 (number system)
( because we start from '0' char. so the max. char for 35number system is 'Z')
- 36 is too much - the last char for 36number system is according by ascii table the '[' char.

P.S.: I used the 'Z' in example because this char has highest ascii value - so I think that it will takes the most memory...
 
Last edited:
  • #19
Hi chiro,

yes you are right, in IT area the people are using only 2,4,8,16 and 10dec nuber system..., but my assignment is to make a program that is able to convert among number systems from 2 - 35 and an next issue is that we have an GBs input stream...

May be there is any super algorithm in an cryptography, which can elimitate mentioned memory-consimuing, but then I think I will need some directions what I should to look for in this book... because I have never work with crypt.
 
  • #20
chiro said:
where do you set the variable sizeOfBuffer? I'm assuming its dynamic since you are using a malloc statement. I'd recommend fixing this problem before you do any more code.

sizeOfBufer is long...
 
  • #21
I am going to try ILSe solution...
 
  • #22
Matt654 said:
Hi, thanks for your the example code... it works.
- I was thinking about output buffer in order from the MSB to LSB without allocate any big array(buffer) for the remainders... I would like to write into a file the result in order from left to the right...
(During any conversion the result is in opposite char order...)

I think that my power and skill about number systems are insufficient...
Do you have any thought how to do it, or in which direction should I continue...?
Thx a lot

I'd suggest to declare a local char buffer (of sufficient size) to contain the output buffer.
Fill it from right to left, and when you're done, write it to the output stream.
 
Last edited:
  • #23
Thank you everybody... mainly ILSe for his optimalisation code(good piece of the code) it has been moved me a little bit forward...

Now I am thinking about an optimised output buffer...
 

1. What is the purpose of the "Setting Up File.txt Position for Writing Function"?

The purpose of this function is to specify the position in a text file where data will be written. This allows for more control over the organization and formatting of the data being written.

2. How do I use the "Setting Up File.txt Position for Writing Function" in my code?

To use this function, you will need to include it in your code and provide the necessary parameters, such as the file name and the desired position in the file. The function will then be called when data needs to be written to the file.

3. What happens if I don't use this function to specify the position for writing?

If this function is not used, the data will be written to the end of the file by default. This may result in the data being disorganized or difficult to read.

4. Can I use this function with any type of file?

Yes, this function can be used with any type of text file, including .txt, .csv, and .xml files. It is especially useful for files that contain structured data that needs to be written in a specific format.

5. Are there any limitations to using the "Setting Up File.txt Position for Writing Function"?

One limitation to keep in mind is that this function only works with text files. It cannot be used for binary files or other types of non-text files. Additionally, some programming languages may have slightly different syntax for using this function, so it's important to consult the documentation for your specific language.

Similar threads

Replies
6
Views
960
  • Programming and Computer Science
2
Replies
54
Views
4K
Replies
1
Views
845
  • Programming and Computer Science
Replies
2
Views
9K
  • Quantum Physics
Replies
1
Views
626
  • Quantum Physics
Replies
2
Views
1K
  • Programming and Computer Science
Replies
2
Views
2K
  • Programming and Computer Science
Replies
5
Views
4K
  • Programming and Computer Science
Replies
1
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
10
Views
4K
Back
Top