Pointers, Ram, Hard drives and Databases

John Creighto · Aug 28, 2009

A long time ago, I was once told that too much reliance on databases in programing can create slow programs. Databases provide a nice way to store and organize information but they may not retrieve and write information quickly. Clearly ram is the fastest way to access meomry but the size of ram is limited and their is overhead associated with transferring information between ram and the hard drive.

One way to store information is to serialize it. For instance you can serialize objects. You often have the option to select between several formats and I presume that one format represents the way the information is stored in Ram. So my first question is with regards to pointers. For programing languages that can use pointers, does the pointer care if the pointer points to ram or non volatile memory or in order to treat non volatile memory as ram is it necessary to deal with virtual memory? This could be handy if you have a large object that would take considerable time to load (unserialize) into memory.

I'm thinking if you have a data source, that is a binary file and you want to access that information though several sources (process, threads, separate programing languages)...a client server model might be good (not I'm not sure if this would be slow though), the server decides to keep the object on the hard disk or load into ram depending on the size of the object, the amount of free memory available and the demand for that object.

I've scene some bridge programs between languages that use client sever models (com connect for instance) and they seem to pass strings around to tell the object what method to use. This strikes me as not the most efficient way to do things. I'm wondering about dynamic link libraries, are these accessible from multiple sources? I need to do more research, but I'd be interested in any comments people might have for managing large amounts of data between programs.

Perhaps a good idea is to use a model like is use in com connect for programing languages that don't support pointers but give the option to get a pointer for programing languages that do. So for instance, on the java side of com connect, you can use the client server model but on the com side (visual basic, C, etc...) you can get a pointer and treat it as a native object.

chroot · Aug 28, 2009

John Creighto said:

Databases provide a nice way to store and organize information but they may not retrieve and write information quickly.

To a user, a database is just an interface for storing and retrieving data. That data could be stored on a disk, or it could be stored in RAM.

For programing languages that can use pointers, does the pointer care if the pointer points to ram or non volatile memory or in order to treat non volatile memory as ram is it necessary to deal with virtual memory?

Pointers are (literally!) just addresses. They're just numbers. What the addresses actually mean is arbitrary. Typically, the addresses are in some very large virtual memory space. Pages of memory that are not often used will eventually be sent to the disk, while pages that are used frequently will remain in RAM. The program has no direct knowledge of where exactly each page of memory currently exists.

I'm thinking if you have a data source, that is a binary file and you want to access that information though several sources (process, threads, separate programing languages)...a client server model might be good (not I'm not sure if this would be slow though), the server decides to keep the object on the hard disk or load into ram depending on the size of the object, the amount of free memory available and the demand for that object.

All modern operating systems include a facility called "memory mapping," which maps a range of addresses in the program's virtual address space to a file. If you read from those addresses, you'll get data from the file. It is up to the operating system to determine whether to load the data into RAM all at once, or to read it from the disk in chunks as necessary.

I need to do more research, but I'd be interested in any comments people might have for managing large amounts of data between programs.

If you're trying to share large amounts of memory between two programs running on the same computer, you should note that all modern operating systems provide mechanisms for shared memory. These shared memory segments can be mapped into the virtual address space of multiple programs simultaneously. Two or more programs can read or write to the shared memory exactly as if it were normal, private memory. (But you should include some thread-safety mechanisms, like mutexes, to make sure your programs won't step on each other's toes.)

If you're trying to share large amounts of memory between programs running on separate computers, use MPI or some other multi-processing library.

- Warren

harborsparrow · Sep 7, 2009

Since you menion COM, I assume that by client-server communication, you mean the ability to call a subroutine across a network. There are 2 basic approaches to this--binary serialization and text serialization.

COM (Microsoft) and CORBA (UNIX/Linux) are binary serialization technologies. Each is also operating-system specific, i.e., both client and server must have exactly compatible OS and compilers.

So-called "web services" is an OS-independent way of calling across a network, where the serialization of the call and return information is text. This approach has worse performance but it can be platform and version independent, which can sometimes be very useful.

Modern database such as SQL Server, MySQL and Oracle are very efficient at caching information in memory and moving it efficiently across a network, though you do have to be careful about what kind of pre-processing you ask the database to do (i.e., what kind of query you send it).

Hope this helps.

Pointers, Ram, Hard drives and Databases

Similar threads

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect