Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

What on earth is DATA anyways?

  1. Aug 6, 2006 #1
    Hi all

    As I was burning some files onto a DVD, I asked myself.. what exactly is "IT" that I am burning onto this thin disc. And then I kinda got freaked out.. hehe. Cause what the heck is data anyways?

    It's this... thing that you can't see but when put into the right hardware, can play music, videos and everything in between. I just can't get past how people can sit down and say, "Let's invent a smaller storage device". I mean, how do these people start working? How do you store something that doesn't exsist? Store it on what? It's easy to understand storing cookies in a jar but data?

    It's amazing how this data got "invented" in the first place.

    It'll be great if anyone can enlighten me. Cause this is really bugging me. And I access loads of data everyday.
  2. jcsd
  3. Aug 6, 2006 #2
    Data is used in a wide range of fields.. But as for the actual data on your computer..

    Data is stored as bits, and a bit is a digit in the binary numeral system..
    A bit can refer to either 1 or 0, while in contrast analog can have other numbers.
    A bit is the smallest entity used in computing.

    For example to store one ascii character in bits, you need 7 bits, almost a byte.
    With 7 1s and 0s, the characters get codes that represent the ascii character.
    The actual apperance of the ascii character is not decided by those 7 bits though, that's up to other things to take care of.

    Larger documents like an image file, say a 200kilobyte jpeg, contains then 200 000 1s and 0s, and remember 8 bits = 1 byte.

    The CPU in your computer, the processor, uses machine code to actually process these 1s and 0s, which are then represented as pixels on your monitor.
    The data itself is simply bits that can be stored magnetically on an hard drive, or digitally on a DVD.

    I'm still a newbie myself but I *think* I got it right.
  4. Aug 6, 2006 #3
    Google is your friend.
  5. Aug 6, 2006 #4
    What a non-constructive post, away with you! :devil:
  6. Aug 6, 2006 #5
    Contrary to popular belif, the fastest way any individual can get information or data is to go after it himself. Using Google is way faster than waiting for a response from a forum as it is not real-time. :cool:
  7. Aug 6, 2006 #6
    What makes you think he wanted it as fast as possible? Maybe he just wanted to discuss the topic...Have you tried having a discussion with google?

    Through discussion we become observers of our own thinking.
    Last edited: Aug 6, 2006
  8. Aug 6, 2006 #7


    User Avatar

    Staff: Mentor

    Heh, google is still our friend. :biggrin:
  9. Aug 6, 2006 #8


    User Avatar
    Science Advisor

    Data is an abstraction for the states that a physical object (like a CD) or medium (like electric impulses over a wire) can be in. It has the characteristic that things can be meaningfully computed or deduced from the data.

    A painting contains visual data. The state of the painting is the arrangement of colors on the painting. Humans can read this data by looking at the painting and determining that the painting indicates a blue sky, some people threshing wheat, fine, short brushstrokes, etc. Let's say we wanted to store the same data in less space. Then we might put the painting on microfilm, so that now it takes up much less space. But it still contains the same data, since and can be read with a microfilm reader and someone can deduce all the things about the painting that he could have deduced from the full-size painting.

    A computer mostly reads, as octelgocopod noted, bits instead of colors. A bit is the state of a small region in time and space, where this region can only be in one of two states. If you are a museum guard and all you care about is the presence of absence of a painting (absence might mean it has been stolen) then each place for a painting gives you one bit of information: it tells you which of two states the place for the painting is in, that of containing a painting or that of not containing a painting. If there are fifty spots for paintings in a particular room, then for you the museum guard, the room contains fifty bits.
    Now let's say that these are very small paintings, basically thumbnails. Then the fifty bits are stored in a much smaller space than fifty large paintings. The fifty bits have been compressed. The "storage device" for them has been made smaller: instead of being stored in fifty large painting spots, maybe fifty square meters altogether, the bits are stored in fifty small painting spots, maybe one square meter altogether.

    The data on a DVD is not presence or absence of a painting, but presence or absence of microscopic pits just under the coating of the DVD. Each place where there could be a pit is thus a bit, since it can take one of two states. The DVD drive looks at the pits by focusing a laser onto them and determining from the reflection whether or not there is a pit there, thus reading the bit. This data is converted into a high or low voltage, which is just another way of encoding the bit, and from there all the circuitry inside the computer starts computing with it. It might interpret the bit as part of the color of one pixel of an image. Of course, the way to encode more data on the DVD is to make the pits smaller so you can pack more of them into a smaller area. Smaller pits might also be harder to read, so you might need a different kind of laser to go with the smaller pits.

    The description of anything in the world is data.
    Last edited: Aug 6, 2006
  10. Aug 6, 2006 #9


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    "Data" is simply ones and zeros. The ones and zeros can represent english text (every letter is one of the 256 different values a byte can have), the color of pixels on a screen (i.e. image data), or sound pressure at successive points in time (i.e. sound data), etc.

    - Warren
  11. Aug 6, 2006 #10


    User Avatar
    Science Advisor

    It's an interesting topic. Data can be represented in any format you want, but binary (0's and 1's) is the simplest form. Binary data is also the easiest to store, because then for storing a digit, all you need is a physical object that has two states. If we were to represent data in decimal notation, for example, then for storing each digit you'd need something with 10 states.

    For example, floppy disks are covered with a coating of Iron Oxide (if i remember correctly), and sectioned into concentric circles which are partitioned into sectors. For storing a bit in a region of the floppy disk you use magnetism. For example, the orientation of the magnetic field will tell you whether it is a 1 or a 0 that is stored in that region. Say, north is a 1 and south is a 0. Hence, that region of the disk has two possible states, which you can use to store a single bit. You could have it store more than just either a 0 or 1. Say North is 0, South is 1, West is 2 and East is 3. The disadvantage of this is that now you are more prone to errors because the difference between the states is not as clearcut, and so the chances of the hardware reading the data incorrectly are higher.

    A similar idea is used to send data across a network link or a circuit. Circuits, like the ones in your machine's processor, use transistors. Then, for example, 0 volts represents a 0 and 5 volts represents a 1 (i'm making up the voltage, i don't remember the actual values). Because there is always the chance of some "noise" it's better to have, for example, 0-1 volts is a 0, and 4-5 volts is a 1.

    This data then flows through the circuit in this manner. We use "logical gates" to perform computation. For example, a simple gate, the AND gate, has two inputs, each which will be either a 0 or a 1, and single output which is defined by:
    - if both inputs are 1, then output a 1
    - otherwise output a 0.

    The output of this gate then flows across the circuit to other gates, etc, and this is how computation is performed. Some other basic gates are the OR and the NOT gates, and these are enough to perform any possible computation, as complex as you want.

    Also, anything at all can be represented by a large enough string of bits. Any image, word document, etc, can be encoded into binary, and this data is real and is in there somewhere.

    I'm probably repeating what other people have said, but i like talking about it.
  12. Aug 6, 2006 #11
    Thanks everyone.

    Orthodontist and Jobs' posts were very comprehensive, though I had to read them twice before I started to get it. I like the analogy of the paintings.

    I understand the 1s and 0s and the "pits" being the places where the data is stored. So would I be correct to assume that what is being stored, data, is magentism? (I know that's a grammatical error but how else should I put it?)

    What's going into these pits? The 1s and 0s are displayed as "1" and "0" on my monitor, but surely a microscopic "1" and "0" isn't going into those pits. That I am still a little confused about.

    But anyhow, everybody has been of great help. Thanks again.
  13. Aug 6, 2006 #12


    User Avatar

    Staff: Mentor

    Huh? No. The pits in a cd are literally pits - indentations in the cd. The laser in the cd drive literally looks at the disk to see the pattern of indentations.

    In magnetic storage, the data is stored as magnetized areas on the disk. The read/write heads detect which polarization (north or south) each bit is.
    No, the 1s and 0s are not displayed as 1s and 0s on your monitor. This was explained in the above posts. Just like by stringing normal numbers or letters together creates words and sentences, stringing 1s and 0s together creates letters, numbers, words, and sentences - the only difference is that with only two letters instead of 36 (plus symbols), the words need to be a lot longer to carry information. Ie, the letter "a" is represented by the binary word "01100001" in the ASCII standard.

    The most basic way graphics data is stored is breathtakingly simple. An image is simply a collection of colored dots called pixels. Each pixel on your monitor has a color assigned to it using a binary "word" usually 32 digits long. The image file is just a list of the pixel colors.
    Last edited: Aug 6, 2006
  14. Aug 6, 2006 #13


    User Avatar
    Science Advisor

    More accurately, magnetic storage is a medium for data. Data is an abstraction. It's hard to say right out what it is an abstraction for, but you can get the idea through example.

    By the way, the paintings is not an analogy--the colors on a painting ARE data, and so is the presence or absence of a painting that the security guard cares about. Data is more than just what is on a computer.

    Let's say you want to track your running performance over time. Every day you run the same route, and you time how long it took you to do so in minutes. After a week of this you might have a list like 20, 21, 19, 20, 25, 19, 18. These numbers are data. You might also represent your times as a line graph to see your performance over time at a glance. This graph displays the same data that the list of numbers does. The data is independent of its representation: whether you store it as some numbers written on a page, or as a hand drawn line graph, or maybe as a file in a computer, it's still the same data.

    1's and 0's are just a human way of talking about computer data storage. The actual computer stores data in many ways--pits on a CD, polarizations of a magnetic disk such as your hard drive, the state of printed circuits called flip-flops that make up much of your computer's memory. 1's and 0's are a way of talking about what all these things do. For example let's say that the pits and lands on the CD are in an order like Pit, Land, Land, Pit, Land, Pit, Land. The way the CD works, each time there is a switch from pit to land or from land to pit, the drive produces a high voltage. And each time there is not that kind of a switch, the drive produces a low voltage. So the drive reads
    Pit, Land, Land, Pit, Land, Pit, Land
    into High voltage, Low voltage, High voltage, High voltage, High voltage, High voltage
    Which it's convenient to abbreviate like
    1's and 0's are just a conventional way to talk about that instead of saying "high" or "low" voltage. So you would say that the drive produces
    when it reads Pit, Land, Land, Pit, Land, Pit, Land
    The numbers never actually appear anywhere in the computer--1's and 0's are just a convenient way for humans to talk about data, in this case voltages.
    Last edited: Aug 6, 2006
  15. Aug 6, 2006 #14
    It seems there's two things to discuss here..
    1. What data IS and
    2. How that data is stored and processed by the computer technically

    1. Data can be an executed idea, information, or simply the unprocessed and raw information.
    Data is simply the state of something.
    Data can be an abstract idea, the state of a physical object or objects etc.
    Note that as 0rth said, the data itself doesn't exist anywhere but as potential information.
    It's when you put it on a DVD, or a hard drive, that the data manifests itself, but the idea of the data remains the same, that is, the data can be said to be a logical abstraction stemming from physical things, that are also logical. Data is simply the unprocessed and raw state of these objects. But data doesn't even need to be abstract, it can simply be.. anything..
    You see?

    As for #2, how data manifests itself physically in a computer has been said above but I will reiterate for completion of this post.
    Simply put, the data you download and read on your screen, is simply a logical state of the workings of your computer.
    The data itself doesn't mean anything without a human to read it or use it, that is to say, this paragraph that you are reading consists of pixels on your screen, a .php file downloaded into your browser cache, your eyes, your brain, the computer processor, the ram in your computer, the html code, and finally the 1s and 0s that make up this .php file, which then again results from hardware and voltages, like 0rth explained.

    These things above all make up so that you can actually read the data on your monitor, the data itself is simply the information, the content of this post that I am writing. So as you can see, in this case, I am typing the data, abstract data, and the computer with its physical complexity transforms it, via protocols and standards, into information that we can share.

    If you want to learn how computers work, and how exactly your cpu works, your circuits and your ram, then that's too much for us to type up here, so I suggest you buy some books or read wikipedia to get a fuller understanding.

    If I typed anything wrong in this post someone correct me.. :P
    Last edited: Aug 6, 2006
  16. Aug 7, 2006 #15


    User Avatar
    Homework Helper

    The original form of data was probably paintings or sculpltures. Eventually this turned into text written onto scrolls and later books. Humans can store data as text, with the basic unit being a letter, and the number of letters dependent on the language. Some Asian language use symbols for words. These are non-binary based methods for storing data. To "comprese" the data, you use smaller text.

    Another type of data is analog data. In the case of vinyl records, the sine waves of the music are etched onto a record. To get stereo, for one channel the width of the track is varied, causing the needle to move up and down, and to get the other channel, the track is moved side to side.

    Music oriented magnetic media works in the same way, the strength of the field determines the amplitude. AM radio works in a similar way, carrier wave strength is modulated so it's amplitude represents the sine wave of the sound being broadcast. FM radio shifts the frequency of the carrier to represent amplitude of a sine wave.

    Most video tapes record precisely timed analog information, the classic formats include VHS and Beta. Laser-disc also use an analog video signal.

    Another example of analog data was that used in analog computers, usually a voltage in the range from -100 to +100 volts. It wasn't highly accurate, as conversion to digital only resulted in 0.1 volt accuracy. Analog computers were more oriented to "solving" differential equations, instead of the typical type of programs seen in digital computers. Numerical integration with better speed and accuracy has long since replaced analog computers. There may be some robotic movements based on the old mechanical analog computers, but I'm not aware of any other current uses.

    Digital computers, being composed of circuitry that has two states, on/off, or plus/minus, use binary to store most of their data, as already mentioned. This includes magnetic media (tapes, disks, link tape, drums, core memories, ...), optical media (cd-roms, dvd-roms).

    Now there are also digtial video tapes, like digital 8, and mini digital video and hi-def digital video.

    Not all digital data is stored as binary though. Optical character readers read human written text. UPC codes use a pattern of 4 alternating bars with a range of widths from 1 to 3 units to represent numbers 1 thorugh 10. There was a format where a pattern was printed on plain paper in books or magazines, that could be read with a scanner, but this didn't last long and was replaced by cd-roms which held more data. I don't think the pattern was a binary based pattern, but one that used patterns of variying width squares to represent data, similar to UPC code.
    Last edited: Aug 7, 2006
  17. Aug 7, 2006 #16
    YES! That was what I was trying to ask! Sorry if I wasn't able to elucidate but the Otrhodontist got my idea. If the pit was a canvas, then what is IN or ON the canvas? On a painting, it's paint - and that's what makes the painting - but on a DVD? So it's a series of pits AND land - not just the pit. That's what the 1s and 0s actually ARE. I remember Octelcogopod and Job mentioning it but it didn't hit me until I read this post.

    That was the main question. Or am I still completely off in some way? Ha! Anyhow, I think I get it a little bit and that's good enough for me.

    Thanks a lot. I'm going to go off and lecture my friends. :smile:
  18. Aug 9, 2006 #17


    User Avatar
    Science Advisor
    Homework Helper

    My favorite course in University so far has been assembly and computer architecture. We actually learned how, given the right parts, to build a very, very simple computer.

    It's important to remember that then 1's and 0's don't mean anything to the computer. It's just making this wire go to a high voltage because those two wires are high voltage, and making this other wire go to a low voltage because some other wires are some voltage. It's just 'moving 1s and 0s around'.

    We're the ones who give the computations meaning. We're the ones who make certain bits correspond to colors on a monitor or sounds from the speaker. We're the ones who input bits using the keyboard. It's very important to understand that the computer doesn't understand. It's just a machine.
  19. Aug 9, 2006 #18


    User Avatar
    Science Advisor

    I agree with that. If we wanted we could build a CPU with water flow, rather than electron flow, using tubes and plumbing tools. From this perspective a city's water network "might" be a complex and powerful CPU, but we use it for water transportation, rather than computation.
  20. Aug 12, 2006 #19
    It would be much more accurate to say "bytes" when referring to colors, sounds or typed characters from the keyboard.

    1 byte = 8 bits
    characters take up only a single byte.
    most forms of data that store color (obviously not black and white), take up 4 bytes -- the standard integer type.
    In fact, the word "bits" should never be used when referring to data. Only in very low machine code and some very low level languages can you actaully manipulate data by the bit.
  21. Aug 12, 2006 #20


    User Avatar
    Science Advisor

    That isn't true--even in Java or C++ you can manipulate bit arrays, whose members are stored as individual bits within a byte.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook