What is Data: Definition and 998 Discussions

Data are units of information, often numeric, that are collected through observation. In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects, while a datum (singular of data) is a single value of a single variable.Although the terms "data" and "information" are often used interchangeably, these terms have distinct meanings. In some popular publications, data are sometimes said to be transformed into information when they are viewed in context or in post-analysis. However, in academic treatments of the subject data are simply units of information. Data are used in scientific research, businesses management (e.g., sales data, revenue, profits, stock price), finance, governance (e.g., crime rates, unemployment rates, literacy rates), and in virtually every other form of human organizational activity (e.g., censuses of the number of homeless people by non-profit organizations).
Data are measured, collected and reported, and analyzed, and from data visualizations such as graphs, tables or images are produced. Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing. Raw data ("unprocessed data") is a collection of numbers or characters before it has been "cleaned" and corrected by researchers. Raw data needs to be corrected to remove outliers or obvious instrument or data entry errors (e.g., a thermometer reading from an outdoor Arctic location recording a tropical temperature). Data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next stage. Field data is raw data that is collected in an uncontrolled "in situ" environment. Experimental data is data that is generated within the context of a scientific investigation by observation and recording.
Data has been described as the new oil of the digital economy.

View More On Wikipedia.org
  1. Arman777

    Why are all values NaN after mapping 'player_name' column in Pandas Data Frame?

    I have two data frames df1 and df2 df1 has two columns 'player_name' and 'player_id'. Similarly df2 has 'player_id' column. From this configuration I want to pass 'player_name' column to df2 by using 'player_id'. For this reason I have tried something like this, df2['player_name'] =...
  2. person123

    I Fitting Data to Grafted Distribution

    I have a set of data (representing the strength distribution of samples), and I would like to fit a normal-Weibull grafted distribution. To the left of a specified graft point, the distribution is Weibull, and to the right it's normal. At the graft point, the value and the first derivative are...
  3. N

    Write a Linear Model For this Data

    The average weight of a male child’s brain is 970 grams at age 1 and 1270 grams at age 3. (Source: American Neurological Association) (a) Assuming that the relationship between brain weight y and age t is linear, write a linear model for the data. (b) What is the slope and what does it tell...
  4. K

    I Extract error from simulated data

    Hello! Say I have some measurements ##y_i = f(x_i|a_1,a_2,...,a_n)## for different values of ##x_i##. Here ##a_i##'s are the parameters of the function I want to fit for. For example for a linear function I would just have ##y_i=ax_i+b=f(x_i|a,b)##. I want to see how the errors on the ##y_i##'s...
  5. chwala

    Discrete data vs continous data in statistics

    I would like to seek your take on the two terms; discrete and continuous in this context, In my understanding, when we look at height of individuals (in cms), this measure in general or in definition implies continuous data. If we are to look at specific math problem that involves height of say...
  6. F

    MATLAB Can I calculate the covariance matrix of a large set of data?

    Hello everyone. I want to calculate the covariance matrix of a stochastic process using Matlab as cov(listOfUVValues) being the dimensions of listOfUVValues 211302*50. I get the following error: Requested 211302x211302 (332.7GB) array exceeds maximum array size preference. Creation of...
  7. W

    Transferring Data from Old PC to New PC (Both Win 10)

    Hi all, So my old Win 10PC is slowing down and I am buying a new one. I want to transfer data from the old to the new without having to remove the HD. Wil just connecting the two PCs with a SATA cable enough to do the job? Is there a simpler, faster way?
  8. tworitdash

    Rotating radar time domain data

    I want to simulate the time domain data for a rotating radar. I assume that the space around the radar is filled up with a very big extended object and it moves with a constant speed in one direction. Picture attached.I don't take range information here. I am only concerned about the velocity as...
  9. K

    I Extracting data the right way

    Hello! I have a relationship of the form ##y_i=ax_i##. In my case ##y_i## is a frequency and ##x_i## is a mass. For each mass, ##x_i## I measure ##y_i## (and I get a central value and an error). The difference between ##x_i##'s is about 1 (in some arbitrary units). Is it possible to extract the...
  10. B

    From weatherstation data to solar irradiation on specific surface

    So, at home I have a weatherstation which measures solar irradiation in the east, south and west in lux. I can read in these measured sensordata. From these measurements, I would like to calculate the solar irradiance on my windows. The solar energy formulas could help me with that. The work of...
  11. Orenshved

    What's needed to use quantum entanglement for FTL data transfer?

    Hey all, I need help with the book I'm currently writing. What would it take (even theoretically) to use quantum entanglement for FTL data transfer? From what I understand, the state of entangled particles can not be changed without breaking the entanglement. Do you think this would ever be...
  12. M

    A Globular clusters in SDSS data

    I am searching for Globular Clusters in SDSS data. From what I have learned after sifting through documentation and the data structures, there is no identifier by which I could directly search for GCs. Nearby clusters are catalogued as individual stars with no apparent link to parent cluster...
  13. C

    Is there an online data resource for electrolyte conductivities?

    Is there an online data resource for electrolyte conductivities showing molar strengths and temperatures? thanks.
  14. M

    MATLAB Averaging two data with different domains

    Hi PF! Suppose I have three pieces of data: x1 = 0:3:12 and x2 = 0:4:16 and x3 = 0:5:20 with corresponding functions y1 = x1.^2 and y2 = x2.^3 and y3 = x2.^4. How would you average these the "functions" with data (x1,y1) and (x2,y2) and (x3,y3)? My thoughts are: 1) linearly interpolate y1, y2...
  15. anorlunda

    Impressive Video Data Compression

    https://arxiv.org/pdf/2011.15126.pdf https://nvlabs.github.io/face-vid2vid/ https://wandb.ai/ayush-thakur/face-vid2vid/reports/Overview-of-One-Shot-Free-View-Neural-Talking-Head-Synthesis-for-Video-Conferencing--Vmlldzo1MzU4ODc One thing in this modern world seems to be ubiquitous; the demand...
  16. S

    Prob/Stats Any good math-theory-focused books on neural networks and data science?

    Hi. I'm looking for books on data science, preferably leaning towards neural networks, that focus on mathematical rigor. For example, theorems on optimization, minimum number of layers to accomplish a task efficiently, etc. Most books I've seen seem to hand wave this stuff. Anyone know any juicy...
  17. TimeSkip

    Steganographic Data File Searches

    <moved to General Discussion, posts that ask for thoughts are not hard science> Summary:: Soon? I've been thinking whether in the present time if not near future, or even already something common nowadays amongst intelligence agencies would be the use of steganographic data file searches...
  18. G

    Looking for large dataset of non image-centric physics data

    How did you find PF?: Google search Hi all, I'm looking for a good example of a large dataset of non image-centric physics data (e.g. astronomy, particles, ...) so I can add an example to this section of my documentation (formal announcement for the Gallia library: see Scala users forum). I...
  19. D

    Java Read a csv file and process the data

    public static Region fromFile(String name, String file) { Region r = new Region(name); List<String> lines = readData(file); String[] headers = lines.remove(0).split(";"); Map<String, Integer> h2i = new HashMap<>(); for (int i=0; i<headers.length...
  20. Arman777

    I Question about data analysis and error

    Let us suppose we have one constant variable ##b \pm \delta b = 20 \pm 1 ## and one function that depends on ##x ##, such as, ##a(x) \pm \delta a ## The problem is I want the difference between ##a(x) ## and ##b ## to be ##0 ##. Let me denote this difference as ##c \pm \delta c ##. To...
  21. Arman777

    A What are the parameters that are the same in CMB data for multiple models?

    In cosmology, CMB tells a lot about which cosmological model can be acceptable or not. For instance we know that, whatever the cosmological model we use ##\theta_*## parameter will be always the same. Is there any other parameters that is listed in this picture is **model-independent** ?(i.e...
  22. L

    Determining the uncertainity in Geiger Counter data

    From what I understand thus far is the counting involves Poisson therefore the uncertainty is just the square root of the counts, correct? But when I take the square root of the counts it produces a very small number compared to the count which makes it insignificant therefore the error bars...
  23. R

    Admissions No Data Available on my "check status" page for SULI

    I recently checked my SULI application status just out of curiosity today and I found that the page that leads to the login was down (https://science.osti.gov/wdts/suli). After a google search I found the login page and logged into my account but upon checking my status, it said that there was...
  24. Arman777

    Python How to Write Python Output into an Excel CSV File?

    I have a code like from random import randint from string import ascii_lowercase file = open("data.txt", "w") numbers = [] for j in range(26): # generating a list of random number random_num = [randint(0, 7) for i in range(120)] numbers.append(random_num) # matching these numbers...
  25. ClimberT8

    Physicist and Data Scientist in Ed Tech

    How did you find PF?: Searching for "high school and college physics teachers discussion forums" I am a "kiddo" in Physics, like Dr. Jill Biden ;-) and following a highly anti-recommended circuitous path have ended up doing data seance. That should tell you everything you need to know about my...
  26. J

    A How could I recognize patterns in streaming data?

    I have an electronic device that sends me a signal but there is also a lot of noise. My question is, in general, How could I identify that pattern in the data I am receiving? So I would like to read articles or books talking about this topic. First thing I thought was using Fourier but I was...
  27. M

    I Regression Prediction with Time Series Data

    Hi, I am not sure what the correct forum is for this question. Question: When do we need to remove seasonality from time series data to do a regression analysis? Context: I am planning to conduct a prediction analysis where I want to find out how a device performs. I hope to estimate a...
  28. F

    Variable, Data Type and Array Data Structures

    Hello, In Python and other programming languages, data can be of different types: integer, float, string, Boolean. On the other hand, Data structures are containers of data items which can (or not) have the same data type. A variable, when created, has: a) name (a piece of data stored in...
  29. Wizard

    Recommendations: a Book/Portfolio of data visualisations

    Hello all, I was just thinking how much I would love a book, or even better a portfolio, on different types of data visualisations. I'm sure there are a few such books out there, but I'm looking for something for technically minded people, which organises its examples by the type of data being...
  30. person123

    "Expected Result" Bias in Research

    Hi. I'm an undergraduate student and I've been doing research in Civil Engineering for a couple years now. One thing which I've thought about repeatedly is a bias toward the desired result. Of course sometimes people might be biased for some ulterior motive or sloppy work, but the bias I'm...
  31. U

    Looking for physics based simulation for data generation

    Good day, I am working on physics based neural networks and I am investigating the role of integrating physics based loss function on the network's training. So I am looking for physics based simulation for data generation (online tools) to generate data from experiments. I know I may...
  32. I

    Turboprop engine propeller efficiency data

    Hello, I'm looking for data on turboprop aircraft engine propeller efficiencies. I'm hoping for a table of available modern engines with a propeller efficiencies for comparison. I've tried Googling but not much luck. I can't even seem to find propeller efficiencies on the individual engine...
  33. lomidrevo

    What is a Data Lake? Understanding the Buzzword

    I think the basic idea is quite clear, as for example defined by wikipedia: But when I google more about this "technology", I am getting quite various ideas about what is considered as data lake. Some of them: just a synonym to ETL approach to data processing a distributed file system, like...
  34. F

    I Understanding 2 equivalent formulations of both data set measures

    I have two independant experiments have measured ##\tau_{1},\sigma_{1}## and ##\tau_{2},\sigma_{2}## with ##\sigma_{i}## representing errors on measures. From these two measures, assuming errors are gaussian, we want to get the estimation of Ï and its error (i.e with a combination of two...
  35. omega_minus

    I Best way to quantify how well a model fits data

    Hello, I have recorded time-domain data from an experiment. I am using a finite difference time domain algorithm to simulate the experiment. The model has been tuned to resemble the data fairly well but I'd like to be able to quantify the fit. Most of what I see online seems to be about...
  36. S

    Models for weather that include data from amateur stations?

    I have the impression (I've not actually researched the matter) that some online weather services publish data from weather stations set up by amateurs. Do any organizations incorporate such data in their weather prediction models? I'd think that using such data would involve a sophisticated...
  37. chwala

    Probability distribution for discrete data

    this is a textbook problem shared on a whattsap group by a colleague... i have no problem in finding the value of ##k=0.08##, i have a problem with part (ii) of the problem. I have attached the solution here; how did they arrive at the probability distribution of ##y##? attached below is...
  38. G

    A Can Data Travel Faster Than Light Computation?

    Hi bear with me I have a conundrum I want to ask you. If data traveled many times the speed of light could the results of decrypted cypher message be computed quicker than any system we currently have? For instance if we sent a burst of data at many times the speed of light across the solar...
  39. I

    Molecular Bio/Genetics YouTube playlist needed for Genomic Data Scienc

    I am from physics background and I need some background in Genetics to learn Genomic Data Science. So, can people here suggest some youtube playlist ? I have not studied biology for a long time , so no idea about biology. Suggestions welcome Thanks
  40. PhysicsTest

    Understanding Motor Data Sheets

    I am trying to understand the Hurst motor data sheet, but i am not clear on the table provided, please help me to understand the data provided System Input: Volts is all in the range of 24V, so it is 24V BLDC motor, i understand. Amps: It is the current drawn from the DC source i understand...
  41. Eclair_de_XII

    Python How do data scientists resolve name discrepancies like this?

    Suppose you have a .csv file that resembles something like this: Name,Profession, ... Mike Jones, Driver, ... And now suppose you have many .csv files pertaining to information about people who drive for a living. Profession,Qualified to Operate Motor Vehicle,... [variation of...
  42. quantumCircuit

    Lotka Volterra estimate parameters from experimental data

    Namely, in the system, I have obtained the value of parameters L, M, A and D, because I treat the other organism as equal to zero, i.e., it doesn't exist, but I am struggling about the values of B and C, that are coupled with the product of x and y. Can anyone help me how to obtain those values...
  43. M

    Using Tablets & Phones instead of a PC for data storage

    My view of tablets and phones is that they are "companion" devices and are not a suitable replacement of a computer. They are there to help you perform certain tasks which you would previously have had to use a computer for and are generally very good at it but are still limited. They are there...
  44. BWV

    Testing Scientific Theories Against Data

    [Moderator's Note: Thread spun off from previous thread in General Discussion since it is more specifically about particular scientific theories and how to test them against data.] So in your opinion, everything in the list below is quantitative, not qualitative...
  45. I

    Learning Data Structures in Java: Circular Queue

    Summary:: I have recently started with data structures in java and I tried doing this Program .But I have few confusions. 1.How do I write the main() of the program? 2.What are we supposed to do with the value returned by function pop()? If anyone could point out if there are any errors and...
  46. A

    Tablet charging problem and data recovery

    My tablet is IKALL N9 .Recently,I started facing problem connecting it to the charger so I would adjust the charger a bit until it would show charging .But a few days back it stopped charging altogether.it won't show the charging sign neither with AC source nor the power bank.I even changed the...
  47. Whipley Snidelash

    Where do I fit in the history of data compression?

    In 1982 I was given a ZX Spectrum by a Timex employee. It was one of only three English units that had been converted to NTSC from PAL. They worked for Timex but had their own company on the side to write software for the Timex computer and for the spectrum They hired me to do the title screen...
  48. AN630078

    Plotting a Scatter Diagram from a Large Data Set

    So I have attempted to plot the scatter diagram. My first query is does the question intend for you to include both subsets of data on one axis, (which I have plotted on the x-axis) or rather does it demand two separate diagrams to investigate if there is any correlation, or a single diagram? I...
  49. E

    Asymmetric uncertainty intervals in astrophysical data

    My initial guess was to calculate the upper and lower value, and then average those two values, but I don't know whether this is correct to make the uncertainty interval symmetric. After I calculated the average value, I subtracted it form the upper and lower value, and obtained the symmetric...
Back
Top