What is Data analysis: Definition and 97 Discussions

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.Data mining is a particular data analysis technique that focuses on statistical modeling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data while CDA focuses on confirming or falsifying existing hypotheses. Predictive analytics focuses on the application of statistical models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a species of unstructured data. All of the above are varieties of data analysis.Data integration is a precursor to data analysis, and data analysis is closely linked to data visualization and data dissemination.

View More On Wikipedia.org
  1. J

    I Plotting the orbits of the planets from Ephemeris data

    I would like to plot the position of planets around the Sun for different dates and describe their orbits. I got the Ephemeris data for each planet from the JPL Horizons System, I got: Date__(UT)__HR:MN R.A.___(ICRF)___DEC Ang-diam ObsSub-LON ObsSub-LAT ObsEcLon ObsEcLat however...
  2. BiGyElLoWhAt

    Data analysis by guessing, checking and fixing

    Hi, I'm trying to come up with a section of an optics based physics lab designed for 2nd year Calc-based college students. Calc 2 is a co-req. There are 2 labs that are intimately linked together. The first effectively revolves around taking ##d_i(d_o)## data from a lens, source and screen...
  3. D

    Water tank fault Detection

    I am assigned to work on a project where I am required to perform multiple types of data analyses on process data using Python or Matlab. The analyses chosen must relate to at least one process objective (e.g., fault detection). I am required to choose one basic linear technique and one more...
  4. L

    Need some help reading this graph and getting data from it (wind turbine simulation)

    hello all. Most the time im pretty good at analyizing data, but today i am struggling. So i need some extra help. I used a open source program called openFAST to generate this data. openFAST is a wind turbine simulation software. The graph below is of the a time series calculation from a...
  5. H

    I How to find the value of a constant experimentally?

    Hi, First of all, sorry if this is not the right place to post my question I was not sure where exactly to post this kind of question. I'm wondering how can I find the value of a constant experimentally. For instance, I have a equation ##l = AB^{4/3}##, with a set of data for ##I## and ##B##...
  6. ForTheLoveOfPhysics

    B Data needed - Related bodies and their stats

    I’m analysing the gravitational relationships between different mass astronomical bodies and am getting sick of having to individually google and document these. Are there data sets out there that list pairs/sets of objects which includes their mass and distance from each other? Including...
  7. P

    B Calculating g with a Conical Pendulum

    In analysing the conical pendulum, it can be shown that the period is given by T=2pi.sqrt(L.cos(phi)/g) and that therefore, g = 4.pi^2.L.(cos(phi)/T^2). L = pendulum length, phi is measured at the top of the pendulum (at the point of suspension). Graphing cos(phi) vs T^2 should produce...
  8. W

    Cleaning/Reordering towards regression

    I have quantitative data on all countries on two variables, say A,B in Excel and I am trying to regress A on B. Problem is that data are ordered based on the magnitude of A, B , rather than Alpha by country. Is there a reasonable way of ordering by country for each and then regress A on B? If I...
  9. W

    A Choice of Pipelines for Data Analysis

    Hi, So say I have some data to process. I am trying, say, Linear/Multilinear Regression. I know how to do this within Python Pandas. I can learn how with Tensorflow (TF). Would TF produce the same output given the "right" choice of Activation Functions *? Or would it output a model that is...
  10. G

    Comparing Near-Infrared Spectra: What Stat Method?

    I'd like to compare 2 or more near-infrared spectra. The data consists of measured light intensity in different wavelengths (range 600 nm to 1100 nm). I'm wondering which statistical method would be appropriate? I noticed when searching online that pearson correlation might be inaccurate as...
  11. Amitkumarr

    I Finding bias of the coin from noise corrupted signals

    Suppose there are two persons A and B such that both have a personal communication system which can transmit and receive bits. B has a biased coin whose bias is not known. A asks B to toss the coin 2000 times, send a 0 when a tail comes up and a 1 when a head comes up. It is known that whatever...
  12. Arman777

    I Question about data analysis and error

    Let us suppose we have one constant variable ##b \pm \delta b = 20 \pm 1 ## and one function that depends on ##x ##, such as, ##a(x) \pm \delta a ## The problem is I want the difference between ##a(x) ## and ##b ## to be ##0 ##. Let me denote this difference as ##c \pm \delta c ##. To...
  13. SymNeric

    Analysing a ##C_M## graph (pitching moment data)

    Hi guys, I hope everyone is safe and well. I'm currently nearing the end of my third year dissertation, and I'm looking at analysing pitching moment coefficient (CM) data over a full range of angles of attack for airfoils with different serrations on the trailing edge. What are things to look...
  14. F

    I Estimating decay yields from fits to these distributions

    I'm currently reading various papers on the violation of Lepton Flavour Universality in rare B-decays and I would appreciate some help in understanding the methodology for measuring the ratios in these decays. Here is a quote from a recent paper from the LHCb collaboration (p.5): My question...
  15. K

    I Rebinning for Data analysis

    Hello! I am working on a spectroscopy experiment and for each wavelength of a laser I have some counts. For the purpose of my question I will make up some data to illustrate my problem, in the table below (these are just numbers, without any relevance for the physical reality of the experiment)...
  16. A

    Solenoid Lift Force Results -- Need Data Analysis

    Good Day, I am trying to pick up small ferro (Neodynium)) magnets vertically with a solenoid. I want to know how much magnetic force the solenoid can pick up. The formula I tested and actual numbers for my solenoid are in the image below. I know that the magnetic field of a solenoid is given...
  17. person123

    MATLAB MATLAB API For Wave Flume Data Analysis

    (As a quick note, 'wave flume' should be taken rather generally. I basically just mean the sort of experiments involved in the flow of water which may use instruments such as pressure gauges, load cells, wave gauges, and ADVs. I know that they're not always done in a flume per say -- the...
  18. astroman707

    For astronomers, what software/languages do you use to handle data?

    For all the astronomers and astrophysicists out there, what are your preferred methods of dealing with large swaths of data? What are your go to programming languages, and software?
  19. T

    I Data analysis for theoretical vs experimental results

    Hello, My question is what types of data analysis can I perform on a set of theoretical and experimental results? For example, I have v(x) = cos(x) and I plot my observed data to v(x). Thanks!
  20. A

    A What is the best free software for analyzing powder XRD data?

    I studied an elementary course as an introduction to Solid State Physics. Now I have powder XRD data to practice crystal structure determination. I am going to do this alone in my computer. I want to know what software to use (free software) with manual of course. Any help please?
  21. R

    I Principal component analysis (PCA) coefficients

    I am trying to use PCA to classify various spectra. I measured several samples to get an estimate of the population standard deviation (here I've shown only 7 measurements): I combined all these data into a matrix where each measurement corresponded to a column. I then used the pca(...)...
  22. S

    How do I find refractive index uncertainties?

    I've been doing an experiment where I've used prisms and a spectrometer to find the exact angles inside the prisms and the refractive index of the prisms by finding the minimum angle of deviation. I have attached a picture of the formula I've been using to find the refractive indices. Where...
  23. T

    MATLAB Numerical approximation of the area under curve

    I am very new too Matlab and how it all works but I am having trouble understanding at what axis the numerical integration is occurring from on the graph that I plotted. So I am currently doing an experiment in gamma ray spectroscopy and due to issue with the software we found it hard to...
  24. David Browning

    I Any Advice Analyzing Root Data

    I've recently started my new RA position, and I've been given the task of analyzing a root data file. I'm not completely lost, but I don't exactly know what I'm doing. The point of my post is not to ask for answers, merely advice. A place I could go for info on data analytics. Pointers on how to...
  25. Guilherme Franco

    A Little help interpreting spectral data from an article

    I'm trying to find a good database of absorption or reflection spectra in visible light for pigments. I've found a wonderful database in this article: http://e-conservation.org/issue-2/36-FORS-spectral-database#CSV It's almost exactly what I needed Except I don't understand the data The...
  26. S

    Struggling to choose my PhD topic

    <Moderator note: Moved from academic guidance to career guidance on Scott92's request. Reason: The question primarily addresses job opportunities in dependence of PhD subject.> Hey everyone, I'm a student who is currently undertaking a Master of Physics (coursework & research) at the...
  27. F

    Edge Data Center Size Cost Estimation in Bakersfield, CA

    I want to do a cost estimation for an edge data center in Bakersfield, California. I don't know how big should the center be and i do not know how i can view the data traffic. Can anyone help?
  28. B

    Parallel Axis Theorem Experiment

    Homework Statement I am currently working on a physics experiment to confirm the parallel axis theorem. To do this, I have the following setup: In this experiment I change the distance between the centre of the rotating disc and the central axis. I record the time for 5 complete rotations...
  29. scottdave

    Python Interesting article how to use Pandas with Excel sheets

    I came across this article about using Pandas in Python to read in a multi-tab spreadsheet to Python, and work with the data then write back to an Excel spreadsheet...
  30. S

    Split Hopkinson pressure bar data

    Currently coding a wave separation script for the split hopkinson pressure bar. Is there a place I can get raw data for the split hopkinson pressure bar to test my code?
  31. FallenApple

    Studying How to learn Topological Data Analysis

    It's a really interesting idea. I think I want to eventually add this to my toolbox. But how? I've heard it uses ideas from algebraic topology. But how much of theoretical topology do I actually need to learn? Are proofs important? I don't care about developing the algorithms from scratch. I...
  32. R

    A F-test regression test, when and how?

    I am aware that f-tests can be used to check the null hypothesis when comparing regression models if the models are nested. What I am confused about is if I can apply an f-test to compare the following, (and if so what is the best way) I have two regression laws Y = a1*X1 + a2*X2 + b Y =...
  33. F

    How to linearize this data?

    Hi, I'm supposed to linearize this set of data: "Below is a data set which includes information about the motion of the objects in the solar system. Note: the periods are listed in Earth years (time it takes the Earth to complete one orbit around the Sun) and the average distances are reported...
  34. franciobr

    Python workflow for experimental data analysis

    Hello! I have been using python for my data analysis and processing needs as an experimental physicist for an year now. I have used MATLAB and originpro before and python provides me everything I need. But I am not satisfacted with my worflow specifically for plotting needs. I often find myself...
  35. W

    Segway for Presentation: SQl Server/Database and Data Mining

    Hi All, I need to do a presentation in the subject areas of SQL Server, general Database into the area of Data Mining for a job interview. Any Ideas? First thought is the use of SSAS ( Analysis Service) and SSDT ( Data Tools) from SQL Server in Data Mining. But this does not seem...
  36. S

    Anomaly detection in cybersecurity

    This question is primarily directed to @bapowell, but I encourage others to please add any thoughts or suggestions. Brian, I just saw your bio while reading the CMB primers, and thought you may have some ideas on cybersecurity data analytics. Some background: I've been in cybersecurity since...
  37. C

    I Participant non-compliance: How to include data in analysis

    Hello, I hope this is the appropriate section to post this question: I conducted a study in which participants were to do one of three treatment options and then write a test. The intention was to separate the participants based on the option they selected and using a one-way ANOVA, analyze...
  38. L

    Solving for Minimum point Excel

    Homework Statement [/B] Hello, I trying to solve for the minimum value that the x and y components reach in plot (A). I know that you can perform this using IgorPro but I would like to know how to solve for it using Excel. The plot was constructed using the following data points: 110...
  39. E

    One Variable Statistics Homework Question

    Homework Statement A set of eight numbers has a median of 19. a) What is the sum of the fourth and fifth data points b) What is the sum of the fourth and sixth data points (no answers available) Homework Equations (N+1)/2 -- Position of the median (N+1)/4 -- Position of Q1 3(N+1)/4 --...
  40. MartinTheStudent

    I Do I use instrument error or arithmetic mean error?

    Hi. Let's say I have data which I have measured. For example I measured a length of an object and the measurment was repeated 5 times. An instrument which I used to measure has an error, value of which I know. My options are to either to just go with the instrument error (probably not, right?)...
  41. B

    XRD Data Analysis Question

    The major diffraction peaks of my sample have essentially the same 2θ values as the reference (graphically), but have different heights. Can it still count as conclusive evidence that my sample matches the reference? Or does it suggest that my sample is a different substance? Also, as a side...
  42. henry wang

    How to calculate uncertainty of gradient of straight line?

    Mod note: Moved from a technical forum section, so missing the homework template. I am analysing the data from my undergrad experiment, which the aim is to find the Plank's constant by scattering x-ray off NaCl crystal and using Braggs law. The straight-line equation is as follows...
  43. T

    I Linear regression on data collection error

    Hi I've collected few sets of data and obtained significant different linear regression (R^2) in 2 particular sets of data . Does that indicates the 2 sets of data is not validated which might due to data collection error? For example, 20 sets of data contain linear regression of 0.900+...
  44. kelvin490

    MATLAB How to extract data from existing JPEG/TiFF graph?

    Is there any method to extract data from a graph in JPEG/TIFF format, using simple software such as MATLAB? I have a graph as shown below: I only have this picture but no raw data and I want to plot the black curve in this graph. How to get approximate numerical data of the x and y axis...
  45. CassiopeiaA

    A Take FFT to find time period for eclipsing binaries

    I am trying to use Kepler Data for Eclipsing Binaries to estimate time period, and then other parameters such as mass, eccentricity, semi-major axis, distance, etc. of the binary systems. I want to write code in MATLAB which will use FFT to find the time period. The available data has the...
  46. lep11

    MATLAB Data Analysis: Observation Model Problem

    The observation model will be ##z=H\theta+v##, where ##z## is column vector containing the results, ##H## is the observation matrix, ##\theta## is the parameter vector and ##v## is a vector containing random additive noise. b.) In this case ##z##=[1 1 0; 1 0 1; 0 1 1; 1 1 1]*[##m_1## ##m_2##...
  47. N

    A Data Analysis: consistency of e.g. two measurements

    How do we check consistency of measured data, e.g. x1 = 1 ± 0.1 x2 = 1.4 ± 0.3 I can do this for two different samples to check for significance in terms of different means, but how to check internal consistency of a single set. How can we do this using: a) standard normal probability table...
  48. B

    Math Career Advice -- Wanting to become a data scientist

    Hi, so currently I am stuck in a situation where I currently accepted a job offer and had 5 other interviews which I never heard back from any yet. (I will have to say no if I hear back I guess and there was one I really was hoping to get but its going to be too late now). These were for data...
  49. fluidistic

    How can scientists trust closed source programs?

    I wonder how can scientists trust closed source programs/softwares. How can they be sure there aren't bugs that return a wrong output every now and then? Assuming they use some kind of extensive tests that figures out whether the program behaves as it should, how can they be sure that the...
  50. U

    Background rejection / data analysis

    Hi, How can one calculate background rejection from a background sample applying cuts ??