How can I visualize this data?

  • Thread starter DaveC426913
  • Start date
  • Tags
    Data
In summary, the conversation revolved around using data from a questionnaire with 14,000 participants, stored in a mySQL database, to extract useful information. The initial queries filter out incomplete questionnaires and dummy users, but the process becomes tedious when trying to filter by gender and age. Suggestions were made to use R for statistical analysis and Gnuplot for visualization. It was also recommended to export the data to a CSV file and use a pivot table in Excel for summaries. Some users found Gnuplot easier to use than others, but R was generally considered a good option for serious statistical analysis. Tips were given on how to filter the data and generate plots using R.
  • #1
DaveC426913
Gold Member
22,483
6,148
I've acquired data on about 14,000 users who participated in my questionnaire. I have their questionnaire answers (98 questions, rated 1 to 5) as well as some user demographic data such as gender, age and honesty.

It's all sitting in a mySQL database on my webhost. I can write some queries on it, and I can export it to one of many formats (OpenDoc Calc or simply CSV), but I'm a bit overwhelmed about how I might extract all the useful, interesting information from it.

My initial queries filter out incomplete questionnaires and dummy users. Then I can filter on gender by age:
Code:
SELECT COUNT(*) FROM `Users` WHERE `Completed` = 7 AND `Dummy` IS NULL AND `Gender`= 1 AND `Age` = 2
This gets me the number of male users in their 20's.

But this is going to get tedious. Additionally, I want to display these results visually using graphs. (I've tried to figure out OpenDoc Calc's graphing feature but it's not going well.)

I know there are a million ways to do what I want, but I'd like your opinions on how to do it easily. Basic is better than advanced. I don't plan to do too much fancy manipulation.
 
Mathematics news on Phys.org
  • #2
I would export complete data, e.g. to CSV, and import it into R. Since R is a very powerful tool for statistical computations, I would use it for analyses of the data. In R, you can select data according to categories, values etc. See, e.g.
http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf
 
  • #3
For simple plotting without too complex analysis, try Gnuplot on the CSV export. It is easy to use, well documented and for this kind of analysis I always like to write scripts that I can copy/paste and modify to my liking rather than click-orgies. Gnuplot can produce publication-quality PDF or EPS. It is free and runs on Windows and Linux.

If you want to do real statistical analysis, correlations etc. then R sounds like the way to go, but I have not used that myself.
 
  • #4
I think he's looking for advice on the best way to visualize it. Presumably he doesn't find it difficult to actually generate the plots.

With data that high dimensional, visualization is difficult. You could plot individual questions of pairs of questions, but I imagine that's not what you're interested in. I have very little experience with analyzing questionnaires, but I'd probably do some sort of factor or cluster analysis and look for large scale differences between groups. Visualizing every question at once seems too difficult.
 
  • #5
With your goal of keeping it simple, I would export the data to a CSV file, import it into Excel, and use a pivot table for summaries. If you want serious statistical analysis, R is a good choice.
 
  • #6
Number Nine said:
I think he's looking for advice on the best way to visualize it. Presumably he doesn't find it difficult to actually generate the plots.
Not a good assumption. :smile:

I really am looking for ways to generate the plots. Not quite Graphing for Dummies - the software, but something with not too steep a learning curve. This is new for me.

Number Nine said:
With data that high dimensional, visualization is difficult. You could plot individual questions of pairs of questions, but I imagine that's not what you're interested in. I have very little experience with analyzing questionnaires, but I'd probably do some sort of factor or cluster analysis and look for large scale differences between groups. Visualizing every question at once seems too difficult.
No, not too complex, simple breakdown is fine. I'll want to show for example,
- total breakdown by age and gender
- the breakdown of answers for the given questions by age and gender, etc.
But there's lots of these. I can see generating a hundred graphs or more, easy. (After all there's 98 questions)
awkward said:
With your goal of keeping it simple, I would export the data to a CSV file, import it into Excel, and use a pivot table for summaries. If you want serious statistical analysis, R is a good choice.
I brought in into Open Office Calc.

I will look up what a pivot table is, and how to generate one in OOC.

Thanks.
 
  • #7
M Quack said:
For simple plotting without too complex analysis, try Gnuplot on the CSV export. It is easy to use, well documented and for this kind of analysis I always like to write scripts that I can copy/paste and modify to my liking rather than click-orgies. Gnuplot can produce publication-quality PDF or EPS. It is free and runs on Windows and Linux.
OK, well one of the nice things about click-orgies is that I can get past File:Open without having to sit down with the manual... :grumpy:

You have a strange definition of "easy to use"... :tongue:
 
Last edited:
  • #8
Hey DaveC426913.

I would also recommend R and Gnuplot, particularly R if you want to get a plot up very quickly.

To read in a CSV, you use the command read.csv. You can then create new objects that take entire columns and throw them in a new object using something like [,1] which will grab the entire first column. So you can grab any two columns and throw it into a 2D data set and use plot to plot the data.

With regards to filtering the data by your specific questions, you can write simple functions that retain data according to characteristics. One way to do this is to copy the input to another data object and then create a function to take in your filtering requirements and then remove any data that doesn't fit the requirements and copy it to a new data object and then plot that.

Once you've got the function to generate the data and the plots, then edit a text file that calls each variation of the function that generates a data object and a plot for that particular 'question' and then execute that in the R console. You might even be able to pipe the output to a bitmap file which means that once you run the code it will automatically save all the results for you, but I don't know if you can do this (but if you can it will make your life very, very easy in comparison to if you couldn't).

Also if you need to customize anything of the plot, you can supply your custom function with extra arguments for title data, axis data, scale data, color data and so on.

I have uploaded a short reference card for R and if you decide to use it, you'll find this pdf will help you get things done faster.
 

Attachments

  • Short-refcard.pdf
    71.3 KB · Views: 175
  • #9
DaveC426913 said:
You have a strange definition of "easy to use"... :tongue:
IMO it is easy to use, but it's not easy to learn, especially if you want to instant gratification.

One big win with a command line interface is that after you have got one plot looking the way you want it, it's easy to create more that look the same - and repeat the process on different data sets - though you probably won't need that till you get the results of your follow-up survey...

A tip for working with big datasets: if you want to produce 98 (or more!) similar plots, make a script file that outputs them all to a PDF. Then you can browse through them, bookmark the interesting ones, etc, with your favorite PDF viewer, without the hassle of retyping the gnuplot commands.
 
  • #10
The biggest drawback of File:Open is a complete lack of preparing operations with plots and data. Of course, you can open a file and 1e3 times click here and there, and it is (usually) intuitive, but you need repeat it for every new data file. Even if you want to change something trivial in the data, you often need to start from scratch, throwing all plots away. That's where scripting wins. Once you learn some basics, which indeed takes some *short* time (half a day at most for learning principles), you start to be very effective. Repeatedly. This is an investment into the future :-)
 
  • #11
DaveC426913 said:
OK, well one of the nice things about click-orgies is that I can get past File:Open without having to sit down with the manual... :grumpy:

You have a strange definition of "easy to use"... :tongue:

Code:
 plot 'data.csv' using 1:2
sound pretty easy to me :-)
Code:
 help plot
gets you the help text, and you can google up plenty of examples.

What I like about scripts is that you can start out very basic and then add complexity, cosmetics etc around it until your figure is ready for PRL.

The second advantage kicks in after you've used it a bit, you can recycle plots and analysis scripts (gnuplot is pretty good for fitting, too), especially if you have to fit the same scan for 100 different temperatures. Click that!

Third, examples of scripts are easier to follow (or copy/paste) than descriptions of where to click when.

But whatever program you use most is always the easiest to use.
 

1. How can I visualize my data effectively?

To effectively visualize your data, you can use charts, graphs, or other visual aids that best represent the data you are trying to convey. It is important to choose a visualization method that is easy to understand and visually appealing to your audience.

2. What are the best tools for visualizing data?

There are many tools available for visualizing data, including Excel, Tableau, Power BI, and Google Data Studio. Each tool has its own features and capabilities, so it is important to research and choose the one that best fits your needs and data.

3. How can I make my data visualization more interactive?

To make your data visualization more interactive, you can use tools that allow for user interaction, such as sliders, filters, and drill-down options. This allows your audience to explore the data in a more engaging and personalized way.

4. How do I choose the right type of data visualization?

The right type of data visualization depends on the type of data you have and the story you want to tell with it. For example, if you want to show trends over time, a line graph would be best, while a bar chart would be suitable for comparing different categories.

5. Can I customize my data visualization to match my brand or style?

Yes, you can customize your data visualization to match your brand or style. Most visualization tools have options for changing colors, fonts, and other design elements. This can help make your visualization more visually appealing and consistent with your brand's image.

Similar threads

  • Programming and Computer Science
Replies
7
Views
401
  • Programming and Computer Science
Replies
2
Views
283
  • Other Physics Topics
Replies
5
Views
1K
  • Programming and Computer Science
Replies
9
Views
1K
Replies
2
Views
863
  • Programming and Computer Science
Replies
1
Views
3K
  • Computing and Technology
Replies
9
Views
2K
  • Programming and Computer Science
Replies
3
Views
2K
Replies
15
Views
5K
Replies
9
Views
3K
Back
Top