How can I visualize this data?


by DaveC426913
Tags: data, visualize
DaveC426913
DaveC426913 is offline
#1
Apr10-12, 09:27 PM
DaveC426913's Avatar
P: 15,325
I've acquired data on about 14,000 users who participated in my questionnaire. I have their questionnaire answers (98 questions, rated 1 to 5) as well as some user demographic data such as gender, age and honesty.

It's all sitting in a mySQL database on my webhost. I can write some queries on it, and I can export it to one of many formats (OpenDoc Calc or simply CSV), but I'm a bit overwhelmed about how I might extract all the useful, interesting information from it.

My initial queries filter out incomplete questionnaires and dummy users. Then I can filter on gender by age:
SELECT COUNT(*) FROM `Users` WHERE `Completed` = 7 AND `Dummy` IS NULL AND `Gender`= 1 AND `Age` = 2
This gets me the number of male users in their 20's.

But this is going to get tedious. Additionally, I want to display these results visually using graphs. (I've tried to figure out OpenDoc Calc's graphing feature but it's not going well.)

I know there are a million ways to do what I want, but I'd like your opinions on how to do it easily. Basic is better than advanced. I don't plan to do too much fancy manipulation.
Phys.Org News Partner Mathematics news on Phys.org
Researchers help Boston Marathon organizers plan for 2014 race
'Math detective' analyzes odds for suspicious lottery wins
Pseudo-mathematics and financial charlatanism
camillio
camillio is offline
#2
Apr11-12, 07:10 AM
P: 74
I would export complete data, e.g. to CSV, and import it into R. Since R is a very powerful tool for statistical computations, I would use it for analyses of the data. In R, you can select data according to categories, values etc. See, e.g.
http://cran.r-project.org/doc/contri...ni-SimpleR.pdf
M Quack
M Quack is offline
#3
Apr11-12, 07:19 AM
P: 635
For simple plotting without too complex analysis, try Gnuplot on the CSV export. It is easy to use, well documented and for this kind of analysis I always like to write scripts that I can copy/paste and modify to my liking rather than click-orgies. Gnuplot can produce publication-quality PDF or EPS. It is free and runs on Windows and Linux.

If you want to do real statistical analysis, correlations etc. then R sounds like the way to go, but I have not used that myself.

Number Nine
Number Nine is offline
#4
Apr11-12, 12:56 PM
P: 771

How can I visualize this data?


I think he's looking for advice on the best way to visualize it. Presumably he doesn't find it difficult to actually generate the plots.

With data that high dimensional, visualization is difficult. You could plot individual questions of pairs of questions, but I imagine that's not what you're interested in. I have very little experience with analyzing questionnaires, but I'd probably do some sort of factor or cluster analysis and look for large scale differences between groups. Visualizing every question at once seems too difficult.
awkward
awkward is offline
#5
Apr11-12, 05:07 PM
P: 325
With your goal of keeping it simple, I would export the data to a CSV file, import it into Excel, and use a pivot table for summaries. If you want serious statistical analysis, R is a good choice.
DaveC426913
DaveC426913 is offline
#6
Apr11-12, 05:42 PM
DaveC426913's Avatar
P: 15,325
Quote Quote by Number Nine View Post
I think he's looking for advice on the best way to visualize it. Presumably he doesn't find it difficult to actually generate the plots.
Not a good assumption.

I really am looking for ways to generate the plots. Not quite Graphing for Dummies - the software, but something with not too steep a learning curve. This is new for me.

Quote Quote by Number Nine View Post
With data that high dimensional, visualization is difficult. You could plot individual questions of pairs of questions, but I imagine that's not what you're interested in. I have very little experience with analyzing questionnaires, but I'd probably do some sort of factor or cluster analysis and look for large scale differences between groups. Visualizing every question at once seems too difficult.
No, not too complex, simple breakdown is fine. I'll want to show for example,
- total breakdown by age and gender
- the breakdown of answers for the given questions by age and gender, etc.
But there's lots of these. I can see generating a hundred graphs or more, easy. (After all there's 98 questions)


Quote Quote by awkward View Post
With your goal of keeping it simple, I would export the data to a CSV file, import it into Excel, and use a pivot table for summaries. If you want serious statistical analysis, R is a good choice.
I brought in in to Open Office Calc.

I will look up what a pivot table is, and how to generate one in OOC.

Thanks.
DaveC426913
DaveC426913 is offline
#7
Apr11-12, 08:55 PM
DaveC426913's Avatar
P: 15,325
Quote Quote by M Quack View Post
For simple plotting without too complex analysis, try Gnuplot on the CSV export. It is easy to use, well documented and for this kind of analysis I always like to write scripts that I can copy/paste and modify to my liking rather than click-orgies. Gnuplot can produce publication-quality PDF or EPS. It is free and runs on Windows and Linux.
OK, well one of the nice things about click-orgies is that I can get past File:Open without having to sit down with the manual...

You have a strange definition of "easy to use"...
chiro
chiro is offline
#8
Apr11-12, 09:39 PM
P: 4,570
Hey DaveC426913.

I would also recommend R and Gnuplot, particularly R if you want to get a plot up very quickly.

To read in a CSV, you use the command read.csv. You can then create new objects that take entire columns and throw them in a new object using something like [,1] which will grab the entire first column. So you can grab any two columns and throw it into a 2D data set and use plot to plot the data.

With regards to filtering the data by your specific questions, you can write simple functions that retain data according to characteristics. One way to do this is to copy the input to another data object and then create a function to take in your filtering requirements and then remove any data that doesn't fit the requirements and copy it to a new data object and then plot that.

Once you've got the function to generate the data and the plots, then edit a text file that calls each variation of the function that generates a data object and a plot for that particular 'question' and then execute that in the R console. You might even be able to pipe the output to a bitmap file which means that once you run the code it will automatically save all the results for you, but I don't know if you can do this (but if you can it will make your life very, very easy in comparison to if you couldn't).

Also if you need to customize anything of the plot, you can supply your custom function with extra arguments for title data, axis data, scale data, color data and so on.

I have uploaded a short reference card for R and if you decide to use it, you'll find this pdf will help you get things done faster.
Attached Files
File Type: pdf Short-refcard.pdf (71.3 KB, 3 views)
AlephZero
AlephZero is offline
#9
Apr11-12, 09:56 PM
Engineering
Sci Advisor
HW Helper
Thanks
P: 6,342
Quote Quote by DaveC426913 View Post
You have a strange definition of "easy to use"...
IMO it is easy to use, but it's not easy to learn, especially if you want to instant gratification.

One big win with a command line interface is that after you have got one plot looking the way you want it, it's easy to create more that look the same - and repeat the process on different data sets - though you probably won't need that till you get the results of your follow-up survey.....

A tip for working with big datasets: if you want to produce 98 (or more!) similar plots, make a script file that outputs them all to a PDF. Then you can browse through them, bookmark the interesting ones, etc, with your favorite PDF viewer, without the hassle of retyping the gnuplot commands.
camillio
camillio is offline
#10
Apr12-12, 03:43 AM
P: 74
The biggest drawback of File:Open is a complete lack of preparing operations with plots and data. Of course, you can open a file and 1e3 times click here and there, and it is (usually) intuitive, but you need repeat it for every new data file. Even if you want to change something trivial in the data, you often need to start from scratch, throwing all plots away. That's where scripting wins. Once you learn some basics, which indeed takes some *short* time (half a day at most for learning principles), you start to be very effective. Repeatedly. This is an investment into the future :-)
M Quack
M Quack is offline
#11
Apr12-12, 08:49 AM
P: 635
Quote Quote by DaveC426913 View Post
OK, well one of the nice things about click-orgies is that I can get past File:Open without having to sit down with the manual...

You have a strange definition of "easy to use"...
 plot 'data.csv' using 1:2
sound pretty easy to me :-)
 help plot
gets you the help text, and you can google up plenty of examples.

What I like about scripts is that you can start out very basic and then add complexity, cosmetics etc around it until your figure is ready for PRL.

The second advantage kicks in after you've used it a bit, you can recycle plots and analysis scripts (gnuplot is pretty good for fitting, too), especially if you have to fit the same scan for 100 different temperatures. Click that!

Third, examples of scripts are easier to follow (or copy/paste) than descriptions of where to click when.

But whatever program you use most is always the easiest to use.


Register to reply

Related Discussions
How to visualize CP^1 = S^2 ? Differential Geometry 7
How to visualize E-MC2 Special & General Relativity 27
How to visualize E-MC2 General Physics 0
So Confused trying to visualize General Physics 3
What am I suppose to visualize here? General Math 5