Determining the Importance of Certain Data Types (PCA?)

  • Context: Undergrad 
  • Thread starter Thread starter Pighead
  • Start date Start date
  • Tags Tags
    Data Pca
Click For Summary

Discussion Overview

The discussion revolves around identifying important features from music files for input into a neural network aimed at predicting audio quality scores based on human evaluations. Participants explore various mathematical techniques and methodologies for feature selection and classification.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant seeks advice on determining which features are most important for predicting human scores, expressing a desire to understand the relationships between input data sets.
  • Another participant questions the definition of "most important" and asks for clarification on the specific goals of the analysis.
  • A later reply suggests that multiple regression could be a suitable approach for finding the best linear combination of features for prediction, recommending both linear and logistic regression methods.
  • There is mention of classifiers, such as Bayesian classifiers, which may help in determining the relevance of inputs and improving prediction accuracy.

Areas of Agreement / Disagreement

Participants have not reached a consensus on the best approach to feature selection and classification, with various methods being proposed and discussed without resolution.

Contextual Notes

Participants express varying levels of familiarity with mathematical concepts, and there is uncertainty regarding the specific methodologies to employ for feature importance assessment and classification.

Pighead
Messages
2
Reaction score
0
Hello Forum,

My first post...

Im doing a project that extracts certain features from music files. These "feautures" will/may become the inputs to a neural network. I have 12 features in total which will correspond to a maximum of 12 inputs to the neural network.

Essentially I will have 12 columns of data, 1 column of data for each feature. eg 10 music files will produce 10 rows of data for each feature/column. eg Amplitude could be column 1.

Anyway, here comes my maths question. I am not an expert at Maths as I've only done basic math at university but I am willing to learn and am a fast learner.

--------------------
I want to decide which input features/columns of data are the most important and any relationshipd between them etc. Maybe some sort of classification also but I am not sure?

I have been told that PCA or Principle Components Analysis could be the best way of doing this. I don't have any knowledge of this but a search in Google tells me that this is working out SD and other parameters.

Also, I have been told that classifiers such as Bayesian classifiers could be worth a look.

Im just looking for advice for good maths experts on here. How would you tackle the problem, what techniques would you use? Is it important to look at the relationships between the input data sets?
 
Last edited:
Physics news on Phys.org
Hi Pighead, welcome to PF!

When you say "most important" what do you mean? In other words, what is the question you are trying to answer or the task you are trying to accomplish with your data?
 
DaleSpam said:
Hi Pighead, welcome to PF!

When you say "most important" what do you mean? In other words, what is the question you are trying to answer or the task you are trying to accomplish with your data?

Thanks.

The prediction out of the neural network will be the quality of the music. The training data for the neural network will be human scores for certain audio files ie they grade the quality of the audio files and give a score. The NN will try to predict what score humans would grade.

The inputs will be from the files used by the humans in the quality grading process. The output of the neural network will be the scores recorded from the humans , for the training of the netwrok.

I want to know 3 things;

1. how do I assess which inputs are most important in giving an accurate prediction of the human scores. I ahve the inputs and expected outputs of the neural network so how do I analyse the inputs to see which ones are most important.

2. Also, which inputs should be removed as they have no importance.

3. Any other ways of improving the accuracy of the system eg classifers that will classify some of the inputs in some way. I am not sure about this. Maybe I could have a different neural network for each class. I think I read that a naive Baysian Classifier can independently decide which inputs to use.?

Thanks for any help.
 
It sounds to me like you want a multiple regression. That will give you the best linear combination of your features for predicting the scores. You should probably try both a linear regression and a logistic regression.

There are also specific methods for including or excluding your features as predictors.
 

Similar threads

  • · Replies 29 ·
Replies
29
Views
7K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 19 ·
Replies
19
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
7
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
7
Views
3K