NLP Algorithm Evaluation: Help?

    Hello guys,

    We're beginners here and please excuse us if our questions are inappropriate in any way. We have a statistics-related problem, and we'll explain it as short as possible.

    Here it goes. We're working on an algorithm that analyses text sentences (some key words and heuristic rules involved). The algorithm is supposed to judge the sentences the ways humans would do. That is why we want to COMPARE THE RESULTS given by the PROGRAM and the results given by HUMANS. The more they match, the better.

    We created a research on that: we have around 150 sentences that were analysed both by the program and by a group of students.

    There are 6 parameters that should be evaluated for each sentence. Each parameter can take a value between 0 and 1 (0 <= x <= 1). For each sentence we collected:

    (a) 6 values given by the algorithm;

    (b) around 30 x 6 values given by a number of students (we asked a number of students to evaluate the same sentence because of the subjectivity of only one person's answer).

    Now we want to summarize human-created results for each sentence and then to compare those integrated human-results with the algorithm results.

    What do you suggest is the best way to do it? What statistics test we should use?

    We plan to use SPSS, but if you know of a better software, please say.


    Friends from the University of Belgrade.
