1. Not finding help here? Sign up for a free 30min tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Creating a Word Index (matlab)

  1. Jul 3, 2014 #1

    Maylis

    User Avatar
    Gold Member

    1. The problem statement, all variables and given/known data
    I am working on Problem #2 in the attached PDF


    2. Relevant equations



    3. The attempt at a solution
    When I run this right now, I get an error in my first conditional for if the word is not in the index. How do I say that the word does not exist in the index? Any other problems?

    Code (Text):
    function Index = InsertDoc(Index, newDoc, DocNum)
    IndexWords = {Index.Word};
    for i = 1:numel(newDoc)
       % If word is not in the Index
        if isempty(Index) || Index(i).Words ~= IndexWords
           Index(numel(IndexWords)+1).Word = newDoc{i};
           Index(numel(IndexWords)+1).Documents = DocNum;
           Index(numel(IndexWords)+1).Locations{end+1} = i;
        else
        end
        % If the word does exist in the Index, but the occurance is unknown
        % 1st occurance
        if Index(i).Documents == DocNum
            WordInIndex = strcmpi(newDoc{i},{Index.Word});
            Index(WordInIndex).Documents = [Index(WordInIndex).Documents,DocNum];
            Index(WordInIndex).Locations{end}(end+1)=i
          % 2nd occurance or later
        else
            Index(WordInIndex.Locations{numel(Index(WordInIndex).Locations)+1}) = i;
        end
    end
     

    Attached Files:

  2. jcsd
  3. Jul 4, 2014 #2
    What is the error that you get?
    What does that error tell you about what is going on in the code?
    Think about if you were tasked to do this by hand how would you go about accomplishing the task? and then how would you translate that into code?
     
  4. Jul 4, 2014 #3

    Maylis

    User Avatar
    Gold Member

    Yes, here is the error I get

    Code (Text):
    Error in InsertDoc (line 4)
        if isempty(Index) || Index(i).Words ~= DocNum
    I believe something is wrong with the second part in particular. How do I say that the word does not exist in the Index?
     
  5. Jul 4, 2014 #4
    I'm not super familiar with Matlab code so take this with a grain of salt.
    It looks like you're exceeding the bounds of Index which would cause the error.

    As far as how do you say? well you need to figure it out properly, which I dont think you're doing atm.
    What it looks like your check is doing is saying:
    If the index is empty OR the current spot I'm checking isn't the word then go ahead and add it.

    Think about this. Say you have a set of trading cards and your friend gives a new one. How would you go through your cards to see if you have that one already or not
     
  6. Jul 4, 2014 #5

    Maylis

    User Avatar
    Gold Member

    I'm trying this problem completely from scratch, so I am trying to test and debug only the first case

    Here is where I am now
    This is my code to create a cell structure that is an empty 1x0 cell array. I want this loop to look at every element of Doc1, and if Index is empty (should be for i=1), or if the word Doc1(i) is not inside of Index, then to append it to the cell array. I am having trouble getting a conditional statement that does exactly that.

    In my test, Index should contain 'I' 'love' 'Matlab'.
    Code (Text):

    function Index = InitializeIndex()
    c10 = cell(1,0);
    Index = struct('Word', c10, 'Documents', c10, 'Locations', c10);
    Here is my input
    Code (Text):
    Doc1 = {'Matlab','is','awesome'};
    E7 = InitializeIndex;
    E7 = InsertDoc(E7,Doc1,1);
    Code (Text):
    function Index = InsertDoc(Index, newDoc, DocNum)
    % This function will be a struct array where each element corresponds to a
    % unique word in a group of documents. In each element of the struct array
    % the word is stored in the Word field, the document numbers that the word
    % is contained is in the documents field, and the locations of the word in
    % each document is in the Location field.
    Index = {Index.Word};
    for i = 1:numel(newDoc)
        % IndexWord is either empty or the word is not present in IndexWord
        if isempty(Index) || strcmpi(Index{i},newDoc(i))
            Index{end+1} = newDoc(i);
            Index(i).Documents = DocNum(i);
        end
    end
     
    Last edited: Jul 4, 2014
  7. Jul 4, 2014 #6

    AlephZero

    User Avatar
    Science Advisor
    Homework Helper

    Code (Text):

    function Index = InsertDoc(Index, newDoc, DocNum)
     
    Using the name (Index) for an input argument and the output argument doesn't seem like a good plan. I can't find anything in the Matlab documentation that says whether it is legal or not, but even if it is legal, it's confusing.
     
  8. Jul 4, 2014 #7

    Maylis

    User Avatar
    Gold Member

    Hey AZ. This is the directions given in the problem. The purpose of the problem is to modify the input Index. We are supposed to Modify Index by adding in words until all unique words are contained in the output.
     
  9. Jul 4, 2014 #8

    AlephZero

    User Avatar
    Science Advisor
    Homework Helper

    There is no problem calling a function with something like Index = InsertDoc(Index, newDoc, DocNum). The function takes the value of Index, computes something, and then overwrites Index with it.

    But if you look at all the simple tutorial examples of how to write a function, they don't use the same variable name inside the definition of he function.

    If changing one of the names doesn't help, sorry, I don't use Matlab much these days so I'm not an expert!
     
  10. Jul 4, 2014 #9

    Maylis

    User Avatar
    Gold Member

    Actually the bigger issue was that I initialzied Index with the first function, then overwrote it with Index = {Index.Word}. I deleted that part, but I still can't get my conditional statement right.

    Edit: I realize what you are telling me. This is a problem, because you have to have the output (same name as input) in the definition of the function everytime. So it seems to be contradictory in a way to what you were saying. To say that you can't have the input name in the definition of the function, however the output has to be in the function, and the output name is the same as the input name.

    I don't have a choice in the name, because I have to follow the template given.
     
    Last edited: Jul 4, 2014
  11. Jul 5, 2014 #10

    Maylis

    User Avatar
    Gold Member

    Here is where I am at now

    Code (Text):
    function Index = InsertDoc(Index, newDoc, DocNum)
    % This function will be a struct array where each element corresponds to a
    % unique word in a group of documents. In each element of the struct array
    % the word is stored in the Word field, the document numbers that the word
    % is contained is in the documents field, and the locations of the word in
    % each document is in the Location field.
    for i = 1:numel(newDoc)
        % IndexWord is either empty or the word is not present in IndexWord
        if isempty(Index)|| strcmpi({Index.Word},newDoc{i})
            Index(end + 1).Word = newDoc{i};
        end
    end
    This is the input I am running
    Code (Text):
    Doc1 = {'Matlab','is','awesome'};
    E7 = InitializeIndex;
    E7 = InsertDoc(E7,Doc1,1);
    and here is my output, which was not expected. I was expecting E7(2) to be 'is', not a matrix dimension error.

    Code (Text):
    E7(1)

    ans =

             Word: 'Matlab'
        Documents: []
        Locations: []

    EDU>> E7(2)
    Index exceeds matrix dimensions.

    [
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted



Similar Discussions: Creating a Word Index (matlab)
Loading...