Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Deleting Duplicate Values Of Matrix In Mathematica?

  1. Jun 25, 2012 #1
    Hey guys,

    I have a x row by 3 column matrix (its big and the amount of rows varies).

    I want to...

    ...Keep ONLY pairs of rows with the same value in their first column and delete the rest. Meaning if a row has a unique first column value or if three rows have the same value in their first column, they should be deleted.

    I think this is pretty simple but I have read and looked for hours and just could not figure it out. Any help would be greatly appreciated! I attached some sample matrix. Thanks in advanced!
     

    Attached Files:

  2. jcsd
  3. Jun 25, 2012 #2
    Sort the entries into order on the first element,
    Note: If you can guarantee that your list will always be sorted on the first element you could skip the sort step,
    Split the entries into groups on the first element,
    Select those groups with two items,
    Flatten to get rid of the extra layer of {}.
    Note: Sort changes the order. If the order must be maintained then an extra layer will be needed to put them back into the original order.

    In[1]:=
    t={{0,26,4728}, {0,9,7111}, {1,18,4292}, {1,16,4069}, {1,15,4092}, {1,14,4931},
    {1,12,3937}, {1,10,4768}, {1,9,6860}, {1,3,11304}, {2,56,12727}, {2,34,9427},
    {2,26,4954}, {2,9,7827}, {3,9,6099}, {4,9,8408}, {5,9,7023}, {6,26,6290}, {6, 9,5565},
    {7,26,4630}, {8,57,12798}, {8,56,12633}, {8,34,11512}, {8, 9,4905}, {9,9,5863},
    {10,26,4386}, {10,10,4640}, {10,9,6889}, {11,9,5841}, {12,26,4335}, {12,24,11688},
    {12,23, 11793}, {12,9,6523}, {13,9,5137}, {14,9,6660}, {15,10,4901}, {15,9,7152},
    {16,34,11659}, {16,26,5339}, {16,25,12489}, {16,24,11601}, {16,23,11824}, {16,22, 12250},
    {16,21,11903}, {16,19,12927}, {16,9,5692}, {16,2,11727}, {16,1,11276}, {17,26,5864},
    {17,9,5809}, {18,9, 6683}, {19,26,4788}, {19,9,8497}, {20,9,6001}, {21,26, 5338},
    {21,22,11620}, {21,9,7638}, {22,26,5644}, {22,9,6466}, {23,57,9669}, {23,26,5039},
    {23,16,8929}, {23,9,8135}, {24,26,4805}, {25,56,12404}, {25,9,5348}, {26,9,5425},
    {27,57,12543}, {27, 26,4470}, {27,9,5067}, {28,57,9897}, {28,9,6840}, {29, 26,4515},
    {29,10,4640}, {29,9,8759}, {30,9,6432}, {31,9,7121}, {32,9, 5464}, {33,66,13433},
    {33,64,11891}, {33,9,6022}, {34,10,4599}, {34,9,5505}, {35,60,3928}, {35,18,4152},
    {35,14,4329}, {35, 9,8435}, {35,3,9568}, {36,26,4377}, {36,9,5208}, {37,10, 7191},
    {37,9,5856}, {38,9,9040}, {39,9,5963}, {40,10,4692}, {40,9,5291}, {41,12,4730},
    {41,10,4791}, {41,9,4963}, {42,26,4355}, {42,18,5017}, {42,10,7506}, {42, 9,6221},
    {43,10,6732}, {43,9,6788}, {44,10,9567}, {45,65,10079}, {45,10,8591}, {45,9,7140},
    {46,26,4544}, {46,10,6186}, {46,9,5624}, {47,26,4352}, {47,10,5918}, {47,9,7523},
    {48,10,5316}, {48,9,4968}, {49,56,12455}, {49,10,5012}, {49,9,6620}, {50,10,8305},
    {50,9,6685}, {51,26,4762}, {51,10,6331}, {51,9,5342}, {52,10,6211}, {52,9,5373},
    {53,26,4461}, {53,9,8686}, {54,26,4909}, {54,9,9899}, {55,64,11378}, {55,9,5085},
    {56,9,6115}, {57,9,6345}, {58,64,12127}, {58, 26,4638}, {58,9,8528}, {59,26,4664},
    {59,9,5191}, {60,24,11795}, {60,21,11849}, {60,9,6786}, {61,26,4789}, {61,23,11513},
    {61,9,7752}, {62,26,5046}, {62,24, 11795}, {62,9,7135}, {63,9,7123}, {64,26,4368},
    {64,9,8483}, {65,26,4912}, {65,9,5939}, {66,34,9153}, {66,9,5376}, {67,26,4361}, {67,9, 6403},
    {68,26,4458}, {69,10,4925}, {69,9,5038}, {70,26,4354}, {70,9,6615}, {70,2,11643},
    {71,26,4492}, {71,10,4540}, {71,9,6835}, {72,9,6554}, {73,10,4822}, {73,9,6071},
    {74,67,3729}, {74,25,10844}, {74,9,7038}, {75,65,12234}, {75,64,9213}, {75,9,5115},
    {76,67,4839}, {76,65,11794}, {76,57,12793}, {76,56,12737}, {76,26,5259}, {76,9,4920},
    {77,9,7081}, {78,26,5034}, {78,9,6640}, {79,68,5495}, {79,23,10883}, {79,9,5438},
    {80,9,5324}, {81, 26,4393}, {82,26,5050}, {82,18,4161}, {82,16,4011}, {82,15, 3997},
    {82,14,4632}, {82,10,4574}, {82,9,6427}, {83,26,4417}, {83,9,5113}, {84,9,7614},
    {85,66,12358}, {85,60,5821}, {85,53,6901}, {85,19,12958}, {85,18,4446}, {85,9,7819},
    {86,19,11022}, {86,9, 5690}, {87,9,5806}, {88,9,5846}, {89, 9,5583}, {90,64,11487},
    {90,24,10515}, {90,22,12108}, {90,9,6876}, {91,9, 9245}, {92,56,12772}, {92,34,11759},
    {92,24,11653}, {92, 23,11753}, {92,22,12229}, {92,21,11898}, {92,20,12019}, {92,19,12982},
    {92,9,9065}, {93,57, 12778}, {93,56,12772}, {93,19,12941}, {93,9,8544}, {94,9, 6319},
    {95,24,11772}, {95,22,9014}, {95,10,7549}, {95,9,7275}, {96,26, 4406}, {96,9,5441},
    {97,26,4691}, {98,26,4549}, {98,9,6709}, {99,9,5199}, {100,9,5564}, {101,23,11165}, {101,9,6683}};
    Flatten[Select[Split[Sort[t,OrderedQ[{First[#1], First[#2]}]&],First[#1]==First[#2]&],Length[#]==2&] ,1]

    Out[2]=
    {{0,26,4728},{0,9,7111},{6,26,6290},{6,9,5565}, {15,10,4901},{15,9,7152},{17,26,5864},{17,9,5809},
    {19,26,4788},{19,9,8497},{22,26,5644},{22,9,6466}, {25,56,12404},{25,9,5348},{28,57,9897},{28,9,6840},
    {34,10,4599},{34,9,5505},{36,26,4377},{36,9,5208}, {37,10,7191},{37,9,5856},{40,10,4692},{40,9,5291},
    {43,10,6732},{43,9,6788},{48,10,5316},{48,9,4968}, {50,10,8305},{50,9,6685},{52,10,6211},{52,9,5373},
    {53,26,4461},{53,9,8686},{54,26,4909},{54,9,9899}, {55,64,11378},{55,9,5085},{59,26,4664},{59,9,5191},
    {64,26,4368},{64,9,8483},{65,26,4912},{65,9,5939}, {66,34,9153},{66,9,5376},{67,26,4361},{67,9,6403},
    {69,10,4925},{69,9,5038},{73,10,4822},{73,9,6071}, {78,26,5034},{78,9,6640},{83,26,4417},{83,9,5113},
    {86,19,11022},{86,9,5690},{96,26,4406},{96,9,5441}, {98,26,4549},{98,9,6709},{101,23,11165},{101,9,6683}}
     
    Last edited: Jun 25, 2012
  4. Jun 25, 2012 #3
    Another method, perhaps simpler to understand.

    Select[t, Count[t, {First[#], __}] == 2 &]

    This will probably be slower for really large lists because it will have to do n passes over the list and do n pattern match comparisons on each pass.
     
  5. Jun 25, 2012 #4
    Bill Simpson, you're amazing! I knew there was probably an easy way to do this but for the life of me could not figure it out! If I may ask, how did you know to use those specific functions?



    I have one last thing I need help on. Now, that the data has been parsed into pairs of rows (1st column common values), I need to:

    Check the second column integer (which refers to a channel number) of each of the paired rows:

    The row is categorized as "ANODE" if the 2nd column value is within: {1 to 35}
    The row is categorized as "CATHODE" if the 2nd column value is within: {36 to 70}

    Now, what I need mathematica to do is KEEP every paired rows (1st column common values) that has a 2nd column's of "ANODE and CATHODE" or vice versa and delete the rest. Any pair with that is "CATHODE and CATHODE" or "ANODE and ANODE" should be deleted.

    Seriously, thanks in advanced...I'm new to mathematica and have been working hours on this!
     
  6. Jun 26, 2012 #5
    Hundreds and thousands of hours of studying and sharpening the tools.

    I remember 35 years ago reading a thin little book on FORTRAN 4 in half an hour. It was less than 1 cm. thick. I think it might have been yellow. I would buy one for old times if I could remember the title or recognize the cover and find a copy on the net.

    After I was done I went and asked others "Was that it, was that all there was to this programming thing?" When they told me that was it I replied "How do you get anything done if that is all there is?" I have my copy of the last printed Mathematica reference manual and it is 6 cm. thick and that only covered a small fraction of the language of the language a decade ago. Since that time they have added a thousand or more new commands with every new version and they have said it is impossible to ever be able to print a new reference manual, it would take up a whole shelf.

    Why am I telling you this? Because the language is huge. Buy yourself a few good books and read them cover to cover a few times. "Mathematica Navigator" is good. "The Mathematica Cookbook" is good. "Applied Mathematica: Getting Started, Getting It Done" is very old and should be available really cheaply from the discount book dealers. I learned a lot from that fifteen years ago. This much reading should help get you started with the first few hundred hours of starting to learn the tool. Trott's epic tomes are likely too advanced and specialized for almost any reader and there are a number of other really really bad Mathematica books out there.

    Take almost what we had before to find your pairs, but don't break them up into individual items yet.

    r1 = Select[Split[Sort[t, OrderedQ[{First[#1], First[#2]}] &], First[#1] == First[#2] &], Length[#] == 2 &]

    Next filter out the acceptable pairs.

    r2 = Select[r1, #[[1, 2]] < 35.5 && #[[2,2]] > 35.5 || #[[1, 2]] > 35.5 && #[[2, 2]] < 35.5 &]

    And finally strip off one layer of extra {}, which could be done several different ways.

    r3 = Flatten[r2, 1]

    leaving you with

    {{25, 56, 12404}, {25, 9, 5348}, {28, 57, 9897}, {28, 9, 6840}, {55, 64,11378}, {55, 9, 5085}}
     
  7. Jun 27, 2012 #6
    Awesome...once again thank you! I will be taking a careful look at these codes over the next few days to make sure i know what exactly is going on!
     
  8. Jun 27, 2012 #7
    Thanks for your help Bill! Now, I am curious...

    How about if I wanted to characterize the ANODE and CATHODE values different and not in order for example:

    -The row is categorized as "ANODE" if the 2nd column value is within: {5,6,7,8,9,10,11,12,13,14,15,16,17,18,26,27,28,29,30,31,32,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,60,61,62,63,67,68,69,70}

    -The row is categorized as "CATHODE" if the 2nd column value is within: {1,2,3,4,19,20,21,22,23,24,25,33,34,37,38,39,40,56,57,58,59,64,65,66}


    I've tried...

    A. Replacing the inequality with a double equals and pasting the values:


    r2 = Select[r1, #[[1, 2]] =={VALUES1} && #[[2,2]]=={VALUES2} || #[[1, 2]] =={VALUES2} && #[[2, 2]]== {VALUES1} &]


    B. Defining an anode and cathode variable with those values:

    r2 = Select[r1, #[[1, 2]] ==var_an && #[[2,2]] ==var_cat || #[[1, 2]] ==var_cat 35.5 && #[[2, 2]] ==var_an &]


    C. Just pasting the list as I did in A but with || "ors" instead of commas.

    D. I used arrows instead of double equals, which seem to not give errors, but not the desired result (gave null result of {}).


    None of these worked haha.
     
    Last edited: Jun 27, 2012
  9. Jun 27, 2012 #8
  10. Jun 27, 2012 #9
    Thanks for the hint...very helpful and I doubt I would have ever guessed to use MemberQ.

    Anyways, here is what worked for me:


    -Defining an anode variable (an) and a cathode variable (cat).


    -r2 = Select[r1, MemberQ[an,#[[1, 2]]] && MemberQ[cat,#[[2,2]]]|| MemberQ[cat,#[[1, 2]]] && MemberQ[an,#[[2, 2]]]&];





    I attached the file in its entirety. Do you have any helpful suggestions or corrections?
     

    Attached Files:

  11. Jun 27, 2012 #10
    There is no 35 or 36 in an or cat. Other than that I see nothing other than suggesting you carefully and extensively document your code so you will understand it next time.
     
  12. Jun 27, 2012 #11
    Good eye! But that was done on purpose...Yes, I cant say I fully understand how r1 is done...but I will be studying it. Hope you don't mind if I bug you with more questions hahahaha :)


    Thanks for all your help though. You seriously saved me hours of work and the hairs on my head!
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Deleting Duplicate Values Of Matrix In Mathematica?
Loading...