# Spreadsheet help: compare two lists to find exceptions

Gold Member
TL;DR Summary
Compare a column of each of two spreadsheets to pull out exceptions
I have coerced two lists of patients into two sheets on a spreadsheet.

I need to determine who is missing from each list.

So, schematically, something like this output:

List 1:
Alice
Bob
Carol
Dave

List 2:
Alice
Bill
Bob
Dave

or something like this is fine too:

List 1:
Carol
List 2:
Bill

I'm not really picky. I am compiling my report manually, so "cheap and cheerful" is the word-of-the-day. The only reason I need to automate parts of it is because of the sheer volume: 70 doctors, totaling 16,000 patients.

I'm not sure how to Google search for such operations - whether it's "conditional formatting" or "compare sheets", etc. Which is why I'm asking for direction/inspiration.

As for comparing sheets, note that the two lists are not currently normalized:
One list might be:
Last Name, First name, Address, Gender, etc.
The other list list might be:
Doctor, Phone#, Full name, Gender, etc.

(The point I'm making is that a straight up "compare two sheets" is not really what this is. A "compare two sheets" operation is likely expecting a majority of overlap with only data/content differences.) Normalizing them could be done, but that's even more work. I'm sure I will have to at least normalize the name field(s).

Also: No hurry. Anytime in the next half hour will be fine.

Staff Emeritus
Excel has something called an intersection operator. You could use that. Syntx is tricky, as white space is significant.

Homework Helper
Gold Member
VLOOKUP seems like a nice candidate.
In my experience, MATCH is almost always a better solution than any of the (V/H/)LOOKUP family (often, as here, with the 'match-type' option set to 0). The answer here is probably to introduce a column doing the MATCHing and using that for conditional formatting (which is always a bit of a faff) or with a bit more work, ingenuity and intermediate calculations, extracting the matching rows.

DaveC426913
Gold Member
OK, making progress with the MATCH function.

I've abandoned using names because one list has Bobbie Smith and the other has Bobbie Jo Smith.
But insurance numbers are unique, so that's a lot easier.

But one list has ON 1234 567 890 YZ while the other has 1234 567 890, so now I have to see if there's a way of trimming/truncating a field...

Ah .. MID()

Gold Member
Nope. Not working.

Here's a sample:
Sheet 'H-RCP':

 7072​ Alice​ 4154​ Andy​ etc.

Sheet 'H-PSS' (sorted):

 7033​ Bill​ 7058​ Dave​ 7072​ Alice​ 7103​ Charlie​ 7113​ Annie​ etc.

Sheet 'Output', Cell A1:

 =MATCH('H-RCP'.A1; 'H-PSS'.A1:A5; 1)​

If I am using this correctly, MATCH is
looking for H-RCP cell 1: 7072
in the array H-PSS cells 1-5

It should return row 3, but it actually returns #N/A, meaning it didn't find it.

Homework Helper
Gold Member
Use 0 as the third argument; you don't need to sort them. Also make sure you are not comparing numbers with text that looks like a number (you might need to convert with VALUE or TEXT).

scottdave
Gold Member
Use 0 as the third argument; you don't need to sort them. Also make sure you are not comparing numbers with text that looks like a number (you might need to convert with VALUE or TEXT).
Yup, and yup.
Tried them as numbers and as text.

The only other difference I can think of is that one of them is, itself, the output of a function: the MID function, to strip off the prefix and suffix, thus:

B1 =MID(A1;3;10)
 ON6323609633ET​ 6323609633​

Staff Emeritus
How many names are there? If it's mall, and you don't need to repeat it often, it might be aster and cheaper to have two sorted lists side by side.

Gold Member
How many names are there? If it's mall, and you don't need to repeat it often, it might be aster and cheaper to have two sorted lists side by side.
16,000 names for 70 doctors.

Homework Helper
Gold Member
The only other difference I can think of is that one of them is, itself, the output of a function: the MID function, to strip off the prefix and suffix, thus:

B1 =MID(A1;3;10)
Yes, that will give a string (text). Convert it to a number with VALUE(MID(A1;3;10))[icode].

Gold Member
Yes, that will give a string (text). Convert it to a number with VALUE(MID(A1;3;10))[icode].
I'm comparing text.

Staff Emeritus
Give the problem, along with your worked example to ChatGPT. It can be very good at Excel. You can suggest using MATCH.

Gold Member
Really? 🤔

Staff Emeritus
Really? 🤔
ChatGPT Q & A results.
Q: I have a spreadsheet file. Sheet A has a list of ID numbers in column A, and names in column B. Sheet B has a list of ID numbers in column A, and names in column B. Write an Excel expression to join the two and return only the matches in Sheet C.
A: In Excel, you can use the VLOOKUP function ... [cut off]
Q: No, use the MATCH function.
A: You can use the MATCH function in Excel to find the row number of a specific value in a range, and then use that row number to retrieve the corresponding data. Here's how to use the MATCH function in Sheet C to find the matching ID numbers in Sheet A and Sheet B:

In Cell A2 of Sheet C, enter the following formula:

=MATCH(A2,SheetA!A:A,0)

In Cell B2 of Sheet C, enter the following formula:

=MATCH(A2,SheetB!A:A,0)

These formulas will return the row number of the matching ID number in Sheet A and Sheet B, respectively. To retrieve the corresponding name for each ID, use the INDEX function, like this:

In Cell C2 of Sheet C, enter the following formula:

=INDEX(SheetA!B:B,A2)

In Cell D2 of Sheet C, enter the following formula:

=INDEX(SheetB!B:B,B2)

These formulas will return the name corresponding to the matching ID number in Sheet A and Sheet B, respectively.

You can then drag these formulas down to the last row of your data to apply them to all the IDs in Sheet C.

Mentor
TL;DR Summary: Compare a column of each of two spreadsheets to pull out exceptions

(The point I'm making is that a straight up "compare two sheets" is not really what this is. A "compare two sheets" operation is likely expecting a majority of overlap with only data/content differences.)
Just as an aside question, have you considered just using a typical scripting programming language to do this type of task instead of Excel? Have you used Python/Perl/Tcl at all before? I will often use Excel for simple stuff, but as soon as things get a little complicated, I'm inclined to just write a program to do it instead...

Staff Emeritus
I was also going to suggest an honest to goodness database, but I suspect that the additional ease of that operation would be cancelled out by the trouble getting the data in and out.

DaveC426913
Homework Helper
Gold Member
I'm comparing text.
You sure? 6323609633 looks like a number to me (and to Excel).

Also, if you have Excel I'd be inclined to use Excel, not OpenOffice which is mostly compatible but you can never be sure how edge cases are handled.

Homework Helper
Gold Member
ChatGPT Q & A results.
Which are broken: a human expert can spot this instantly (SheetC!A2 will be a circular reference).

You really need to stop believing that a language tool can ever provide a reliable answer to technical problems.

Wrichik Basu and jack action
Gold Member
You sure? 6323609633 looks like a number to me (and to Excel).
By setting the format to text explicitly.

Also, if you have Excel I'd be inclined to use Excel, not OpenOffice which is mostly compatible but you can never be sure how edge cases are handled.
You're right, apparently this utility may have to survive me, which means I'll have to use their tools, not mine.

Gold Member
Just as an aside question, have you considered just using a typical scripting programming language to do this type of task instead of Excel? Have you used Python/Perl/Tcl at all before? I will often use Excel for simple stuff, but as soon as things get a little complicated, I'm inclined to just write a program to do it instead...
*sigh* Yeah. I made* a Python script for a very similar utility. (There's lots of data massaging that needs to happen in this office because of old. obsolete and unfriendly Record Management software).

The problem with such utilities is how to make them survive me.

It's hard enough writing several pages of instructions for my successor to export/massage/import/process the data, but if I use a python program, that adds another bunch of pages in the "Office Procedures" documentation for how to install and use Python and environment. And that drops any likelihood this program will ever be used beyond my exit to less than 1%.

pasmith
Homework Helper
Gold Member
By setting the format to text explicitly.
No, a number formatted as text is not text, it is still a number (you can verify this by adding 1 to it). This is a common source of problems when extracting numbers from strings in spreadsheets. The most reliable way around it is to make sure that anything that looks like a number is a number with VALUE, not to try to make numbers look like text.

DaveE, DaveC426913 and berkeman
Homework Helper
2022 Award
Since you have access to Excel, I would suggest taking advantage of Power Query. Load every table into your data model, apply any necessary transformations to get each dataset into a consistent format, and then analyse them.

(You will need the actual desktop version of Excel, not the Office365 app.)

Gold Member
Well this is disappointing.

It appears that a derived field does not want to be formatted as a number, so I can't MATCH it.

My Search data:

My Array to be searched:

Column B is derived from column A.
Column C is flat numbers (as a control).

The results:

Column B cannot find a MATCH to the derived number, even though they're both formatted as 'number'.
Column C (the control) MATCHes fine.

I am stumped.

Test file attached.

Gold Member
Yes, that will give a string (text). Convert it to a number with VALUE(MID(A1;3;10)).
Ah. @pbuk solved that already! Enclose it in VALUE(). (It also works to add +0)

Mentor
Dave, what was the formula you used in cell D1 of post #25? I'm using =MATCH(D1, B1:B5), but it just gives the (correct) index of the value I have in D1, of the column range B1 - B5.

I can't figure out how to use the index of a range to get the value there.

Homework Helper
Gold Member
I'm using =MATCH(D1, B1:B5), but it just gives the (correct) index of the value I have in D1, of the column range B1 - B5.
Tips for MATCH:
• Always specify the final argument, which you most often want to be 0.
• It is often best to MATCH on a whole column e.g. MATCH(D1,B:B), this avoids mistakes when you add more items to the list (and you should never put that formula in D1, or anywhere in column B as it would be a circular reference).
I can't figure out how to use the index of a range to get the value there.
You could try INDEX

Gold Member
• It is often best to MATCH on a whole column e.g. MATCH(D1,B:B), this avoids mistakes
Indeed. Too bad it doesn't work in OpenOffice Calc.

Gold Member
Indeed. Too bad it doesn't work in OpenOffice Calc.
Yes, it does:

Homework Helper
Gold Member
Indeed. Too bad it doesn't work in OpenOffice Calc.
In that case I strongly recommend you use guard rows (the narrow rows in the sheet below - I downloaded the latest OpenOffice just for you!)

Coming up: the formulas to achieve this.

Homework Helper
Gold Member
Coming up: the formulas to achieve this.
Doesn't seem easy in OpenOffice, I'll give you the entries in row 3, the rest just copy down.
Code:
C3 =MATCH(A3;B$2:B$25;0)
D3 =IF(ISNA(C3);0;MAX(D2:D$2)+1) E3 =E2+1 F3 =MATCH(E3;D$2:D$25;0) G3 =IF(ISNA(F3);"";INDEX(A$2:A$25;$F3))

Homework Helper
Gold Member
Yes, it does:
It didn't for me (OpenOffice on Windows) - that looks like LibreOffice on Linux to me.

jack action and DaveC426913
Homework Helper
Gold Member
Yeah, works in LibreOffice on Windows too, and we can display the formulas - see below (different formulas as we don't need the guard rows now).

Can you persuade your client to move to LibreOffice, apart from these there are a number of technical and commercial reasons to drop OpenOffice since the fork?

Gold Member
In my case (LibreOffice), after I begin writing a function, when I click on a column heading, it writes B1:B1048576, and then it transforms into B:B after I get out of the cell.