I need to come up with an optimal algorithm for playing a popular guessing game which you might know by a handful of names like Bulls and Cows, Codebreaker, Guess-the-number, Mastermind(without repeating digits though).
Here's the general definitions from the wiki page:

I have thought about it and I do have a certain strategy I play with, but I kinda fail to condense it into something specific and I don't think it works always and I also doubt its the most optimal.
Any tips, resources would be helpful.

Are you trying to optimize it with respect to the number of guesses you have to make, or for time (as in, does performing what may be a computationally large amount of logic not affect the algorithm's optimality)?

If it's the guessing thing, then perhaps a scheme like this...

List the 10 P 4 = 5040 possibilities.

Inquire as to the first one in the list... say, 0123.

Then,...

if B=4, you've won the game.

if B=3, eliminate this password only.

if B=2, eliminate from your list all passwords that have at least 3 ordered digits in common with the password you guessed.

if B=1, eliminate from your list all passwords that have at least 2 ordered digits in common with the password you guessed.

if B=0, eliminate from your list all passwords that have at least 1 ordered digit in common with the password you guessed.

Then guess the next remaining number in the list.

For instance, let's have the list ordered like: 0123, 0124, 0125, ..., etc. Say your secret password is 7628.

I guess 0123. B=1, so I eliminate anything looking like 01xx, 0x2x, 0xx3, x12x, x1x3, xx23. This is 270 numbers off the list.

I then guess 0213. B=0, so I eliminate anything looking like 0xxx, x2xx, xx1x, xxx3. This will be reduce the list further, by something between 0 and 360 elements (my guess is somewhere in the neighborhood of 300).

I then guess 1032. B=0, so I eliminate anything looking like 1xxx, x0xx, xx3x, xxx2.

I then guess 2103. B=0, so I eliminate anything looking like 2xxx, x1xx, xx0x, xxx3.

I then guess 3410. B=0, so I eliminate anything looking like 3xxx, x4xx, xx1x, xxx0.

I then guess 4321. B=1, so I eliminate anything looking like 43xx, 4x2x, 4xx1, x32x, x3x1, xx21.

etc. you get the idea. It's hard to do in my head, but once you get a program down for it it can run through this algorithm like a champ.

Notice that I don't even make reference to the C's. Can the idea be improved using the C's? Or must an entirely different method be adopted if one is to use the C's (and is the other method more efficient?) Thought experiment...

Is this in the ballpark of what you thought might be interesting?

I believe the algorithm is correct, in that in principle you check all numbers (either ask about them or eliminate them if they are impossible) and you can't eliminate a possible number. It always halts for finite input and the only way it can return is if B=4... and since the list will contain all possibilities, this should be alright.

The beauty of the game is that everything carries some information (which most of the time isn't that obvious too) - including C's and stuff. Obviously the more information there is, the better.
Well since its a game I am trying to optimize for number of guesses.

I usually guess the number in 7 or less turns(if I don't make a mistake that is). General strategy is going through all the numbers first.
Ex:
Num = 2407
Guesses:
1. 1234 - 2C
2. 5678 - 1C //By now I know that either 9 or 0 is also present.
3. 5690 - 1C //Now I know it isn't 5 or 6
4. 1596 - X //Eliminate those, so we now know that there is a 0 and it isn't last !(xxx0)
...and so forth...

However you do give me an idea here - while its kinda difficult for humans to brute-force, computers can do that in a snap. But is brute-forcing ever a good strategy?
One option is enumerating all permutations and eliminating the ones not possible.
Thought I'd rather not resort to brute-forcing...

And then there's also the challenge of coming up with next number or series of numbers that give you the most information, given what you have so far.

Also it has to be general, working not just for decimal with 4 numbers to guess.

Actually, my method generalizes to 10 numbers... and it's not complete in the sense that the information about Cs can be incorporated into the algorithm to make it converge faster.

Also, what I have down can be improved considerably as-is.

It's "brute force" only in the sense that you do a lot of work in between asking...the asking part isn't really so very brute force. A "brute force" algorithm with respect to what you're optimizing would be to just ask random numbers from the list, removing them if they're wrong. Then you could expect to ask [(10 C r) / 2] times on the average and (10 C r) times in the worst case, where r is the password length.

A nice thing about the general strategy - making a list, iterating through the list and removing impossibilities - is that it is clearly and provably correct. And if you use every bit of information to its fullest extent, you can do a lot better than what's above.

Also, you mention what the next best guess is. If you have some way of determining that, well, you can further improve it by sorting the numbers in the list based on how much information they may possibly add. The algorithm given already partially does that, and depending on how you put in the C handling logic, it may not even be necessary.

The above is just the idea, not the final algorithm.
Think about it... it needs some more work, but the idea is sound.

Does this answer your questions? I didn't just want to do your work for you.

Basically, to be both correct AND optimal, you will have to prove that you consider the possibility of each number, and you do so a minimal number of times. The list elimination method is nice in that respect since you can remove impossible passwords (and show your algorithm only removes those which are impossible), that the algorithm must produce the correct answer (because it checks all possible passwords until it finds the correct one), and that it's optimal (because a faster method would somehow know it was one of several possible solutions... without asking).

I see your point. It seems to me that humans go about solving this in a different way, not so methodical, hence the feeling that this is still a bit brute-force. However it may be just that we are better at picking up on complex patterns than computers are :P

But what we have here is pretty good. I'll test this method now, then I'll do some tech work on a program(interface and stuff), and then I'll think about implementing the algorithm.
I am still wondering how to come up with the best guess(or for the purposes of different difficulty settings - differently informative ones).

I'm not satisfied. The method in post #2 is clearly nonoptimal, and I expect that the method in post #6 is also suboptimal. Either would take too long on large instances of the problem: the space needed for the game with 10 letters or digits would be ~115 TB.

The problem seems structured enough to admit a better solution.

Consider a generalization of the C&B game where the first player can give hints ("It's not 1234"). This allows the list to be in any state at a given point. Suppose it has:
1234
4567
4568

Algorithm #6 takes two guesses in the worst case, but an optimal algorithm can determine the answer in one guess.

Showing that #6 is suboptimal in the (non-generalized) C&B game could presumably be done by finding a way to reach a state like this with actual guesses.

What? 115TB???
Er... I do need it to be computationally efficient too! Enough so I can also keep history of guesses too because humans are prone to errors and I need to be able to go back and correct them.
It also needs to scale well only to a maximum of 16 P 6.

If you have an alphabet of 16 characters and select 6, there are 5,765,760 entries for the list, so it takes up 5,765,760 bits (about 700 kB). That's not too bad, but it's not good either.

And I still think that a better algorithm exists by altering the order of guesses.

Hey guys, I'm new here... but I like this problem.

I agree that the algorithm in post #6 isn't optimal... or at least it seems like it could be better.

But it looks promising... could the basic structure not be improved upon by reordering the list, or if space is a consideration, by generating a "rule" set and generating new passwords on the fly, and checking them against the rules? Then all entries wouldn't have to be kept in storage at the same time. Also, this would make it easier to come up with the next "best guess"...

Ok what about a minimax approach to minimize the number of guesses required.

Maintain a list of possibilities and eliminate all that are inconsistent with the clues so far. At each step you choose the guess that would maximize the worst case number of possibilities that would be eliminated, i.e. let n(X,Y) be the number of possibilities that would be eliminated if X is guessed and Y is the answer; let n(X) be the minimum of n(X,Y) over all possibilities Y and choose as your next guess an X that maximizes n(X).

Obviously different Y's will yield the same clue and thus the same n(X,Y) so we can determine n(X) from all possible distinct clues.

I wonder, is there ever a situation where an optimal guess comes from outside the list of possibilities?

Somehow I get the feeling that we shouldn't talk about "the guess with the highest chance of getting the most information". I don't want a probabilistic approach, even if its statistically better between games. I think randomness should only help, not harm the algo's efforts.

The idea came about as a side project to a programing task. I used to play that game a lot in class and I find it interesting. So in the light of a programing task we were given which seemed relatively easy and useless by itself I thought, what if along with the required technical part I also do this which puts the task to practical use and isn't so useless(might even be publicly releasable; Hell, some people would even charge money for something like this).

In that regard we are mixing up the practical and theoretical domain of the problem. I'll be worrying about doing the programing. Considering the maximum is 16 P 6 with a careful choice of service code algorithms I believe the program will be snappy enough.

Here we need to come up with the best theoretic deterministic algorithm math allows.

It seems like, in any event, there needs to be a way to find the "value" of a guess in terms of how much information it will give in the best case, the worst case, or on average.

The naive method - assuming every possible response in terms of B and C and then calculating the eliminated possibilities for each password seems unwieldy.

Actually that is a pretty valid issue. I often use numbers that I know not to be present to clarify information about other numbers.
Say you have
1234 - 1C
5678 - 2C
1290 - 2C
1394 - X - intentionally using the clearly not present 3/4 to determine the presence of other numbers.

It could just be, though, that this is just the most straightforward to get information, not the "best". Again, I think there needs to be a way to characterize a guess in terms of how much information it will give for a certain list of possible passwords.

Information is easy to characterize: the base-two logarithm of the number of remaining states is called "bits", which is a familiar unit. But just knowing which guess will maximize information is not enough, since perhaps one state will give more information but further guesses from that position will yield less, with a worse worst-case scenario.

Sadly I haven't really had time to busy myself a lot with that issue in the past few days and it does not seem likely I will be able to in the near future too.

However from the recent replies here an unsuspecting viewer might easily decide we are talking about chess or a chess-like game here.
So maybe there are some techniques we can borrow from chess that apply here. Any resources you can think of maybe?