This post by
@jack action
jack action said:
That is pretty easy to accomplish. Here is the method I used when doing it without any list:
- IRATE - 5 most popular letters (in English);
- CLONS - 5 next most popular letters;
- DUMPY - 5 next most popular letters (including all vowels at this point);
- BiGHa - 3 next most popular letters;
- WaKFs - 3 next most popular letters;
- The only letters left are JQVXZ
With these guesses, it is impossible not to get the solutions within 6 guesses if you know the remaining words after each guess.
If you don't check any list, it is possible to miss because of a word you cannot think of, but it rarely occurred to me - and English is not even my mother tongue. My average was somewhere between 3 and 4.
One word I remember struggling with was WALTZ. The word just did not pop into my head, even on the 6th guess. But if I had the full list for WORDLE, I would have isolated it after the 2nd second guess. Even with the
full list of 5-letter English words, I would have isolated it with my 3rd guess.
and
@Orodruin's response
Orodruin said:
It is impossible not to know what the letters are (effectively), but does it guarantee a solution? What about words that have the same letters but in different order? Different double letters? Etc.
prompted me to investigate whether there exists a set of starting guesses that can guarantee a path to any word on the list of 2,331. As always, each guess adds 1 point to the score and the score should not exceed 6. Clearly, guess 6
must be made with 100% certainty. In summary, I established by brute force that it can be done using the same 3 words for starting guesses. That is, I found paths for all 2,331 words using 3 or fewer guesses after the initial 3.
My initial 3 words were ORATE, LIMNS, DUCHY. They are orthogonal in the sense that they have no shared letters. This turned out to be a good choice. Here are the stats.
After applying this 3-word filter,
- 63.3% of the words can be guessed with 100% certainty for a score of 4.
- 20.6% of the words formed 50-50 pairs. I gave these pairs the expectation value score of 4.5
After a 4th guess chosen on the basis of the information from the 3-word filter,
- 13.3% of the words can be guessed with 100% certainty for a score of 5.
- 2.6% of the words formed 50-50 pairs for the expectation value score of 5.5
- 0.2% (4 words) resulted in a certain score of 6 (see specific actual example below)
Example: After applying the 3-word filter and the 4th guess I ended up with BOBBY BOOBY BOOZY. If I choose any one of the 3 as the fifth guess and it fails, then the correct word can be entered as guess 6. This results in a "legitimate" score of 6 as opposed to a "wrong choice" 6 from a 50-50 pair.
The candidate group populations are also be of interest. When the number of candidates was 5 or fewer, it is likely that one of the candidates can be used for a filter that will distinguish all the rest. A notable example is the group SPILT SPLIT STILL STILT. Any one of the four words, can be used as a filter that will find the answer in no more than 2 additional guesses.
It can get a bit tricky when the number of candidates increases. My heart sank when a group of 11 candidates (the highest there is with this initial filter), popped up
BAKER EAGER GAZER PAPER PARER RARER REBAR WAFER WAGER WAVER ZEBRA
I will spare you the details of my thought process and give you the streamlined version. For the fourth guess I used GAWPS. It is the third person singular of the verb to gawp. When one gawps, one is simultaneously gawking and gaping, kinda like watching TikTok videos for hours on end.
As you can see from the tree in the figure on the right, that would do the trick. The dashed rectangle encloses all the possible outcomes of using GAWPS as guess 4. The input group of 11 words results in a an output of four 50-50 groups (gray) with scores of 5.5 and three one-word groups (green) with scores of 5.
The mean score was 4.28, i.e. roughly 1β
guess beyond the baseline of 3. I should point out that this mean is only one point higher that the mean of 3.24 I got with the other survey (see post #6,064). That is to be expected because in that survey I used a one-word filter (SLATE), took pains to optimize the filters and excluded words that were already used.
Doing this gave me a new perspective because my goal was not to find the answer using the fewest guesses and tricks of the trade. At the start all the words have equal probability. When one applies the 3-filter operator ##\mathcal F_3 \mathcal F_2 \mathcal F_1## on any one of 2,331 words one reduces the 2,331-fold degeneracy to N-fold where N is the number of candidates. Which specific words will be bundled together in the same group has nothing to do with whether a word has popular or unpopular letters but depends solely on the filters.
For example, the guess sequence to find the quintessential rare-letter word JAZZY is
ORATE β DUCHY β LIMNS β BAGGY β JAZZY
How did that come about?
##\mathcal F_3\mathcal F_2\mathcal F_1\text{(All 2331 words)}=~##(BAGGY, GAWKY, JAZZY) ##\oplus ~\mathcal S.## Here, ##\mathcal S## is the unique signature of the group shared by each of its members.
Choose ##\mathcal F_4= ## BAGGY. Then the paths to any of the three words are shown in the simulator output below. Note that the top three rows in each of the ##6\times 5## arrays is the signature of this group. The fourth row is the unique signature of each word within the group when ##\mathcal F_4## is applied to it.
So for the daily puzzle I will (at least for a while)
- Apply ##\mathcal F_3\mathcal F_2\mathcal F_1## to the daily word to identify its group. I already know that 80% of the ensuing words groups consist of either single words or 50-50 pairs which makes finding the answer trivial.
- For the remaining 20% of the words, look up my previously discovered filters for the group and apply them. Since I've already done all the work, this part is also trivial.
- Post the solution with a day's delay. The next one is #1,312 two days from now.
I suspect that I will find it boring because I've already been there - done that. Nevertheless, the method needs to be field-tested.
Disclaimer
I understand that this method is not efficient. My goal was establishing that it be sufficient to guarantee a solution for all words.