Self-consistency also based on CoT, which aims to replace the “naive” greedy decoding result used in CoT.
The intuition is really simple: a model can generate several plausible responses to a math question that all arive at the same correct answer; it can also produce an incorrect reasoning path but those solutions are less likely to arrive at the same answer.
Specific instructions:
- Prompted with a set of manually written chain-of-thought examples
- Sample a set of candidate outputs from the LLM’s decoder
- Aggregate the answers and choose the answer that is the most consistent