DeepMind finds AI language models can self-optimize prompts.
When individuals engage in the development of novel deep learning AI models, specifically those capable of autonomously discerning pertinent data features, a predominant reliance on optimization algorithms or optimizers prevails to ensure the models achieve a sufficiently high level of accuracy. However, a prominent challenge arises when employing derivative-based optimizers in real-world scenarios.
In a recent research paper, experts from DeepMind have introduced an innovative approach named ‘Optimization by PROmpting’ (OPRO). OPRO employs large language models (LLMs) as optimizers, distinguishing itself by shaping the optimization task using natural language rather than conventional mathematical formulations.
The researchers expound on this approach, stating, ‘Rather than formally defining the optimization problem and deducing the update procedure via a programmed solver, we describe the optimization challenge in natural language. We then guide the LLM to iteratively generate novel solutions based on the problem description and prior solutions.’
The method is exceptionally adaptable, permitting users to modify the problem description or integrate specific directives, thereby guiding the LLM to address a diverse range of problems.
The researchers observed that in the context of small-scale optimization tasks, LLMs can produce effective solutions through the use of prompts alone, occasionally matching or even surpassing the performance of expert-designed heuristic algorithms. Nevertheless, the true potential of OPRO emerges in its capacity to optimize LLM prompts to attain maximum accuracy from these models.
The Mechanics of Optimization by PROmpting:
OPRO initiates with a ‘meta-prompt’ as input, encompassing a natural language description of the task, along with sample problems, placeholders for prompt instructions, and corresponding solutions.
Throughout the optimization process, the large language model generates potential solutions grounded in the problem description and the prior solutions referenced in the meta-prompt.
OPRO subsequently evaluates these candidate solutions, assigning each a quality score. Optimal solutions and their respective scores are incorporated into the meta-prompt, enriching the contextual framework for the subsequent round of solution generation. This iterative process persists until the model ceases to propose superior solutions.
The primary advantage of employing LLMs for optimization lies in their capacity to comprehend natural language, allowing individuals to delineate optimization tasks without the need for formal specifications. This empowers users to stipulate target metrics such as ‘accuracy’ while providing supplementary instructions, such as requesting concise and broadly applicable solutions.
OPRO also leverages LLMs’ capability to discern context-specific patterns, enabling the model to recognize an optimization trajectory derived from the examples within the meta-prompt. The researchers note that, ‘Including the optimization trajectory in the meta-prompt allows the LLM to identify similarities among solutions with high scores, encouraging the LLM to build upon existing effective solutions and potentially generate superior ones without explicit instructions on how to update the solutions.’
To assess the effectiveness of OPRO, the researchers conducted experiments on two well-established mathematical optimization problems: linear regression and the ‘traveling salesman problem.’ While OPRO may not be the most optimal method for solving these problems, the results were encouraging. The researchers reported that, ‘On both tasks, we observed LLMs adeptly capturing the optimization pathways in small-scale problems, primarily based on the optimization trajectory outlined in the meta-prompt.’
Optimizing LLM Prompts with OPRO:
Experiments demonstrated that prompt engineering can significantly influence a model’s output. For example, appending the phrase ‘let’s think step by step’ to a prompt could guide the model to adopt a more reasoned approach, leading to more accurate results.
However, it’s essential to emphasize that this doesn’t imply that LLMs possess human-like reasoning abilities. Their responses are profoundly influenced by the format of the prompt, and semantically analogous prompts can yield substantially different outcomes. As the DeepMind researchers underscored, ‘Optimal prompt formats can be model-specific and task-specific.’
The genuine potential of Optimization by PROmpting lies in its capacity to optimize prompts for LLMs like OpenAI’s ChatGPT and Google’s PaLM, guiding these models to identify the most effective prompts that maximize task accuracy.
To illustrate this concept, consider the task of identifying the optimal prompt for solving word-based math problems. An ‘optimizer LLM’ is supplied with a meta-prompt encompassing instructions and examples, complete with placeholders for the optimization prompt (e.g., ‘Let’s think step by step’). The model generates a variety of optimization prompts, which are subsequently assessed by a ‘scorer LLM’ through testing on problem examples. The most effective prompts, along with their associated scores, are integrated at the outset of the meta-prompt, and the process iterates.
The researchers tested this technique using several LLMs from the PaLM and GPT families and found that ‘all LLMs in our evaluation are able to serve as optimizers, consistently enhancing the performance of the generated prompts through iterative optimization until convergence.’
For instance, when applying OPRO with PaLM-2 to the GSM8K benchmark, which consists of grade school math word problems, the model commenced with the prompt ‘Let’s solve the problem’ and generated various strings such as ‘Let’s think carefully about the problem and solve it together,’ ‘Let’s break it down,’ ‘Let’s calculate our way to the solution,’ and ultimately ‘Let’s do the math,’ which yielded the highest accuracy.
In another experiment, the most accurate result was achieved by incorporating the string ‘Take a deep breath and work on this problem step-by-step’ before the LLM’s answer.
These outcomes are both intriguing and thought-provoking. To a human observer, these instructions may appear synonymous, yet they evoked distinct responses from the LLM, underscoring the caution against anthropomorphizing LLMs and highlighting the depth of our ongoing exploration into their inner workings.
However, the advantage of OPRO is clear. It furnishes a structured method for exploring the vast landscape of potential LLM prompts and identifying the one best suited for addressing a particular type of problem. While its applicability in real-world scenarios awaits further examination, this research constitutes a significant step toward comprehending the mechanics of LLMs.