AutoToS is a tool that makes LLM planning easy, accurate and affordable

September 24, 2024 Editorial Staff 352 Views 0 Comments

Join our daily and weekly emails to receive the latest AI news and updates. Learn More

Large Language Models (LLMs), which search through solutions, have shown promise for solving planning and reasoning problems. Existing methods are often slow, expensive to compute and give unreliable results. Researchers from Cornell University have developed AutoToS. This new technique combines the planning ability of LLMs and the speed and accuracy provided by rule-based search algorithm. AutoToS reduces the cost of solving problems by reducing the need for human interaction. This makes it a promising technique for LLM applications that must reason over large solution spaces.

Thought of Search

There is a growing interest in using LLMs to handle planning problems, and researchers have developed several techniques for this purpose. The more successful techniques, such as Tree of Thoughts, use LLMs as a search algorithm that can validate solutions and propose corrections.

While these approaches have demonstrated impressive results, they face two main challenges. They require many calls to LLMs which can be costly, especially when dealing complex problems that have thousands of solutions. Second, they do not guarantee that the LLM-based algorithm qualifies for “completeness” and “soundness.” Completeness ensures that if a solution exists, the algorithm will eventually find it, while soundness guarantees that any solution returned by the algorithm is valid.

Thought of Search (ToS) offers an alternative approach. ToS uses LLMs to create code for two important components of search algorithm: the successor function, and the goal function. The successor function controls how the algorithm explores the different nodes within the search space. While the goal function tests whether the algorithm has reached its desired state. This approach is much more efficient than keeping the LLM in the loop during the search process. This approach is much more efficient than keeping the LLM in the loop during the search process.

“Historically, in the planning community, these search components were either manually coded for each new problem or produced automatically via translation from a description in a planning language such as PDDL, which in turn was either manually coded or learned from data,” Michael Katz, principal research staff member at IBM Research, told VentureBeat. We proposed using the large language model to generate code for the search component from the textual descriptions of the planning problems. It required a human to review the code generated and refine the output. This manual review was a bottleneck that reduced the speed of the algorithm.

Automating ToS

AutoToS (source: arXiv)

“We felt that in order to automate the process of solving the planning problems provided in natural language, the first step must be to remove the human from the loop.” “We felt that in order to automate the process of solving the planning problems provided in a natural language, the first step must be to take the human out of that loop.”

AutoToS automates the feedback and exception handling process using unit tests and debugging statements, combined with few-shot and chain-of-thought (CoT) prompting techniques.

AutoToS works in multiple steps. It first provides the LLM the problem description, and then prompts it for code to be generated for the successor and goals functions. The model then runs unit tests for the goal function, and gives feedback if the test fails. This feedback is then used by the model to fix its code. The algorithm will then run a limited breadth first search after the goal function has passed the tests to ensure that the functions are complete and sound. The process is repeated till the generated functions have passed all tests.

Finally, the validated functions are plugged into a classic search algorithm to perform the full search efficiently.

AutoToS in action[ToS]The researchers evaluated AutoToS on several planning and reasoning tasks, including BlocksWorld, Mini Crossword and 24 Game. The 24 Game requires you to use basic arithmetic to find a formula equal to 24. You are given four integers. BlocksWorld, a classic AI domain, requires players to rearrange blocks in towers. They tested LLMs of different families including GPT-4o and DeepSeek Coder. They used both the largest and smallest models from each family to evaluate the impact of model size on performance.

Their findings showed that with AutoToS, all models were able to identify and correct errors in their code when given feedback. It was found that the larger models produced more accurate goal functions and only required a few iterations for the successor function. Interestingly, GPT-4o-mini performed surprisingly well in terms of accuracy despite its small size.

“With just a few calls to the language model, we demonstrate that we can obtain the search components without any direct human-in-the-loop feedback, ensuring soundness, completeness, accuracy and nearly 100% accuracy across all models and all domains,” the researchers write.

Compared to other LLM-based planning approaches, ToS drastically reduces the number of calls to the LLM. The previous approach, for example, would have called GPT-4 100,000 times on the 24 Game dataset. This dataset contains 1,362 puzzles. AutoToS, on the other hand, needed only 2.2 calls on average to generate sound search components.

“With these components, we can use the standard BFS algorithm to solve all the 1,362 games together in under 2 seconds and get 100% accuracy, neither of which is achievable by the previous approaches,” Katz said.

AutoToS for enterprise applications

AutoToS can have direct implications for enterprise applications that require planning-based solutions. It cuts the cost of using LLMs and reduces the reliance on manual labor, enabling experts to focus on high-level planning and goal specification.

“We hope that AutoToS can help with both the development and deployment of planning-based solutions,” Katz said. It uses language models to create verifiable search elements, which speeds up the development and avoids the issues associated with large language models. “I don’t believe that there is any question about the role hybrid systems will play in the future AI,” Harsha Koko, IBM research scientist, told VentureBeat. “The current language models can be viewed as hybrid systems since they perform a search to obtain the next tokens.”

While ToS and AutoToS show great promise, there is still room for further exploration.

“It is exciting to see how the landscape of planning in natural language evolves and how LLMs improve the integration of planning tools in decision-making workflows, opening up opportunities for intelligent agents of the future,” Kokel and Katz said. “We are interested in general questions of how the world knowledge of LLMs can help improve planning and acting in real-world environments.”

VB Daily

Stay in the know! Subscribe to receive the latest news daily in your email.
Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

story originally seen here

“With these components, we can use the standard BFS algorithm to solve all the 1,362 games together in under 2 seconds and get 100% accuracy, neither of which is achievable by the previous approaches,” Katz said.

Stay in the know! Subscribe to receive the latest news daily in your email. Thank you for subscribing. Click here to view more VB Newsletters.

Editorial Staff

Leave a Reply Cancel reply

Stay in the know! Subscribe to receive the latest news daily in your email.
Thank you for subscribing. Click here to view more VB Newsletters.