Welcome to City-Data.com Forum!
U.S. CitiesCity-Data Forum Index
Go Back   City-Data Forum > General Forums > Science and Technology > AI
 [Register]
Please register to participate in our discussions with 2 million other members - it's free and quick! Some forums can only be seen by registered members. After you create your account, you'll be able to customize options and access all our 15,000 new posts/day with fewer ads.
View detailed profile (Advanced) or search
site with Google Custom Search

Search Forums  (Advanced)
 
Old 08-10-2023, 06:51 PM
 
37 posts, read 34,544 times
Reputation: 132

Advertisements

https://arxiv.org/pdf/2308.03688.pdf
https://github.com/THUDM/AgentBench

AgentBench is a new comprehensive benchmarking tool to evaluate LLMs' performance in interactive, pragmatic environments. The tool currently incorporates eight distinct environments. Five of these are brand new, crafted specifically for this initiative, namely the Operating System, Database, Knowledge Graph, Digital Card Game, and Lateral Thinking Puzzles. The remaining three - House-Holding, Web Shopping, and Web Browsing - are adaptations from previously published datasets.

Models like GPT-4 displayed a commendable ability to manage various tasks. However, a stark contrast was observed between these models and their open-source peers. While the open-source models have shown competitive results in other benchmarks, their performance in the multifaceted tests posed by AgentBench was noticeably inferior.

The toolkit, datasets, and environments are publicly accessible.

Attached Thumbnails
New benchmark comparing language models as agents in interactive envirnonments-scapture.png  
Reply With Quote Quick reply to this message

Reply
Please update this thread with any new information or opinions. This open thread is still read by thousands of people, so we encourage all additional points of view.

Quick Reply
Message:

Over $104,000 in prizes was already given out to active posters on our forum and additional giveaways are planned!

Go Back   City-Data Forum > General Forums > Science and Technology > AI

All times are GMT -6.

© 2005-2024, Advameg, Inc. · Please obey Forum Rules · Terms of Use and Privacy Policy · Bug Bounty

City-Data.com - Contact Us - Archive 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37 - Top