Technology

The way we currently measure AI progress is horribly flawed

One goal of the research was the definition of a list that defines a good benchmark. Ivanova says, “It is a very important issue to discuss benchmark quality, what we expect from them and what we need them to do.” The problem is that benchmarks are not defined by a single standard. This paper attempts to define a set evaluation criteria. This is very useful.” The rating factors include whether experts were involved in the design of the benchmark or not, whether it is well-defined, and whether there is a feedback channel or peer review. The MMLU benchmark received the lowest rating. “I disagree with this ranking. Dan Hendrycks is the director of CAIS and the Center for AI Safety and one of its creators. He was an author of several of the papers ranked high, but would argue that the lower ranked ones are superior.

That said, Hendrycks still believes that the best way to move the field forward is to build better benchmarks.Some think the criteria may be missing the bigger picture. The paper is valuable. All of these are important. “It makes the benchmarks more accurate,” says Marius Hobbhahn. He is CEO of Apollo Research – a research group that specializes in AI evaluations. You could check all of these boxes, but still have a terrible benchmark because it doesn’t measure the right thing. You could check all of these boxes, but you could still have a terrible benchmark because it just doesn’t measure the right thing.”Essentially, even if a benchmark is perfectly designed, one that tests the model’s ability to provide compelling analysis of Shakespeare sonnets may be useless if someone is really concerned about AI’s hacking capabilities. You’ll find a benchmark which is supposed to measure moral reasoning. What that means is not always clearly defined. Are experts in the domain involved in the process of developing software? Often that isn’t the case,” says Amelia Hardy, another author of the paper and an AI researcher at Stanford University.

There are organizations actively trying to improve the situation. A new benchmark developed by Epoch AI (a research organization) was created with the input of 60 mathematicians. It was then verified to be challenging by two Fields Medal winners, the most prestigious mathematics award. This participation is one of the criteria for the Better Bench assessment. The most advanced models can only answer a little more than 2% questions, so there is still a long way to go until the benchmark becomes saturated. Tamay Besiroglu is an associate director at Epoch AI. She says, “We tried to represent all the breadth and depth in modern math research.” Besiroglu estimates that despite the difficult nature of the test, it will only take around four to five years for AI algorithms to perform well.

story originally seen here

Editorial Staff

Founded in 2020, Millenial Lifestyle Magazine is both a print and digital magazine offering our readers the latest news, videos, thought-pieces, etc. on various Millenial Lifestyle topics.

Leave a Reply

Your email address will not be published. Required fields are marked *