Anthropic's Benchmark Push, SenseNova 5.5 Challenges GPT-4, Google’s Green Dilemma

Plus, techniques for improving LLM stability

Before we start, recommend this week's news to your friends and colleagues for $150 in AWS credits.

Key Takeaways

  • Anthropic is funding the development of comprehensive AI benchmarks, targeting AI safety, societal implications, and performance.

  • Google's Gemini 1.5 Pro and 1.5 Flash models underperform in data analysis, correctly answering 46.7% of true/false statements about a 520-page book.

  • SenseTime unveiled its SenseNova 5.5 model, claiming a 30% performance improvement over its predecessor, surpassing GPT-4 in five key metrics.

  • RankRAG uses a single language model for both context ranking and answer generation, outperforming state-of-the-art models like GPT-4 across multiple benchmarks.

  • DeepSeek AI's ESFT method tunes only the most relevant experts in sparse-architecture LLMs, improving tuning efficiency and performance while reducing computational resources.

Got forwarded this newsletter? Subscribe below👇

The Talk of The Day

A16Z, a Silicon Valley heavyweight, that has recently announced a new $7.2B fund, has announced a 20,000+ NVIDIA GPU cluster, that it plans to rent out to portfolio companies. We've seen a similar model from recent Anthropic/Amazon deal (among others), so seems like the model is getting adopted not only by the hyperscalers.

The Latest AI News

While Abacus AI's benchmark exposed the current limitations of even the most advanced LLMs, Anthropic decided to fund the development of new benchmarks. 

Meanwhile, things didn’t look so good for the other tech giants. Google faced scrutiny for releasing an environmental report without mentioning the energy implications of AI operations, and Microsoft made some impressive updates to the Phi-3 models.

Subscribe to keep reading

This content is free, but you must be subscribed to GenAI360 - Weekly AI News to continue reading.

Already a subscriber?Sign In.Not now