- GenAI360 - Weekly AI News
- Posts
- Anthropic's Benchmark Push, SenseNova 5.5 Challenges GPT-4, Google’s Green Dilemma
Anthropic's Benchmark Push, SenseNova 5.5 Challenges GPT-4, Google’s Green Dilemma
Plus, techniques for improving LLM stability
Before we start, recommend this week's news to your friends and colleagues for $150 in AWS credits.
Key Takeaways
Anthropic is funding the development of comprehensive AI benchmarks, targeting AI safety, societal implications, and performance.
Google's Gemini 1.5 Pro and 1.5 Flash models underperform in data analysis, correctly answering 46.7% of true/false statements about a 520-page book.
SenseTime unveiled its SenseNova 5.5 model, claiming a 30% performance improvement over its predecessor, surpassing GPT-4 in five key metrics.
RankRAG uses a single language model for both context ranking and answer generation, outperforming state-of-the-art models like GPT-4 across multiple benchmarks.
DeepSeek AI's ESFT method tunes only the most relevant experts in sparse-architecture LLMs, improving tuning efficiency and performance while reducing computational resources.
Got forwarded this newsletter? Subscribe below👇
The Talk of The Day
A16Z, a Silicon Valley heavyweight, that has recently announced a new $7.2B fund, has announced a 20,000+ NVIDIA GPU cluster, that it plans to rent out to portfolio companies. We've seen a similar model from recent Anthropic/Amazon deal (among others), so seems like the model is getting adopted not only by the hyperscalers.
The Latest AI News
While Abacus AI's benchmark exposed the current limitations of even the most advanced LLMs, Anthropic decided to fund the development of new benchmarks.
Meanwhile, things didn’t look so good for the other tech giants. Google faced scrutiny for releasing an environmental report without mentioning the energy implications of AI operations, and Microsoft made some impressive updates to the Phi-3 models.