A Small Language Model Week, GPT-4 Mini, Llama-3 405B Leaked

Plus, Mistral's new Trifecta of Models, & Lessons from Google Cloud’s Early Missteps

Before we start, share this week's news with a friend or a colleague:

Key Takeaways

  • On the theme of accidents, Llama-3 405B was allegedly leaked on HuggingFace (and made available for download on 4Chan) yesterday. Read below for what we know about the model ahead of today's release.

  • OpenAI unveiled GPT-4o mini, a compact and cost-effective AI model for ChatGPT that outperforms leading small AI models on reasoning tasks while being 60% cheaper to operate than GPT-3.5 Turbo.

  • Salesforce released xLAM, a family of models for autonomous task planning and execution, with the 7B model achieving 88.24% on the BFCL function calling leaderboard.

  • HuggingFace’s SmolLM is a new line of efficient small language models designed for local devices, available in 135M, 360M, and 1.7B parameter sizes, outperforming similar-sized models like GPT-2 and MobileLM across various benchmarks.

  • FlashAttention-3 achieves up to 2× speedup in attention mechanisms using producer-consumer asynchrony and hardware-accelerated low-precision operations.

  • LMMs-Eval proposes a unified evaluation framework for multimodal AI, balancing task diversity, human alignment, and efficiency to enable standardized model comparisons.

Got forwarded this newsletter? Subscribe below👇

The Latest AI News

Sheesh, what a week. We're glad we can be… unburdened with what has been happening over the past week on political arena and the international Blue Screen day as an AI newsletter. After all, so much has happened in AI, too!

Last week has been big for small language models. AI developments showcased a push towards efficient, compact models like GPT-4o mini, Arcee-Nova, SmolLM, and xLAM, alongside AI titans pivoting to specialized ventures like Fei-Fei Li's World Labs. 

These advancements are occurring amidst growing regulatory scrutiny, as evidenced by Meta's EU decision and Altman's "AI-client privilege" proposal, highlighting the tension between innovation and ethical considerations in AI.

We're launching a new certification program. Do you have 45 minutes to test it and give us feedback?

Help shape a new GenAI360 certification test

Login or Subscribe to participate in polls.

But first… the talk of the day:

Llama-3 405B Leaked on 4Chan. What Do We Know Ahead of the Today's Release?

Weighing almost 820GB, it was ‘accidentally’ leaked on HuggingFace repository ahead of the release. (UPD: you can read Meta's full announcement here).

We don't know where you can download more RAM to run this, but here's what we do know:

  1. Outperforms GPT4-o and Claude Sonnet on more than 90% of benchmarks, but may fall short on some text-related tasks. Unclear how it fares against the newly-released GPT4-o mini yet.

  2. 128k context tokens

  3. 15 Trillions tokens pre-trained 😱 (the number floating around for the OG GPT-4 was 13T)

  4. Multilingual, but not yet multi-modal

  5. Fine-tuning and Quants available soon after base release

  6. Possibly paywalled at a certain point as per this part of the code:

Llama code repo contains upsell prompts

Here are the Llama 3.1 405B benchmarks:

Llama 3.1 vs close-source models

Llama 3.1 vs open-source models

New Models From OpenAI and Mistral Lead the Efficiency Race

Subscribe to keep reading

This content is free, but you must be subscribed to GenAI360 - Weekly AI News to continue reading.

Already a subscriber?Sign In.Not now