🍓, Multimodal Llamas, the New GPT-4 Model

Plus, Meta’s new AI character creation tool

Before we start, share last week's news with a friend or a colleague:

Key Takeaways

No, seriously, are you keeping up with the AI news? This past two weeks were so packed we're splitting this issue in two parts, which we'll send you soon.

  • It was strawberry season for AI Twitter last week, with Altman's “Project Strawberry” hints. The new model that is rumored to have more advanced reasoning abilities than current LLMs (we're hearing a chatter of graduate-level intelligence). We'll cover this in a special mid-week release in two days, but take a look at @iruletheworldmo and Lily Ashwood on X, who may or may not be real, and powered by the new model (the latter even hosted Twitter Spaces).

  • Activeloop, Intel Disruptor Initiative, and Towards AI launched the Impossible GenAI Test, where only 1 in 20 engineers passes the test. Take it for free today.

  • GPT-4o-2024-08-06 launched on Azure with structured outputs support, achieving perfect scores in JSON Schema evaluations.

  • Meta introduced AI Studio for creating AI characters, while discontinuing celebrity AI chatbots due to user feedback calling them "creepy".

  • Idefics3 is a new model that adapts Llama 3 to multimodality and showed drastic improvements in OCR and document understanding over its predecessors.

  • MiniCPM demonstrated comparable performance to larger models with smaller parameter counts (1.3B and 2.7B) through efficient fine-tuning techniques.

Got forwarded this newsletter? Subscribe below👇

Launching The Impossible GenAI Test

As a subscriber, you've been the first to know we've introduced the Impossible GenAI Test. It tests across 6 core GenAI competences, comprises 25 questions, and is so tough only 1 in 20 passes (after the preliminary data, we've may actually need to update this to… 1 in 40!).

Learn more about the test here. Try it yourself today.

The Latest AI News

Safe to say that OpenAI was at full force last week with all kinds of news, ranging from a highly detailed system card for safety to the introduction of structured outputs in the API and a new GPT-4 model. We also saw a new AI tool from Meta that lets users create their own characters, shortly after they shut down their Al celebrity chatbots.

GPT-4’s System Card, New Model Available on Azure, and Structured Outputs in the API

OpenAI detailed the safety measures taken before releasing GPT-4o. (Source)

The GPT-4 System Card details the safety measures, limitations, and evaluation methodologies implemented to mitigate risks associated with GPT-4's deployment. It outlines the system's potential harms and the steps taken to address issues like misinformation, bias, and unintentional harmful outputs.

The system's residual risks, such as occasional unauthorized voice generation or over-refusals in non-English languages, are areas of active improvement. The focus remains on refining these aspects to minimize risks while enhancing the model's utility across diverse contexts.

But that wasn’t the only move we saw from openAI last week. 

The latest model, GPT-4o-2024-08-06, has been launched on Azure, focusing on enhancing developer productivity by supporting structured outputs, such as JSON Schemas. It even achieved perfect scores on evaluations with Structured Outputs, which means that the model's generated outputs consistently and accurately follow the complex JSON schema that was provided.

Structured outputs achieved a 100% score. (Source)

The new GPT model is also cheaper to use as a reranker than Cohere’s model, with OpenAI’s model offering $2.50/1M input tokens and $10/1M output tokens compared to Cohere’s Command R+ $3/1M input tokens and $15/1M output tokens.

GPT-4 mini is even more cost effective than both models at $0.15/1M input tokens and $0.60/1M output tokens. It’s also worth mentioning that the new GPT-4 model offers a higher context window at 16K.

Amazon Upgrades Image Gen Tool, Meta Releases New AI Chatbot Tool, and Idefics3 Emerges

Examples of images generated by Titan Image Generator 2. (Source)

The last time we heard from Amazon is when they were working on a GPT-killer called Metis, so it’s been a little awhile in terms of news from them. They’ve released an upgraded version of their image generating model called Titan Image Generator v2, which can detect and segment multiple objects within the foreground of an image.

It also introduces improved image condition capabilities, so users can focus on specific visual characteristics such as edges, object outlines, and structural elements - all leading to more detailed image generation. Although, it isn’t quite clear as to what data Amazon used to train this model.

Meanwhile, Meta’s new tool called AI studio lets users create AI versions of themselves, or even create an entirely new AI character. Speaking of which, Google acqui-hired Character.ai’s founders, which also provides a similar type of technology where users can talk to personality-driven AI chatbots, like a mental health helper or an English teacher.

These types of conversational AI platforms have been pretty popular in the last couple of years, as we’ve seen a number of startups in this space continue to grow rapidly - with character.ai being one of them.

But that wasn’t the case for Meta’s celebrity AI chatbots (which featured celebs like Snoop Dogg or Tom Brady), as users can no longer interact with them. It fell flat with users who even called them “creepy”.

In other news, we saw the release of Idefics3, a model that adapts Llama 3 to multimodality. It’s capable of processing arbitrary sequences of text and image inputs to generate text outputs. It can perform tasks such as visual question answering, image captioning, and story creation based on multiple images.

It builds upon Idefics1 and Idefics2, significantly improving in areas like OCR (Optical Character Recognition), document understanding, and visual reasoning.

Idefics3 outperforms its predecessor, Idefics2, across multiple benchmarks. (Source)

OpenAI’s Board Expansion

In the wake of several high-profile departures, OpenAI decided to appoint Zico Kolter, a prominent professor and director at Carnegie Mellon University's Machine Learning Department, to its board of directors.

There’s been some concerns about the internal dynamics at OpenAI, especially regarding the allocation of resources for AI safety initiatives. Since Kolter’s research focuses on safety, it’s a smart move by OpenAI.

Kolter will join OpenAI’s Safety and Security Committee, which includes other directors like Bret Taylor and Adam D’Angelo, as well as technical experts. This committee is tasked with overseeing and making recommendations on the safety and security of all OpenAI projects.

X’s EU Data Pause and Reddit’s AI-Powered Search Results

X has agreed to suspend the use of European users' data for training its AI tool, Grok, following legal action from Ireland's Data Protection Commission (DPC). This suspension covers the period between May 7, 2024, and August 1, 2024, and will remain in effect as the DPC continues to assess the legality of this data processing under the GDPR.

X has publicly criticized the DPC's actions, labeling the injunction as "unwarranted" and "overbroad." The company claims to have implemented privacy settings allowing users to control their data and argues that it has been working with the DPC on these issues since last year.

The DPC, in collaboration with other EU/EEA regulators, is investigating whether X's data processing practices comply with GDPR requirements. This investigation includes examining the potential unlawfulness of AI models trained on data collected without proper consent.

The other social media giant that had some AI news was Reddit, as they’ll be testing out AI-powered search pages soon. It builds on Reddit's recent partnerships with OpenAI and Google, which allow the company to leverage their LLM and AI capabilities.

Humane's Pin Faces More Returns Than Sales

Previously, we mentioned two executives at Humane left to form their own fact checking company shortly after the release of Ai Pin didn’t exactly go to plan and received some harsh criticism. It doesn’t seem like things are getting any better as the Ai Pin was reported to have more returns than sales between May and August.  

To make matters worse, Humane can’t refurbish or resell because of T-mobile limitations on reassigning devices to new users. As a result, the returned pins become e-waste and a loss of revenue for Humane.

Additionally, Humane experienced significant executive turnover, including departures of key engineering leaders and the director of customer experience. The company also laid off 4% of employees in January as a cost-cutting measure.

Humane is under a lot of financial pressure considering the fact that they raised $200 million in funding while dealing with low sales numbers and a lot of unhappy customers.

Is the Autonomous Driving Market Ready for a Chinese Challenger?

WeRide, a Chinese autonomous driving startup, officially filed for an IPO with the U.S. Securities and Exchange Commission (SEC) on July 26, which marks the company's intention to go public in the US.

WeRide reported losses of $268 million in the previous year, with only $55 million in revenue. Despite these losses, the company continues to push forward with its IPO, reflecting the high growth potential it sees in the autonomous driving market.

For key players in the self-driving industry like Wayve and NIO, this might mean increased competition, as we might see other Chinese autonomous driving companies go public soon as well. 

The huge losses of $268 million also points toward the trend that autonomous driving companies are facing issues when it comes to profitability, due to the high initial costs involved and the fact that making profit from this type of product is more a long-game instead of one that provides immediate returns.

Additionally, Warren Buffett's Berkshire Hathaway has sold more shares of Chinese electric vehicle maker BYD, continuing its gradual reduction in holdings. The sale has sparked speculation about Berkshire's confidence in BYD, given its significant past investments in the company. 

Advancements in AI Research

We saw some notable progress in language model research last week, with CODEXGRAPH providing a means for LLMs to interact with code repositories and MiniCPM as a method for deploying GPT-4V level MLLMs on end devices.

RAGFoundry was another framework that stood out, since it provides a single workflow that combines various aspects to make RAG implementation a little less complex.

CodexGraph Bridges the Gap Between LLMs and Complex Codebases

CODEXGRAPH overview. (Source) 

CODEXGRAPH is a system that lets LLMs effectively interact with entire code repositories. It addresses the challenge of handling complex, repository-level coding tasks that require understanding cross-file code structures and performing intricate reasoning across large codebases.

To achieve this, CODEXGRAPH integrates LLM agents with graph database interfaces extracted from code repositories. The system uses static analysis to construct code graphs, where nodes represent code symbols and edges represent relationships between them.

LLM agents then generate and execute graph queries to navigate the codebase, allowing for precise, code structure-aware context retrieval.

Results were impressive as it achieved competitive performance across three challenging repository-level benchmarks: CrossCodeEval, SWE-bench, and EvoCodeBench.

When equipped with GPT-4o, CODEXGRAPH outperforms other retrieval-augmented code generation baselines on CrossCodeEval and EvoCodeBench, while matching state-of-the-art performance on SWE-bench.

NACL's Hybrid Eviction Policy Slashes LLM Memory Usage

NACL is a much more efficient alternative than traditional eviction algorithms that use step-by-step greedy search. (Source)

Researchers from the Chinese Academy of Sciences and Baidu have introduced NACL, a framework for key-value (KV) cache eviction in LLMs during inference time. This approach addresses the challenge of managing extensive memory consumption in KV caches, particularly for models with extended context windows, which has been a significant bottleneck in deploying LLMs for long-context tasks.

NACL employs a hybrid eviction policy combining Proxy-Tokens Eviction and Random Eviction. The Proxy-Tokens Eviction utilizes global statistics of attention scores from selected proxy tokens, while Random Eviction incorporates a diversified sampling strategy.

NACL drastically improves performance on both short- and long-text tasks by 80% and 76% respectively, while reducing KV Cache by up to 5× with over 95% performance maintenance. It shows that there’s a practical approach to managing memory constraints in LLMs, which might enable efficient deployment of these models for long-context applications.

MiniCPM's Two-Stage Fine-Tuning to Deploy GPT-4V Level Models On-Device

Moore’s Law for MLLMs, which shows that deploying GPT-4V level MLLMs on end devices is becoming a reality. (Source)

OpenBMB researchers introduced MiniCPM, a means of improving the performance of multimodal large language models (MLLMs) through efficient fine-tuning techniques. This helps boost their capabilities without having to increase their size or computational requirements, making them more suitable for deployment in resource-constrained environments.

MiniCPM employs a two-stage fine-tuning process. First, it uses a larger teacher model to generate high-quality synthetic data, then it fine-tunes the smaller target model on this data using techniques like LoRA and QLoRA. The approach also incorporates multi-task learning and careful data curation to maximize the efficiency of the fine-tuning process.

MiniCPM models with only 1.3B and 2.7B parameters achieve performance comparable to much larger models like GPT-4V in various benchmarks, including common sense reasoning, math problem-solving, and coding tasks.

For instance, the MiniCPM-2.7B model outperforms Llama2-13B on several metrics despite being a lot smaller. As a result, GPT-4V level MLLMs can be deployed on end devices.

Frameworks We Love

Some frameworks that caught our attention in the last week include:

  • SegXAL: Explainable Active Learning (XAL) model designed for semantic segmentation in driving scenes, which integrates human expertise through an explainable AI module and uncertainty measures.

  • RiskAwareBench:  Automated framework designed to assess physical risk awareness in LLM-based embodied agents.

  • AggSS: Introduces an Aggregated Self-Supervision approach for class-incremental learning, where image rotations are treated as additional classes to enhance robust feature learning. 

If you want your framework to be featured here, reply to this email and say hi :) 

Conversations We Loved

OpenAI continued the wave of news with Altman dropping a huge hint about a new model announcement that we might see soon. His cryptic post containing a picture of a strawberry was actually referring to “Project Strawberry”, a highly advanced model with better reasoning capabilities than current models. 

Another interesting discussion that popped up was regarding Intel’s discussions with OpenAI in 2017-2018, and what impact this had on the chip industry.

Project Strawberry Announcement Incoming?

Altman’s hint for OpenAI’s newest model. (Source)

Altman's cryptic tweet featuring a strawberry sparked intense speculation about "Project Strawberry," a new AI model reportedly capable of advanced reasoning. This project, an extension of the previously revealed Q* initiative, aims to address one of AI's biggest challenges: multi-step problem-solving and reasoning.

Project Strawberry reportedly builds on OpenAI's existing LLMs, fine-tuning them for enhanced reasoning capabilities. The approach is said to be similar to the Self-Taught Reasoner (STaR) method, which uses iterative self-improvement techniques to boost AI's problem-solving skills.

Reports suggest impressive capabilities, particularly in math and science – areas that have traditionally been difficult for AI. An anonymous model, possibly related to Project Strawberry, has already demonstrated reasoning abilities surpassing GPT-4 on the AI testing platform Arena.

This follows a pattern similar to GPT-4's pre-release testing, hinting at a potential imminent announcement as early as this week.

OpenAI's Billion-Dollar Opportunity: How Intel's Hesitation Reshaped the AI Landscape

Althoug Nvidia currently leads the AI chip market, Intel was once the dominant player in the chip industry. In 2017-2018, Intel had discussions with OpenAI about potentially acquiring a 15% stake for $1 billion.

The deal also included provisions for Intel to provide specialized chips at cost to OpenAI, potentially shaping the future of AI computing. However, Intel ultimately decided not to proceed with the investment.

At the time, the company's leadership, including then-CEO Bob Swan, had a different perspective on the near-term market potential of generative AI. This decision came during a period when Intel was navigating the transition from CPU to GPU architecture for AI applications.

Meanwhile, Nvidia's focus on GPUs for AI workloads helped them gain a significant market share.

Money Moving in AI

Investments were plentiful in the AI industry last week, with Recursion Pharmaceuticals being involved in a massive deal to acquire Exscentia for $688 million. Meanwhile, Groq secured $640 million in a successful round and Leonardo.ai was acquired by Canva.

Recursion Pharmaceuticals Ready to Acquire Exscentia in $688 Million Deal

Recursion Pharmaceuticals is set to acquire Exscientia in a $688 million all-stock deal, marking a significant consolidation in the AI-driven drug discovery space. This merger combines Recursion's focus on rare diseases and cancers with Exscientia's AI-powered drug discovery platform, aiming to accelerate drug development and reduce costs. 

Groq Secures $640 Million in Series D Funding

Groq, a leader in fast AI inference, has secured a massive $640 million Series D funding round at a $2.8 billion valuation, led by BlackRock Private Equity Partners with participation from notable investors including Neuberger Berman, Cisco Investments, and Samsung Catalyst Fund. 

Tencent Contributes to $300 Million Funding Round for Moonshot

Tencent has participated in a $300 million-plus financing round for Chinese AI startup Moonshot, valuing the company at $3.3 billion, with Alibaba and Gaorong Capital also joining the investment. This move is part of a larger trend of significant capital inflow into Chinese AI firms, as major tech companies and venture capitalists compete to establish dominance in the AI market and develop alternatives to ChatGPT.

Adept AI Investors to be Paid Back

In a complex deal blurring the lines between acquisition and talent poaching, Amazon has effectively hired away most of Adept's top employees while arranging for the AI startup's investors to recoup their $414 million investment. 

This arrangement, which sees Adept retaining about a third of its workforce and receiving $25 million, has caught the attention of regulators, with the FTC probing whether it circumvents merger notification rules. 

Leonardo.ai Acquired by Canva

Canva, the design platform giant, has acquired Leonardo.ai, a generative AI content startup, in a strategic move to enhance its AI capabilities and expand its Magic Studio suite. We don’t know the full financial terms, but the deal involves a mix of cash and stock, with all 120 Leonardo.ai employees, including the executive team, joining Canva.