A 3B Model May Disrupt the PDF Extraction Industry, Claude 3 Haiku Fine-Tuning

Plus, Efficient Architectures from DeepMind and Shanghai AI Lab

Before we start, share this week's news with a friend or a colleague:

Key Takeaways

  • Microsoft resigned its observer seat on OpenAI's board amid antitrust scrutiny, while US lawmakers raised concerns about Microsoft's $1.5 billion investment in UAE-based AI firm G42 due to potential ties with China.

  • Anthropic announced fine-tuning capabilities for Claude 3 Haiku in Amazon Bedrock, allowing customization for specific business needs.

  • DeepMind's JEST promises to improve energy efficiency and model performance by using a smaller AI model to grade data quality.

  • A new method refines retrieved content before including it in the prompt for generation models, using meta-prompting to optimize instructions.

  • RTMW is a series of high-performance models for 2D/3D whole-body pose estimation that demonstrates strong performance while maintaining high inference efficiency.

Got forwarded this newsletter? Subscribe below👇

The Talk of The Day: The 3B Model That May Disrupt the PDF Extraction Industry Overnight

There's a new kid with impressive benchmarks on the block, folks. ColPali, a new retrieval model architecture uses vision language models to directly embed page images, without relying on complex text extraction pipelines. Combined with a late interaction matching mechanism, ColPali largely outperforms modern document retrieval pipelines while being drastically faster and end-to-end trainable.

Suprisingly, the difference between working extremely well (81.3% for ColPali) and not working at all (58.8% for BiPali) is the "Col" part of ColPali, i.e. ColBERT late interaction.

The ColPali research team demonstrated that it outperforms all other evaluated systems on ViDoRe, a new benchmark intoduced by them, including baselines where Claude Sonnet is used to caption all visual elements

There are well-funded ML tooling companies existing to facilitate the extraction pipeline (e.g., Unstructured, Surya) ColPali is disrupting.

The whole indexing process can be slow, error-prone, and it struggles to take into account the more visual elements of a page (figures, images, etc..), ColPali authors claim. Will this open-source model disrupt the market the same way Segment Anything disrupted Data Labelling landscape? We'll see soon.

The Latest AI News

Last week, developments showcased a complex interplay of corporate strategy, technological advancement, and regulatory scrutiny. Microsoft's dual moves on the OpenAI board and G42 investment highlighted the delicate balance tech giants need to maintain. 

Meanwhile, innovations like Claude 3 Haiku's fine-tuning capabilities and DeepMind's JEST training method signal a shift towards more efficient and customizable AI solutions.

Subscribe to keep reading

This content is free, but you must be subscribed to GenAI360 - Weekly AI News to continue reading.

Already a subscriber?Sign In.Not now