Introducing PaliGemma 2 combine: A vision-language mannequin for a number of duties

This previous December, we launched PaliGemma 2, an upgraded vision-language mannequin within the Gemma household. The discharge included pretrained checkpoints of various sizes (3B, 10B, and 28B parameters) that may be simply fine-tuned on a variety of vision-language duties and domains, reminiscent of picture segmentation, quick video captioning, scientific query answering and text-related duties with excessive efficiency.

Now, we’re thrilled to announce the launch of PaliGemma 2 combine checkpoints. PaliGemma 2 combine are fashions tuned to a mix of duties that enable immediately exploring the mannequin capabilities and utilizing it out-of-the-box for widespread use circumstances.

What’s new in PaliGemma 2 combine?

A number of duties with one mannequin: PaliGemma 2 combine can clear up duties reminiscent of quick and lengthy captioning, optical character recognition (OCR), picture query answering, object detection and segmentation.

Developer-friendly sizes: Use one of the best mannequin in your wants because of the completely different mannequin sizes (3B, 10B, and 28B parameters) and resolutions (224px and 448px).

Should you have been already utilizing the unique PaliGemma combine checkpoints, you may immediately improve to PaliGemma 2 without having to do any modifications. The mannequin performs completely different duties relying on the way it’s prompted. You may evaluate the completely different immediate job syntax within the official documentation and study extra about how PaliGemma 2 was developed in our technical report.

Detection

Process: Detection (PaliGemma-2-3b-mix-224)
Enter: “detect androidn”

Outcome: a cow standing on a seaside subsequent to an indication that claims warning harmful rip present.

Optical Character Recognition (OCR)

Outcome: A cow standing on a seaside subsequent to a warning signal.

Outcome:

WARNING DANGEROUS

RIP CURRENT

Get Began In the present day

Prepared to find the potential of PaliGemma 2? Right here is how one can discover the combo mannequin capabilities:

Check out the combo mannequin with just a few clicks: Discover the combo mannequin capabilities immediately on the Hugging Face demo.

Learn to run the mannequin: Check out the Keras inference pocket book immediately in Google Colab or regionally.

Whereas PaliGemma 2 combine has robust efficiency throughout a number of duties, you’re going to get one of the best outcomes by fine-tuning PaliGemma 2 in your individual job or area. To learn to do it, dive into our complete documentation, test our official instance notebooks for Keras and JAX, or use the Hugging Face transformers instance. We’re trying ahead to seeing what you construct with it!