Salesforce has launched a new model called BLIP which can caption images, visual question-answering, and image-text matching.

To learn more, please read the LinkedIn post by Younes Belkada. You can also read the original paper here, or play with the model yourself by clicking here.

Your account