Salesforce has launched a new model called BLIP which can caption images, visual question-answering, and image-text matching.
To learn more, please read the LinkedIn post by Younes Belkada. You can also read the original paper here, or play with the model yourself by clicking here.