vision language model (VLM) - Pump