Bunny MMPROJ w/ Llama.cpp (Python)


Bunny is an amazing Multimodal model that can be used as a stand-alone or in conjunction with your other LLM projects. How does it work?

You upload an image and ask it a question – the model is able to accurately describe the image, the “feel”, and style, and so-on. It can even read and translate text! In fact, it is possible to continue to probe the model about images in a forensic manner. It runs fast and it runs accurate.

With this tutorial project we are using Bunny-Llama-3-4B-V GGUF model – they also have 8b and 3b (careful with 8b as it will crash your computer if you do not have enough VRAM)

Whereas we can run Bunny from the command line as follows (from within .\llama.cpp):

llava-cli -m ".\path\to\model\ggml-model-Q4_K_M.gguf" --mmproj ".\path\to\model\mmproj-model-f16.gguf" --image ".\path\to\image\image.jpg" -c 4096 -e -p "A chat between a curious user and an artificial intelligence assistant. The assistant uses descriptive language to describe images and answer user queries. USER: <image>\nDescribe this image in detail. ASSISTANT:" --temp 0.0 -ngl 32

If you have recently updated your llama.cpp – note that “main.exe” and “llava-cli” no longer work.
Instead you must go the following folder: \build\bin\release
from here,”llama-cli” replaces main , and “llama-llava-cli” replaces llava-cli

llama-llava-cli -m ".\path\to\model\ggml-model-Q4_K_M.gguf" --mmproj ".\path\to\model\mmproj-model-f16.gguf" --image ".\path\to\image\image.jpg" -c 4096 -e -p "A chat between a curious user and an artificial intelligence assistant. The assistant uses descriptive language to describe images and answer user queries. USER: <image>\nDescribe this image in detail. ASSISTANT:" --temp 0.0 -ngl 32

To change the image each time becomes tedious. What if we generated a quick and dirty web-app?

Below is the code and instructions:

Assuming that you already have llama.cpp installed

1) cd llama.cpp #note with new compiles, you must change the code to “llama-llava-cli”
2) python -m venv venv
3) call .\venv\scripts\activate.bat
4) pip install gradio
5) notepad run.py
6a) copy and past the text below (be sure to keep the indents)
6b) change the location of your models and aave
7) python run.py

import gradio as gr
import subprocess
import os

def process_image_and_prompt(image, prompt):
    image_path = 'temp_image.jpg'
    prompt_path = 'temp_prompt.txt'
    
    # Save the image to a temporary file
    image.save(image_path)
    
    # Save the prompt to a temporary file
    with open(prompt_path, 'w') as f:
        f.write(prompt)
    
    # Construct the command
    command = [
        'llava-cli',
        '-m', r'D:\\path\\to\\model\\ggml-model-Q4_K_M.gguf',
        '--mmproj', r'D:\\path\\to\\model\\mmproj-model-f16.gguf',
        '--image', image_path,
        '-c', '4096',
        '-e',
        '-p', open(prompt_path).read(),
        '--temp', '0.0'
    ]
    
    # Run the command and capture the output
    result = subprocess.run(command, capture_output=True, text=True)
    
    # Clean up temporary files
    os.remove(image_path)
    os.remove(prompt_path)
    
    return result.stdout

# Default prompt
default_prompt = (
    "A chat between a curious user and a helpful artificial intelligence assistant. "
    "The assistant answers all questions with descriptive and detailed responses. "
    "USER: <image>\nDescribe this image in detail. ASSISTANT:"
)

# Create the Gradio interface
with gr.Blocks() as demo:
    gr.Markdown("# LLaVA CLI Interface")
    image_input = gr.Image(type="pil", label="Select Image")
    prompt_input = gr.Textbox(value=default_prompt, lines=10, label="Prompt")
    output_display = gr.Textbox(lines=10, label="Output")

    process_button = gr.Button("Process")
    process_button.click(
        fn=process_image_and_prompt,
        inputs=[image_input, prompt_input],
        outputs=[output_display]
    )

# Launch the Gradio interface
demo.launch()

Leave a Reply

Your email address will not be published. Required fields are marked *