Images and PDFs – Kirby Copilot

The Kirby Copilot plugin lets you select files to enrich the user prompt with additional information. For example, you can select product images and ask the AI to describe the product visually or select PDF files and let the AI draw conclusions from the documents.

Analyzing Images

GPT-4 with Vision and Anthropic Claude models are powerful tools for processing images and text in the same prompt. To use the OpenAI Vision model with Kirby Copilot, you need to have an OpenAI API key and access to the gpt-4o model. Same applies to the Anthropic Claude models claude-3-5-sonnet-20241022 and claude-3-5-sonnet-20240620, which both support image processing.

Unlike the user prompt, images are not written to local storage. Thus, a Panel refresh will remove all selected images.

If you configure the Copilot plugin to use a different AI provider that does not support image processing, such as Mistral AI, the images will be ignored.

Image Resizing

When the Panel user selects files, the plugin will automatically downscale all images to a maximum size of 2048px. On the one hand, this ensures that the AI can process the images quickly, and on the other hand, it complies with the OpenAI image size limit of fitting into a 2048 pixel bounding box.

The resize browse happens directly in the browser.

Panel File as Context Source

On a Kirby file page, you can set the value to auto to pre-select the current file as context for the user prompt. This is useful if you want to generate text for a specific file, such as an image or PDF, that has already been uploaded to the Panel:

sections/copilot.yml

type: copilot
field: alt
# Use the uploaded file as context
files: auto
# Optional: Provide a pre-defined user prompt
userPrompt: Write an alternative text for the provided image. Use a maximum of 10 words.
# Optional: Disable the editing of the user prompt
editable: false

The section above will use the current file of a Kirby file model as context for the user prompt:

Extracting Text From PDFs

A PDF file can be used to support user prompts with information that the AI can incorporate to generate the answer. For example, you can select a PDF with a product description and ask the AI to summarize the product.

When selecting PDF files with the Select files button, the plugin will extract the text from the PDF. This extraction process happens directly in the browser with Mozilla's PDF.js library and does not require any server-side processing.

Before the instruction is sent to your AI provider, the extracted text is added to the user prompt.

For example, if this is your user prompt:

Summarize the product in 200 words.

Then, each PDF file will be added to the prompt, resulting in the final prompt:

Summarize the product in 200 words.

<pdf_document_1>
(Extracted text from PDF document 1)
</pdf_document_1>

<pdf_document_2>
(Extracted text from PDF document 2)
</pdf_document_2>

Fields as Context

Use fields as placeholders in user prompts.

Toolbar Buttons

Kickstart your toolbar button experience with these blueprint examples.