Images and PDFs
The Kirby Copilot plugin lets you select files to enrich the user prompt with additional information. For example, you can select product images and ask the AI to describe the product visually or select PDF files and let the AI draw conclusions from the documents.
Analyzing Images With GPT Vision
GPT-4 Vision is a powerful tool for processing images and text in the same prompt. To use the OpenAI GPT-4 Vision with Kirby Copilot, you need to have an OpenAI API key and access to the gpt-4-turbo
or gpt-4o
(recommended) model. Released in May 2024, GPT-4o is the new flagship model from OpenAI and superseded the GPT-4 Turbo model.
Unlike the user prompt, images are not written to local storage. Thus, a Panel refresh will remove all selected images.
Panel File as Context Source
On a Kirby file page, you can set the value to auto
to pre-select the current file as context for the user prompt. This is useful if you want to generate text for a specific file, such as an image or PDF, that has already been uploaded to the Panel:
type: copilot
field: alt
# Use the uploaded file as context
files: auto
# Optional: Provide a pre-defined user prompt
userPrompt: Summarize the provided image in maximum 10 words.
# Optional: Disable the editing of the user prompt
editable: false
The section above will use the current file of a Kirby file model as context for the user prompt:
Image Resizing
When the Panel user selects files, the plugin will automatically downscale all images to a maximum size of 2048px
. On the one hand, this ensures that the AI can process the images quickly, and on the other hand, it complies with the OpenAI image size limit of fitting into a 2048 pixel bounding box.
The resize browse happens directly in the browser.
Extracting Text From PDFs
A PDF file can be used to support user prompts with information that the AI can incorporate to generate the answer. For example, you can select a PDF with a product description and ask the AI to summarize the product.
When selecting PDF files with the Select files button, the plugin will extract the text from the PDF. This extraction process happens directly in the browser with Mozilla's PDF.js library and does not require any server-side processing.
Before the instruction is sent to your AI provider, the extracted text is added to the user prompt.
For example, if this is your user prompt:
Summarize the product in 200 words.
Then, each PDF file will be added to the prompt, resulting in the final prompt:
Summarize the product in 200 words.
Additional context from the following PDF documents has been processed and made available to you. Include the information from these documents as applicable.
PDF document 1: <extracted text will be added here>
PDF document 2: <extracted text will be added here>