Images and PDFs
The Kirby Copilot plugin lets you select files to enrich the user prompt with additional information. For example, you can select product images and ask the AI to describe the product visually or select PDF files and let the AI draw conclusions from the documents.
Analyzing Images With GPT Vision
GPT-4 with Vision is a powerful tool for processing images and text in the same prompt. To use the OpenAI Vision model with Kirby Copilot, you need to have an OpenAI API key and access to the gpt-4o
model.
Unlike the user prompt, images are not written to local storage. Thus, a Panel refresh will remove all selected images.
Image Resizing
When the Panel user selects files, the plugin will automatically downscale all images to a maximum size of 2048px
. On the one hand, this ensures that the AI can process the images quickly, and on the other hand, it complies with the OpenAI image size limit of fitting into a 2048 pixel bounding box.
The resize browse happens directly in the browser.
Panel File as Context Source
On a Kirby file page, you can set the value to auto
to pre-select the current file as context for the user prompt. This is useful if you want to generate text for a specific file, such as an image or PDF, that has already been uploaded to the Panel:
type: copilot
field: alt
# Use the uploaded file as context
files: auto
# Optional: Provide a pre-defined user prompt
userPrompt: Write an alternative text for the provided image. Use a maximum of 10 words.
# Optional: Disable the editing of the user prompt
editable: false
The section above will use the current file of a Kirby file model as context for the user prompt:
Extracting Text From PDFs
A PDF file can be used to support user prompts with information that the AI can incorporate to generate the answer. For example, you can select a PDF with a product description and ask the AI to summarize the product.
When selecting PDF files with the Select files button, the plugin will extract the text from the PDF. This extraction process happens directly in the browser with Mozilla's PDF.js library and does not require any server-side processing.
Before the instruction is sent to your AI provider, the extracted text is added to the user prompt.
For example, if this is your user prompt:
Summarize the product in 200 words.
Then, each PDF file will be added to the prompt, resulting in the final prompt:
Summarize the product in 200 words.
<pdf_document_1>
(Extracted text from PDF document 1)
</pdf_document_1>
<pdf_document_2>
(Extracted text from PDF document 2)
</pdf_document_2>