Nicolas Mowen d24b96d3bb
Early 0.18 work (#22138)
* Update version

* Create scaffolding for case management (#21293)

* implement case management for export apis (#21295)

* refactor vainfo to search for first GPU (#21296)

use existing LibvaGpuSelector to pick appropritate libva device

* Case management UI (#21299)

* Refactor export cards to match existing cards in other UI pages

* Show cases separately from exports

* Add proper filtering and display of cases

* Add ability to edit and select cases for exports

* Cleanup typing

* Hide if no unassigned

* Cleanup hiding logic

* fix scrolling

* Improve layout

* Camera connection quality indicator (#21297)

* add camera connection quality metrics and indicator

* formatting

* move stall calcs to watchdog

* clean up

* change watchdog to 1s and separately track time for ffmpeg retry_interval

* implement status caching to reduce message volume

* Export filter UI (#21322)

* Get started on export filters

* implement basic filter

* Implement filtering and adjust api

* Improve filter handling

* Improve navigation

* Cleanup

* handle scrolling

* Refactor temperature reporting for detectors and implement Hailo temp reading (#21395)

* Add Hailo temperature retrieval

* Refactor `get_hailo_temps()` to use ctxmanager

* Show Hailo temps in system UI

* Move hailo_platform import to get_hailo_temps

* Refactor temperatures calculations to use within detector block

* Adjust webUI to handle new location

---------

Co-authored-by: tigattack <10629864+tigattack@users.noreply.github.com>

* Camera-specific hwaccel settings for timelapse exports (correct base) (#21386)

* added hwaccel_args to camera.record.export config struct

* populate camera.record.export.hwaccel_args with a cascade up to camera then global if 'auto'

* use new hwaccel args in export

* added documentation for camera-specific hwaccel export

* fix c/p error

* missed an import

* fleshed out the docs and comments a bit

* ruff lint

* separated out the tips in the doc

* fix documentation

* fix and simplify reference config doc

* Add support for GPU and NPU temperatures (#21495)

* Add rockchip temps

* Add support for GPU and NPU temperatures in the frontend

* Add support for Nvidia temperature

* Improve separation

* Adjust graph scaling

* Exports Improvements (#21521)

* Add images to case folder view

* Add ability to select case in export dialog

* Add to mobile review too

* Add API to handle deleting recordings  (#21520)

* Add recording delete API

* Re-organize recordings apis

* Fix import

* Consolidate query types

* Add media sync API endpoint (#21526)

* add media cleanup functions

* add endpoint

* remove scheduled sync recordings from cleanup

* move to utils dir

* tweak import

* remove sync_recordings and add config migrator

* remove sync_recordings

* docs

* remove key

* clean up docs

* docs fix

* docs tweak

* Media sync API refactor and UI (#21542)

* generic job infrastructure

* types and dispatcher changes for jobs

* save data in memory only for completed jobs

* implement media sync job and endpoints

* change logs to debug

* websocket hook and types

* frontend

* i18n

* docs tweaks

* endpoint descriptions

* tweak docs

* use same logging pattern in sync_recordings as the other sync functions (#21625)

* Fix incorrect counting in sync_recordings (#21626)

* Update go2rtc to v1.9.13 (#21648)

Co-authored-by: Eugeny Tulupov <eugeny.tulupov@spirent.com>

* Refactor Time-Lapse Export (#21668)

* refactor time lapse creation to be a separate API call with ability to pass arbitrary ffmpeg args

* Add CPU fallback

* Optimize empty directory cleanup for recordings (#21695)

The previous empty directory cleanup did a full recursive directory
walk, which can be extremely slow. This new implementation only removes
directories which have a chance of being empty due to a recent file
deletion.

* Implement llama.cpp GenAI Provider (#21690)

* Implement llama.cpp GenAI Provider

* Add docs

* Update links

* Fix broken mqtt links

* Fix more broken anchors

* Remove parents in remove_empty_directories (#21726)

The original implementation did a full directory tree walk to find and remove
empty directories, so this implementation should remove the parents as well,
like the original did.

* Implement LLM Chat API with tool calling support (#21731)

* Implement initial tools definiton APIs

* Add initial chat completion API with tool support

* Implement other providers

* Cleanup

* Offline preview image (#21752)

* use latest preview frame for latest image when camera is offline

* remove frame extraction logic

* tests

* frontend

* add description to api endpoint

* Update to ROCm 7.2.0 (#21753)

* Update to ROCm 7.2.0

* ROCm now works properly with JinaV1

* Arcface has compilation error

* Add live context tool to LLM (#21754)

* Add live context tool

* Improve handling of images in request

* Improve prompt caching

* Add networking options for configuring listening ports (#21779)

* feat: add X-Frame-Time when returning snapshot (#21932)

Co-authored-by: Florent MORICONI <170678386+fmcloudconsulting@users.noreply.github.com>

* Improve jsmpeg player websocket handling (#21943)

* improve jsmpeg player websocket handling

prevent websocket console messages from appearing when player is destroyed

* reformat files after ruff upgrade

* Allow API Events to be Detections or Alerts, depending on the Event Label (#21923)

* - API created events will be alerts OR detections, depending on the event label, defaulting to alerts
- Indefinite API events will extend the recording segment until those events are ended
- API event start time is the actual start time, instead of having a pre-buffer of record.event_pre_capture

* Instead of checking for indefinite events on a camera before deciding if we should end the segment, only update last_detection_time and last_alert_time if frame_time is greater, which should have the same effect

* Add the ability to set a pre_capture number of seconds when creating a manual event via the API. Default behavior unchanged

* Remove unnecessary _publish_segment_start() call

* Formatting

* handle last_alert_time or last_detection_time being None when checking them against the frame_time

* comment manual_info["label"].split(": ")[0] for clarity

* ffmpeg Preview Segment Optimization for "high" and "very_high" (#21996)

* Introduce qmax parameter for ffmpeg preview encoding

Added PREVIEW_QMAX_PARAM to control ffmpeg encoding quality.

* formatting

* Fix spacing in qmax parameters for preview quality

* Adapt to new Gemini format

* Fix frame time access

* Remove exceptions

* Cleanup

---------

Co-authored-by: Josh Hawkins <32435876+hawkeye217@users.noreply.github.com>
Co-authored-by: tigattack <10629864+tigattack@users.noreply.github.com>
Co-authored-by: Andrew Roberts <adroberts@gmail.com>
Co-authored-by: Eugeny Tulupov <zhekka3@gmail.com>
Co-authored-by: Eugeny Tulupov <eugeny.tulupov@spirent.com>
Co-authored-by: John Shaw <1753078+johnshaw@users.noreply.github.com>
Co-authored-by: Eric Work <work.eric@gmail.com>
Co-authored-by: FL42 <46161216+fl42@users.noreply.github.com>
Co-authored-by: Florent MORICONI <170678386+fmcloudconsulting@users.noreply.github.com>
Co-authored-by: nulledy <254504350+nulledy@users.noreply.github.com>
2026-02-26 21:16:10 -07:00

9.7 KiB
Raw Blame History

id title
genai_config Configuring Generative AI

Configuration

A Generative AI provider can be configured in the global config, which will make the Generative AI features available for use. There are currently 4 native providers available to integrate with Frigate. Other providers that support the OpenAI standard API can also be used. See the OpenAI section below.

To use Generative AI, you must define a single provider at the global level of your Frigate configuration. If the provider you choose requires an API key, you may either directly paste it in your configuration, or store it in an environment variable prefixed with FRIGATE_.

Ollama

:::warning

Using Ollama on CPU is not recommended, high inference times make using Generative AI impractical.

:::

Ollama allows you to self-host large language models and keep everything running locally. It is highly recommended to host this server on a machine with an Nvidia graphics card, or on a Apple silicon Mac for best performance.

Most of the 7b parameter 4-bit vision models will fit inside 8GB of VRAM. There is also a Docker container available.

Parallel requests also come with some caveats. You will need to set OLLAMA_NUM_PARALLEL=1 and choose a OLLAMA_MAX_QUEUE and OLLAMA_MAX_LOADED_MODELS values that are appropriate for your hardware and preferences. See the Ollama documentation.

Model Types: Instruct vs Thinking

Most vision-language models are available as instruct models, which are fine-tuned to follow instructions and respond concisely to prompts. However, some models (such as certain Qwen-VL or minigpt variants) offer both instruct and thinking versions.

  • Instruct models are always recommended for use with Frigate. These models generate direct, relevant, actionable descriptions that best fit Frigate's object and event summary use case.
  • Thinking models are fine-tuned for more free-form, open-ended, and speculative outputs, which are typically not concise and may not provide the practical summaries Frigate expects. For this reason, Frigate does not recommend or support using thinking models.

Some models are labeled as hybrid (capable of both thinking and instruct tasks). In these cases, Frigate will always use instruct-style prompts and specifically disables thinking-mode behaviors to ensure concise, useful responses.

Recommendation: Always select the -instruct or documented instruct/tagged variant of any model you use in your Frigate configuration. If in doubt, refer to your model providers documentation or model library for guidance on the correct model variant to use.

Supported Models

You must use a vision capable model with Frigate. Current model variants can be found in their model library. Note that Frigate will not automatically download the model you specify in your config, Ollama will try to download the model but it may take longer than the timeout, it is recommended to pull the model beforehand by running ollama pull your_model on your Ollama server/Docker container. Note that the model specified in Frigate's config must match the downloaded model tag.

:::info

Each model is available in multiple parameter sizes (3b, 4b, 8b, etc.). Larger sizes are more capable of complex tasks and understanding of situations, but requires more memory and computational resources. It is recommended to try multiple models and experiment to see which performs best.

:::

:::tip

If you are trying to use a single model for Frigate and HomeAssistant, it will need to support vision and tools calling. qwen3-VL supports vision and tools simultaneously in Ollama.

:::

The following models are recommended:

Model Notes
qwen3-vl Strong visual and situational understanding, higher vram requirement
Intern3.5VL Relatively fast with good vision comprehension
gemma3 Strong frame-to-frame understanding, slower inference times
qwen2.5-vl Fast but capable model with good vision comprehension

:::note

You should have at least 8 GB of RAM available (or VRAM if running on GPU) to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

:::

Ollama Cloud models

Ollama also supports cloud models, where your local Ollama instance handles requests from Frigate, but model inference is performed in the cloud. Set up Ollama locally, sign in with your Ollama account, and specify the cloud model name in your Frigate config. For more details, see the Ollama cloud model docs.

Configuration

genai:
  provider: ollama
  base_url: http://localhost:11434
  model: qwen3-vl:4b
  provider_options: # other Ollama client options can be defined
    keep_alive: -1
    options:
      num_ctx: 8192 # make sure the context matches other services that are using ollama

llama.cpp

llama.cpp is a C++ implementation of LLaMA that provides a high-performance inference server. Using llama.cpp directly gives you access to all native llama.cpp options and parameters.

:::warning

Using llama.cpp on CPU is not recommended, high inference times make using Generative AI impractical.

:::

It is highly recommended to host the llama.cpp server on a machine with a discrete graphics card, or on an Apple silicon Mac for best performance.

Supported Models

You must use a vision capable model with Frigate. The llama.cpp server supports various vision models in GGUF format.

Configuration

genai:
  provider: llamacpp
  base_url: http://localhost:8080
  model: your-model-name
  provider_options:
    temperature: 0.7
    repeat_penalty: 1.05
    top_p: 0.8
    top_k: 40
    min_p: 0.05
    seed: -1

All llama.cpp native options can be passed through provider_options, including temperature, top_k, top_p, min_p, repeat_penalty, repeat_last_n, seed, grammar, and more. See the llama.cpp server documentation for a complete list of available parameters.

Google Gemini

Google Gemini has a free tier for the API, however the limits may not be sufficient for standard Frigate usage. Choose a plan appropriate for your installation.

Supported Models

You must use a vision capable model with Frigate. Current model variants can be found in their documentation.

Get API Key

To start using Gemini, you must first get an API key from Google AI Studio.

  1. Accept the Terms of Service
  2. Click "Get API Key" from the right hand navigation
  3. Click "Create API key in new project"
  4. Copy the API key for use in your config

Configuration

genai:
  provider: gemini
  api_key: "{FRIGATE_GEMINI_API_KEY}"
  model: gemini-2.5-flash

:::note

To use a different Gemini-compatible API endpoint, set the provider_options with the base_url key to your provider's API URL. For example:

genai:
  provider: gemini
  ...
  provider_options:
    base_url: https://...

Other HTTP options are available, see the python-genai documentation.

:::

OpenAI

OpenAI does not have a free tier for their API. With the release of gpt-4o, pricing has been reduced and each generation should cost fractions of a cent if you choose to go this route.

Supported Models

You must use a vision capable model with Frigate. Current model variants can be found in their documentation.

Get API Key

To start using OpenAI, you must first create an API key and configure billing.

Configuration

genai:
  provider: openai
  api_key: "{FRIGATE_OPENAI_API_KEY}"
  model: gpt-4o

:::note

To use a different OpenAI-compatible API endpoint, set the OPENAI_BASE_URL environment variable to your provider's API URL.

:::

:::tip

For OpenAI-compatible servers (such as llama.cpp) that don't expose the configured context size in the API response, you can manually specify the context size in provider_options:

genai:
  provider: openai
  base_url: http://your-llama-server
  model: your-model-name
  provider_options:
    context_size: 8192 # Specify the configured context size

This ensures Frigate uses the correct context window size when generating prompts.

:::

Azure OpenAI

Microsoft offers several vision models through Azure OpenAI. A subscription is required.

Supported Models

You must use a vision capable model with Frigate. Current model variants can be found in their documentation.

Create Resource and Get API Key

To start using Azure OpenAI, you must first create a resource. You'll need your API key, model name, and resource URL, which must include the api-version parameter (see the example below).

Configuration

genai:
  provider: azure_openai
  base_url: https://instance.cognitiveservices.azure.com/openai/responses?api-version=2025-04-01-preview
  model: gpt-5-mini
  api_key: "{FRIGATE_OPENAI_API_KEY}"