Compare commits

...

3 Commits

Author SHA1 Message Date
Nicolas Mowen
68382d89b4
Cleanup detection (#17785)
* Fix yolov9 NMS

* Improve batched yolo NMS

* Consolidate grids and strides calculation

* Use existing variable

* Remove

* Ensure init is called
2025-04-18 10:26:34 -06:00
Josh Hawkins
14a32a6472
LPR tweaks (#17783)
* clarify docs

* improve debugging messages

* don't run any lpr postprocessing

* wording
2025-04-18 07:45:37 -06:00
Nicolas Mowen
19aaa64fe9
Add support for yolox models to onnx detector (#17773) 2025-04-18 06:40:06 -05:00
13 changed files with 243 additions and 119 deletions

View File

@ -184,7 +184,7 @@ cameras:
ffmpeg: ... # add your streams
detect:
enabled: True
fps: 5 # increase to 10 if vehicles move quickly across your frame. Higher than 10 is unnecessary and is not recommended.
fps: 5 # increase to 10 if vehicles move quickly across your frame. Higher than 15 is unnecessary and is not recommended.
min_initialized: 2
width: 1920
height: 1080
@ -267,7 +267,7 @@ With this setup:
- Review items will always be classified as a `detection`.
- Snapshots will always be saved.
- Zones and object masks are **not** used.
- The `frigate/events` MQTT topic will **not** publish tracked object updates, though `frigate/reviews` will if recordings are enabled.
- The `frigate/events` MQTT topic will **not** publish tracked object updates with the license plate bounding box and score, though `frigate/reviews` will publish if recordings are enabled. If a plate is recognized as a known plate, publishing will occur with an updated `sub_label` field. If characters are recognized, publishing will occur with an updated `recognized_license_plate` field.
- License plate snapshots are saved at the highest-scoring moment and appear in Explore.
- Debug view will not show `license_plate` bounding boxes.
@ -280,7 +280,7 @@ With this setup:
| Object Detection | Standard Frigate+ detection applies | Bypasses standard object detection |
| Zones & Object Masks | Supported | Not supported |
| Debug View | May show `license_plate` bounding boxes | May **not** show `license_plate` bounding boxes |
| MQTT `frigate/events` | Publishes tracked object updates | Does **not** publish tracked object updates |
| MQTT `frigate/events` | Publishes tracked object updates | Publishes limited updates |
| Explore | Recognized plates available in More Filters | Recognized plates available in More Filters |
By selecting the appropriate configuration, users can optimize their dedicated LPR cameras based on whether they are using a Frigate+ model or the secondary LPR pipeline.

View File

@ -659,7 +659,7 @@ YOLOv3, YOLOv4, YOLOv7, and [YOLOv9](https://github.com/WongKinYiu/yolov9) model
:::tip
The YOLO detector has been designed to support YOLOv3, YOLOv4, YOLOv7, and YOLOv9 models, but may support other YOLO model architectures as well.
The YOLO detector has been designed to support YOLOv3, YOLOv4, YOLOv7, and YOLOv9 models, but may support other YOLO model architectures as well. See [the models section](#downloading-yolo-models) for more information on downloading YOLO models for use in Frigate.
:::
@ -682,6 +682,29 @@ model:
Note that the labelmap uses a subset of the complete COCO label set that has only 80 objects.
#### YOLOx
[YOLOx](https://github.com/Megvii-BaseDetection/YOLOX) models are supported, but not included by default. See [the models section](#downloading-yolo-models) for more information on downloading the YOLOx model for use in Frigate.
After placing the downloaded onnx model in your config folder, you can use the following configuration:
```yaml
detectors:
onnx:
type: onnx
model:
model_type: yolox
width: 416 # <--- should match the imgsize set during model export
height: 416 # <--- should match the imgsize set during model export
input_tensor: nchw_denorm
input_dtype: float
path: /config/model_cache/yolox_tiny.onnx
labelmap_path: /labelmap/coco-80.txt
```
Note that the labelmap uses a subset of the complete COCO label set that has only 80 objects.
#### RF-DETR
[RF-DETR](https://github.com/roboflow/rf-detr) is a DETR based model. The ONNX exported models are supported, but not included by default. See [the models section](#downloading-rf-detr-model) for more information on downloading the RF-DETR model for use in Frigate.
@ -962,6 +985,10 @@ The input image size in this notebook is set to 320x320. This results in lower C
### Downloading YOLO Models
#### YOLOx
YOLOx models can be downloaded [from the YOLOx repo](https://github.com/Megvii-BaseDetection/YOLOX/tree/main/demo/ONNXRuntime).
#### YOLOv3, YOLOv4, and YOLOv7
To export as ONNX:

View File

@ -513,10 +513,14 @@ class FrigateConfig(FrigateBaseModel):
)
# Warn if detect fps > 10
if camera_config.detect.fps > 10:
if camera_config.detect.fps > 10 and camera_config.type != "lpr":
logger.warning(
f"{camera_config.name} detect fps is set to {camera_config.detect.fps}. This does NOT need to match your camera's frame rate. High values could lead to reduced performance. Recommended value is 5."
)
if camera_config.detect.fps > 15 and camera_config.type == "lpr":
logger.warning(
f"{camera_config.name} detect fps is set to {camera_config.detect.fps}. This does NOT need to match your camera's frame rate. High values could lead to reduced performance. Recommended value for LPR cameras are between 5-15."
)
# Default min_initialized configuration
min_initialized = int(camera_config.detect.fps / 2)

View File

@ -490,10 +490,6 @@ class LicensePlateProcessingMixin:
merged_boxes.append(current_box)
current_box = next_box
logger.debug(
f"Provided plate_width: {plate_width}, max_gap: {max_gap}, horizontal_gap: {horizontal_gap}"
)
# Add the last box
merged_boxes.append(current_box)
@ -1133,7 +1129,7 @@ class LicensePlateProcessingMixin:
# 4. Log the comparison
logger.debug(
f"Plate comparison - Current: {top_plate} (score: {curr_score:.3f}, min_conf: {curr_min_conf:.2f}) vs "
f"Previous: {prev_plate} (score: {prev_score:.3f}, min_conf: {prev_min_conf:.2f})\n"
f"Previous: {prev_plate} (score: {prev_score:.3f}, min_conf: {prev_min_conf:.2f}) "
f"Metrics - Length: {len(top_plate)} vs {len(prev_plate)} (scores: {curr_length_score:.2f} vs {prev_length_score:.2f}), "
f"Area: {top_area} vs {prev_area}, "
f"Avg Conf: {avg_confidence:.2f} vs {prev_avg_confidence:.2f}, "
@ -1263,6 +1259,15 @@ class LicensePlateProcessingMixin:
)
return
# don't run for objects with no position changes
# this is the initial state after registering a new tracked object
# LPR will run 2 frames after detect.min_initialized is reached
if obj_data.get("position_changes", 0) == 0:
logger.debug(
f"{camera}: Plate detected in {self.config.cameras[camera].detect.min_initialized + 1} concurrent frames, LPR frame threshold ({self.config.cameras[camera].detect.min_initialized + 2})"
)
return
license_plate: Optional[dict[str, any]] = None
if "license_plate" not in self.config.cameras[camera].objects.track:
@ -1401,6 +1406,8 @@ class LicensePlateProcessingMixin:
license_plate_frame,
)
logger.debug(f"{camera}: Running plate recognition.")
# run detection, returns results sorted by confidence, best first
start = datetime.datetime.now().timestamp()
license_plates, confidences, areas = self._process_license_plate(

View File

@ -54,6 +54,9 @@ class LicensePlatePostProcessor(LicensePlateProcessingMixin, PostProcessorApi):
Returns:
None.
"""
# don't run LPR post processing for now
return
event_id = data["event_id"]
camera_name = data["camera"]

View File

@ -16,7 +16,7 @@ class DetectionApi(ABC):
@abstractmethod
def __init__(self, detector_config: BaseDetectorConfig):
self.detector_config = detector_config
self.thresh = 0.5
self.thresh = 0.4
self.height = detector_config.model.height
self.width = detector_config.model.width
@ -24,58 +24,21 @@ class DetectionApi(ABC):
def detect_raw(self, tensor_input):
pass
def post_process_yolonas(self, output):
"""
@param output: output of inference
expected shape: [np.array(1, N, 4), np.array(1, N, 80)]
where N depends on the input size e.g. N=2100 for 320x320 images
def calculate_grids_strides(self) -> None:
grids = []
expanded_strides = []
@return: best results: np.array(20, 6) where each row is
in this order (class_id, score, y1/height, x1/width, y2/height, x2/width)
"""
# decode and orient predictions
strides = [8, 16, 32]
hsizes = [self.height // stride for stride in strides]
wsizes = [self.width // stride for stride in strides]
N = output[0].shape[1]
for hsize, wsize, stride in zip(hsizes, wsizes, strides):
xv, yv = np.meshgrid(np.arange(wsize), np.arange(hsize))
grid = np.stack((xv, yv), 2).reshape(1, -1, 2)
grids.append(grid)
shape = grid.shape[:2]
expanded_strides.append(np.full((*shape, 1), stride))
boxes = output[0].reshape(N, 4)
scores = output[1].reshape(N, 80)
class_ids = np.argmax(scores, axis=1)
scores = scores[np.arange(N), class_ids]
args_best = np.argwhere(scores > self.thresh)[:, 0]
num_matches = len(args_best)
if num_matches == 0:
return np.zeros((20, 6), np.float32)
elif num_matches > 20:
args_best20 = np.argpartition(scores[args_best], -20)[-20:]
args_best = args_best[args_best20]
boxes = boxes[args_best]
class_ids = class_ids[args_best]
scores = scores[args_best]
boxes = np.transpose(
np.vstack(
(
boxes[:, 1] / self.height,
boxes[:, 0] / self.width,
boxes[:, 3] / self.height,
boxes[:, 2] / self.width,
)
)
)
results = np.hstack(
(class_ids[..., np.newaxis], scores[..., np.newaxis], boxes)
)
return np.resize(results, (20, 6))
def post_process(self, output):
if self.detector_config.model.model_type == ModelTypeEnum.yolonas:
return self.post_process_yolonas(output)
else:
raise ValueError(
f'Model type "{self.detector_config.model.model_type}" is currently not supported.'
)
self.grids = np.concatenate(grids, 1)
self.expanded_strides = np.concatenate(expanded_strides, 1)

View File

@ -31,6 +31,7 @@ class InputTensorEnum(str, Enum):
class InputDTypeEnum(str, Enum):
float = "float"
float_denorm = "float_denorm" # non-normalized float
int = "int"

View File

@ -14,6 +14,7 @@ from frigate.util.model import (
post_process_dfine,
post_process_rfdetr,
post_process_yolo,
post_process_yolox,
)
logger = logging.getLogger(__name__)
@ -30,6 +31,8 @@ class ONNXDetector(DetectionApi):
type_key = DETECTOR_KEY
def __init__(self, detector_config: ONNXDetectorConfig):
super().__init__(detector_config)
try:
import onnxruntime as ort
@ -51,13 +54,14 @@ class ONNXDetector(DetectionApi):
path, providers=providers, provider_options=options
)
self.h = detector_config.model.height
self.w = detector_config.model.width
self.onnx_model_type = detector_config.model.model_type
self.onnx_model_px = detector_config.model.input_pixel_format
self.onnx_model_shape = detector_config.model.input_tensor
path = detector_config.model.path
if self.onnx_model_type == ModelTypeEnum.yolox:
self.calculate_grids_strides()
logger.info(f"ONNX: {path} loaded")
def detect_raw(self, tensor_input: np.ndarray):
@ -66,10 +70,12 @@ class ONNXDetector(DetectionApi):
None,
{
"images": tensor_input,
"orig_target_sizes": np.array([[self.h, self.w]], dtype=np.int64),
"orig_target_sizes": np.array(
[[self.height, self.width]], dtype=np.int64
),
},
)
return post_process_dfine(tensor_output, self.w, self.h)
return post_process_dfine(tensor_output, self.width, self.height)
model_input_name = self.model.get_inputs()[0].name
tensor_output = self.model.run(None, {model_input_name: tensor_input})
@ -91,14 +97,22 @@ class ONNXDetector(DetectionApi):
detections[i] = [
class_id,
confidence,
y_min / self.h,
x_min / self.w,
y_max / self.h,
x_max / self.w,
y_min / self.height,
x_min / self.width,
y_max / self.height,
x_max / self.width,
]
return detections
elif self.onnx_model_type == ModelTypeEnum.yologeneric:
return post_process_yolo(tensor_output, self.w, self.h)
return post_process_yolo(tensor_output, self.width, self.height)
elif self.onnx_model_type == ModelTypeEnum.yolox:
return post_process_yolox(
tensor_output[0],
self.width,
self.height,
self.grids,
self.expanded_strides,
)
else:
raise Exception(
f"{self.onnx_model_type} is currently not supported for onnx. See the docs for more info on supported models."

View File

@ -38,6 +38,7 @@ class OvDetector(DetectionApi):
]
def __init__(self, detector_config: OvDetectorConfig):
super().__init__(detector_config)
self.ov_core = ov.Core()
self.ov_model_type = detector_config.model.model_type
@ -133,25 +134,7 @@ class OvDetector(DetectionApi):
break
self.num_classes = tensor_shape[2] - 5
logger.info(f"YOLOX model has {self.num_classes} classes")
self.set_strides_grids()
def set_strides_grids(self):
grids = []
expanded_strides = []
strides = [8, 16, 32]
hsize_list = [self.h // stride for stride in strides]
wsize_list = [self.w // stride for stride in strides]
for hsize, wsize, stride in zip(hsize_list, wsize_list, strides):
xv, yv = np.meshgrid(np.arange(wsize), np.arange(hsize))
grid = np.stack((xv, yv), 2).reshape(1, -1, 2)
grids.append(grid)
shape = grid.shape[:2]
expanded_strides.append(np.full((*shape, 1), stride))
self.grids = np.concatenate(grids, 1)
self.expanded_strides = np.concatenate(expanded_strides, 1)
self.calculate_grids_strides()
## Takes in class ID, confidence score, and array of [x, y, w, h] that describes detection position,
## returns an array that's easily passable back to Frigate.

View File

@ -4,6 +4,7 @@ import re
import urllib.request
from typing import Literal
import numpy as np
from pydantic import Field
from frigate.const import MODEL_CACHE_DIR
@ -150,6 +151,62 @@ class Rknn(DetectionApi):
'Make sure to set the model input_tensor to "nhwc" in your config.'
)
def post_process_yolonas(self, output: list[np.ndarray]):
"""
@param output: output of inference
expected shape: [np.array(1, N, 4), np.array(1, N, 80)]
where N depends on the input size e.g. N=2100 for 320x320 images
@return: best results: np.array(20, 6) where each row is
in this order (class_id, score, y1/height, x1/width, y2/height, x2/width)
"""
N = output[0].shape[1]
boxes = output[0].reshape(N, 4)
scores = output[1].reshape(N, 80)
class_ids = np.argmax(scores, axis=1)
scores = scores[np.arange(N), class_ids]
args_best = np.argwhere(scores > self.thresh)[:, 0]
num_matches = len(args_best)
if num_matches == 0:
return np.zeros((20, 6), np.float32)
elif num_matches > 20:
args_best20 = np.argpartition(scores[args_best], -20)[-20:]
args_best = args_best[args_best20]
boxes = boxes[args_best]
class_ids = class_ids[args_best]
scores = scores[args_best]
boxes = np.transpose(
np.vstack(
(
boxes[:, 1] / self.height,
boxes[:, 0] / self.width,
boxes[:, 3] / self.height,
boxes[:, 2] / self.width,
)
)
)
results = np.hstack(
(class_ids[..., np.newaxis], scores[..., np.newaxis], boxes)
)
return np.resize(results, (20, 6))
def post_process(self, output):
if self.detector_config.model.model_type == ModelTypeEnum.yolonas:
return self.post_process_yolonas(output)
else:
raise ValueError(
f'Model type "{self.detector_config.model.model_type}" is currently not supported.'
)
def detect_raw(self, tensor_input):
output = self.rknn.inference(
[

View File

@ -77,6 +77,8 @@ class LocalObjectDetector(ObjectDetector):
if self.dtype == InputDTypeEnum.float:
tensor_input = tensor_input.astype(np.float32)
tensor_input /= 255
elif self.dtype == InputDTypeEnum.float_denorm:
tensor_input = tensor_input.astype(np.float32)
return self.detect_api.detect_raw(tensor_input=tensor_input)

View File

@ -138,11 +138,13 @@ class TrackedObject:
if not self.false_positive and has_valid_frame:
# determine if this frame is a better thumbnail
if self.thumbnail_data is None or is_better_thumbnail(
if self.thumbnail_data is None or (
better_thumb := is_better_thumbnail(
self.obj_data["label"],
self.thumbnail_data,
obj_data,
self.camera_config.frame_shape,
)
):
# use the current frame time if the object's frame time isn't in the frame cache
selected_frame_time = (
@ -150,6 +152,13 @@ class TrackedObject:
if obj_data["frame_time"] not in self.frame_cache.keys()
else obj_data["frame_time"]
)
if (
obj_data["frame_time"] not in self.frame_cache.keys()
and not better_thumb
):
logger.warning(
f"Frame time {obj_data['frame_time']} not in frame cache, using current frame time {selected_frame_time}"
)
self.thumbnail_data = {
"frame_time": selected_frame_time,
"box": obj_data["box"],

View File

@ -148,27 +148,17 @@ def __post_process_multipart_yolo(
bw = ((dw * 2.0) ** 2) * anchor_w
bh = ((dh * 2.0) ** 2) * anchor_h
x1 = max(0, bx - bw / 2) / width
y1 = max(0, by - bh / 2) / height
x2 = min(width, bx + bw / 2) / width
y2 = min(height, by + bh / 2) / height
x1 = max(0, bx - bw / 2)
y1 = max(0, by - bh / 2)
x2 = min(width, bx + bw / 2)
y2 = min(height, by + bh / 2)
all_boxes.append([x1, y1, x2, y2])
all_scores.append(conf)
all_class_ids.append(class_id)
formatted_boxes = [
[
int(x1 * width),
int(y1 * height),
int((x2 - x1) * width),
int((y2 - y1) * height),
]
for x1, y1, x2, y2 in all_boxes
]
indices = cv2.dnn.NMSBoxes(
bboxes=formatted_boxes,
bboxes=all_boxes,
scores=all_scores,
score_threshold=0.4,
nms_threshold=0.4,
@ -181,13 +171,25 @@ def __post_process_multipart_yolo(
class_id = all_class_ids[idx]
conf = all_scores[idx]
x1, y1, x2, y2 = all_boxes[idx]
results[i] = [class_id, conf, y1, x1, y2, x2]
results[i] = [
class_id,
conf,
y1 / height,
x1 / width,
y2 / height,
x2 / width,
]
return np.array(results, dtype=np.float32)
def __post_process_nms_yolo(predictions: np.ndarray, width, height) -> np.ndarray:
predictions = np.squeeze(predictions).T
predictions = np.squeeze(predictions)
# transpose the output so it has order (inferences, class_ids)
if predictions.shape[0] < predictions.shape[1]:
predictions = predictions.T
scores = np.max(predictions[:, 4:], axis=1)
predictions = predictions[scores > 0.4, :]
scores = scores[scores > 0.4]
@ -195,9 +197,14 @@ def __post_process_nms_yolo(predictions: np.ndarray, width, height) -> np.ndarra
# Rescale box
boxes = predictions[:, :4]
boxes_xyxy = np.ones_like(boxes)
boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2] / 2
boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3] / 2
boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2] / 2
boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3] / 2
boxes = boxes_xyxy
input_shape = np.array([width, height, width, height])
boxes = np.divide(boxes, input_shape, dtype=np.float32)
# run NMS
indices = cv2.dnn.NMSBoxes(boxes, scores, score_threshold=0.4, nms_threshold=0.4)
detections = np.zeros((20, 6), np.float32)
for i, (bbox, confidence, class_id) in enumerate(
@ -209,10 +216,10 @@ def __post_process_nms_yolo(predictions: np.ndarray, width, height) -> np.ndarra
detections[i] = [
class_id,
confidence,
bbox[1] - bbox[3] / 2,
bbox[0] - bbox[2] / 2,
bbox[1] + bbox[3] / 2,
bbox[0] + bbox[2] / 2,
bbox[1] / height,
bbox[0] / width,
bbox[3] / height,
bbox[2] / width,
]
return detections
@ -225,6 +232,53 @@ def post_process_yolo(output: list[np.ndarray], width: int, height: int) -> np.n
return __post_process_nms_yolo(output[0], width, height)
def post_process_yolox(
predictions: np.ndarray,
width: int,
height: int,
grids: np.ndarray,
expanded_strides: np.ndarray,
) -> np.ndarray:
predictions[..., :2] = (predictions[..., :2] + grids) * expanded_strides
predictions[..., 2:4] = np.exp(predictions[..., 2:4]) * expanded_strides
# process organized predictions
predictions = predictions[0]
boxes = predictions[:, :4]
scores = predictions[:, 4:5] * predictions[:, 5:]
boxes_xyxy = np.ones_like(boxes)
boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2] / 2
boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3] / 2
boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2] / 2
boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3] / 2
cls_inds = scores.argmax(1)
scores = scores[np.arange(len(cls_inds)), cls_inds]
indices = cv2.dnn.NMSBoxes(
boxes_xyxy, scores, score_threshold=0.4, nms_threshold=0.4
)
detections = np.zeros((20, 6), np.float32)
for i, (bbox, confidence, class_id) in enumerate(
zip(boxes_xyxy[indices], scores[indices], cls_inds[indices])
):
if i == 20:
break
detections[i] = [
class_id,
confidence,
bbox[1] / height,
bbox[0] / width,
bbox[3] / height,
bbox[2] / width,
]
return detections
### ONNX Utilities