mirror of
https://github.com/navidrome/navidrome.git
synced 2026-06-02 07:01:36 +00:00
* feat(transcoding): add MaxConcurrent and MaxConcurrentPerUser config Introduce Transcoding.MaxConcurrent (default NumCPU()*2) and Transcoding.MaxConcurrentPerUser (default 3) to support upcoming concurrency limits on the streaming pipeline. No behavior change yet. Refs #5246 * feat(transcoding): add TranscodeLimiter with global and per-user caps Introduce a non-blocking limiter that gates concurrent transcodes. Returns ErrTooManyTranscodes immediately when the cap is reached so callers can translate it into a 429 response, rather than queuing requests. The per-user reservation is taken first to avoid burning a global slot that would only be rolled back when the per-user cap rejects the caller. Release is idempotent so wrapping the transcoder reader's Close is safe. Refs #5246 * feat(transcoding): cap concurrent transcodes in media streamer Acquire a TranscodeLimiter slot before spawning ffmpeg in the transcoding cache's read function, and release it when the resulting reader is closed. Raw streams and cache hits bypass the limiter so a single saturating client cannot block ordinary playback. When the cap is reached, ErrTooManyTranscodes bubbles up through cache.Get, ready for the HTTP layer to translate into a 429 response. Refs #5246 * feat(transcoding): return HTTP 429 with Retry-After when transcode cap is hit Map stream.ErrTooManyTranscodes to HTTP 429 in both the Subsonic API (/stream, /download) and the public share endpoint, including a 5s Retry-After hint. The Subsonic response still carries a failed-status envelope so clients that ignore HTTP codes also see the failure. Refs #5246 * feat(transcoding): default MaxConcurrent to 0 (disabled) Ship the limiter opt-in so existing installations are not affected by a behavior change on upgrade. Users hitting the DoS reported in #5246 can enable it by setting Transcoding.MaxConcurrent to a positive value (NumCPU()*2 is a reasonable starting point). Refs #5246 * fix(transcoding): make global and per-user caps independent Previously the limiter short-circuited to a no-op whenever MaxConcurrent was zero, silently ignoring a configured MaxConcurrentPerUser. Treat each cap independently so an operator can throttle per-user without enforcing a global ceiling (or vice versa), and only fall back to the no-op limiter when both caps are disabled. * fix(archiver): abort archive download when the transcode limiter rejects The album/artist/playlist zip writers were silently producing zip entries with headers but no data when ms.NewStream returned ErrTooManyTranscodes, because the per-file error was discarded by `_ = a.addFileToZip(...)`. The client received HTTP 200 with a corrupt zip and no indication that the server was rate-limited. Now the zip loop bails out as soon as it sees ErrTooManyTranscodes, and the Download handler swallows the error (the response status and Content-Disposition are already flushed by the time the limit is hit, so no 429 can be sent). The truncated zip surfaces the problem to the client; operators see a clear "transcode cap reached" warning in the server logs. Refs #5246 * fix(transcoding): release limiter slot on client close, not ffmpeg EOF Previously the slot was wrapped around the ffmpeg source reader, so it was only released by the cache's background copyAndClose goroutine when ffmpeg finished producing the file — meaning a client that disconnected after a single byte still held the slot for the full transcode duration. Under MaxConcurrent=N this serialized fresh requests behind abandoned encodes for minutes. Hand the release function back from the cache producer via the streamJob struct and wire it into the consumer-side Stream.Close. The HTTP handler already runs `defer stream.Close()`, so disconnect now frees the slot immediately. Cache hits never enter the producer and still pay no slot, and singleflight waiters on the same key correctly inherit no release (only the original producer's job holds the slot). Refs #5246 * fix(transcoding): skip per-user cap for anonymous requests Public share viewers have no user in context, so userName(ctx) returned the literal string "UNKNOWN" and the limiter mapped every anonymous viewer to the same bucket. With MaxConcurrentPerUser=N, only N unrelated anonymous clients could stream a viral share at any time — the opposite of the fairness the per-user cap is meant to provide. Introduce a limiterKey(ctx) helper that returns "" for anonymous callers (userName(ctx) is unchanged for logs), and teach Acquire to skip the per-user reservation when the key is empty. The global cap is still enforced for anonymous traffic and remains the protection against runaway anonymous load. Refs #5246 * refactor(transcoding): tidy limiter struct and centralize Retry-After Per review feedback: - Drop the redundant maxConcurrent field on transcodeLimiter; the channel capacity already enforces the global cap and the field was only used inside the constructor. - Only allocate the perUser map when MaxConcurrentPerUser > 0. - Move the Retry-After value into core/stream as RetryAfterSeconds so the Subsonic API and public-share handlers cannot drift if the window is later tuned. * fix(transcoding): do not log limiter rejections as cache failures NewStream was emitting an error-level "Error accessing transcoding cache" log whenever cache.Get returned anything non-nil, including the limiter's ErrTooManyTranscodes — even though the producer had already logged the rejection at warn level. The result was double logging and a misleading "cache failure" classification that buries real cache problems. Skip the error log when the cause is ErrTooManyTranscodes; the warn line from the producer is the canonical signal. * fix(archiver): open stream before writing zip entry header Per review: addFileToZip previously called z.CreateHeader before NewStream, so when the limiter rejected a transcode the zip already contained a 0-byte entry for that track. Open the source first and only write the header once the read side is ready; rejections now skip the entry entirely. The truncation comment in handleArchiveErr was also misleading — z.Close finalises the central directory, so the client receives a well-formed zip containing only the tracks written before the rejection, not a "truncated" archive. Reword to match reality. * fix(transcoding): hold slot for ffmpeg lifetime, force cancellable ctx The previous release-on-consumer-close design let a client open many unique transcodes, disconnect immediately, and still spawn the configured cap's worth of ffmpeg processes — the cache writer goroutine continued draining ffmpeg to disk after the client disappeared, defeating the DoS protection the limiter is meant to provide. Move the release back onto the source reader so the slot is freed only when ffmpeg actually exits (either EOF or context cancellation). To keep disconnects from leaking slots for the full transcode duration, force the request context into ffmpeg whenever the limiter is enabled — so client disconnect cancels the process and frees the slot promptly. When the limiter is disabled, the legacy EnableTranscodingCancellation behavior is preserved unchanged. Reported by codex and Copilot reviewers on #5522.