how does `default_response_timeout` work? · pytorch/serve#2452

(6 comments) (0 reactions) (0 assignees)Java (790 forks)batch import

documentationgood first issuetriaged

Repository metrics

Stars: (3,844 stars)
PR merge metrics: (No merged PRs in 30d)

Description

📚 The doc issue

I set the value of default_response_timeout to 4 i.e. 4 seconds. At the start of the model load, this happens after 4 (ish) seconds:

org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time

My guess is because the model takes a while to load (more than 4 seconds), the worker gets killed. Is there a way to set a larger initial delay i.e. differentiate these two scenarios:

account for the initial model load with a number different from default_response_timeout
if model doesn't response in default_response_timeout after the initial load, then kill the worker

Suggest a potential alternative/fix

No response

Contributor guide

Research direction: Examine the issue comments for any maintainer suggestions or workarounds. Look at the PyTorch Serve source code, particularly the worker initialization files (e.g., `wlm.py` for Python backend or relevant Java classes) to understand how `default response timeout` is currently implemented. Propose adding a separate configuration parameter like `initial load timeout` to differentiate between model loading and subsequent timeouts. Update the documentation to clarify this behavior.
Tech stack: pythonpytorchjava
Domain: documentationbackend
Issue type: Documentation
Difficulty: 2
Estimated time: 1-3 hours
Activity status: Stale
Clarity: Clear
Prerequisites: PyTorch Serveconfiguration
Newbie friendliness: 80

Repository metrics

Description

📚 The doc issue

Suggest a potential alternative/fix

Contributor guide

Get fresh easy issues in your inbox.