feat: GPU model lazy-load/unload lifecycle management
- Domain: add ModelState, ModelStateEvent, ModelNotReady, ManageModelLifecycle (in-port), ModelLoader and ModelStateEventBus (out-ports) - Application: InMemoryModelStateEventBus; ModelLifecycleService — state machine (ReentrantLock), lazy load on first request, idle-timeout auto-unload (configurable via trueref.embedding.idle-timeout-seconds, default 300 s), job-guard (skips unload while ingestion running), platform-thread CUDA executor - Adapters: OnnxModelLoader wires embedder + reranker start/stop; remove @PostConstruct/@PreDestroy from OnnxEmbeddingService and OnnxRerankerService; requireStarted() now throws ModelNotReady instead of IllegalStateException - REST: GET /api/model/status, POST /api/model/unload (409 when jobs running, force=true to override), GET /api/model/status/stream (SSE) - GlobalExceptionHandler: ModelNotReady -> 503 + Retry-After header - HybridSearchService: calls lifecycle.ensureReady() before every search so both REST and MCP paths get ModelNotReady (-> 503 / MCP error) when unloaded - TrueRefMcpTools: catches ModelNotReady, returns retry hint in MCP error text - Tests: InMemoryModelStateEventBusTest, ModelLifecycleServiceTest (10 cases), OnnxModelLoaderTest, GlobalExceptionHandlerTest — all 41 tests green Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -7,6 +7,7 @@ import com.trueref.application.observability.InMemoryJobEventBus;
|
||||
import com.trueref.application.observability.JobObservationService;
|
||||
import com.trueref.application.resolve.LibraryResolver;
|
||||
import com.trueref.application.search.HybridSearchService;
|
||||
import com.trueref.domain.port.in.ManageModelLifecycle;
|
||||
import com.trueref.domain.port.out.ChunkStore;
|
||||
import com.trueref.domain.port.out.CodeParser;
|
||||
import com.trueref.domain.port.out.EmbeddingCache;
|
||||
@@ -76,10 +77,11 @@ public class ApplicationBeans {
|
||||
EmbeddingService embedder,
|
||||
RerankerService reranker,
|
||||
RepositoryStore repos,
|
||||
ManageModelLifecycle lifecycle,
|
||||
@Value("${trueref.search.rrf-k:60}") int rrfK,
|
||||
@Value("${trueref.reranker.top-k:50}") int rerankTopK,
|
||||
@Value("${trueref.search.final-top-k:20}") int finalTopK) {
|
||||
return new HybridSearchService(chunks, embedder, reranker, repos, rrfK, rerankTopK, finalTopK);
|
||||
return new HybridSearchService(chunks, embedder, reranker, repos, lifecycle, rrfK, rerankTopK, finalTopK);
|
||||
}
|
||||
|
||||
@Bean
|
||||
|
||||
Reference in New Issue
Block a user