ZONE

Title

Description.

jama×faiss
×azure

把 Jama 需求库 转成 AI 能直接查的语义索引—— 离线侧每小时跑一次：Azure Pipeline 拉差量数据 → 用 sentence-transformers 的 all-MiniLM-L6-v2 模型（384 维、CPU 推理） 把每条 Item 转成向量 → 写进项目对应的 .faiss 文件 → 推到 Azure Artifacts。 在线侧用户在 Jama 页面里问一句，Azure OpenAI gpt-5 走 ReAct 循环（最多 3 轮）自己挑工具——做 FAISS 关键词检索、或顺着 Jama 关系图抓子项 / 上下游 / 测试记录 / 评论——直到觉得够了再生成最终回答，逐字流回对话面板。 Turn the Jama requirements library into a semantic index AI can query directly. Offline, every hour: Azure Pipeline pulls a diff → the sentence-transformers all-MiniLM-L6-v2 model (384-dim, CPU) turns each Item into a vector → written into that project's .faiss → pushed to Azure Artifacts. Online, the user asks a question inside Jama; Azure OpenAI gpt-5 drives a ReAct loop (≤3 rounds) where the model itself picks tools — FAISS keyword search, or walking the Jama relationship graph for children / upstream-downstream / test runs / comments — until it has enough, then streams the final answer back token-by-token into the chat panel.

8层strata architectural strata

1h incremental cadence

2flows offline · online

3fallbacks graceful degradation

点击图中任意节点查看其职责与数据形态；切换 §02 / §03 / §04 标签深入八层架构、流程拆解与关键机制。 Click any node in the diagram to see its role and data shape; switch §02 / §03 / §04 to dive into the eight strata, the walkthroughs, and the design mechanisms.

§ 02 · Strata

八层体系架构 Eight Strata

L·01

Data Source数据采集层acquisitionSOURCE

全量 / 增量拉取 Jama 中的需求、用例、任务等 Item 元数据；首次全量快照，之后基于更新时间戳与版本号做差量。Full and incremental pull of Jama Items — requirements, test cases, tasks. First a full snapshot, then differential by update timestamp and version.

Jama API同步脚本sync scriptOAuth Token

L·02

Code & Data Repository代码仓储层repositorySTORE

原始 Jama CSV、同步脚本与向量化脚本均托管于 automation 组的 Git 仓库——版本化数据归档与流水线代码源合一。Raw Jama CSV, sync and vectorization scripts all live in the automation team's Git repo — versioned data archive and pipeline source-of-truth in one.

auto_ansible GitCSVscripts

L·03

Orchestration调度编排层orchestrationPIPE

每小时定时触发，串行执行 fetch → merge → embed → publish 全链路；任一环节失败保留上一版可用产物。Triggered every hour, runs the full fetch → merge → embed → publish chain serially; on failure, the previous good artifact is preserved.

Azure Pipelinecron · 1h

L·04

Vectorization向量处理层embeddingCOMPUTE

从每条 Item 抽出标题、描述、备注等核心文本，拼成一段字符串，喂给 sentence-transformers/all-MiniLM-L6-v2 模型，得到一串 384 维浮点向量——这串数字就是"这条 Item 的语义指纹"，意思越接近的两条 Item，向量越像。模型是开源轻量级（~80MB），跑在 Azure VM 的 CPU 上，不调任何外部 API、不打外网。向量按 item_id 追加进项目对应的 .faiss 文件——只动新增 / 修改过的条目，不重做全量。From each Item, the title, description, and notes are concatenated into one string and fed to sentence-transformers/all-MiniLM-L6-v2, producing a 384-dim float vector — the "semantic fingerprint" of that Item. Items with similar meaning have similar vectors. The model is open-source and lightweight (~80MB), runs on the Azure VM's CPU, makes no external API calls. Vectors are appended by item_id into the project's .faiss file — only added or modified Items are touched; nothing is recomputed.

all-MiniLM-L6-v2384 dimFAISSCPU · no GPUid ↔ vector

L·05

Artifacts Repository制品仓库层artifactsSTORE

FAISS 索引 + ID 映射元数据按时间 / 版本号打标，统一归档于 Azure Artifacts；运行端按需拉取，支持回滚。FAISS index + ID-mapping metadata are tagged by time and version, archived in Azure Artifacts; runtime pulls on demand, with rollback supported.

Azure ArtifactsSemVerimmutable

L·06

Retrieval Service检索服务层retrievalRUNTIME

常驻 FastAPI 服务（Azure VM, 端口 8080），启动时自动检查本地 FAISS 是否缺失/落后 → 缺则去 Azure Artifacts 拉最新版 → 全部加载进内存。对外暴露两类端点：
① 检索类——/api/search（单 query，给 AI Search 模式用）、/api/search_multi（一次接 3-5 个关键短语，合并成一个候选池而不是各搜各的，再用 Cohere-rerank-v4.0-pro 拿用户原问题整段打分——这是 ReAct 主路径）。
② 图导航类——/api/jama/{item, children, relationships, testruns, comments}，代理 Jama 原生 API，统一软处理 404（让 LLM 看到 total:0 自然换工具，而不是报错）。Long-running FastAPI service (Azure VM, port 8080). On start, self-checks local FAISS — pulls from Azure Artifacts if missing/stale, loads everything into memory. Exposes two endpoint families:
① Retrieval — /api/search (single query, used by AI Search mode); /api/search_multi takes 3-5 phrases at once and merges hits into one pool instead of querying each phrase separately, then Cohere-rerank-v4.0-pro reranks against the user's original question as a whole (this is the ReAct main path).
② Graph navigation — /api/jama/{item, children, relationships, testruns, comments}, proxies Jama's native API with uniform soft-404 handling (LLM sees total:0 and naturally tries another tool instead of getting an error).

FastAPI · :8080/search_multiCohere-rerank-v4.0-pro/jama/* graph nav公网可达 *public net *

L·07

Frontend Integration前端集成层frontendFRONT

Manifest V3 Edge 扩展，只注入到 jabra.jamacloud.com/*。content.js 往 Jama 顶栏塞一个"Jama AI"按钮；点开后 chat.js 注入右侧栏（两种模式 tab：AI Agent 走 ReAct，AI Search 直接出 FAISS 命中）；ReAct 主循环 runAgentLoop 写在 api.js 里，直接在 content script 里发 SSE 调 gpt-5——绕开 service worker，避免 MV3 那个 30 秒空闲超时把流式中断。同一条 Jama item 的详情拉取也在前端发起。Manifest V3 Edge extension, injected only into jabra.jamacloud.com/*. content.js drops a "Jama AI" button into Jama's top nav; clicking it has chat.js mount the right-side panel (two tabs: AI Agent uses ReAct, AI Search shows raw FAISS hits). The ReAct main loop runAgentLoop lives in api.js and streams SSE to gpt-5 directly from the content script — bypassing the service worker to dodge MV3's 30-second idle timeout that would otherwise kill the stream. Per-Item detail fetches are also issued from the frontend.

Manifest V3content.js / chat.js / api.jsrunAgentLoopSSE 绕开 SWSSE bypasses SW

L·08

AI ReasoningAI 推理层reasoningAI

Azure OpenAI 企业租户内的 gpt-5，承担一条用户问题的全部推理工作——既负责"想下一步该查什么"，也负责"看完所有材料后怎么回答"。具体走 ReAct 循环（最多 3 轮）：每轮 LLM 看到当前 messages + 7 个可用工具的说明 → 自己挑要调哪些工具（同一轮可并行调多个）→ 工具结果以 role:'tool' 灌回 → 进入下一轮。直到 LLM 自己调 finish_answer 工具，第二段 SSE 流出最终回答。跨轮自动去重已展示过的 item id（这样它可以放心地"同样的关键词、更大的 top_k"扩展搜索）。gpt-5, hosted in Azure OpenAI's enterprise tenant, does all the reasoning for one user question — both "what should I look up next" and "now that I've seen everything, how do I answer". It runs a ReAct loop (≤3 rounds): each round the LLM sees the current messages + 7 tool definitions → picks which tool(s) to call (parallel calls allowed within one round) → tool results come back as role:'tool' messages → next round begins. When the LLM calls finish_answer, a second SSE stream produces the final answer. Already-emitted item IDs are auto-deduped across rounds, so the model can safely "re-search the same phrase with a larger top_k" to expand without seeing duplicates.

Azure OpenAI gpt-5单模型 · 决策+终答one model · decisions+answer7 tools≤3 rounds跨轮 item 去重cross-round dedup

§ 03 · Walkthrough

逐步展开流程 Step through each flow

ModeOFFLINE · 每小时增量hourly incremental 5 stages

触发调度Trigger

Azure Pipeline 定时触发器拉起任务。Azure Pipeline's scheduled trigger fires the job.

每小时整点流水线被自动唤起，运行环境与凭据由 Azure 托管，无需人工介入。The pipeline wakes itself on the hour; the runtime environment and credentials are Azure-managed — no human intervention needed.

增量数据拉取Incremental fetch

基于更新时间戳 / 版本号差量同步。Diff-sync by update timestamp / version.

调用 Jama API，仅拉取上次成功同步以来发生新增、修改、删除的 Item，保留变更类型标记。Calls the Jama API, fetches only Items added / modified / deleted since the last successful sync, with the change-type flag preserved.

CSV 数据更新CSV update

合并 → 去重 → 提交至 auto_ansible。Merge → dedupe → commit to auto_ansible.

增量数据与历史 CSV 合并、去重、变更标记，自动提交进入 Git 版本流——CSV 既是数据源也是审计追踪。Incremental rows are merged with the historical CSV, deduped, and tagged, then auto-committed into the Git stream — CSV is both data source and audit trail.

批量文本向量化Batch vectorization

all-MiniLM-L6-v2（384 维、CPU）增量写入 FAISS。all-MiniLM-L6-v2 (384-dim, CPU) writes incrementally into FAISS.

抽取标题 / 描述 / 备注等核心文本，拼成字符串后过 sentence-transformers/all-MiniLM-L6-v2 模型得到 384 维向量，按 item_id 写回项目对应的 .faiss——只动变更条目。模型本身~80MB、跑在 Azure VM 的 CPU 上，不打外网。Title / description / notes are concatenated and fed to sentence-transformers/all-MiniLM-L6-v2 to produce a 384-dim vector, written back into the project's .faiss by item_id — only changed Items are touched. The model itself is ~80MB and runs on the Azure VM's CPU; no external calls.

向量制品打包上传Publish artifact

FAISS + ID 映射推至 Azure Artifacts。FAISS + ID map pushed to Azure Artifacts.

打版本标 → 推送至 Artifacts → 保留历史版本以支持回滚；记录条数日志，异常自动告警，下游零感知。Version-tag → push to Artifacts → keep history for rollback; row counts logged, exceptions alert automatically, downstream stays unaware.

ModeONLINE · 一次提问的 ReAct 全过程one question, end-to-end ReAct 7 hops

用户在 Jama 页面里发问User asks inside Jama

点 Jama 顶栏的"Jama AI"按钮 → 右侧栏展开 → 输入问题。Clicks "Jama AI" in Jama's top nav → side panel opens → types a question.

扩展早就把按钮注入到 Jama 顶栏了。chat.js 把当前 Item ID、用户问题、过去几轮对话历史打包，交给 api.js 的 runAgentLoop。The extension dropped the button into Jama's top nav at load time. chat.js packages the current Item ID, the user's question, and recent chat history, and hands it to runAgentLoop in api.js.

第 1 轮 ReAct：LLM 思考 + 挑工具ReAct round 1: think + pick tools

gpt-5 边吐"思考"文本，边吐 tool_calls。gpt-5 streams "thought" text plus tool_calls.

content script 直接 fetch 走 SSE 调 gpt-5（绕开 service worker，避免 MV3 30 秒空闲超时把流卡断）。LLM 看到 7 个工具的描述后，先吐一段思考（UI 显示为 🧠 Thought 气泡），再吐一个或多个 tool_calls——同一轮里可以并行调多个工具。The content script fetches gpt-5 over SSE directly (bypassing the service worker, which has a 30-second idle timeout that would kill streaming). The LLM sees 7 tool descriptions, streams a thought segment first (rendered as a 🧠 Thought bubble), then one or more tool_calls — multiple tools can be invoked in parallel within a round.

工具调用：FAISS 检索 / Jama 图导航Tools: FAISS search / Jama graph nav

search_jama 走 /api/search_multi；图导航工具走 /api/jama/*。search_jama hits /api/search_multi; graph-nav tools hit /api/jama/*.

如果 LLM 选了 search_jama：扩展把 3-5 个英文关键短语打包成 HTTPS 请求送到 /api/search_multi。后端把每个短语分别过 all-MiniLM-L6-v2 转 384 维向量、各自做 FAISS 召回，合并去重成单一候选池（≤80），对这个池只调一次 Cohere-rerank-v4.0-pro 按用户原问题整段打分，取 top_k 返回。如果 LLM 选了图导航工具（如 get_item_testruns），则走 /api/jama/testruns，后端代理 Jama 原生 API，限速 + 软处理 404。该轮可能并行触发多个工具，前端用 Promise.all 一起发。If the LLM chose search_jama: the extension sends 3-5 English keyword phrases to /api/search_multi. The backend embeds each phrase separately via all-MiniLM-L6-v2 into a 384-dim vector, runs FAISS recall for each, merges hits into one deduped candidate pool (≤80), runs Cohere-rerank-v4.0-pro once against the user's original full question, and returns the top_k. If the LLM chose a graph-nav tool (e.g. get_item_testruns), it goes to /api/jama/testruns — the backend proxies Jama's native API with rate-limit + soft-404 handling. Multiple tools can fire in parallel in one round; the frontend uses Promise.all.

索引自检 → FAISS 命中 → 拉详情Self-check → FAISS hit → fetch details

向量服务命中 item id 后，由后端顺手把每条 Jama 详情也拉了。After FAISS hits, the backend itself fetches each Item's full details.

本地 .faiss 缺失 / 落后 → 从 Azure Artifacts 拉新版加载入内存；命中得到 item id 列表后，后端代用户的 OAuth token 去 Jama 拉每条详情（缓存在 jama_item_detail/），把 documentKey / fields / parents 等都填齐再交给 rerank。和老版本"前端单独拉详情"的设计不一样了——现在 rerank 用的就是带详情的候选条目。If the local .faiss is missing or stale → pull the latest from Azure Artifacts and load into memory; once item IDs come back, the backend itself fetches each Item's full details via the OAuth token (cached in jama_item_detail/) — filling in documentKey, fields, parents, etc. before handing the pool to rerank. This differs from the older "frontend fetches details separately" design: now rerank operates on enriched items.

第 2-3 轮 ReAct：观察 → 决策Rounds 2-3: observe → decide

工具结果灌回 LLM，它继续挑工具，或调 finish_answer 收尾。Tool results feed back; the LLM picks more tools, or calls finish_answer.

工具结果以 role:'tool' 消息追加到 messages 数组，新一轮 SSE 重发整段 messages。LLM 可能继续挖（"看到 testcase 失败了，再查相关 bug"），也可能觉得材料够了直接调 finish_answer。已经露给 LLM 的 item id 跨轮自动去重——它可以放心地"同样的关键词加大 top_k"扩展搜索，不会看到重复条目。硬上限 3 轮；第 4 轮强制 tool_choice:'none' 收尾。Tool results are appended as role:'tool' messages; a new SSE round re-sends the whole messages array. The LLM might keep digging ("the testcase failed — let me check related bugs") or decide it has enough and call finish_answer. Already-emitted item IDs are auto-deduped across rounds, so the model can safely "re-search the same phrase with a bigger top_k" without seeing duplicates. Hard cap: 3 rounds; round 4 forces tool_choice:'none'.

finish_answer → 第二段 SSE 流出终答finish_answer → second SSE stream

同一个 gpt-5，关掉工具，逐字吐答案。Same gpt-5, tools off, streams tokens.

LLM 调 finish_answer 后，前端立刻开第二段 SSE，tool_choice:'none'，messages 仍然带着所有工具观察结果——gpt-5 这次只输出 delta.content，不再选工具。前端把 delta 按 token 渲染进对话面板（蓝色 ✅ Final 卡片），同时高亮所有被引用过的 documentKey 作为来源徽标。After finish_answer, the frontend opens a second SSE stream with tool_choice:'none'; messages still carry all tool observations. gpt-5 now only streams delta.content, no more tool calls. The frontend renders deltas token-by-token into the chat panel (the blue ✅ Final card) and highlights every referenced documentKey as a source badge.

UI 步骤气泡 + 日志落盘UI step bubbles + log persistence

每个工具结果对应一个彩色气泡；调试日志后台落 jsonl。Every tool result becomes a colored bubble; debug logs land in jsonl on the backend.

整个过程中 UI 会渐次冒出彩色气泡：🧠 Thought / 🔎 Search / 🌳 Children / 🔗 Relationships / 🧪 TestRuns / 📄 Detail / 💬 Comments / ✅ Final——分别对应 7 个工具的执行轨迹。每条会话的完整 ReAct 轨迹（messages / tool_calls / observation / 时延）会被 fire-and-forget POST 到 /api/log，按天落到 Jama_backend/logs/react_YYYY-MM-DD.jsonl，供事后复盘和调优。Throughout the run, colored bubbles surface in the UI: 🧠 Thought / 🔎 Search / 🌳 Children / 🔗 Relationships / 🧪 TestRuns / 📄 Detail / 💬 Comments / ✅ Final — one per tool execution. The full ReAct trace (messages / tool_calls / observation / latencies) is also fire-and-forget POSTed to /api/log and persisted per day in Jama_backend/logs/react_YYYY-MM-DD.jsonl for post-hoc review and tuning.

§ 04 · Design Mechanisms

四组关键机制 Four design mechanisms

M·01

FAISS 文件版本化托管 Version-pinned FAISS storage

向量文件统一托管 Azure Artifacts，做版本化管理 + 回滚；VM 端检索服务开机自启 + 自动校验，缺失 / 落后则按需拉取。增量向量化只更新变更 Item。 Vector files are centrally stored in Azure Artifacts with versioning and rollback; the VM-side retrieval service starts on boot and self-validates — pulling on demand when missing or stale. Incremental vectorization only touches changed Items.

versionedself-healingincremental

M·02

数据链路双向隔离 Bidirectional path isolation

离线层：Git 存原始 CSV，Artifacts 存向量文件，业务原始数据与 AI 向量数据存储分离。在线层：检索只返回 Item ID，详情由前端直连 Jama API，向量服务零详情压力。 Offline: Git holds raw CSV, Artifacts holds vector files — raw business data and AI vectors are stored separately. Online: retrieval returns only Item IDs, details flow from the frontend straight to Jama's API — the vector service carries zero detail-fetch load.

storage isolationthin retrievalfrontend fan-out

M·03 · 校正M·03 · CORRECTION

关于权限与安全边界—— ~~Azure 虚拟机走内网~~ Azure 虚拟机走公网，向量检索 API 由公网可达；Azure Pipeline 与 Azure Artifacts 仍走内网私有链路，Azure OpenAI gpt-5 部署在企业自己的 tenant 里，不对外暴露——所有 LLM 流量都从公网 VM 走 HTTPS 进企业 endpoint，问题和 Jama 数据不出公司账户范围。 On security boundaries — ~~the Azure VM uses a private network~~ the Azure VM is on the public internet, the vector retrieval API is publicly reachable; Azure Pipeline and Azure Artifacts remain on private internal links; Azure OpenAI gpt-5 is deployed inside the company's own tenant and not exposed externally — all LLM traffic goes from the public VM over HTTPS to the enterprise endpoint, so questions and Jama data never leave the corporate account scope.

M·04

多级容错兜底 Multi-tier fallback

① 流水线增量拉取失败 → 保留上一版 CSV / 向量文件，在线检索不受影响。② 向量服务拉 Artifacts 失败 → 降级用本地旧版 FAISS。③ Cohere rerank 调用失败 → 自动退化成"按 FAISS 向量分数排序"返回（在 cohere_rerank() 里有兜底）。④ Jama 图导航子资源 404（比如这条 itemType 没有 testruns）→ 后端软处理为 {total:0, items:[]}，LLM 自然换工具，不报错也不中断流程。⑤ ReAct 第 3 轮还没收尾 → 第 4 轮强制 tool_choice:'none'，让 LLM 拿现有材料生成回答，避免无限循环。 ① Pipeline incremental fetch fails → previous CSV / vector files are kept; online retrieval is unaffected. ② Vector service can't pull Artifacts → falls back to the local old FAISS. ③ Cohere rerank call fails → auto-degrades to "rank by raw FAISS similarity score" (built into cohere_rerank()). ④ Jama graph-nav sub-resource returns 404 (e.g. this itemType has no testruns) → backend soft-normalizes to {total:0, items:[]}; the LLM naturally tries another tool, no error, no flow interruption. ⑤ ReAct still hasn't finished by round 3 → round 4 forces tool_choice:'none', the LLM must produce an answer from what it has — preventing infinite loops.

previous-goodlocal-fallbackrerank-bypasssoft-404round-cap

Title

jama×faiss ×azure

八层 体系架构 Eight Strata

逐步 展开 流程 Step through each flow

四组 关键机制 Four design mechanisms

FAISS 文件 版本化托管 Version-pinned FAISS storage

数据链路 双向隔离 Bidirectional path isolation

多级 容错兜底 Multi-tier fallback

jama×faiss
×azure

八层体系架构 Eight Strata

逐步展开流程 Step through each flow

四组关键机制 Four design mechanisms

FAISS 文件版本化托管 Version-pinned FAISS storage

数据链路双向隔离 Bidirectional path isolation

多级容错兜底 Multi-tier fallback