# AstroResearch Database Schema / 数据库设计

AstroResearch 使用轻量级、零配置的 **SQLite** 数据库作为持久化存储。数据库文件默认保存在项目根目录下的 `astro_research.db`（可通过 `.env` 中的 `DATABASE_URL` 配置），由 Rust 中的 `sqlx` 驱动管理并自动执行迁移。

---

## 1. 实体关系图 (Entity-Relationship Diagram)

```mermaid
erDiagram
    PAPERS {
        text bibcode PK
        text title
        text authors "JSON Array"
        text year
        text pub "Journal/Publisher"
        text keywords "JSON Array"
        text abstract
        text doi
        text arxiv_id
        integer citation_count
        integer reference_count
        text doctype "文献类型"
        text pdf_path "PDF 物理路径 或 error:诊断"
        text html_path "HTML 物理路径 或 error:诊断"
        text markdown_path "Markdown 物理路径"
        text translation_path "翻译文件物理路径"
        datetime created_at
    }

    NOTES {
        integer id PK
        text bibcode FK
        integer paragraph_index
        text note_text
        text highlight_color
        text selected_text
        datetime created_at
    }

    CITATIONS_REFERENCES {
        text source_bibcode PK
        text target_bibcode PK
    }

    SYNC_QUERIES {
        integer id PK
        text query "检索关键词"
        text source "数据源"
        integer limit_count "拉取上限"
        datetime last_run "最近运行时间"
        datetime created_at "创建时间"
        UNIQUE_query_source_limit "唯一去重约束"
    }

    PAPERS ||--o{ NOTES : "has"
    PAPERS ||--o{ CITATIONS_REFERENCES : "cites / cited_by"
```

---

## 2. 数据表结构详述 (Table Schema Details)

### 2.1 papers 表 (文献元数据)
存储文献的核心元数据和本地物理存储路径。
- **特殊字段说明**：
  - `pdf_path` / `html_path`：正常情况下存储相对路径（如 `library/PDF/2024arXiv.pdf`）。当下载失败时，会以 `error:` 前缀存储诊断信息（如 `error:Cloudflare 拦截`）。特殊值 `error:no_resource` 表示用户手动标记了"无有效全文资源"。
  - `doctype`：文献类型标识，如 `article`、`eprint`、`proceedings`、`phdthesis`、`catalog`、`software`、`circular`、`book` 等。
- **索引**：
  - `idx_papers_doi` -> 基于 `doi`
  - `idx_papers_arxiv_id` -> 基于 `arxiv_id`

### 2.2 citations_references 表 (引文与参考文献拓扑)
多对多关联表，存储文献之间的引用网络（即拓扑星系图的基础数据）。
- **复合主键**：`(source_bibcode, target_bibcode)`
- **索引**：
  - `idx_citations_ref_source` -> 优化以 `source_bibcode` 查询参考文献
  - `idx_citations_ref_target` -> 优化以 `target_bibcode` 查询被引文献

### 2.3 notes 表 (高亮与阅读笔记)
存储学者在阅读器中对特定段落创建的高亮和笔记。
- **外键**：`bibcode` 级联删除 (`ON DELETE CASCADE`)。
- **索引**：
  - `idx_notes_bibcode` -> 优化单篇文献的笔记列表查询。

### 2.4 sync_queries 表 (同步检索条件)
存储用户保存的批量同步检索条件，支持快速重新同步。
- **唯一约束**：`UNIQUE(query, source, limit_count)` 确保相同条件的检索不会重复保存。

---

## 3. 数据库迁移说明
迁移脚本存放在 `migrations/` 下，服务启动时（`src/main.rs`）会自动调用 `sqlx::migrate!().run(&pool).await` 自动部署：

| 迁移文件 | 说明 |
|:---|:---|
| `20260608000000_init.sql` | 初始化 `papers` 与 `citations_references` 结构。 |
| `20260608000001_notes.sql` | 添加 `notes` 笔记高亮表，并为关联建立级联删除。 |
| `20260608000002_add_doctype.sql` | 为 `papers` 表新增 `doctype` 文献类型字段。 |
| `20260608000003_sync_features.sql` | 添加 `sync_queries` 同步检索条件表，支持唯一去重。 |

---

## 4. 错误诊断存储约定

系统使用 `papers` 表的 `pdf_path` 和 `html_path` 字段的双重语义来同时存储正常路径和错误诊断：

| 字段值模式 | 含义 | 前端展示 |
|:---|:---|:---|
| `NULL` | 尚未尝试下载 | 琥珀色"未下载"角标 |
| `library/PDF/xxx.pdf` | 下载成功，正常物理路径 | 蓝色"已下载"角标 |
| `error:具体原因` | 下载失败，原因为前缀后的文本 | 红色"下载失败"角标，悬浮显示原因 |
| `error:no_resource` | 用户手动标记为无有效全文资源 | 灰色"无资源"角标 |

`health_check` 工具在 `--fix` 模式下会清理损坏文件并将路径重置为 `NULL`，但**不会**清除 `error:` 前缀的记录（以保留诊断线索）。