feat: 初始化 AstroResearch 核心系统代码及重构技术文档

This commit is contained in:
Asfmq 2026-06-08 17:23:27 +08:00
commit 307a1c0cee
53 changed files with 45076 additions and 0 deletions

26
.env.example Normal file
View File

@ -0,0 +1,26 @@
# AstroResearch Configuration Template
# Copy this to .env and fill in your details
# NASA ADS API Key (Get from ui.adsabs.harvard.edu)
ADS_API_KEY=your_ads_api_key_here
# LLM Translation Provider Settings (OpenAI-compatible endpoints)
LLM_API_KEY=your_llm_api_key_here
LLM_API_BASE=https://api.deepseek.com/v1
# Examples: deepseek-chat, gpt-4o-mini, gemini-1.5-flash
LLM_MODEL=deepseek-chat
# Qiniu Cloud Storage Config (For hosting PDF-extracted layout images)
QINIU_AK=your_qiniu_access_key_here
QINIU_SK=your_qiniu_secret_key_here
QINIU_BUCKET=your_bucket_name
QINIU_DOMAIN=http://your_cdn_domain.com
# MinerU PDF Layout Extractor Remote API (If not using HTML)
MINERU_API_URL=http://mineru.remote-api.com/api/v1/extract
MINERU_API_KEY=your_mineru_api_key
# Local Data Paths
LIBRARY_DIR=./library
PORT=8000
DATABASE_URL=sqlite://astro_research.db

21
.gitignore vendored Normal file
View File

@ -0,0 +1,21 @@
# Rust build artifacts
/target/
# Local configuration
.env
# SQLite local database
/astro_research.db
/astro_research.db-journal
/astro_research.db-shm
/astro_research.db-wal
# Local literature storage (contains downloaded PDFs, HTML, Markdowns and Translations)
/library/
# IDEs and OS files
.vscode/
.idea/
.DS_Store
*.suo
*.swp

3554
Cargo.lock generated Normal file

File diff suppressed because it is too large Load Diff

29
Cargo.toml Normal file
View File

@ -0,0 +1,29 @@
[package]
name = "astroresearch"
version = "0.1.0"
edition = "2021"
[dependencies]
tokio = { version = "1", features = ["full"] }
axum = { version = "0.7", features = ["macros"] }
tower-http = { version = "0.5", features = ["cors", "fs"] }
sqlx = { version = "0.7", features = ["runtime-tokio-rustls", "sqlite", "chrono", "json"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
reqwest = { version = "0.12", features = ["json", "stream", "multipart", "cookies"] }
dotenvy = "0.15"
quick-xml = { version = "0.31", features = ["serialize"] }
anyhow = "1.0"
thiserror = "1.0"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
futures-util = { version = "0.3", features = ["io"] }
rand = "0.8"
regex = "1.10"
chrono = { version = "0.4", features = ["serde"] }
sha1 = "0.10"
hmac = "0.12"
base64 = "0.22"
urlencoding = "2.1"
url = "2.5"
html2md = "0.2"

81
README.md Normal file
View File

@ -0,0 +1,81 @@
# AstroResearch 天文科研辅助系统
AstroResearch 是一个基于 **Rust (Axum)** 后端与 **React (Vite + TypeScript)** 前端的天文文献一体化科研辅助系统。
---
## 1. 功能概述 (Overview)
AstroResearch 为天文领域的学者与研究人员提供一站式的文献管理与智能阅读解决方案,核心功能包括:
- 🌌 **统一学术检索**:一键跨源检索 NASA ADS 与 arXiv 数据库,支持去重元数据卡片式展示。
- 📥 **智能双通道下载**:模拟浏览器请求头与延迟,自动绕过出版商防爬墙,官方 HTML 优先并支持 ar5iv 备用兜底。
- 📝 **结构化文献解析**:解析 HTML 或调用 MinerU (PDF 降级解析) 输出标准 GFM Markdown对 LaTeX 公式实施占位符保护。
- 🗣️ **大模型双语翻译**:基于本地天文学词汇库 (Trie 树最长匹配) 构建翻译 Glossary指导大模型进行公式级精准中英翻译。
- 🪐 **引文网络星系图**:基于 HTML5 Canvas 的高性能力导向拓扑渲染,双击节点支持引文深度探索。
- ✍️ **划词高亮与笔记**:在双语阅读器中自由划词、多色高亮并记录学术心得,数据与文献双向绑定。
---
## 2. 快速启动 (Quickstart)
### 2.1 配置环境变量
将根目录下的 `.env.example` 复制并重命名为 `.env`
```bash
cp .env.example .env
```
用编辑器打开填入你的 `ADS_API_KEY`、`LLM_API_KEY`、`QINIU_` 等第三方服务的认证 Token。
### 2.2 运行服务 (Run)
#### 方案一:本地开发调试模式 (Development Mode)
开发模式下分别启动后端 API 服务和前端 HMR 热更新服务:
1. **启动后端 (Rust Axum)**
```bash
cargo run
```
后端服务默认运行在 `http://localhost:8000`,并会自动初始化本地 SQLite 数据库及运行 migrations 迁移。
2. **启动前端 (React Vite)**
```bash
cd dashboard
npm install
npm run dev
```
前端开发服务器默认运行在 `http://localhost:5173`。前端的所有 `/api` 接口请求已配置反向代理,会自动转发到后端的 `8000` 端口。
#### 方案二:生产打包与单程序部署模式 (Production Mode)
生产模式下需要先编译前端静态文件,随后由后端进程统一托管分发:
1. **打包编译前端**
```bash
cd dashboard
npm install
npm run build
```
静态资源将打包并输出在项目根目录下的 `dashboard/dist/` 目录。
2. **运行后端服务**
```bash
cd ..
cargo run --release
```
运行后直接访问 `http://localhost:8000` 即可使用,此时所有 React 网页和后台 API 均由 Rust 进程统一分发托管,无需额外启动 Vite。
---
## 3. 技术文档结构 (Documentation Directory)
详细的技术和部署设计文档已集中整理至 `docs/` 目录:
- 🏗️ **[架构设计](docs/architecture.md)**:包含系统宏观流程图与序列图。
- 🌐 **[API 接口规范](docs/api.md)**:后端 Axum 路由及 HTTP 接口格式。
- 🗄️ **[数据库设计](docs/database.md)**SQLite 表结构、ER 图与索引优化。
- 🎨 **[视觉与交互设计](docs/design.md)**:浅色/深色磨砂玻璃、自研 Canvas 图谱引擎说明。
- 🛠️ **[排障指南](docs/troubleshooting.md)**:人机校验、解析失败等常见问题解法。
- 🚀 **[编译与部署指南](docs/deployment.md)**:单执行文件打包与发布流程。
- 🤝 **[参与贡献指南](docs/contributing.md)**:开发规范及单元测试。
---
## 4. 目录组件 README 交叉引用 (Component READMEs)
- 💻 **前端 React 控制台**:查看 [dashboard/README.md](dashboard/README.md)
- ⚙️ **后端 Rust 源码**:参见 [src/README.md](src/README.md)

24
dashboard/.gitignore vendored Normal file
View File

@ -0,0 +1,24 @@
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*
node_modules
dist
dist-ssr
*.local
# Editor directories and files
.vscode/*
!.vscode/extensions.json
.idea
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?

45
dashboard/README.md Normal file
View File

@ -0,0 +1,45 @@
# AstroResearch Dashboard / 前端仪表盘控制台
本模块是 AstroResearch 的前端部分,基于 **React (TypeScript) + Vite + Tailwind CSS** 构建。
---
## 1. 本地开发与调试 (Setup & Dev)
### 1.1 安装依赖
确保已安装 Node.js (v18+),在当前目录下运行:
```bash
npm install
```
### 1.2 启动开发服务器
```bash
npm run dev
```
开发服务器将默认运行在 `http://localhost:5173`。Vite 已配置了对 `http://localhost:8000` (后端) 的代理转发,跨域请求会自动路由。
---
## 2. 核心功能及文件结构 (Core Modules)
- `src/components/layout/`:侧边栏与基本布局组件。
- `src/components/CitationGalaxyCanvas.tsx`:自研的 Canvas 力导向引文星系图渲染器。
- `src/features/search/`:统一检索面板,支持跨源搜索与收藏。
- `src/features/library/`:馆藏管理卡片,提供下载状态实时监测及重新下载操作。
- `src/features/reader/`:左右对齐的双分栏阅读器,内置划词高亮笔记及 LLM 重新翻译触发。
- `src/types.ts`:全局 TypeScript 静态类型定义。
---
## 3. 生产构建 (Build)
若需打包生成生产环境静态文件:
```bash
npm run build
```
打包文件将输出至当前目录下的 `dist/`,后端 Axum 在编译时会直接将其静态托管。
---
## 4. 开发规范 (Coding Standards)
- 本模块的详细前端开发约定,请参阅本目录下的 [前端开发规范-React篇.md](前端开发规范-React篇.md)。

View File

@ -0,0 +1,22 @@
import js from '@eslint/js'
import globals from 'globals'
import reactHooks from 'eslint-plugin-react-hooks'
import reactRefresh from 'eslint-plugin-react-refresh'
import tseslint from 'typescript-eslint'
import { defineConfig, globalIgnores } from 'eslint/config'
export default defineConfig([
globalIgnores(['dist']),
{
files: ['**/*.{ts,tsx}'],
extends: [
js.configs.recommended,
tseslint.configs.recommended,
reactHooks.configs.flat.recommended,
reactRefresh.configs.vite,
],
languageOptions: {
globals: globals.browser,
},
},
])

13
dashboard/index.html Normal file
View File

@ -0,0 +1,13 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>dashboard</title>
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.tsx"></script>
</body>
</html>

4675
dashboard/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

42
dashboard/package.json Normal file
View File

@ -0,0 +1,42 @@
{
"name": "dashboard",
"private": true,
"version": "0.0.0",
"type": "module",
"scripts": {
"dev": "vite",
"build": "tsc -b && vite build",
"lint": "eslint .",
"preview": "vite preview"
},
"dependencies": {
"axios": "^1.17.0",
"clsx": "^2.1.1",
"framer-motion": "^12.40.0",
"katex": "^0.17.0",
"lucide-react": "^1.17.0",
"react": "^19.2.6",
"react-dom": "^19.2.6",
"react-markdown": "^10.1.0",
"rehype-katex": "^7.0.1",
"remark-math": "^6.0.0",
"tailwind-merge": "^3.6.0"
},
"devDependencies": {
"@eslint/js": "^10.0.1",
"@tailwindcss/typography": "^0.5.19",
"@tailwindcss/vite": "^4.3.0",
"@types/node": "^24.12.3",
"@types/react": "^19.2.14",
"@types/react-dom": "^19.2.3",
"@vitejs/plugin-react": "^6.0.1",
"eslint": "^10.3.0",
"eslint-plugin-react-hooks": "^7.1.1",
"eslint-plugin-react-refresh": "^0.5.2",
"globals": "^17.6.0",
"tailwindcss": "^4.3.0",
"typescript": "~6.0.2",
"typescript-eslint": "^8.59.2",
"vite": "^8.0.12"
}
}

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 9.3 KiB

View File

@ -0,0 +1,24 @@
<svg xmlns="http://www.w3.org/2000/svg">
<symbol id="bluesky-icon" viewBox="0 0 16 17">
<g clip-path="url(#bluesky-clip)"><path fill="#08060d" d="M7.75 7.735c-.693-1.348-2.58-3.86-4.334-5.097-1.68-1.187-2.32-.981-2.74-.79C.188 2.065.1 2.812.1 3.251s.241 3.602.398 4.13c.52 1.744 2.367 2.333 4.07 2.145-2.495.37-4.71 1.278-1.805 4.512 3.196 3.309 4.38-.71 4.987-2.746.608 2.036 1.307 5.91 4.93 2.746 2.72-2.746.747-4.143-1.747-4.512 1.702.189 3.55-.4 4.07-2.145.156-.528.397-3.691.397-4.13s-.088-1.186-.575-1.406c-.42-.19-1.06-.395-2.741.79-1.755 1.24-3.64 3.752-4.334 5.099"/></g>
<defs><clipPath id="bluesky-clip"><path fill="#fff" d="M.1.85h15.3v15.3H.1z"/></clipPath></defs>
</symbol>
<symbol id="discord-icon" viewBox="0 0 20 19">
<path fill="#08060d" d="M16.224 3.768a14.5 14.5 0 0 0-3.67-1.153c-.158.286-.343.67-.47.976a13.5 13.5 0 0 0-4.067 0c-.128-.306-.317-.69-.476-.976A14.4 14.4 0 0 0 3.868 3.77C1.546 7.28.916 10.703 1.231 14.077a14.7 14.7 0 0 0 4.5 2.306q.545-.748.965-1.587a9.5 9.5 0 0 1-1.518-.74q.191-.14.372-.293c2.927 1.369 6.107 1.369 8.999 0q.183.152.372.294-.723.437-1.52.74.418.838.963 1.588a14.6 14.6 0 0 0 4.504-2.308c.37-3.911-.63-7.302-2.644-10.309m-9.13 8.234c-.878 0-1.599-.82-1.599-1.82 0-.998.705-1.82 1.6-1.82.894 0 1.614.82 1.599 1.82.001 1-.705 1.82-1.6 1.82m5.91 0c-.878 0-1.599-.82-1.599-1.82 0-.998.705-1.82 1.6-1.82.893 0 1.614.82 1.599 1.82 0 1-.706 1.82-1.6 1.82"/>
</symbol>
<symbol id="documentation-icon" viewBox="0 0 21 20">
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="m15.5 13.333 1.533 1.322c.645.555.967.833.967 1.178s-.322.623-.967 1.179L15.5 18.333m-3.333-5-1.534 1.322c-.644.555-.966.833-.966 1.178s.322.623.966 1.179l1.534 1.321"/>
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M17.167 10.836v-4.32c0-1.41 0-2.117-.224-2.68-.359-.906-1.118-1.621-2.08-1.96-.599-.21-1.349-.21-2.848-.21-2.623 0-3.935 0-4.983.369-1.684.591-3.013 1.842-3.641 3.428C3 6.449 3 7.684 3 10.154v2.122c0 2.558 0 3.838.706 4.726q.306.383.713.671c.76.536 1.79.64 3.581.66"/>
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M3 10a2.78 2.78 0 0 1 2.778-2.778c.555 0 1.209.097 1.748-.047.48-.129.854-.503.982-.982.145-.54.048-1.194.048-1.749a2.78 2.78 0 0 1 2.777-2.777"/>
</symbol>
<symbol id="github-icon" viewBox="0 0 19 19">
<path fill="#08060d" fill-rule="evenodd" d="M9.356 1.85C5.05 1.85 1.57 5.356 1.57 9.694a7.84 7.84 0 0 0 5.324 7.44c.387.079.528-.168.528-.376 0-.182-.013-.805-.013-1.454-2.165.467-2.616-.935-2.616-.935-.349-.91-.864-1.143-.864-1.143-.71-.48.051-.48.051-.48.787.051 1.2.805 1.2.805.695 1.194 1.817.857 2.268.649.064-.507.27-.857.49-1.052-1.728-.182-3.545-.857-3.545-3.87 0-.857.31-1.558.8-2.104-.078-.195-.349-1 .077-2.078 0 0 .657-.208 2.14.805a7.5 7.5 0 0 1 1.946-.26c.657 0 1.328.092 1.946.26 1.483-1.013 2.14-.805 2.14-.805.426 1.078.155 1.883.078 2.078.502.546.799 1.247.799 2.104 0 3.013-1.818 3.675-3.558 3.87.284.247.528.714.528 1.454 0 1.052-.012 1.896-.012 2.156 0 .208.142.455.528.377a7.84 7.84 0 0 0 5.324-7.441c.013-4.338-3.48-7.844-7.773-7.844" clip-rule="evenodd"/>
</symbol>
<symbol id="social-icon" viewBox="0 0 20 20">
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M12.5 6.667a4.167 4.167 0 1 0-8.334 0 4.167 4.167 0 0 0 8.334 0"/>
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M2.5 16.667a5.833 5.833 0 0 1 8.75-5.053m3.837.474.513 1.035c.07.144.257.282.414.309l.93.155c.596.1.736.536.307.965l-.723.73a.64.64 0 0 0-.152.531l.207.903c.164.715-.213.991-.84.618l-.872-.52a.63.63 0 0 0-.577 0l-.872.52c-.624.373-1.003.094-.84-.618l.207-.903a.64.64 0 0 0-.152-.532l-.723-.729c-.426-.43-.289-.864.306-.964l.93-.156a.64.64 0 0 0 .412-.31l.513-1.034c.28-.562.735-.562 1.012 0"/>
</symbol>
<symbol id="x-icon" viewBox="0 0 19 19">
<path fill="#08060d" fill-rule="evenodd" d="M1.893 1.98c.052.072 1.245 1.769 2.653 3.77l2.892 4.114c.183.261.333.48.333.486s-.068.089-.152.183l-.522.593-.765.867-3.597 4.087c-.375.426-.734.834-.798.905a1 1 0 0 0-.118.148c0 .01.236.017.664.017h.663l.729-.83c.4-.457.796-.906.879-.999a692 692 0 0 0 1.794-2.038c.034-.037.301-.34.594-.675l.551-.624.345-.392a7 7 0 0 1 .34-.374c.006 0 .93 1.306 2.052 2.903l2.084 2.965.045.063h2.275c1.87 0 2.273-.003 2.266-.021-.008-.02-1.098-1.572-3.894-5.547-2.013-2.862-2.28-3.246-2.273-3.266.008-.019.282-.332 2.085-2.38l2-2.274 1.567-1.782c.022-.028-.016-.03-.65-.03h-.674l-.3.342a871 871 0 0 1-1.782 2.025c-.067.075-.405.458-.75.852a100 100 0 0 1-.803.91c-.148.172-.299.344-.99 1.127-.304.343-.32.358-.345.327-.015-.019-.904-1.282-1.976-2.808L6.365 1.85H1.8zm1.782.91 8.078 11.294c.772 1.08 1.413 1.973 1.425 1.984.016.017.241.02 1.05.017l1.03-.004-2.694-3.766L7.796 5.75 5.722 2.852l-1.039-.004-1.039-.004z" clip-rule="evenodd"/>
</symbol>
</svg>

After

Width:  |  Height:  |  Size: 4.9 KiB

184
dashboard/src/App.css Normal file
View File

@ -0,0 +1,184 @@
.counter {
font-size: 16px;
padding: 5px 10px;
border-radius: 5px;
color: var(--accent);
background: var(--accent-bg);
border: 2px solid transparent;
transition: border-color 0.3s;
margin-bottom: 24px;
&:hover {
border-color: var(--accent-border);
}
&:focus-visible {
outline: 2px solid var(--accent);
outline-offset: 2px;
}
}
.hero {
position: relative;
.base,
.framework,
.vite {
inset-inline: 0;
margin: 0 auto;
}
.base {
width: 170px;
position: relative;
z-index: 0;
}
.framework,
.vite {
position: absolute;
}
.framework {
z-index: 1;
top: 34px;
height: 28px;
transform: perspective(2000px) rotateZ(300deg) rotateX(44deg) rotateY(39deg)
scale(1.4);
}
.vite {
z-index: 0;
top: 107px;
height: 26px;
width: auto;
transform: perspective(2000px) rotateZ(300deg) rotateX(40deg) rotateY(39deg)
scale(0.8);
}
}
#center {
display: flex;
flex-direction: column;
gap: 25px;
place-content: center;
place-items: center;
flex-grow: 1;
@media (max-width: 1024px) {
padding: 32px 20px 24px;
gap: 18px;
}
}
#next-steps {
display: flex;
border-top: 1px solid var(--border);
text-align: left;
& > div {
flex: 1 1 0;
padding: 32px;
@media (max-width: 1024px) {
padding: 24px 20px;
}
}
.icon {
margin-bottom: 16px;
width: 22px;
height: 22px;
}
@media (max-width: 1024px) {
flex-direction: column;
text-align: center;
}
}
#docs {
border-right: 1px solid var(--border);
@media (max-width: 1024px) {
border-right: none;
border-bottom: 1px solid var(--border);
}
}
#next-steps ul {
list-style: none;
padding: 0;
display: flex;
gap: 8px;
margin: 32px 0 0;
.logo {
height: 18px;
}
a {
color: var(--text-h);
font-size: 16px;
border-radius: 6px;
background: var(--social-bg);
display: flex;
padding: 6px 12px;
align-items: center;
gap: 8px;
text-decoration: none;
transition: box-shadow 0.3s;
&:hover {
box-shadow: var(--shadow);
}
.button-icon {
height: 18px;
width: 18px;
}
}
@media (max-width: 1024px) {
margin-top: 20px;
flex-wrap: wrap;
justify-content: center;
li {
flex: 1 1 calc(50% - 8px);
}
a {
width: 100%;
justify-content: center;
box-sizing: border-box;
}
}
}
#spacer {
height: 88px;
border-top: 1px solid var(--border);
@media (max-width: 1024px) {
height: 48px;
}
}
.ticks {
position: relative;
width: 100%;
&::before,
&::after {
content: '';
position: absolute;
top: -4.5px;
border: 5px solid transparent;
}
&::before {
left: 0;
border-left-color: var(--border);
}
&::after {
right: 0;
border-right-color: var(--border);
}
}

377
dashboard/src/App.tsx Normal file
View File

@ -0,0 +1,377 @@
// dashboard/src/App.tsx
import { useState, useEffect, useCallback } from 'react';
import axios from 'axios';
import { Sidebar } from './components/layout/Sidebar';
import { SearchPanel } from './features/search/SearchPanel';
import { LibraryPanel } from './features/library/LibraryPanel';
import { ReaderPanel } from './features/reader/ReaderPanel';
import { CitationPanel } from './features/citation/CitationPanel';
import { SettingsPanel } from './features/settings/SettingsPanel';
import type { StandardPaper, CitationNetwork, NoteRecord } from './types';
export default function App() {
const [activeTab, setActiveTab] = useState<'search' | 'library' | 'reader' | 'citation' | 'settings'>('search');
// 共享数据状态
const [library, setLibrary] = useState<StandardPaper[]>([]);
const [selectedPaper, setSelectedPaper] = useState<StandardPaper | null>(null);
// 检索页状态
const [searchQuery, setSearchQuery] = useState('');
const [searchSource, setSearchSource] = useState<'all' | 'ads' | 'arxiv'>('all');
const [searchResults, setSearchResults] = useState<StandardPaper[]>([]);
const [searching, setSearching] = useState(false);
const [exportingList, setExportingList] = useState<string[]>([]);
const [bibtexContent, setBibtexContent] = useState<string | null>(null);
const [exporting, setExporting] = useState(false);
// 读者页状态
const [englishText, setEnglishText] = useState('');
const [chineseText, setChineseText] = useState('');
const [parsing, setParsing] = useState(false);
const [translating, setTranslating] = useState(false);
// 引用星系数据状态
const [citationNetwork, setCitationNetwork] = useState<CitationNetwork | null>(null);
const [loadingCitations, setLoadingCitations] = useState(false);
const [citationHistory, setCitationHistory] = useState<CitationNetwork[]>([]); // 多跳历史
// 笔记系统状态
const [notes, setNotes] = useState<NoteRecord[]>([]);
const [showNotesPanel, setShowNotesPanel] = useState(false);
const [newNoteText, setNewNoteText] = useState('');
const [newNoteColor, setNewNoteColor] = useState('yellow');
const [selectedParagraphIdx, setSelectedParagraphIdx] = useState<number | null>(null);
const [selectedText, setSelectedText] = useState('');
// 下载进度状态
const [downloadingBibcodes, setDownloadingBibcodes] = useState<Record<string, boolean>>({});
// 1. 初始化时加载本地文献
useEffect(() => {
fetchLibrary();
}, []);
const fetchLibrary = async () => {
try {
const res = await axios.get<StandardPaper[]>('/api/library');
setLibrary(res.data);
} catch (e) {
console.error('加载本地文献库失败', e);
}
};
// 2. 检索文献
const handleSearch = async (e: React.FormEvent) => {
e.preventDefault();
if (!searchQuery.trim()) return;
setSearching(true);
setBibtexContent(null);
try {
const res = await axios.get<StandardPaper[]>('/api/search', {
params: { q: searchQuery, source: searchSource, rows: 15 }
});
setSearchResults(res.data);
} catch (e) {
console.error('检索文献失败', e);
alert('检索失败,请确认后端连接及 API 密钥配置。');
} finally {
setSearching(false);
}
};
// 3. 触发文献双格式下载
const handleDownload = async (bibcode: string, force = false) => {
setDownloadingBibcodes(prev => ({ ...prev, [bibcode]: true }));
try {
const res = await axios.post<StandardPaper>('/api/download', { bibcode, force });
// 更新库及检索列表中的状态
setSearchResults(prev => prev.map(p => p.bibcode === bibcode ? res.data : p));
setLibrary(prev => {
if (prev.some(p => p.bibcode === bibcode)) {
return prev.map(p => p.bibcode === bibcode ? res.data : p);
} else {
return [res.data, ...prev];
}
});
if (selectedPaper?.bibcode === bibcode) {
setSelectedPaper(res.data);
}
} catch (e) {
console.error('下载文献失败', e);
alert('文献下载失败,请检查 ADS 网络限制与网络代理!');
} finally {
setDownloadingBibcodes(prev => ({ ...prev, [bibcode]: false }));
}
};
// 4. 文献解析成 Markdown (优先 HTML, 其次 PDF MinerU)
const handleParse = async (bibcode: string, force = false) => {
setParsing(true);
try {
const res = await axios.post<{ markdown: string }>('/api/parse', { bibcode, force });
setEnglishText(res.data.markdown);
// 更新文献状态
setLibrary(prev => prev.map(p => p.bibcode === bibcode ? { ...p, has_markdown: true } : p));
if (selectedPaper?.bibcode === bibcode) {
setSelectedPaper(prev => prev ? { ...prev, has_markdown: true } : null);
}
} catch (e) {
console.error('文献解析失败', e);
alert('文献排版解析失败,请检查是否已完成 HTML/PDF 下载,并配置了 MinerU API 节点。');
} finally {
setParsing(false);
}
};
// 5. 对比翻译文本 (带天文学术语修正)
const handleTranslate = async (bibcode: string, force = false) => {
setTranslating(true);
try {
const res = await axios.post<{ translation: string }>('/api/translate', { bibcode, force });
setChineseText(res.data.translation);
setLibrary(prev => prev.map(p => p.bibcode === bibcode ? { ...p, has_translation: true } : p));
if (selectedPaper?.bibcode === bibcode) {
setSelectedPaper(prev => prev ? { ...prev, has_translation: true } : null);
}
} catch (e) {
console.error('文献翻译失败', e);
alert('翻译失败,请检查 .env 中的大模型 API 密钥与端点配置。');
} finally {
setTranslating(false);
}
};
// 6. 加载文献引用关系网络
const loadCitations = useCallback(async (bibcode: string, reset = false) => {
setLoadingCitations(true);
try {
const res = await axios.get<CitationNetwork>('/api/citations', {
params: { bibcode }
});
setCitationNetwork(res.data);
setCitationHistory(prev => {
if (reset) {
return [res.data];
}
if (prev.some(net => net.bibcode === res.data.bibcode)) {
return prev;
}
return [...prev, res.data];
});
} catch (e) {
console.error('加载引用拓扑失败', e);
} finally {
setLoadingCitations(false);
}
}, []);
// 7. 进入阅读器
const openReader = async (paper: StandardPaper) => {
setSelectedPaper(paper);
setEnglishText('');
setChineseText('');
setNotes([]);
setShowNotesPanel(false);
setActiveTab('reader');
// 获取详情 (包含已有原文及翻译)
try {
const res = await axios.get<{ paper: StandardPaper, english_content?: string, translation_content?: string }>('/api/paper', {
params: { bibcode: paper.bibcode }
});
if (res.data.english_content) {
setEnglishText(res.data.english_content);
}
if (res.data.translation_content) {
setChineseText(res.data.translation_content);
}
} catch (e) {
console.error('加载文献详情失败', e);
}
// 加载该文献的所有笔记
try {
const nRes = await axios.get<NoteRecord[]>('/api/notes', { params: { bibcode: paper.bibcode } });
setNotes(nRes.data);
} catch (e) {
console.error('加载笔记失败', e);
}
};
// 8. 批量导出 BibTeX
const handleExportBibtex = async () => {
if (exportingList.length === 0) return;
setExporting(true);
try {
const res = await axios.post<{ bibtex: string }>('/api/export', { bibcodes: exportingList });
setBibtexContent(res.data.bibtex);
} catch (e) {
console.error('导出 BibTeX 失败', e);
alert('导出 BibTeX 失败,请检查 ADS Token。');
} finally {
setExporting(false);
}
};
// 批量选择引文
const toggleExportItem = (bibcode: string) => {
setExportingList(prev =>
prev.includes(bibcode) ? prev.filter(b => b !== bibcode) : [...prev, bibcode]
);
};
// 笔记相关操作
const handleCreateNote = async () => {
if (!selectedPaper || selectedParagraphIdx === null || !newNoteText.trim()) return;
try {
const res = await axios.post<NoteRecord>('/api/notes', {
bibcode: selectedPaper.bibcode,
paragraph_index: selectedParagraphIdx,
note_text: newNoteText.trim(),
highlight_color: newNoteColor,
selected_text: selectedText,
});
setNotes(prev => [...prev, res.data]);
setNewNoteText('');
setSelectedParagraphIdx(null);
setSelectedText('');
} catch (e) {
console.error('保存笔记失败', e);
}
};
const handleDeleteNote = async (id: number) => {
try {
await axios.delete('/api/notes', { params: { id } });
setNotes(prev => prev.filter(n => n.id !== id));
} catch (e) {
console.error('删除笔记失败', e);
}
};
// 选中文本时弹出笔记添加面板
const handleTextSelection = (paragraphIdx: number) => {
const sel = window.getSelection();
if (sel && sel.toString().trim().length > 3) {
setSelectedText(sel.toString().trim());
setSelectedParagraphIdx(paragraphIdx);
setShowNotesPanel(true);
}
};
return (
<div className="flex h-screen overflow-hidden text-slate-800">
{/* 炫酷淡雅背景装饰 */}
<div className="absolute inset-0 bg-[#f8fafc] z-[-10]">
<div className="absolute top-[10%] left-[20%] w-[300px] h-[300px] bg-purple-500/5 rounded-full blur-[120px] animate-pulse-slow" />
<div className="absolute bottom-[10%] right-[20%] w-[400px] h-[400px] bg-blue-500/5 rounded-full blur-[150px] animate-pulse-slow" style={{ animationDelay: '1.5s' }} />
</div>
{/* 导航左侧栏 */}
<Sidebar
activeTab={activeTab}
setActiveTab={setActiveTab}
selectedPaper={selectedPaper}
loadCitations={loadCitations}
/>
{/* 主工作区 */}
<main className="flex-1 flex flex-col overflow-hidden">
{/* 顶部状态条 */}
<header className="h-16 border-b border-slate-200/60 px-8 flex items-center justify-between bg-white/40 backdrop-blur-md">
<div className="flex items-center gap-2">
<span className="h-2 w-2 rounded-full bg-emerald-500 animate-pulse" />
<span className="text-xs text-slate-600 font-medium"></span>
</div>
<div className="text-xs text-slate-600">
: {library.length}
</div>
</header>
{/* 选项卡容器 */}
<div className="flex-1 overflow-y-auto p-8">
{activeTab === 'search' && (
<SearchPanel
searchQuery={searchQuery}
setSearchQuery={setSearchQuery}
searchSource={searchSource}
setSearchSource={setSearchSource}
searching={searching}
handleSearch={handleSearch}
searchResults={searchResults}
exportingList={exportingList}
toggleExportItem={toggleExportItem}
handleExportBibtex={handleExportBibtex}
exporting={exporting}
bibtexContent={bibtexContent}
downloadingBibcodes={downloadingBibcodes}
handleDownload={handleDownload}
selectedPaper={selectedPaper}
setSelectedPaper={setSelectedPaper}
openReader={openReader}
setActiveTab={setActiveTab}
loadCitations={loadCitations}
/>
)}
{activeTab === 'library' && (
<LibraryPanel
library={library}
fetchLibrary={fetchLibrary}
openReader={openReader}
setSelectedPaper={setSelectedPaper}
setActiveTab={setActiveTab}
loadCitations={loadCitations}
downloadingBibcodes={downloadingBibcodes}
handleDownload={handleDownload}
/>
)}
{activeTab === 'reader' && selectedPaper && (
<ReaderPanel
selectedPaper={selectedPaper}
parsing={parsing}
handleParse={handleParse}
translating={translating}
handleTranslate={handleTranslate}
showNotesPanel={showNotesPanel}
setShowNotesPanel={setShowNotesPanel}
notes={notes}
englishText={englishText}
chineseText={chineseText}
handleTextSelection={handleTextSelection}
selectedParagraphIdx={selectedParagraphIdx}
setSelectedParagraphIdx={setSelectedParagraphIdx}
selectedText={selectedText}
setSelectedText={setSelectedText}
newNoteColor={newNoteColor}
setNewNoteColor={setNewNoteColor}
newNoteText={newNoteText}
setNewNoteText={setNewNoteText}
handleCreateNote={handleCreateNote}
handleDeleteNote={handleDeleteNote}
/>
)}
{activeTab === 'citation' && (
<CitationPanel
selectedPaper={selectedPaper}
loadingCitations={loadingCitations}
citationNetwork={citationNetwork}
citationHistory={citationHistory}
loadCitations={loadCitations}
/>
)}
{activeTab === 'settings' && (
<SettingsPanel />
)}
</div>
</main>
</div>
);
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

View File

@ -0,0 +1 @@
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" class="iconify iconify--logos" width="35.93" height="32" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 228"><path fill="#00D8FF" d="M210.483 73.824a171.49 171.49 0 0 0-8.24-2.597c.465-1.9.893-3.777 1.273-5.621c6.238-30.281 2.16-54.676-11.769-62.708c-13.355-7.7-35.196.329-57.254 19.526a171.23 171.23 0 0 0-6.375 5.848a155.866 155.866 0 0 0-4.241-3.917C100.759 3.829 77.587-4.822 63.673 3.233C50.33 10.957 46.379 33.89 51.995 62.588a170.974 170.974 0 0 0 1.892 8.48c-3.28.932-6.445 1.924-9.474 2.98C17.309 83.498 0 98.307 0 113.668c0 15.865 18.582 31.778 46.812 41.427a145.52 145.52 0 0 0 6.921 2.165a167.467 167.467 0 0 0-2.01 9.138c-5.354 28.2-1.173 50.591 12.134 58.266c13.744 7.926 36.812-.22 59.273-19.855a145.567 145.567 0 0 0 5.342-4.923a168.064 168.064 0 0 0 6.92 6.314c21.758 18.722 43.246 26.282 56.54 18.586c13.731-7.949 18.194-32.003 12.4-61.268a145.016 145.016 0 0 0-1.535-6.842c1.62-.48 3.21-.974 4.76-1.488c29.348-9.723 48.443-25.443 48.443-41.52c0-15.417-17.868-30.326-45.517-39.844Zm-6.365 70.984c-1.4.463-2.836.91-4.3 1.345c-3.24-10.257-7.612-21.163-12.963-32.432c5.106-11 9.31-21.767 12.459-31.957c2.619.758 5.16 1.557 7.61 2.4c23.69 8.156 38.14 20.213 38.14 29.504c0 9.896-15.606 22.743-40.946 31.14Zm-10.514 20.834c2.562 12.94 2.927 24.64 1.23 33.787c-1.524 8.219-4.59 13.698-8.382 15.893c-8.067 4.67-25.32-1.4-43.927-17.412a156.726 156.726 0 0 1-6.437-5.87c7.214-7.889 14.423-17.06 21.459-27.246c12.376-1.098 24.068-2.894 34.671-5.345a134.17 134.17 0 0 1 1.386 6.193ZM87.276 214.515c-7.882 2.783-14.16 2.863-17.955.675c-8.075-4.657-11.432-22.636-6.853-46.752a156.923 156.923 0 0 1 1.869-8.499c10.486 2.32 22.093 3.988 34.498 4.994c7.084 9.967 14.501 19.128 21.976 27.15a134.668 134.668 0 0 1-4.877 4.492c-9.933 8.682-19.886 14.842-28.658 17.94ZM50.35 144.747c-12.483-4.267-22.792-9.812-29.858-15.863c-6.35-5.437-9.555-10.836-9.555-15.216c0-9.322 13.897-21.212 37.076-29.293c2.813-.98 5.757-1.905 8.812-2.773c3.204 10.42 7.406 21.315 12.477 32.332c-5.137 11.18-9.399 22.249-12.634 32.792a134.718 134.718 0 0 1-6.318-1.979Zm12.378-84.26c-4.811-24.587-1.616-43.134 6.425-47.789c8.564-4.958 27.502 2.111 47.463 19.835a144.318 144.318 0 0 1 3.841 3.545c-7.438 7.987-14.787 17.08-21.808 26.988c-12.04 1.116-23.565 2.908-34.161 5.309a160.342 160.342 0 0 1-1.76-7.887Zm110.427 27.268a347.8 347.8 0 0 0-7.785-12.803c8.168 1.033 15.994 2.404 23.343 4.08c-2.206 7.072-4.956 14.465-8.193 22.045a381.151 381.151 0 0 0-7.365-13.322Zm-45.032-43.861c5.044 5.465 10.096 11.566 15.065 18.186a322.04 322.04 0 0 0-30.257-.006c4.974-6.559 10.069-12.652 15.192-18.18ZM82.802 87.83a323.167 323.167 0 0 0-7.227 13.238c-3.184-7.553-5.909-14.98-8.134-22.152c7.304-1.634 15.093-2.97 23.209-3.984a321.524 321.524 0 0 0-7.848 12.897Zm8.081 65.352c-8.385-.936-16.291-2.203-23.593-3.793c2.26-7.3 5.045-14.885 8.298-22.6a321.187 321.187 0 0 0 7.257 13.246c2.594 4.48 5.28 8.868 8.038 13.147Zm37.542 31.03c-5.184-5.592-10.354-11.779-15.403-18.433c4.902.192 9.899.29 14.978.29c5.218 0 10.376-.117 15.453-.343c-4.985 6.774-10.018 12.97-15.028 18.486Zm52.198-57.817c3.422 7.8 6.306 15.345 8.596 22.52c-7.422 1.694-15.436 3.058-23.88 4.071a382.417 382.417 0 0 0 7.859-13.026a347.403 347.403 0 0 0 7.425-13.565Zm-16.898 8.101a358.557 358.557 0 0 1-12.281 19.815a329.4 329.4 0 0 1-23.444.823c-7.967 0-15.716-.248-23.178-.732a310.202 310.202 0 0 1-12.513-19.846h.001a307.41 307.41 0 0 1-10.923-20.627a310.278 310.278 0 0 1 10.89-20.637l-.001.001a307.318 307.318 0 0 1 12.413-19.761c7.613-.576 15.42-.876 23.31-.876H128c7.926 0 15.743.303 23.354.883a329.357 329.357 0 0 1 12.335 19.695a358.489 358.489 0 0 1 11.036 20.54a329.472 329.472 0 0 1-11 20.722Zm22.56-122.124c8.572 4.944 11.906 24.881 6.52 51.026c-.344 1.668-.73 3.367-1.15 5.09c-10.622-2.452-22.155-4.275-34.23-5.408c-7.034-10.017-14.323-19.124-21.64-27.008a160.789 160.789 0 0 1 5.888-5.4c18.9-16.447 36.564-22.941 44.612-18.3ZM128 90.808c12.625 0 22.86 10.235 22.86 22.86s-10.235 22.86-22.86 22.86s-22.86-10.235-22.86-22.86s10.235-22.86 22.86-22.86Z"></path></svg>

After

Width:  |  Height:  |  Size: 4.0 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 8.5 KiB

View File

@ -0,0 +1,272 @@
// dashboard/src/components/CitationGalaxyCanvas.tsx
import { useEffect, useRef } from 'react';
import type { CitationNetwork } from '../types';
interface CanvasProps {
networks: CitationNetwork[];
activeNetwork: CitationNetwork;
onNodeClick: (bibcode: string) => void;
}
interface Node {
id: string;
label: string;
x: number;
y: number;
vx: number;
vy: number;
radius: number;
color: string;
type: 'center' | 'reference' | 'citation';
}
interface Link {
source: string;
target: string;
}
export function CitationGalaxyCanvas({ networks, activeNetwork, onNodeClick }: CanvasProps) {
const canvasRef = useRef<HTMLCanvasElement | null>(null);
useEffect(() => {
const canvas = canvasRef.current;
if (!canvas) return;
const ctx = canvas.getContext('2d');
if (!ctx) return;
// 适配高清屏幕像素比
const dpr = window.devicePixelRatio || 1;
const rect = canvas.getBoundingClientRect();
canvas.width = rect.width * dpr;
canvas.height = rect.height * dpr;
ctx.scale(dpr, dpr);
// 合并所有 networks 的节点,去重,最多 50 个
const MAX_NODES = 50;
const allIds = new Set<string>();
const nodes: Node[] = [];
const links: Link[] = [];
networks.forEach((net, netIdx) => {
const isActive = net.bibcode === activeNetwork.bibcode;
// 添加中心节点
if (!allIds.has(net.bibcode) && nodes.length < MAX_NODES) {
allIds.add(net.bibcode);
nodes.push({
id: net.bibcode,
label: net.bibcode,
x: rect.width / 2 + (netIdx === 0 ? 0 : (Math.random() - 0.5) * 200),
y: rect.height / 2 + (netIdx === 0 ? 0 : (Math.random() - 0.5) * 200),
vx: 0,
vy: 0,
radius: isActive ? 24 : 16,
color: isActive ? '#a855f7' : '#6366f1',
type: 'center',
});
}
// 添加参考文献节点
net.references.forEach((ref, idx) => {
if (nodes.length >= MAX_NODES) return;
if (!allIds.has(ref)) {
allIds.add(ref);
const angle = (idx / Math.max(1, net.references.length)) * Math.PI * 2;
const dist = 140 + Math.random() * 30;
const centerNode = nodes.find(n => n.id === net.bibcode);
nodes.push({
id: ref,
label: ref,
x: (centerNode?.x ?? rect.width / 2) + Math.cos(angle) * dist,
y: (centerNode?.y ?? rect.height / 2) + Math.sin(angle) * dist,
vx: 0,
vy: 0,
radius: 12,
color: '#d97706',
type: 'reference',
});
}
if (allIds.has(ref)) {
links.push({ source: ref, target: net.bibcode });
}
});
// 添加被引文献节点
net.citations.forEach((cit, idx) => {
if (nodes.length >= MAX_NODES) return;
if (!allIds.has(cit)) {
allIds.add(cit);
const angle = (idx / Math.max(1, net.citations.length)) * Math.PI * 2 + Math.PI;
const dist = 160 + Math.random() * 40;
const centerNode = nodes.find(n => n.id === net.bibcode);
nodes.push({
id: cit,
label: cit,
x: (centerNode?.x ?? rect.width / 2) + Math.cos(angle) * dist,
y: (centerNode?.y ?? rect.height / 2) + Math.sin(angle) * dist,
vx: 0,
vy: 0,
radius: 12,
color: '#4f46e5',
type: 'citation',
});
}
if (allIds.has(cit)) {
links.push({ source: net.bibcode, target: cit });
}
});
});
let animationFrameId: number;
let hoveredNode: Node | null = null;
// 经典力导向算法迭代
const updatePhysics = () => {
// 1. 斥力:任何两个节点之间均产生反向推力
for (let i = 0; i < nodes.length; i++) {
for (let j = i + 1; j < nodes.length; j++) {
let dx = nodes[j].x - nodes[i].x;
let dy = nodes[j].y - nodes[i].y;
let dist = Math.sqrt(dx * dx + dy * dy) || 1;
let minDist = nodes[i].radius + nodes[j].radius + 50;
if (dist < minDist) {
let force = (minDist - dist) * 0.08;
let fx = (dx / dist) * force;
let fy = (dy / dist) * force;
// 节点不强行推动中心大节点
if (nodes[i].type !== 'center' || nodes[i].id !== activeNetwork.bibcode) {
nodes[i].vx -= fx;
nodes[i].vy -= fy;
}
if (nodes[j].type !== 'center' || nodes[j].id !== activeNetwork.bibcode) {
nodes[j].vx += fx;
nodes[j].vy += fy;
}
}
}
}
// 2. 引力与向心力:被连线连接的节点之间产生向中心靠拢力
links.forEach(link => {
const sourceNode = nodes.find(n => n.id === link.source);
const targetNode = nodes.find(n => n.id === link.target);
if (sourceNode && targetNode) {
let dx = targetNode.x - sourceNode.x;
let dy = targetNode.y - sourceNode.y;
let dist = Math.sqrt(dx * dx + dy * dy) || 1;
let force = dist * 0.003; // 弹性系数
let fx = (dx / dist) * force;
let fy = (dy / dist) * force;
if (sourceNode.type !== 'center' || sourceNode.id !== activeNetwork.bibcode) {
sourceNode.vx += fx;
sourceNode.vy += fy;
}
if (targetNode.type !== 'center' || targetNode.id !== activeNetwork.bibcode) {
targetNode.vx -= fx;
targetNode.vy -= fy;
}
}
});
// 3. 应用阻尼阻力,限制极限加速
nodes.forEach(node => {
if (node.id !== activeNetwork.bibcode) {
node.x += node.vx;
node.y += node.vy;
node.vx *= 0.85; // 阻尼
node.vy *= 0.85;
}
});
};
// 画布渲染渲染循环
const render = () => {
updatePhysics();
ctx.clearRect(0, 0, rect.width, rect.height);
// 绘制连线
ctx.lineWidth = 1;
links.forEach(link => {
const sourceNode = nodes.find(n => n.id === link.source);
const targetNode = nodes.find(n => n.id === link.target);
if (sourceNode && targetNode) {
ctx.beginPath();
ctx.moveTo(sourceNode.x, sourceNode.y);
ctx.lineTo(targetNode.x, targetNode.y);
ctx.strokeStyle = sourceNode.type === 'reference' ? 'rgba(245, 158, 11, 0.25)' : 'rgba(129, 140, 248, 0.25)';
ctx.stroke();
}
});
// 绘制节点
nodes.forEach(node => {
const isHovered = hoveredNode?.id === node.id;
ctx.beginPath();
ctx.arc(node.x, node.y, node.radius + (isHovered ? 4 : 0), 0, Math.PI * 2);
ctx.fillStyle = node.color;
ctx.fill();
// 绘制光晕环绕
ctx.beginPath();
ctx.arc(node.x, node.y, node.radius + (isHovered ? 8 : 4), 0, Math.PI * 2);
ctx.strokeStyle = node.color + '40'; // 附加透明度光晕
ctx.lineWidth = 2;
ctx.stroke();
// 绘制 bibcode 文本说明
ctx.fillStyle = isHovered ? '#0f172a' : '#64748b';
ctx.font = isHovered ? 'bold 10px monospace' : '9px monospace';
ctx.textAlign = 'center';
ctx.fillText(node.label, node.x, node.y + node.radius + (isHovered ? 18 : 14));
});
animationFrameId = requestAnimationFrame(render);
};
render();
// 交互鼠标监听
const handleMouseMove = (e: MouseEvent) => {
const mouseX = e.clientX - rect.left;
const mouseY = e.clientY - rect.top;
let found: Node | null = null;
for (const node of nodes) {
let dx = node.x - mouseX;
let dy = node.y - mouseY;
let dist = Math.sqrt(dx * dx + dy * dy);
if (dist < node.radius + 5) {
found = node;
break;
}
}
hoveredNode = found;
canvas.style.cursor = found ? 'pointer' : 'default';
};
const handleCanvasClick = () => {
if (hoveredNode && hoveredNode.id !== activeNetwork.bibcode) {
onNodeClick(hoveredNode.id);
}
};
canvas.addEventListener('mousemove', handleMouseMove);
canvas.addEventListener('click', handleCanvasClick);
return () => {
cancelAnimationFrame(animationFrameId);
canvas.removeEventListener('mousemove', handleMouseMove);
canvas.removeEventListener('click', handleCanvasClick);
};
}, [networks, activeNetwork, onNodeClick]);
return (
<canvas
ref={canvasRef}
className="w-full h-full min-h-[450px]"
style={{ display: 'block' }}
/>
);
}

View File

@ -0,0 +1,77 @@
// dashboard/src/components/layout/Sidebar.tsx
import { Search, BookOpen, GitFork, Library, Settings } from 'lucide-react';
import type { StandardPaper } from '../../types';
interface SidebarProps {
activeTab: 'search' | 'library' | 'reader' | 'citation' | 'settings';
setActiveTab: (tab: 'search' | 'library' | 'reader' | 'citation' | 'settings') => void;
selectedPaper: StandardPaper | null;
loadCitations: (bibcode: string) => void;
}
export function Sidebar({ activeTab, setActiveTab, selectedPaper, loadCitations }: SidebarProps) {
return (
<aside className="w-64 glass border-r border-slate-200/80 flex flex-col justify-between py-6">
<div>
{/* Logo */}
<div className="px-6 mb-8 flex items-center gap-3">
<div className="w-9 h-9 rounded-xl bg-gradient-to-br from-purple-500 to-indigo-600 flex items-center justify-center shadow-lg shadow-purple-500/20">
<span className="font-extrabold text-white text-lg tracking-wider">A</span>
</div>
<div>
<h1 className="text-lg font-bold text-slate-800 leading-none font-outfit">AstroResearch</h1>
<span className="text-xs text-slate-500"></span>
</div>
</div>
{/* 选项卡导航 */}
<nav className="px-4 space-y-1.5">
{[
{ id: 'search', label: '统一检索', icon: Search },
{ id: 'library', label: '馆藏管理', icon: Library },
{ id: 'reader', label: '双语阅读', icon: BookOpen, disabled: !selectedPaper },
{ id: 'citation', label: '引用星系', icon: GitFork, disabled: !selectedPaper },
{ id: 'settings', label: '系统设置', icon: Settings },
].map(tab => {
const Icon = tab.icon;
const isActive = activeTab === tab.id;
return (
<button
key={tab.id}
disabled={tab.disabled}
onClick={() => {
setActiveTab(tab.id as any);
if (tab.id === 'citation' && selectedPaper) {
loadCitations(selectedPaper.bibcode);
}
}}
className={`w-full flex items-center gap-3 px-4 py-3 rounded-xl text-sm font-medium transition-all ${
isActive
? 'bg-gradient-to-r from-purple-600/10 to-indigo-600/10 text-purple-600 border border-purple-500/20'
: tab.disabled
? 'opacity-40 cursor-not-allowed text-slate-400'
: 'text-slate-600 hover:bg-slate-100 hover:text-slate-900'
}`}
>
<Icon className="w-4 h-4" />
{tab.label}
</button>
);
})}
</nav>
</div>
{/* 底部当前文献卡片 */}
{selectedPaper && (
<div className="mx-4 p-4 rounded-xl bg-slate-100/50 border border-slate-200/80">
<span className="text-[10px] text-purple-600 font-bold uppercase tracking-wider block mb-1"></span>
<h4 className="text-xs text-slate-800 font-medium line-clamp-2 mb-2">{selectedPaper.title}</h4>
<div className="flex items-center justify-between text-[10px] text-slate-500">
<span>{selectedPaper.year}</span>
<span className="truncate max-w-[100px] text-slate-400">{selectedPaper.bibcode}</span>
</div>
</div>
)}
</aside>
);
}

View File

@ -0,0 +1,114 @@
// dashboard/src/features/citation/CitationPanel.tsx
import { Loader, GitFork, RotateCcw } from 'lucide-react';
import { CitationGalaxyCanvas } from '../../components/CitationGalaxyCanvas';
import type { StandardPaper, CitationNetwork } from '../../types';
interface CitationPanelProps {
selectedPaper: StandardPaper | null;
loadingCitations: boolean;
citationNetwork: CitationNetwork | null;
citationHistory: CitationNetwork[];
loadCitations: (bibcode: string, reset?: boolean) => void;
}
export function CitationPanel({
selectedPaper,
loadingCitations,
citationNetwork,
citationHistory,
loadCitations,
}: CitationPanelProps) {
return (
<div className="max-w-7xl mx-auto h-[calc(100vh-140px)] flex flex-col space-y-4 w-full">
<div className="flex items-center justify-between">
<div>
<h2 className="text-lg font-bold text-slate-800"></h2>
<p className="text-xs text-slate-500"> (References) - - (Citations)</p>
</div>
{selectedPaper && (
<button
onClick={() => loadCitations(selectedPaper.bibcode, true)}
className="px-4 py-2 rounded-xl bg-slate-100 border border-slate-200 text-xs text-slate-650 hover:bg-slate-200 flex items-center gap-2"
>
<RotateCcw className="w-3.5 h-3.5" />
</button>
)}
</div>
{loadingCitations ? (
<div className="glass rounded-2xl flex-1 flex flex-col items-center justify-center text-slate-500">
<Loader className="w-8 h-8 animate-spin text-purple-500 mb-2" />
<p className="text-xs"> SQLite ...</p>
</div>
) : citationNetwork ? (
<div className="flex-1 glass rounded-2xl border border-slate-200/80 overflow-hidden relative flex">
{/* 可视化画布 */}
<div className="flex-1 relative">
<CitationGalaxyCanvas
networks={citationHistory}
activeNetwork={citationNetwork}
onNodeClick={(bibcode) => {
loadCitations(bibcode, false);
}}
/>
</div>
{/* 右侧关系详情面板 */}
<div className="w-80 border-l border-slate-200 bg-white/50 p-6 space-y-6 overflow-y-auto">
<div>
<span className="text-[10px] text-purple-600 font-bold uppercase tracking-wider block mb-1"></span>
<h4 className="text-sm font-semibold text-slate-800 leading-snug">{citationNetwork.title}</h4>
<p className="text-xs text-slate-500 font-mono mt-1">{citationNetwork.bibcode}</p>
</div>
<div>
<span className="text-[10px] text-amber-700 font-bold uppercase tracking-wider block mb-2">
(References: {citationNetwork.references.length})
</span>
<div className="space-y-1 max-h-48 overflow-y-auto">
{citationNetwork.references.length === 0 ? (
<span className="text-xs text-slate-400 italic"> bibcode </span>
) : (
citationNetwork.references.map(bib => (
<div
key={bib}
onClick={() => loadCitations(bib)}
className="text-xs text-slate-600 hover:text-purple-600 cursor-pointer font-mono truncate py-1 hover:bg-slate-100 px-2 rounded"
>
{bib}
</div>
))
)}
</div>
</div>
<div>
<span className="text-[10px] text-indigo-700 font-bold uppercase tracking-wider block mb-2">
(Citations: {citationNetwork.citations.length})
</span>
<div className="space-y-1 max-h-48 overflow-y-auto">
{citationNetwork.citations.length === 0 ? (
<span className="text-xs text-slate-400 italic"> bibcode </span>
) : (
citationNetwork.citations.map(bib => (
<div
key={bib}
onClick={() => loadCitations(bib)}
className="text-xs text-slate-600 hover:text-purple-600 cursor-pointer font-mono truncate py-1 hover:bg-slate-100 px-2 rounded"
>
{bib}
</div>
))
)}
</div>
</div>
</div>
</div>
) : (
<div className="glass rounded-2xl flex-1 flex flex-col items-center justify-center text-slate-400">
<GitFork className="w-12 h-12 mb-3" />
<p className="text-xs"></p>
</div>
)}
</div>
);
}

View File

@ -0,0 +1,131 @@
// dashboard/src/features/library/LibraryPanel.tsx
import { Library, RotateCw, Download, Loader, RefreshCw } from 'lucide-react';
import type { StandardPaper } from '../../types';
interface LibraryPanelProps {
library: StandardPaper[];
fetchLibrary: () => void;
openReader: (paper: StandardPaper) => void;
setSelectedPaper: (paper: StandardPaper | null) => void;
setActiveTab: (tab: 'search' | 'library' | 'reader' | 'citation' | 'settings') => void;
loadCitations: (bibcode: string) => void;
downloadingBibcodes: Record<string, boolean>;
handleDownload: (bibcode: string, force?: boolean) => void;
}
export function LibraryPanel({
library,
fetchLibrary,
openReader,
setSelectedPaper,
setActiveTab,
loadCitations,
downloadingBibcodes,
handleDownload,
}: LibraryPanelProps) {
return (
<div className="max-w-5xl mx-auto space-y-6">
<div className="flex items-center justify-between mb-4">
<div>
<h2 className="text-lg font-bold text-slate-800"></h2>
<p className="text-xs text-slate-500"></p>
</div>
<button
onClick={fetchLibrary}
className="px-4 py-2 rounded-xl bg-slate-100 border border-slate-200 text-xs text-slate-600 hover:bg-slate-200 flex items-center gap-2"
>
<RotateCw className="w-3.5 h-3.5" />
</button>
</div>
{library.length === 0 ? (
<div className="glass p-12 rounded-2xl text-center">
<Library className="w-12 h-12 text-slate-400 mx-auto mb-4" />
<h3 className="font-bold text-slate-800 mb-1"></h3>
<p className="text-xs text-slate-500 mb-6">线</p>
<button
onClick={() => setActiveTab('search')}
className="px-6 py-2.5 rounded-xl bg-purple-600 text-white font-semibold hover:bg-purple-500 text-xs"
>
</button>
</div>
) : (
<div className="grid grid-cols-1 md:grid-cols-2 gap-4">
{library.map(paper => {
const isDownloading = downloadingBibcodes[paper.bibcode] || false;
return (
<div key={paper.bibcode} className="glass p-6 rounded-2xl border border-slate-200/80 hover:border-slate-300 flex flex-col justify-between">
<div>
<div className="flex justify-between items-start gap-4 mb-2">
<h3
className="font-bold text-sm text-slate-800 line-clamp-2 hover:text-purple-600 cursor-pointer"
onClick={() => openReader(paper)}
>
{paper.title}
</h3>
{paper.is_downloaded ? (
<span className="px-2 py-0.5 rounded bg-emerald-50 text-emerald-600 border border-emerald-200 text-[9px] font-bold uppercase shrink-0"></span>
) : (
<span className="px-2 py-0.5 rounded bg-amber-50 text-amber-600 border border-amber-200 text-[9px] font-bold uppercase shrink-0"></span>
)}
</div>
<div className="text-xs text-slate-500 mb-3">
{paper.authors.slice(0, 2).join(', ')}{paper.authors.length > 2 ? ' et al.' : ''} | {paper.year}
</div>
<p className="text-xs text-slate-600 line-clamp-3 leading-relaxed mb-4">{paper.abstract_text}</p>
</div>
<div className="flex items-center justify-between border-t border-slate-200/60 pt-4 mt-auto">
<div className="flex flex-wrap gap-2">
<button
onClick={() => openReader(paper)}
className="px-3.5 py-1.5 rounded-lg bg-purple-600 text-white text-xs hover:bg-purple-500 transition-all font-semibold"
>
</button>
<button
onClick={() => {
setSelectedPaper(paper);
setActiveTab('citation');
loadCitations(paper.bibcode);
}}
className="px-3.5 py-1.5 rounded-lg bg-slate-100 border border-slate-200 text-slate-600 hover:bg-slate-200 text-xs font-semibold"
>
</button>
{paper.is_downloaded ? (
<button
onClick={() => { if (confirm('确定要强制重新下载吗?这会覆盖本地文件。')) handleDownload(paper.bibcode, true); }}
disabled={isDownloading}
className="px-3.5 py-1.5 rounded-lg bg-slate-100 border border-slate-200 text-amber-600 hover:bg-slate-200 text-xs font-semibold flex items-center gap-1"
>
{isDownloading ? <Loader className="w-3 h-3 animate-spin" /> : <RefreshCw className="w-3 h-3" />}
{isDownloading ? '下载中' : '重新下载'}
</button>
) : (
<button
onClick={() => handleDownload(paper.bibcode)}
disabled={isDownloading}
className="px-3.5 py-1.5 rounded-lg bg-blue-600 text-white text-xs hover:bg-blue-500 transition-all font-semibold flex items-center gap-1 shadow-lg shadow-blue-500/10"
>
{isDownloading ? <Loader className="w-3 h-3 animate-spin" /> : <Download className="w-3 h-3" />}
{isDownloading ? '下载中' : '下载 PDF/HTML'}
</button>
)}
</div>
<div className="flex items-center gap-1.5">
<span className={`w-1.5 h-1.5 rounded-full ${paper.has_markdown ? 'bg-purple-400' : 'bg-slate-350'}`} title="解析状态" />
<span className={`w-1.5 h-1.5 rounded-full ${paper.has_translation ? 'bg-emerald-400' : 'bg-slate-350'}`} title="翻译状态" />
<span className="text-[10px] font-mono text-slate-400">{paper.bibcode}</span>
</div>
</div>
</div>
);
})}
</div>
)}
</div>
);
}

View File

@ -0,0 +1,287 @@
// dashboard/src/features/reader/ReaderPanel.tsx
import ReactMarkdown from 'react-markdown';
import remarkMath from 'remark-math';
import rehypeKatex from 'rehype-katex';
import 'katex/dist/katex.min.css';
import { FileText, Loader, Languages, RotateCw, Pencil, X, PlusCircle, Trash2 } from 'lucide-react';
import type { StandardPaper, NoteRecord } from '../../types';
interface ReaderPanelProps {
selectedPaper: StandardPaper;
parsing: boolean;
handleParse: (bibcode: string, force?: boolean) => void;
translating: boolean;
handleTranslate: (bibcode: string, force?: boolean) => void;
showNotesPanel: boolean;
setShowNotesPanel: (show: boolean) => void;
notes: NoteRecord[];
englishText: string;
chineseText: string;
handleTextSelection: (paragraphIdx: number) => void;
selectedParagraphIdx: number | null;
setSelectedParagraphIdx: (idx: number | null) => void;
selectedText: string;
setSelectedText: (text: string) => void;
newNoteColor: string;
setNewNoteColor: (color: string) => void;
newNoteText: string;
setNewNoteText: (text: string) => void;
handleCreateNote: () => void;
handleDeleteNote: (id: number) => void;
}
const NOTE_COLORS: Record<string, { bg: string; border: string; label: string }> = {
yellow: { bg: 'bg-yellow-500/20', border: 'border-yellow-500/40', label: '黄色' },
green: { bg: 'bg-emerald-500/20', border: 'border-emerald-500/40', label: '绿色' },
blue: { bg: 'bg-blue-500/20', border: 'border-blue-500/40', label: '蓝色' },
pink: { bg: 'bg-pink-500/20', border: 'border-pink-500/40', label: '粉色' },
};
export function ReaderPanel({
selectedPaper,
parsing,
handleParse,
translating,
handleTranslate,
showNotesPanel,
setShowNotesPanel,
notes,
englishText,
chineseText,
handleTextSelection,
selectedParagraphIdx,
setSelectedParagraphIdx,
selectedText,
setSelectedText,
newNoteColor,
setNewNoteColor,
newNoteText,
setNewNoteText,
handleCreateNote,
handleDeleteNote,
}: ReaderPanelProps) {
return (
<div className="max-w-7xl mx-auto h-[calc(100vh-140px)] flex flex-col space-y-4 w-full">
{/* 控制头部 */}
<div className="flex items-center justify-between">
<div>
<h2 className="text-base font-bold text-slate-800 line-clamp-1 leading-snug">{selectedPaper.title}</h2>
<div className="flex items-center gap-2 text-xs text-slate-500">
<span>: {selectedPaper.pub_journal}</span>
<span></span>
<span>Bibcode: {selectedPaper.bibcode}</span>
</div>
</div>
<div className="flex gap-2">
{!selectedPaper.has_markdown ? (
<button
onClick={() => handleParse(selectedPaper.bibcode)}
disabled={parsing}
className="px-4 py-2 rounded-xl bg-purple-600 text-white text-xs hover:bg-purple-500 font-semibold flex items-center gap-2 shadow-lg shadow-purple-500/10"
>
{parsing ? <Loader className="w-3.5 h-3.5 animate-spin" /> : <FileText className="w-3.5 h-3.5" />}
{parsing ? '提取英文正文中...' : '第一步:解析 PDF/HTML 正文'}
</button>
) : (
<button
onClick={() => { if (confirm('确定要重新解析正文吗?这会覆盖本地已解析的 Markdown。')) handleParse(selectedPaper.bibcode, true); }}
disabled={parsing}
className="px-4 py-2 rounded-xl bg-slate-100 border border-slate-200 text-slate-600 text-xs hover:bg-slate-200 font-semibold flex items-center gap-2"
title="覆盖本地解析结果,重新从 HTML/PDF 转换为 Markdown"
>
{parsing ? <Loader className="w-3.5 h-3.5 animate-spin" /> : <RotateCw className="w-3.5 h-3.5" />}
</button>
)}
{selectedPaper.has_markdown && (
<button
onClick={() => handleTranslate(selectedPaper.bibcode)}
disabled={translating}
className="px-4 py-2 rounded-xl bg-emerald-600 text-white text-xs hover:bg-emerald-500 font-semibold flex items-center gap-2 shadow-lg shadow-emerald-500/10"
>
{translating ? <Loader className="w-3.5 h-3.5 animate-spin" /> : <Languages className="w-3.5 h-3.5" />}
{translating ? 'LLM 学术术语修正翻译中...' : '第二步:大模型对比翻译'}
</button>
)}
{selectedPaper.has_translation && (
<button
onClick={() => handleTranslate(selectedPaper.bibcode, true)}
disabled={translating}
className="px-4 py-2 rounded-xl bg-slate-100 border border-slate-200 text-slate-600 text-xs hover:bg-slate-200 font-semibold flex items-center gap-2"
title="清除翻译缓存并重新生成大模型对照翻译"
>
{translating ? <Loader className="w-3.5 h-3.5 animate-spin" /> : <RotateCw className="w-3.5 h-3.5" />}
</button>
)}
</div>
</div>
{/* 并排对比阅读器 */}
<div className="flex-1 grid gap-6 overflow-hidden w-full" style={{ gridTemplateColumns: showNotesPanel ? '1fr 1fr 320px' : '1fr 1fr' }}>
{/* 英文正文面板 */}
<div className="glass rounded-2xl p-6 overflow-y-auto border border-slate-200 relative flex flex-col">
<div className="flex items-center justify-between mb-3 border-b border-slate-200/60 pb-2">
<span className="text-[10px] text-purple-650 font-bold uppercase tracking-wider"> (Markdown/LaTeX)</span>
<button
onClick={() => setShowNotesPanel(!showNotesPanel)}
className={`flex items-center gap-1 text-[10px] px-2 py-1 rounded-lg transition-all ${
showNotesPanel ? 'bg-amber-100 text-amber-800 border border-amber-300' : 'text-slate-600 hover:text-slate-900 bg-slate-100 border border-slate-200'
}`}
>
<Pencil className="w-3 h-3" />
({notes.length})
</button>
</div>
{parsing ? (
<div className="flex-1 flex flex-col items-center justify-center text-slate-500 space-y-2">
<Loader className="w-8 h-8 animate-spin text-purple-500" />
<p className="text-xs"> HTML MD退 MinerU PDF ...</p>
</div>
) : englishText ? (
<div className="prose prose-sm max-w-none leading-relaxed text-slate-700 prose-headings:text-purple-700 prose-strong:text-slate-900 prose-code:text-emerald-700 prose-blockquote:border-purple-500/50 prose-blockquote:text-slate-500 prose-img:max-w-full prose-img:rounded-lg">
{englishText.split('\n\n').map((para, idx) => (
<div
key={idx}
onMouseUp={() => handleTextSelection(idx)}
className={`cursor-text relative rounded px-1 -mx-1 transition-colors ${
notes.some(n => n.paragraph_index === idx)
? `${NOTE_COLORS[notes.find(n => n.paragraph_index === idx)!.highlight_color]?.bg || ''} ${NOTE_COLORS[notes.find(n => n.paragraph_index === idx)!.highlight_color]?.border || ''} border`
: 'hover:bg-slate-100'
}`}
>
<ReactMarkdown
remarkPlugins={[remarkMath]}
rehypePlugins={[rehypeKatex]}
>
{para}
</ReactMarkdown>
</div>
))}
</div>
) : (
<div className="flex-1 flex flex-col items-center justify-center text-slate-400">
<FileText className="w-12 h-12 mb-3" />
<p className="text-xs"> Markdown </p>
<p className="text-[11px] text-slate-500 mt-1"></p>
</div>
)}
</div>
{/* 中文翻译面板 */}
<div className="glass rounded-2xl p-6 overflow-y-auto border border-slate-200 relative flex flex-col">
<span className="text-[10px] text-emerald-700 font-bold uppercase tracking-wider block mb-3 border-b border-slate-200/60 pb-2"> ()</span>
{translating ? (
<div className="flex-1 flex flex-col items-center justify-center text-slate-500 space-y-2">
<Loader className="w-8 h-8 animate-spin text-emerald-500" />
<p className="text-xs"> prompt ...</p>
</div>
) : chineseText ? (
<div className="prose prose-sm max-w-none leading-relaxed text-slate-700 prose-headings:text-emerald-700 prose-strong:text-slate-900 prose-code:text-blue-700 prose-blockquote:border-emerald-500/50 prose-img:max-w-full prose-img:rounded-lg">
<ReactMarkdown
remarkPlugins={[remarkMath]}
rehypePlugins={[rehypeKatex]}
>
{chineseText}
</ReactMarkdown>
</div>
) : (
<div className="flex-1 flex flex-col items-center justify-center text-slate-400">
<Languages className="w-12 h-12 mb-3" />
<p className="text-xs"></p>
<p className="text-[11px] text-slate-500 mt-1"></p>
</div>
)}
</div>
{/* 笔记侧边栏 */}
{showNotesPanel && (
<div className="glass rounded-2xl border border-amber-200 bg-amber-50/20 flex flex-col overflow-hidden">
<div className="px-5 py-4 border-b border-amber-200 flex items-center justify-between">
<span className="text-xs text-amber-700 font-bold"> ({notes.length})</span>
<button onClick={() => setShowNotesPanel(false)} className="text-slate-500 hover:text-slate-800">
<X className="w-4 h-4" />
</button>
</div>
{/* 新建笔记输入区 */}
{selectedParagraphIdx !== null && (
<div className="px-4 py-3 border-b border-slate-200 bg-slate-50">
<div className="text-[10px] text-slate-500 mb-2"> #{selectedParagraphIdx + 1}</div>
{selectedText && <div className="text-[10px] text-slate-650 italic line-clamp-2 mb-2 bg-white border border-slate-200 px-2 py-1 rounded">"{selectedText}"</div>}
{/* 高亮颜色选择 */}
<div className="flex gap-1.5 mb-2">
{Object.entries(NOTE_COLORS).map(([color, style]) => (
<button
key={color}
onClick={() => setNewNoteColor(color)}
className={`w-5 h-5 rounded-full border-2 transition-transform ${
newNoteColor === color ? 'border-slate-700 scale-110' : 'border-transparent'
} ${style.bg.replace('/20', '')}`}
title={style.label}
/>
))}
</div>
<textarea
value={newNoteText}
onChange={e => setNewNoteText(e.target.value)}
placeholder="输入笔记内容..."
rows={3}
className="w-full bg-white border border-slate-350 rounded-lg text-xs text-slate-800 placeholder-slate-400 px-3 py-2 resize-none focus:outline-none focus:border-amber-500 mb-2"
/>
<div className="flex gap-2">
<button
onClick={handleCreateNote}
className="flex-1 px-3 py-1.5 rounded-lg bg-amber-600 text-white text-xs hover:bg-amber-500 font-semibold flex items-center justify-center gap-1"
>
<PlusCircle className="w-3 h-3" />
</button>
<button
onClick={() => { setSelectedParagraphIdx(null); setSelectedText(''); setNewNoteText(''); }}
className="px-3 py-1.5 rounded-lg bg-slate-150 text-slate-600 text-xs hover:bg-slate-200"
>
</button>
</div>
</div>
)}
{/* 笔记列表 */}
<div className="flex-1 overflow-y-auto p-4 space-y-3">
{notes.length === 0 ? (
<div className="text-center text-slate-400 text-xs py-8">
<Pencil className="w-8 h-8 mx-auto mb-2 opacity-40" />
</div>
) : (
notes.map(note => (
<div
key={note.id}
className={`p-3 rounded-xl border text-xs ${NOTE_COLORS[note.highlight_color]?.bg || 'bg-slate-50'} ${NOTE_COLORS[note.highlight_color]?.border || 'border-slate-200'}`}
>
<div className="text-slate-500 text-[10px] mb-1"> #{note.paragraph_index + 1}</div>
{note.selected_text && <div className="text-slate-600 italic line-clamp-1 mb-1 text-[10px]">"{note.selected_text}"</div>}
<p className="text-slate-800 leading-relaxed">{note.note_text}</p>
<div className="flex items-center justify-between mt-2">
<span className="text-[10px] text-slate-400">{note.created_at.split('T')[0]}</span>
<button
onClick={() => handleDeleteNote(note.id)}
className="text-slate-400 hover:text-red-600 transition-colors"
>
<Trash2 className="w-3 h-3" />
</button>
</div>
</div>
))
)}
</div>
</div>
)}
</div>
</div>
);
}

View File

@ -0,0 +1,230 @@
// dashboard/src/features/search/SearchPanel.tsx
import React from 'react';
import { Search, Loader, CheckCircle, Copy, Download, RefreshCw } from 'lucide-react';
import type { StandardPaper } from '../../types';
interface SearchPanelProps {
searchQuery: string;
setSearchQuery: (query: string) => void;
searchSource: 'all' | 'ads' | 'arxiv';
setSearchSource: (src: 'all' | 'ads' | 'arxiv') => void;
searching: boolean;
handleSearch: (e: React.FormEvent) => void;
searchResults: StandardPaper[];
exportingList: string[];
toggleExportItem: (bibcode: string) => void;
handleExportBibtex: () => void;
exporting: boolean;
bibtexContent: string | null;
downloadingBibcodes: Record<string, boolean>;
handleDownload: (bibcode: string, force?: boolean) => void;
selectedPaper: StandardPaper | null;
setSelectedPaper: (paper: StandardPaper | null) => void;
openReader: (paper: StandardPaper) => void;
setActiveTab: (tab: 'search' | 'library' | 'reader' | 'citation' | 'settings') => void;
loadCitations: (bibcode: string, reset?: boolean) => void;
}
export function SearchPanel({
searchQuery,
setSearchQuery,
searchSource,
setSearchSource,
searching,
handleSearch,
searchResults,
exportingList,
toggleExportItem,
handleExportBibtex,
exporting,
bibtexContent,
downloadingBibcodes,
handleDownload,
selectedPaper,
setSelectedPaper,
openReader,
setActiveTab,
loadCitations,
}: SearchPanelProps) {
return (
<div className="space-y-6 max-w-5xl mx-auto">
{/* 搜索框 */}
<div className="glass p-6 rounded-2xl">
<form onSubmit={handleSearch} className="space-y-4">
<div className="relative">
<Search className="absolute left-4 top-1/2 -translate-y-1/2 text-slate-400 w-5 h-5" />
<input
type="text"
value={searchQuery}
onChange={e => setSearchQuery(e.target.value)}
placeholder="检索天文学文献 (支持关键字、作者、年份范围检索,如 'hot subdwarf year:2020-2023')"
className="w-full pl-12 pr-4 py-4 rounded-xl bg-white/60 border border-slate-200 text-slate-800 placeholder-slate-400 focus:outline-none focus:border-purple-500 focus:ring-1 focus:ring-purple-500 transition-all text-sm"
/>
<button
type="submit"
disabled={searching}
className="absolute right-3 top-1/2 -translate-y-1/2 px-5 py-2 rounded-lg bg-gradient-to-r from-purple-600 to-indigo-600 text-white text-xs font-semibold hover:from-purple-500 hover:to-indigo-500 transition-all flex items-center gap-2"
>
{searching ? <Loader className="w-3.5 h-3.5 animate-spin" /> : null}
{searching ? '检索中' : '开始检索'}
</button>
</div>
<div className="flex items-center justify-between">
<div className="flex gap-4">
{[
{ id: 'all', label: '全部数据源' },
{ id: 'ads', label: 'NASA ADS' },
{ id: 'arxiv', label: 'arXiv 预印本' },
].map(src => (
<label key={src.id} className="flex items-center gap-2 cursor-pointer text-xs">
<input
type="radio"
name="searchSource"
checked={searchSource === src.id}
onChange={() => setSearchSource(src.id as any)}
className="text-purple-600 focus:ring-purple-500 border-slate-300 bg-white"
/>
<span className={searchSource === src.id ? 'text-purple-600 font-medium' : 'text-slate-500'}>
{src.label}
</span>
</label>
))}
</div>
{exportingList.length > 0 && (
<button
onClick={handleExportBibtex}
disabled={exporting}
className="px-4 py-1.5 rounded-lg bg-slate-100 border border-slate-200 text-xs text-slate-600 hover:bg-slate-200 hover:text-slate-800 flex items-center gap-1.5"
>
{exporting ? <Loader className="w-3 h-3 animate-spin" /> : null}
({exportingList.length}) BibTeX
</button>
)}
</div>
</form>
</div>
{/* BibTeX 导出结果显示 */}
{bibtexContent && (
<div className="glass p-6 rounded-2xl relative">
<h3 className="text-sm font-bold text-purple-600 mb-3 flex items-center gap-2">
<CheckCircle className="w-4 h-4 text-emerald-500" /> BibTeX
</h3>
<pre className="bg-slate-50 p-4 rounded-xl border border-slate-200 text-xs text-slate-700 font-mono overflow-x-auto whitespace-pre-wrap max-h-60">
{bibtexContent}
</pre>
<button
onClick={() => {
navigator.clipboard.writeText(bibtexContent);
alert('已复制至剪贴板');
}}
className="absolute top-6 right-6 text-slate-600 hover:text-slate-900 flex items-center gap-1 text-xs bg-slate-100 px-3 py-1 rounded-lg border border-slate-200"
>
<Copy className="w-3 h-3" />
</button>
</div>
)}
{/* 检索列表 */}
<div className="space-y-4">
{searchResults.map(paper => {
const isDownloading = downloadingBibcodes[paper.bibcode] || false;
const isSelected = selectedPaper?.bibcode === paper.bibcode;
return (
<div
key={paper.bibcode}
className={`glass p-6 rounded-2xl transition-all border ${
isSelected ? 'border-purple-500/50 bg-purple-50/50' : 'border-slate-200/80 hover:border-slate-300'
}`}
>
<div className="flex justify-between items-start gap-4 mb-2">
<h3
className="font-bold text-base text-slate-800 line-clamp-1 leading-snug hover:text-purple-600 cursor-pointer"
onClick={() => openReader(paper)}
>
{paper.title}
</h3>
<div className="flex gap-2 shrink-0">
{paper.is_downloaded ? (
<div className="flex items-center gap-2">
<span className="px-2 py-0.5 rounded bg-emerald-50 text-emerald-600 border border-emerald-200 text-[10px] font-bold uppercase"></span>
<button
onClick={() => { if (confirm('确定要强制重新下载吗?这会覆盖本地文件。')) handleDownload(paper.bibcode, true); }}
disabled={isDownloading}
className="px-3 py-1 rounded bg-slate-100 border border-slate-200 text-xs text-amber-600 hover:bg-slate-200 hover:text-amber-700 flex items-center gap-1 font-semibold"
>
{isDownloading ? <Loader className="w-3 h-3 animate-spin" /> : <RefreshCw className="w-3 h-3" />}
{isDownloading ? '下载中' : '重新下载'}
</button>
</div>
) : (
<button
onClick={() => handleDownload(paper.bibcode)}
disabled={isDownloading}
className="px-3 py-1 rounded bg-slate-100 border border-slate-200 text-xs text-slate-600 hover:bg-slate-200 hover:text-slate-900 flex items-center gap-1"
>
{isDownloading ? <Loader className="w-3 h-3 animate-spin" /> : <Download className="w-3 h-3" />}
{isDownloading ? '下载中' : '下载 PDF/HTML'}
</button>
)}
<input
type="checkbox"
checked={exportingList.includes(paper.bibcode)}
onChange={() => toggleExportItem(paper.bibcode)}
className="rounded text-purple-600 border-slate-300 bg-white focus:ring-purple-500"
/>
</div>
</div>
<div className="flex flex-wrap items-center gap-x-4 gap-y-1.5 text-xs text-slate-500 mb-3">
<span className="font-medium text-slate-700">
{paper.authors.slice(0, 3).join(', ')}{paper.authors.length > 3 ? ' et al.' : ''}
</span>
<span className="text-slate-300"></span>
<span>{paper.year}</span>
<span className="text-slate-300"></span>
<span className="text-slate-500">{paper.pub_journal}</span>
{paper.citation_count > 0 && (
<>
<span className="text-slate-300"></span>
<span className="text-amber-600 font-medium">: {paper.citation_count}</span>
</>
)}
</div>
<p className="text-xs text-slate-600 line-clamp-3 mb-4 leading-relaxed">{paper.abstract_text}</p>
<div className="flex items-center justify-between">
<div className="flex gap-2">
<button
onClick={() => openReader(paper)}
className="px-4 py-1.5 rounded-lg bg-gradient-to-r from-purple-50 to-indigo-50 border border-purple-200 text-xs text-purple-600 hover:from-purple-100 hover:to-indigo-100 hover:border-purple-300"
>
</button>
<button
onClick={() => {
setSelectedPaper(paper);
setActiveTab('citation');
loadCitations(paper.bibcode, true);
}}
className="px-4 py-1.5 rounded-lg bg-slate-100 border border-slate-200 text-xs text-slate-600 hover:bg-slate-200 hover:text-slate-900"
>
</button>
</div>
<div className="text-[10px] text-slate-400 flex gap-4 font-mono">
{paper.doi && <span>DOI: {paper.doi}</span>}
<span>Bibcode: {paper.bibcode}</span>
</div>
</div>
</div>
);
})}
</div>
</div>
);
}

View File

@ -0,0 +1,25 @@
// dashboard/src/features/settings/SettingsPanel.tsx
export function SettingsPanel() {
return (
<div className="max-w-3xl mx-auto space-y-6">
<div>
<h2 className="text-lg font-bold text-slate-800"></h2>
<p className="text-xs text-slate-500">AstroResearch </p>
</div>
<div className="glass p-6 rounded-2xl space-y-6">
<div className="border-b border-slate-200 pb-4">
<h3 className="text-sm font-semibold text-slate-800 mb-1"> (.env) </h3>
<p className="text-xs text-slate-500 leading-relaxed">
API Token <code className="text-xs text-purple-650 font-semibold bg-purple-50 px-1 py-0.5 rounded">.env</code>
</p>
</div>
<div className="bg-purple-50 border border-purple-200 p-4 rounded-xl text-xs text-slate-650 leading-relaxed">
<strong></strong> 使 <code className="text-xs text-purple-650 font-semibold bg-purple-50 px-1 py-0.5 rounded">/home/fmq//astrodict_241020/astrodict241020_ec.txt</code>
</div>
</div>
</div>
);
}

54
dashboard/src/index.css Normal file
View File

@ -0,0 +1,54 @@
@import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=Outfit:wght@400;500;600;700&display=swap');
@import "tailwindcss";
@plugin "@tailwindcss/typography";
:root {
font-family: 'Inter', system-ui, -apple-system, sans-serif;
color-scheme: light;
}
body {
margin: 0;
background-color: #f8fafc;
color: #0f172a;
min-height: 100vh;
}
/* Custom premium scrollbar styling */
::-webkit-scrollbar {
width: 6px;
height: 6px;
}
::-webkit-scrollbar-track {
background: #f1f5f9;
}
::-webkit-scrollbar-thumb {
background: #cbd5e1;
border-radius: 3px;
}
::-webkit-scrollbar-thumb:hover {
background: #94a3b8;
}
/* Glassmorphism utility for light mode */
.glass {
background: rgba(255, 255, 255, 0.45);
backdrop-filter: blur(16px);
-webkit-backdrop-filter: blur(16px);
border: 1px solid rgba(255, 255, 255, 0.6);
box-shadow: 0 4px 30px rgba(0, 0, 0, 0.03);
}
.glass-accent {
border: 1px solid rgba(192, 132, 252, 0.2);
}
/* Animations */
@keyframes pulse-slow {
0%, 100% { opacity: 0.4; }
50% { opacity: 0.8; }
}
.animate-pulse-slow {
animation: pulse-slow 3s cubic-bezier(0.4, 0, 0.6, 1) infinite;
}

10
dashboard/src/main.tsx Normal file
View File

@ -0,0 +1,10 @@
import { StrictMode } from 'react'
import { createRoot } from 'react-dom/client'
import './index.css'
import App from './App.tsx'
createRoot(document.getElementById('root')!).render(
<StrictMode>
<App />
</StrictMode>,
)

37
dashboard/src/types.ts Normal file
View File

@ -0,0 +1,37 @@
// dashboard/src/types.ts
export interface StandardPaper {
bibcode: string;
title: string;
authors: string[];
year: string;
pub_journal: string;
keywords: string[];
abstract_text: string;
doi: string;
arxiv_id: string;
citation_count: number;
reference_count: number;
is_downloaded: boolean;
has_markdown: boolean;
has_translation: boolean;
}
export interface CitationNetwork {
bibcode: string;
title: string;
citation_count: number;
reference_count: number;
references: string[];
citations: string[];
}
export interface NoteRecord {
id: number;
bibcode: string;
paragraph_index: number;
note_text: string;
highlight_color: string;
selected_text: string;
created_at: string;
}

View File

@ -0,0 +1,25 @@
{
"compilerOptions": {
"tsBuildInfoFile": "./node_modules/.tmp/tsconfig.app.tsbuildinfo",
"target": "es2023",
"lib": ["ES2023", "DOM"],
"module": "esnext",
"types": ["vite/client"],
"skipLibCheck": true,
/* Bundler mode */
"moduleResolution": "bundler",
"allowImportingTsExtensions": true,
"verbatimModuleSyntax": true,
"moduleDetection": "force",
"noEmit": true,
"jsx": "react-jsx",
/* Linting */
"noUnusedLocals": true,
"noUnusedParameters": true,
"erasableSyntaxOnly": true,
"noFallthroughCasesInSwitch": true
},
"include": ["src"]
}

7
dashboard/tsconfig.json Normal file
View File

@ -0,0 +1,7 @@
{
"files": [],
"references": [
{ "path": "./tsconfig.app.json" },
{ "path": "./tsconfig.node.json" }
]
}

View File

@ -0,0 +1,24 @@
{
"compilerOptions": {
"tsBuildInfoFile": "./node_modules/.tmp/tsconfig.node.tsbuildinfo",
"target": "es2023",
"lib": ["ES2023"],
"module": "esnext",
"types": ["node"],
"skipLibCheck": true,
/* Bundler mode */
"moduleResolution": "bundler",
"allowImportingTsExtensions": true,
"verbatimModuleSyntax": true,
"moduleDetection": "force",
"noEmit": true,
/* Linting */
"noUnusedLocals": true,
"noUnusedParameters": true,
"erasableSyntaxOnly": true,
"noFallthroughCasesInSwitch": true
},
"include": ["vite.config.ts"]
}

19
dashboard/vite.config.ts Normal file
View File

@ -0,0 +1,19 @@
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
import tailwindcss from '@tailwindcss/vite'
// https://vite.dev/config/
export default defineConfig({
plugins: [
react(),
tailwindcss(),
],
server: {
proxy: {
'/api': {
target: 'http://localhost:8000',
changeOrigin: true,
}
}
}
})

View File

@ -0,0 +1,580 @@
# 前端开发规范\-React篇
React 开发实践指南
V1\.0 \| 2026年5月 \
# 一、概述与总则
本文档旨在统一前端开发团队的技术实践标准,确保代码质量、可维护性和团队协作效率。规范覆盖 React 技术栈的核心开发场景,同时涵盖 HTML/CSS、JavaScript/TypeScript 等基础层面的通用准则。
## 1\.1 适用范围
本规范适用于所有使用 React 技术栈的前端项目,包括但不限于:
1. 使用 React 18\+ 的 Web 应用项目
2. 基于 Next\.js、Remix 等元框架的服务端渲染项目
3. 使用 React Native 的移动端跨平台项目(部分适用)
## 1\.2 规范层级
规范条目按强制程度分为三个层级,开发者应根据项目实际情况合理遵循:
|**层级**|**标识**|**说明**|
|---|---|---|
|必须Must|\[M\]|所有项目必须严格遵守Code Review 中必检项|
|推荐Should|\[S\]|强烈建议遵循,特殊场景经评估后可调整|
|可选May|\[O\]|根据项目实际情况选择性采纳|
## 1\.3 技术栈版本要求
新项目应优先采用以下技术栈版本,已有项目应在迭代周期内逐步升级:
|**技术项**|**推荐版本**|**说明**|
|---|---|---|
|React|18\.x / 19\.x|使用最新稳定版|
|TypeScript|5\.5\+|strict 模式启用|
|Vite|6\.x|构建工具首选|
|Next\.js|15\.x|SSR/SSG 场景|
|Tailwind CSS|4\.x|原子化 CSS 方案|
|ESLint|9\.x|Flat Config 格式|
# 二、React 开发规范
React 是本规范的核心关注领域。本章从组件设计、Hooks 使用、状态管理、TypeScript 类型约束和性能优化五个维度,系统性地定义 React 开发的最佳实践。
## 2\.1 组件设计规范
### 2\.1\.1 组件分类与组织
React 组件应按职责明确划分为以下类别,并在项目目录中保持清晰的组织结构:
|**组件类型**|**存放路径**|**职责说明**|
|---|---|---|
|Page 组件|app/ 或 pages/|路由级别的页面组件,负责数据获取和页面级布局|
|Layout 组件|components/layout/|页面布局框架,如 Header、Sidebar、Footer|
|UI 组件|components/ui/|基础 UI 元素Button、Input、Modal 等纯展示组件|
|Feature 组件|features/\*/components/|业务功能组件,与特定功能域紧耦合|
|HOC / 工具|components/hoc/|高阶组件和渲染工具render props|
### 2\.1\.2 函数组件优先
自 React 16\.8 引入 Hooks 以来,函数组件已成为官方推荐的标准写法。所有新开发组件必须使用函数组件,类组件仅在维护遗留代码时允许存在。
```TypeScript
// 推荐:函数组件 + Hooks
import { useState, useCallback } from 'react';
interface UserCardProps {
user: User;
onSelect: (id: string) => void;
}
export function UserCard({ user, onSelect }: UserCardProps) {
const [expanded, setExpanded] = useState(false);
const handleClick = useCallback(() => {
onSelect(user.id);
setExpanded(prev => !prev);
}, [onSelect, user.id]);
return (
<Card onClick={handleClick}>
<Avatar src={user.avatar} />
<UserName>{user.name}</UserName>
{expanded && <UserDetail user={user} />}
</Card>
);
}
```
### 2\.1\.3 Props 设计原则
组件的 Props 接口设计直接影响组件的可复用性和可维护性。遵循以下原则:
4. 单一职责:每个组件只接收其渲染所需的最小数据集合,避免传递冗余数据
5. 显式接口:使用 TypeScript interface 定义 Props禁止隐式 any 类型
6. 默认值策略:对可选 Props 提供合理的默认值,或使用解构赋值简化处理
7. 事件命名:自定义事件处理器以 on 为前缀(如 onSelect、onValueChange遵循 React 原生事件命名惯例
8. 避免过度透传:不要简单地将父组件的 Props 全部展开传递给子组件,应显式声明所需属性
### 2\.1\.4 组件文件结构
每个组件应按以下结构组织,确保关注点分离和可测试性:
```TypeScript
// components/UserCard/index.tsx
export { UserCard } from './UserCard';
export type { UserCardProps } from './types';
// components/UserCard/UserCard.tsx
import { useState } from 'react';
import type { UserCardProps } from './types';
import { useUserCard } from './useUserCard';
import * as S from './styles';
export function UserCard({ user, onSelect }: UserCardProps) {
const { expanded, handleClick } = useUserCard(user, onSelect);
return (
<S.Card onClick={handleClick}>...</S.Card>
);
}
// components/UserCard/types.ts
export interface UserCardProps {
user: User;
onSelect: (id: string) => void;
}
// components/UserCard/useUserCard.ts
export function useUserCard(user: User, onSelect: (id: string) => void) {
// 业务逻辑抽离到自定义 Hook
}
// components/UserCard/styles.ts (styled-components / CSS Modules)
```
## 2\.2 Hooks 使用规范
### 2\.2\.1 Hooks 基础规则
Hooks 是 React 16\.8 引入的革命性特性,必须严格遵循以下使用规则,否则可能导致不可预期的行为:
9. 只在最顶层调用 Hooks不要在循环、条件判断或嵌套函数中调用 Hooks
10. 只在 React 函数中调用 Hooks在函数组件或自定义 Hook 中调用,不要在普通 JavaScript 函数中调用
11. 以 use 开头命名:自定义 Hook 必须以 use 开头命名,以便 ESLint 插件识别
12. 依赖数组诚实原则useEffect、useMemo、useCallback 的依赖数组必须完整列出所有依赖项
### 2\.2\.2 useEffect 最佳实践
useEffect 是最常用的 Hook 之一,也是最容易滥用的。遵循以下实践:
```JavaScript
// 推荐:单一职责的 Effect
useEffect(() => {
const controller = new AbortController();
fetchUser(userId, { signal: controller.signal })
.then(setUser)
.catch(setError);
return () => controller.abort();
}, [userId]); // 依赖数组必须完整
// 推荐:逻辑拆分到独立 Effect
useEffect(() => {
// 数据获取逻辑
}, [params]);
useEffect(() => {
// DOM 操作或订阅逻辑
return () => { /* 清理逻辑 */ };
}, []);
// 禁止:缺失依赖项
useEffect(() => {
fetchData(page); // page 未在依赖数组中!
}, []); // eslint-disable-line 是临时方案,应尽快修复
```
### 2\.2\.3 useMemo 与 useCallback
性能优化 Hooks 应在有明确性能问题时使用,避免过早优化。遵循以下准则:
13. useMemo用于缓存昂贵的计算结果仅在计算成本显著高于缓存开销时使用
14. useCallback用于缓存事件处理函数主要配合 React\.memo 使用,避免子组件不必要的重渲染
15. 避免滥用:简单的计算和事件处理不需要 memoizationReact 的渲染性能通常优于预期
16. 依赖数组完整性:与 useEffect 同样,必须确保依赖数组的完整性
```JavaScript
// 推荐:复杂数据转换使用 useMemo
const filteredUsers = useMemo(() => {
return users
.filter(u => u.active)
.sort((a, b) => b.score - a.score)
.slice(0, 100);
}, [users]);
// 推荐:配合 React.memo 使用 useCallback
const handleSubmit = useCallback((values: FormData) => {
api.submit(values).then(onSuccess);
}, [onSuccess]);
// 禁止:对简单值使用 useMemo
const fullName = useMemo(
() => `$${firstName} $${lastName}`,
[firstName, lastName] // 字符串拼接成本极低,无需缓存
);
```
## 2\.3 状态管理规范
### 2\.3\.1 状态管理策略
React 应用的状态管理应按状态的作用域和复杂度选择适当的方案,避免过度工程化:
|**状态类型**|**管理方案**|**适用场景**|
|---|---|---|
|本地 UI 状态|useState|组件内部的临时状态,如表单输入、展开/收起|
|派生状态|useMemo / 计算|可从已有状态计算得出的值|
|共享状态|Context / Props|跨 2\-3 层组件传递的状态|
|全局状态|Zustand / Jotai|应用级共享状态,如用户信息、主题设置|
|服务端状态|TanStack Query|服务器数据缓存和同步|
|表单状态|React Hook Form|复杂表单的状态和验证管理|
### 2\.3\.2 Context 使用规范
React Context 适用于跨组件层级的数据传递,但不当使用会导致性能问题:
17. 拆分 Context将高频变化和低频变化的状态拆分到独立的 Context避免不必要的重渲染
18. 避免过度使用:仅在真正需要跨多级组件传递数据时使用,简单的父子组件通信仍应通过 Props
19. 结合 useReducer对于复杂状态逻辑Context 配合 useReducer 可以实现轻量级的 Redux\-like 方案
```TypeScript
// 推荐:拆分 Context 避免重渲染
const ThemeContext = createContext<Theme>('light');
const UserContext = createContext<User | null>(null);
// ThemeProvider 更新时,只消费 ThemeContext 的组件重渲染
// UserProvider 更新时,只消费 UserContext 的组件重渲染
// 推荐Context + useReducer 组合
type Action = { type: 'increment' } | { type: 'decrement' };
const CounterContext = createContext<{
state: number;
dispatch: React.Dispatch<Action>;
} | null>(null);
```
### 2\.3\.3 外部状态管理Zustand/Jotai
对于中大型企业级应用,推荐使用轻量级的原子化状态管理方案,如 Zustand 或 Jotai
20. Zustand适合模块化的 Store 架构API 极简,无 Provider 包裹问题
21. Jotai适合原子化的细粒度状态管理状态依赖自动追踪
22. 避免 Redux 过度使用:仅在需要 Redux DevTools 时间旅行调试、复杂中间件链时考虑 Redux Toolkit
## 2\.4 TypeScript 类型规范
### 2\.4\.1 组件 Props 类型
所有组件 Props 必须使用 TypeScript 接口显式定义,禁止使用 any 类型绕过类型检查:
```TypeScript
// 推荐:显式 Props 接口
interface ButtonProps {
variant?: 'primary' | 'secondary' | 'ghost';
size?: 'sm' | 'md' | 'lg';
disabled?: boolean;
loading?: boolean;
onClick?: (event: React.MouseEvent<HTMLButtonElement>) => void;
children: React.ReactNode;
}
export function Button({
variant = 'primary',
size = 'md',
disabled = false,
loading = false,
onClick,
children,
}: ButtonProps) {
// 实现...
}
// 禁止:隐式 any 或缺少类型
// function Button(props) { // 错误props 为 any
// return <button>{props.label}</button>;
// }
```
### 2\.4\.2 泛型组件
对于数据展示类组件,使用泛型实现类型安全的通用组件:
```TypeScript
// 推荐:泛型表格组件
interface DataTableProps<T> {
data: T[];
columns: ColumnDef<T>[];
keyExtractor: (item: T) => string;
onRowClick?: (item: T) => void;
}
export function DataTable<T>({
data, columns, keyExtractor, onRowClick,
}: DataTableProps<T>) {
return (
<table>
<tbody>
{data.map(item => (
<tr key={keyExtractor(item)}
onClick={() => onRowClick?.(item)}>
{columns.map(col => (
<td key={col.key}>{col.render(item)}</td>
))}
</tr>
))}
</tbody>
</table>
);
}
```
### 2\.4\.3 事件类型
React 事件处理函数应使用 React 提供的泛型事件类型,而非原生 DOM 事件类型:
|**事件类型**|**React 类型**|
|---|---|
|点击事件|React\.MouseEvent\<HTMLButtonElement\>|
|输入事件|React\.ChangeEvent\<HTMLInputElement\>|
|表单提交|React\.FormEvent\<HTMLFormElement\>|
|键盘事件|React\.KeyboardEvent\<HTMLInputElement\>|
|拖拽事件|React\.DragEvent\<HTMLDivElement\>|
|触摸事件|React\.TouchEvent\<HTMLDivElement\>|
|通用事件|React\.SyntheticEvent|
## 2\.5 性能优化规范
### 2\.5\.1 渲染优化
React 的渲染优化应从以下维度系统化地进行:
23. React\.memo对纯展示组件使用 React\.memo 进行浅比较优化,避免不必要的重渲染
24. useMemo / useCallback对昂贵的计算和传递给子组件的回调进行缓存
25. 虚拟列表:长列表使用 react\-window 或 react\-virtualized 实现虚拟滚动
26. 代码分割:使用 React\.lazy \+ Suspense 实现路由级别和组件级别的懒加载
```JavaScript
// 推荐React.memo + 自定义比较函数
export const UserList = React.memo(function UserList({
users,
onSelect,
}: UserListProps) {
return (
<ul>
{users.map(user => (
<UserItem key={user.id} user={user} onSelect={onSelect} />
))}
</ul>
);
}, (prev, next) => prev.users === next.users);
// 推荐React.lazy 代码分割
const Dashboard = React.lazy(() => import('./Dashboard'));
const Settings = React.lazy(() => import('./Settings'));
function App() {
return (
<Suspense fallback={<LoadingSpinner />}>
<Routes>
<Route path="/dashboard" element={<Dashboard />} />
<Route path="/settings" element={<Settings />} />
</Routes>
</Suspense>
);
}
```
### 2\.5\.2 状态更新优化
状态更新方式直接影响渲染性能,应遵循以下最佳实践:
27. 批量更新React 18 自动批处理所有状态更新,无需手动合并
28. 不可变数据:始终使用不可变更新模式,配合 useMemo/React\.memo 进行引用比较
29. 状态拆分:将独立变化的状态拆分为多个 useState避免不必要的联合更新
30. 派生状态:优先使用 useMemo 计算派生状态,避免在状态中存储可计算的值
# 三、HTML/CSS 规范
HTML 和 CSS 是前端开发的基础,良好的标记和样式实践是构建可维护应用的前提。
## 3\.1 HTML 语义化
语义化的 HTML 不仅有利于可访问性A11y也有助于 SEO 和代码的可读性:
31. 使用恰当的语义化标签header、nav、main、article、section、aside、footer
32. 表单元素必须关联 label使用 aria\-label 或 aria\-labelledby 补充描述
33. 图片必须提供有意义的 alt 文本,装饰性图片使用 alt=""
34. 遵循标题层级顺序h1 → h2 → h3不要跳级使用
## 3\.2 CSS 架构
推荐采用CSS Modules避免全局命名空间污染
```JavaScript
// 推荐CSS Modules
import styles from './Button.module.css';
export function Button({ children }) {
return <button className={styles.button}>{children}</button>;
}
/* Button.module.css */
.button {
padding: 8px 16px;
border-radius: 4px;
background: var(--color-primary);
color: white;
}
.button:hover {
background: var(--color-primary-dark);
}
```
## 3\.3 响应式设计
所有界面必须适配至少三种断点,采用移动优先的设计策略:
|**断点名**|**尺寸范围**|**适配策略**|
|---|---|---|
|Mobile|\< 768px|单列布局、触摸友好的交互、简化导航|
|Tablet|768px \- 1024px|双列布局、侧边栏可收起、适配触控|
|Desktop|\> 1024px|完整多列布局、 hover 交互、固定侧边栏|
# 四、JavaScript/TypeScript 通用规范
除 React 特定规范外,团队应遵循以下 JavaScript/TypeScript 通用编码规范。
## 4\.1 命名规范
35. 文件名PascalCase 用于组件文件UserCard\.tsxcamelCase 用于工具文件formatDate\.ts
36. 组件名PascalCase与文件名保持一致
37. Hook 名:以 use 开头,后跟 PascalCaseuseUserData
38. 常量UPPER\_SNAKE\_CASEMAX\_RETRY\_COUNT
39. 布尔变量:使用 is、has、should 等前缀isLoading、hasError
## 4\.2 代码风格
统一使用 ESLint \+ Prettier 进行代码格式化和质量检查,配置文件纳入版本控制:
```Java
// .eslintrc.cjs
module.exports = {
extends: [
'eslint:recommended',
'@typescript-eslint/recommended',
'plugin:react-hooks/recommended',
],
rules: {
'@typescript-eslint/no-explicit-any': 'error',
'@typescript-eslint/explicit-function-return-type': 'warn',
'react-hooks/exhaustive-deps': 'error',
},
};
// .prettierrc
{
"semi": true,
"singleQuote": true,
"tabWidth": 2,
"trailingComma": "all"
}
```
## 4\.3 类型安全
40. strict 模式TypeScript 配置必须启用 strict: true
41. 禁止 any原则上禁止使用 any 类型,必要时应使用 unknown 并配合类型收窄
42. 返回值类型:公共函数应显式声明返回值类型,利用类型推断的边界情况除外
43. 类型导出:组件 Props 接口应随组件一起导出,便于复用
# 五、工程化与项目结构
良好的项目结构和工程化配置是团队协作的基石。
## 5\.1 目录结构
推荐采用以下目录组织方式Feature\-based 结构优先:
```Python
src/
├── app/ # 路由页面Next.js / React Router
│ ├── layout.tsx
│ ├── page.tsx
│ └── dashboard/
│ └── page.tsx
├── components/ # 共享组件
│ ├── ui/ # 基础 UI 组件Button, Input, Modal
│ └── layout/ # 布局组件Header, Sidebar, Footer
├── features/ # 功能模块
│ └── auth/ # 认证功能
│ ├── api/ # API 请求
│ ├── components/ # 功能组件
│ ├── hooks/ # 功能 Hooks
│ ├── stores/ # 状态管理
│ └── types.ts # 功能类型
├── hooks/ # 全局共享 Hooks
├── lib/ # 工具库和配置
│ ├── api.ts # Axios 实例配置
│ └── utils.ts # 通用工具函数
├── types/ # 全局类型定义
└── styles/ # 全局样式和主题配置
```
## 5\.2 开发工作流
44. Git 分支策略:采用 Git Flow 或 Trunk\-based 开发,功能分支命名格式 feature/描述 或 fix/描述
45. 代码审查:所有代码变更必须通过 Pull Request 审查,至少 1 人 approving 后方可合并
46. 提交规范:遵循 Conventional Commits 规范feat:、fix:、docs:、refactor:、test: 等前缀)
47. CI/CD集成自动化测试、代码质量检查ESLint、TypeScript 编译检查)到 CI 流水线
# 六、性能优化与最佳实践
性能优化是前端开发的重要环节,应贯穿整个开发周期。
## 6\.1 加载性能
48. 资源压缩:启用 Gzip/Brotli 压缩,图片使用 WebP/AVIF 格式
49. 懒加载:路由、图片、非首屏组件均使用懒加载策略
50. 预加载:对关键资源使用 rel=preload对后续路由使用 rel=prefetch
51. Bundle 分析:定期使用 @next/bundle\-analyzer 或 webpack\-bundle\-analyzer 分析打包体积
## 6\.2 运行时性能
52. 避免频繁的状态更新使用防抖debounce和节流throttle控制高频事件
53. Web Workers将复杂计算 offload 到 Web Worker避免阻塞主线程
54. 内存管理:及时清理定时器、事件监听器和订阅,防止内存泄漏
55. 虚拟化:长列表使用虚拟滚动,大数据集使用分页或虚拟表格

30094
dictionary.txt Executable file

File diff suppressed because it is too large Load Diff

274
docs/api.md Normal file
View File

@ -0,0 +1,274 @@
# AstroResearch REST API Documentation / REST API 接口文档
AstroResearch 后端服务运行于 Rust Axum 框架之上,默认基准 URL 为 `http://localhost:8000/api`
---
## 1. 共享类型定义 (TypeScript Type Definitions)
为了前后端类型一致,以下是主要的 TypeScript 数据接口定义:
```typescript
// 标准文献元数据
export interface StandardPaper {
bibcode: string;
title: string;
authors: string[];
year: string;
pub_journal: string;
keywords: string[];
abstract_text: string;
doi: string;
arxiv_id: string;
citation_count: number;
reference_count: number;
is_downloaded: boolean;
has_markdown: boolean;
has_translation: boolean;
}
// 笔记记录
export interface NoteRecord {
id: number;
bibcode: string;
paragraph_index: number;
note_text: string;
highlight_color: string; // 'yellow' | 'green' | 'blue' | 'pink'
selected_text: string;
created_at: string;
}
```
---
## 2. 接口分模块详述 (API Endpoints)
### 2.1 检索与引文导出模块 (Search & Citations Export)
#### 2.1.1 跨源文献统一搜索
- **Endpoint**: `GET /api/search`
- **Description**: 同时从 NASA ADS 与 arXiv XML 接口检索文献,返回去重并标准化后的文献元数据。
- **Query Parameters**:
- `q` (string, required): 检索关键词。
- `source` (string, optional): 指定源,取值为 `all` | `ads` | `arxiv`,默认 `all`
- `rows` (number, optional): 返回条数限制。
- **Response Schema (`Vec<StandardPaper>`)**:
- HTTP `200 OK`
- **cURL 示例**:
```bash
curl -G "http://localhost:8000/api/search" \
--data-urlencode "q=Hertzsprung-Russell diagram" \
--data-urlencode "source=all"
```
#### 2.1.2 批量引文 BibTeX 导出
- **Endpoint**: `POST /api/export`
- **Description**: 将选中的文献 Bibcode 批量提交给 NASA ADS 接口,返回拼接的标准 BibTeX 文本。
- **Request Body**:
```json
{
"bibcodes": ["2024arXiv241011663H", "1984AJ.....89..374B"]
}
```
- **Response Schema**:
```json
{
"bibtex": "@ARTICLE{2024arXiv241011663H, ...}\n\n@ARTICLE{1984AJ.....89..374B, ...}"
}
```
- **cURL 示例**:
```bash
curl -X POST "http://localhost:8000/api/export" \
-H "Content-Type: application/json" \
-d '{"bibcodes": ["2024arXiv241011663H"]}'
```
---
### 2.2 馆藏管理与物理文件模块 (Library & Local Storage)
#### 2.2.1 获取馆藏文献列表
- **Endpoint**: `GET /api/library`
- **Description**: 查询本地 SQLite 数据库中已收藏入库的所有文献列表,后端会自动**实时感应物理文件是否存在**来修正 `is_downloaded` / `has_markdown` 等布尔状态。
- **Response Schema (`Vec<StandardPaper>`)**:
- HTTP `200 OK`
- **cURL 示例**:
```bash
curl "http://localhost:8000/api/library"
```
#### 2.2.2 触发并行文献下载
- **Endpoint**: `POST /api/download`
- **Description**: 触发后台线程拉取文献的 PDF 及 HTML。如果是 arXiv 来源优先官方 HTML 兜底 ar5iv并支持强制更新。
- **Request Body**:
```json
{
"bibcode": "2024arXiv241011663H",
"force": false
}
```
- **Response Schema (`StandardPaper`)**: Returns the updated paper structure with `is_downloaded: true`.
- **cURL 示例**:
```bash
curl -X POST "http://localhost:8000/api/download" \
-H "Content-Type: application/json" \
-d '{"bibcode": "2024arXiv241011663H", "force": true}'
```
#### 2.2.3 触发文献结构化解析
- **Endpoint**: `POST /api/parse`
- **Description**: 将本地下载的 HTML/PDF 清洗为 Markdown。支持 `force` 强制重新执行。
- **Request Body**:
```json
{
"bibcode": "2024arXiv241011663H",
"force": false
}
```
- **Response Schema**:
```json
{
"markdown": "# 论文标题\n\n## 1. 绪论\n..."
}
```
- **cURL 示例**:
```bash
curl -X POST "http://localhost:8000/api/parse" \
-H "Content-Type: application/json" \
-d '{"bibcode": "2024arXiv241011663H", "force": false}'
```
---
### 2.3 阅读器与翻译模块 (Reader & LLM Translation)
#### 2.3.1 获取文献阅读详情
- **Endpoint**: `GET /api/paper`
- **Description**: 获取某篇文献的标准元数据和已缓存的英文正文 Markdown 以及翻译后 Markdown。
- **Query Parameters**:
- `bibcode` (string, required): 文献唯一标识符。
- **Response Schema**:
```json
{
"paper": { ... },
"english_content": "# Abstract...", // 若未解析,返回 null
"translation_content": "# 摘要..." // 若未翻译,返回 null
}
```
- **cURL 示例**:
```bash
curl "http://localhost:8000/api/paper?bibcode=2024arXiv241011663H"
```
#### 2.3.2 触发 LLM 对照翻译
- **Endpoint**: `POST /api/translate`
- **Description**: 将英文 Markdown 提取本地词典名词注入 Glossary 提示词,并调用大模型进行学术翻译,最后写回本地物理文件并入库。
- **Request Body**:
```json
{
"bibcode": "2024arXiv241011663H",
"force": false
}
```
- **Response Schema**:
```json
{
"translation": "# 翻译结果..."
}
```
- **cURL 示例**:
```bash
curl -X POST "http://localhost:8000/api/translate" \
-H "Content-Type: application/json" \
-d '{"bibcode": "2024arXiv241011663H"}'
```
---
### 2.4 引文网络拓扑模块 (Citation Galaxy Map)
#### 2.4.1 查询文献的引文拓扑
- **Endpoint**: `GET /api/citations`
- **Description**: 获取某篇文献的参考文献 (References) 和施引文献 (Citations) 的 Bibcode 数组列表,用于渲染拓扑关系网。
- **Query Parameters**:
- `bibcode` (string, required): 目标文献 Bibcode。
- **Response Schema**:
```json
{
"bibcode": "2024arXiv241011663H",
"title": "...",
"citation_count": 12,
"reference_count": 48,
"references": ["bibcode1", "bibcode2"],
"citations": ["bibcode3", "bibcode4"]
}
```
- **cURL 示例**:
```bash
curl "http://localhost:8000/api/citations?bibcode=2024arXiv241011663H"
```
---
### 2.5 笔记高亮模块 (Notes & Highlights)
#### 2.5.1 创建笔记与高亮
- **Endpoint**: `POST /api/notes`
- **Description**: 对指定文献的特定段落位置创建高亮选段,并记录文字备注。
- **Request Body**:
```json
{
"bibcode": "2024arXiv241011663H",
"paragraph_index": 12,
"note_text": "这是一个重要的物理模型",
"highlight_color": "yellow", // 'yellow' | 'green' | 'blue' | 'pink'
"selected_text": "the standard model of galaxy formation"
}
```
- **Response Schema (`NoteRecord`)**: Returns the created note details containing auto-incremented `id` and creation timestamp.
- **cURL 示例**:
```bash
curl -X POST "http://localhost:8000/api/notes" \
-H "Content-Type: application/json" \
-d '{"bibcode": "2024arXiv241011663H", "paragraph_index": 12, "note_text": "My Note", "highlight_color": "green", "selected_text": "original text"}'
```
#### 2.5.2 获取单篇文献下的全部笔记
- **Endpoint**: `GET /api/notes`
- **Description**: 查询某篇文献关联的所有笔记。
- **Query Parameters**:
- `bibcode` (string, required): 目标文献。
- **Response Schema (`Vec<NoteRecord>`)**:
- HTTP `200 OK`
- **cURL 示例**:
```bash
curl "http://localhost:8000/api/notes?bibcode=2024arXiv241011663H"
```
#### 2.5.3 删除笔记
- **Endpoint**: `DELETE /api/notes`
- **Description**: 物理删除指定 ID 的笔记高亮记录。
- **Query Parameters**:
- `id` (number, required): 笔记记录的唯一自增 id。
- **Response Schema**:
```json
{
"status": "success"
}
```
- **cURL 示例**:
```bash
curl -X DELETE "http://localhost:8000/api/notes?id=5"
```
---
## 3. 常见 HTTP 状态码与异常处理 (Error Codes)
系统基于标准的 HTTP Status Codes 返回错误原因,响应的 Response Body 中通常为纯文本提示String
| 状态码 | 错误类型 | 触发常见场景及原因说明 |
| :--- | :--- | :--- |
| **`400 Bad Request`** | 业务请求不合规 | - 文献未下载/解析却直接调用 `translate`<br>- 未在 `.env` 中提供 `ADS_API_KEY` 时调用 `export`。 |
| **`404 Not Found`** | 资源未找到 | - 数据库中没有该 Bibcode 的收藏记录。 |
| **`500 Internal Error`**| 服务器内部错误 | - 第三方 LLM / ADS 接口通信超时或返回异常。<br>- 本地磁盘 IO 失败(如写入文件权限受阻)。<br>- 数据库查询异常。 |

228
docs/architecture.md Normal file
View File

@ -0,0 +1,228 @@
# AstroResearch Architecture / 架构设计
AstroResearch 是一个集成了天文学文献检索、双通道下载、结构化解析、中英学术对比翻译以及引文星系图谱的天文科研辅助系统。
## 1. 整体架构 (Overall Architecture)
AstroResearch 采用 **C/S (Client-Server)** 架构,由前端 React 单页应用和后端 Axum HTTP 服务构成,核心流程及层级如下:
```mermaid
graph TD
subgraph Frontend ["React 前端 (Port 5173 / 8000)"]
UI[仪表盘 UI / ReaderPanel]
Canvas[引文 Canvas 拓扑图]
API_Client[Axum API 客户端]
end
subgraph Backend ["Rust Axum 后端 (Port 8000)"]
Router[Axum 路由与中间件]
Handlers[业务处理器 handlers.rs]
Parser[解析器 parser.rs]
Downloader[下载器 download.rs]
Translator[翻译器 translation.rs]
Qiniu[七牛云客户端 qiniu.rs]
DB[("SQLite / astro_research.db")]
end
subgraph External [外部第三方服务]
ADS[NASA ADS API]
arXiv[arXiv Atom XML API]
MinerU[MinerU PDF 解析服务]
QiniuCDN[七牛云对象存储 CDN]
LLM[LLM API]
end
UI -->|用户操作| API_Client
API_Client -->|RESTful APIs| Router
Router --> Handlers
Handlers -->|查询/保存元数据| DB
Handlers -->|文献下载| Downloader
Handlers -->|结构化清洗| Parser
Handlers -->|LLM学术翻译| Translator
Downloader -->|代理请求| ADS
Downloader -->|直连或 ar5iv| arXiv
Parser -->|图文降级解析| MinerU
Parser -->|托管插图| Qiniu
Qiniu -->|上传图片| QiniuCDN
Translator -->|天文术语翻译| LLM
Canvas -->|引文网络请求| Handlers
```
---
## 2. 核心工作流 (Core Workflows)
### 2.1 文献下载流程 (Download Flow)
本流程实现了文献的双通道流式下载,支持多级回退以及安全反爬防线绕过,其详细步骤与交互如下:
```mermaid
sequenceDiagram
participant U as 用户 (React 前端)
participant H as 处理器 (handlers.rs)
participant D as 下载器 (download.rs)
participant DB as 本地数据库 (SQLite)
U->>H: 1. 发起下载请求 (POST /api/download, 含 bibcode, force)
H->>DB: 2. 查询文献元数据 (获取 arxiv_id, doi 等)
alt force == true
H->>DB: 3. 重置本地下载路径字段为 NULL
end
H->>D: 4. 调度下载器执行物理拉取
alt 文献含有 arxiv_id (通道 AarXiv 直连优先)
D->>D: 5a. 去除版本号 (strip_arxiv_version, v2 -> 无版本)
D->>D: 5b. 随机延时 (maybe_delay: 500-2000ms) 并伪装 UA
D->>D: 5c. 下载 PDF 并校验文件头 (%PDF + %%EOF)
D->>D: 5d. 优先请求官方 HTML (arxiv.org/html/)
note over D: 若官方 HTML 返回 404/错误
D->>D: 5e. 自动降级回退请求 ar5iv HTML (ar5iv.labs.arxiv.org)
D->>D: 5f. 校验 HTML 内容 (detect_anti_bot 检测反爬)
else 无 arxiv_id (通道 BADS 路由回退)
D->>D: 6a. 跟踪 ADS Link Gateway 重定向路由
note over D: 若遇到 validate.perfdrive.com 拦截
D->>D: 6b. 自动解析并解码 ssc 参数提取直链
note over D: 若指向 IOPscience / Springer
D->>D: 6c. IOP 专属策略:预热主页写入 Cookie带 Referer 下载 PDF
D->>D: 6d. Springer 专属策略:使用 Chrome 头下载 HTML 页
note over D: 若网关均失败且存在 DOI
D->>D: 6e. CrossRef 兜底:请求 CrossRef API 获取 PDF URL 并直连下载
end
D-->>H: 7. 返回下载好的本地物理 PDF & HTML 路径
H->>DB: 8. 更新 pdf_path & html_path 记录
H-->>U: 9. 返回最新文献状态 (is_downloaded: true)
```
#### 详细下载说明:
1. **指令接收与校验**:后端 `download_paper` 接口在 `force` 参数为 `true` 时,会强行擦除数据库中已下载的文件路径,启动无缓存的物理文件重新拉取。
2. **下载反爬伪装**:下载器 `Downloader` 请求时采用动态生成的 Firefox/Chrome 轮换 User-Agent并在每次 HTTP 访问前强制加入随机休眠机制500ms - 2000ms模拟人类自然阅读行为。
3. **内容完整性校验**
- 对 PDF 严格校验前四个字节(必须是 `%PDF`)以及尾部检索(必须包含 `%%EOF` 终止符),排查登录墙、错误页伪装成 PDF 导致下载坏文件的问题。
- 对 HTML 文本利用 `detect_anti_bot` 流水线过滤 "cloudflare"、"captcha"、"robot check" 等拦截特征。
---
### 2.2 文献解析流程 (Parse Flow)
本流程负责将本地下载的 HTML 或 PDF 转换为高保真的 Markdown。其详细步骤与交互如下
```mermaid
sequenceDiagram
participant U as 用户 (React 前端)
participant H as 处理器 (handlers.rs)
participant P as 解析器 (parser.rs)
participant M as MinerU (PDF解析服务)
participant Q as 七牛云 (对象存储)
participant DB as 本地数据库 (SQLite)
U->>H: 1. 发起解析请求 (POST /api/parse, 含 bibcode, force)
H->>DB: 2. 查询文献物理路径 (pdf_path, html_path, markdown_path)
alt force == false 且本地已存在 Markdown 物理缓存
H->>H: 3. 读取本地 Markdown 物理文件
H-->>U: 4. 直接返回缓存 Markdown流程结束
end
H->>P: 5. 触发结构化文献解析
alt 本地存在 HTML 文件
P->>P: 6a. 剥离广告/导航栏与尾页页脚噪声
P->>P: 6b. 公式保护:利用占位符隔离 MathJax/LaTeX 公式段
P->>P: 6c. 标签规范:还原 LaTeXML 特定 span 为标准 table/tr/td修正上下标
P->>P: 6d. 插图处理:把相对图像路径替换为绝对 CDN 外链地址
P->>P: 6e. 转换 GFM Markdown 并恢复 LaTeX 公式
P->>P: 6f. 后处理:清除冗余的 margin 空白与前导缩进
else 仅有 PDF 文件 (PDF 降级解析)
P->>M: 7a. Multipart 格式上传 PDF 至 MinerU 服务
M-->>P: 7b. 返回大模型解析出的 Markdown 文本及插图包
loop 遍历每一个提取的插图
P->>Q: 7c. 上传插图文件并获取七牛云 CDN 域名外链
end
P->>P: 7d. 在 Markdown 中重写插图链接为七牛云 CDN 绝对路径
end
P-->>H: 8. 返回清洗转换出的标准英文 Markdown 文本
H->>P: 9. 写入本地物理缓存 Markdown/ 目录
H->>DB: 10. 更新数据库 markdown_path 记录
H-->>U: 11. 返回标准 Markdown 内容渲染展示
```
#### 详细解析说明:
1. **HTML 转换为 Markdown 保护公式**:由于 MathJax/LaTeX 在 Markdown 转换中极易被当成普通字符进行转义(例如 `_` 倾斜或 `\` 换行失效),解析器在 HTML 解析前,通过正则将 `$` / `$$``\(` / `\[` 中的内容全部替换为特定的 UUID 占位符,转换为标准 Markdown 之后,再反向替换恢复公式,确保 LaTeX 渲染无损。
2. **PDF 复杂排版降级**:遇到无法直接提取 HTML 的老文献时,调用 MinerU 进行布局分析与公式提取,配合七牛云对象存储实现插图的自动提取、自动图床托管与正文自动替换回写。
---
### 2.3 智能对照翻译流程 (Translation Flow)
本流程实现了基于天文学专属词汇表的 LLM 专业对比翻译,其详细步骤与交互如下:
```mermaid
sequenceDiagram
participant U as 用户 (React 前端)
participant H as 处理器 (handlers.rs)
participant T as 翻译器 (translation.rs)
participant D as 天文词典 (dictionary.rs)
participant L as 大模型 (LLM API)
participant DB as 本地数据库 (SQLite)
U->>H: 1. 请求文献对比翻译 (POST /api/translate, 含 bibcode, force)
H->>DB: 2. 查询文献路径及状态
alt force == false 且本地已存在翻译缓存文件
H->>H: 3. 读取本地 Translation/{bibcode}_zh.md 物理文件
H-->>U: 4. 直接返回缓存译文,流程结束
end
H->>H: 5. 读取对应的英文解析 Markdown 物理文件
H->>T: 6. 调度翻译器执行翻译工作流
T->>D: 7. 加载本地 dictionary.txt 并初始化 Trie 树结构
T->>D: 8. 执行英文 Markdown 文本分词匹配
D->>D: 9a. 进行前缀匹配检索
D->>D: 9b. 遵循“最长匹配优先”原则,过滤子词去重
D-->>T: 10. 返回该篇文献提取出的天文学名词对照 (Glossary)
loop 针对英文 Markdown 进行段落分块 (Token 长度控制)
T->>L: 11. 携带 Glossary + 英文原文段落发送 Prompt 请求
note over L: LLM 遵循系统 Prompt 约束:<br>1. 专业词汇严格对应 Glossary 译出<br>2. 严禁改变 LaTeX 公式及 Markdown 标签<br>3. 保持中英段落高度对齐
L-->>T: 12. 返回学术级双语对照翻译段落
end
T->>T: 13. 拼接所有段落,生成完整的对照 Markdown
T->>H: 14. 写入本地物理缓存 Translation/ 目录
H->>DB: 15. 更新数据库中的 translation_path 字段
H-->>U: 16. 返回翻译后 Markdown 渲染展示
```
#### 详细步骤说明:
1. **分级翻译缓存机制**
- 第一级缓存:若未开启 `force` 且本地物理磁盘已存在对应翻译文件,直接读取并返回,避免不必要的 LLM API 调用消耗。
- 第二级缓存:必须先完成英文 Markdown 的结构化解析,否则接口返回 `400` 错误,引导用户先进行正文解析。
2. **基于 Trie 树的天文学名词提取**
- 字典类 `Dictionary` 会加载包含数十万词条的本地天文词表 `dictionary.txt`
- 为防止短词覆盖长词(如 `Hertzsprung` 覆盖 `Hertzsprung-Russell diagram`),分词匹配采用 Trie 树的最长前缀匹配。若匹配到长词,自动忽略其包含的子词。
- 最终只保留文献中真实出现的名词并去重,以 JSON 的形式构建为专有提示词Glossary注入 LLM 提示中。
3. **LLM 强约束 Prompt 设计**
- 在向大模型发送请求时,利用 System Prompt 声明其“天文学专业翻译家”的角色。
- 强制约定格式要求:所有的 LaTeX 公式(`$` / `$$`必须原封不动保留Markdown 的标题(`#`)、列表(`-`)、加粗(`**`)等语法严禁破坏,使前端可以无缝解析双语结构并左右对齐渲染。
---
## 3. 核心模块说明
- **[src/download.rs](../src/download.rs)**:
- 包含浏览器头伪装与请求延迟控制。
- 处理 ADS Link Gateway 路由重定向追踪与 `validate.perfdrive.com` 防护解码绕过。
- 实现官方 `arxiv.org/html` 优先及 `ar5iv` 兜底,自动去除版本号后缀。
- **[src/parser.rs](../src/parser.rs)**:
- 实现 HTML 语法树向 GFM Markdown 的逆向转换,使用占位符保护机制防止 MathJax/LaTeX 公式被误解析。
- 统一相对图表链接,并集成 MinerU PDF 解析。
- **[src/translation.rs](../src/translation.rs)**:
- 利用本地千万字级别的天文学双语词典对原文进行分词匹配,注入系统提示词让 LLM 实现学术级精细翻译。
- **[dashboard/src/components/CitationGalaxyCanvas.tsx](../dashboard/src/components/CitationGalaxyCanvas.tsx)**:
- 基于原生 HTML5 Canvas 开发的轻量级、高性能力导向图星系物理引擎,用于文献引文网络拓扑结构的可视化渲染。

60
docs/contributing.md Normal file
View File

@ -0,0 +1,60 @@
# AstroResearch Contributing Guide / 参与贡献
我们欢迎社区共同参与 AstroResearch 的开发与优化。以下是关于本地开发调试、代码规范和测试的说明。
---
## 1. 开发者本地环境搭建 (Developer Setup)
### 后端开发环境 (Rust)
1. 准备 Rust 工具链 (Edition 2021)。
2. 安装 SQLx CLI可选用于生成迁移文件
```bash
cargo install sqlx-cli --no-default-features --features sqlite
```
3. 启动开发模式下的 Rust 服务:
```bash
cargo run
```
### 前端开发环境 (React + TypeScript)
1. 进入 `dashboard` 目录,安装依赖:
```bash
cd dashboard
npm install
```
2. 启动开发服务器(支持 HMR 热更新及 API 请求代理转发):
```bash
npm run dev
```
---
## 2. 编码规范 (Coding Style Guidelines)
### Rust 规范 (Backend)
- 遵循 Rust 官方标准样式,提交前必须执行 `cargo fmt``cargo clippy`
- 注释和系统日志建议统一使用中文,便于开发者追踪和阅读。
- API handers 中的异常信息请使用 `anyhow``thiserror` 进行结构化抛出。
### React & TypeScript 规范 (Frontend)
- 严格遵循 `React 18/19` 函数式组件写法,使用 React Hooks 维护状态。
- 为保证生产编译成功,务必开启类型安全限制(如在导入纯类型时显式使用 `import type { ... }`)。
- CSS 层面使用 Tailwind CSS 统一的磨砂玻璃体 (Glassmorphism) 及响应式布局,所有间距、颜色严格使用 CSS 变量控制以支持主题切换。
---
## 3. 测试与验证 (Testing)
### 运行后端单元测试
系统为各个下载、解析、词典分词、接口提取等模块设计了健全的测试。运行测试命令:
```bash
cargo test
```
### 运行前端校验
```bash
cd dashboard
npm run build # 运行 TypeScript 类型检查及 Vite 打包编译
```
确保无编译 Error 或 Warn 警告后方可提交 PR。

77
docs/database.md Normal file
View File

@ -0,0 +1,77 @@
# AstroResearch Database Schema / 数据库设计
AstroResearch 使用轻量级、零配置的 **SQLite** 数据库作为持久化存储。数据库文件默认保存在项目根目录下的 `astro_research.db`,由 Rust 中的 `sqlx` 驱动管理并自动执行迁移。
---
## 1. 实体关系图 (Entity-Relationship Diagram)
```mermaid
erDiagram
PAPERS {
text bibcode PK
text title
text authors "JSON Array"
text year
text pub "Journal/Publisher"
text keywords "JSON Array"
text abstract
text doi
text arxiv_id
integer citation_count
integer reference_count
text pdf_path
text html_path
text markdown_path
text translation_path
datetime created_at
}
NOTES {
integer id PK
text bibcode FK
integer paragraph_index
text note_text
text highlight_color
text selected_text
datetime created_at
}
CITATIONS_REFERENCES {
text source_bibcode PK
text target_bibcode PK
}
PAPERS ||--o{ NOTES : "has"
PAPERS ||--o{ CITATIONS_REFERENCES : "cites / cited_by"
```
---
## 2. 数据表结构详述 (Table Schema Details)
### 2.1 papers 表 (文献元数据)
存储文献的核心元数据和本地物理存储路径。
- **索引**
- `idx_papers_doi` -> 基于 `doi`
- `idx_papers_arxiv_id` -> 基于 `arxiv_id`
### 2.2 citations_references 表 (引文与参考文献拓扑)
多对多关联表,存储文献之间的引用网络(即拓扑星系图的基础数据)。
- **复合主键**`(source_bibcode, target_bibcode)`
- **索引**
- `idx_citations_ref_source` -> 优化以 `source_bibcode` 查询参考文献
- `idx_citations_ref_target` -> 优化以 `target_bibcode` 查询被引文献
### 2.3 notes 表 (高亮与阅读笔记)
存储学者在阅读器中对特定段落创建的高亮和笔记。
- **外键**`bibcode` 级联删除 (`ON DELETE CASCADE`)。
- **索引**
- `idx_notes_bibcode` -> 优化单篇文献的笔记列表查询。
---
## 3. 数据库迁移说明
迁移脚本存放在 `migrations/` 下,服务启动时(`src/main.rs`)会自动调用 `sqlx::migrate!().run(&pool).await` 自动部署:
1. `20260608000000_init.sql`:初始化 `papers``citations_references` 结构。
2. `20260608000001_notes.sql`:添加 `notes` 笔记高亮表,并为关联建立级联删除。

46
docs/deployment.md Normal file
View File

@ -0,0 +1,46 @@
# AstroResearch Deployment Guide / 部署指南
AstroResearch 的后端服务是由 Rust 编译出的单执行文件,它内置托管了前端 React 的静态构建资源,因此生产部署十分简单。
---
## 1. 系统要求与环境依赖 (Requirements)
- **操作系统**Linux / macOS / Windows
- **运行环境**
- Node.js (v18+) 用以构建前端 React 资源
- Rust (1.75+) 用以编译后端 Axum 进程
- SQLite (自动内置,无需单独部署)
---
## 2. 生产构建步骤 (Production Build Steps)
### 步骤 1构建 React 前端静态资源
进入 `dashboard` 文件夹,安装依赖并执行编译命令。编译产物会自动输出在 `dashboard/dist` 目录下:
```bash
cd dashboard
npm install
npm run build
```
### 步骤 2编译 Rust 后端二进制文件
返回项目根目录,通过 Cargo 构建 Release 版本的执行文件。编译后的程序会内置链接 `dashboard/dist` 下的全部静态资源:
```bash
cd ..
cargo build --release
```
编译产物位于 `target/release/astroresearch`
---
## 3. 服务部署与启动 (Running in Production)
1. 将编译出来的 `target/release/astroresearch` 二进制文件部署到目标服务器。
2. 在二进制文件同一目录下,创建并填写 `.env` 环境变量配置文件(可从根目录的 `.env.example` 复制模板)。
3. 确保本地相对路径下拥有天文对照词典文件 `dictionary.txt`
4. 运行后端服务:
```bash
./astroresearch
```
5. 进程将默认在后台启动并监听 `http://localhost:8000` 端口。你可以通过 Nginx 将此端口反向代理到公网 80/443 端口。

44
docs/design.md Normal file
View File

@ -0,0 +1,44 @@
# AstroResearch Design Systems / 设计系统与交互体验
AstroResearch 的前端界面设计坚持“未来科技感与学术沉浸”的理念,结合了现代网页设计的高级质感。
---
## 1. 视觉系统 (Visual Palette)
### 1.1 精致双色主题
AstroResearch 完美适配了深色与浅色模式。使用精挑细选的 HSL 柔和色彩代替刺眼的饱和色:
| 模式 | 背景色 | 主文本色 | 卡片容器 | 毛玻璃效果 (Glassmorphism) |
| :--- | :--- | :--- | :--- | :--- |
| **深色模式** | 深夜极光黑 (`#090d16`) | 纯净雪白 (`#f8fafc`) | 磨砂深灰 (`bg-slate-900/60`) | 边框: `border-slate-800/80`, 模糊: `backdrop-blur-md` |
| **浅色模式** | 雅致灰石色 (`#f8fafc`) | 深石板色 (`#0f172a`) | 磨砂亮白 (`bg-white/60`) | 边框: `border-slate-200/80`, 模糊: `backdrop-blur-md` |
---
## 2. 核心交互组件 (Key Interactive Components)
### 2.1 引文星系图谱 (Citation Galaxy Map)
- **底层技术**:完全脱离第三方庞大的 D3/G6 依赖,基于 HTML5 `<canvas>` 开发的自研力导向算法。
- **物理特性**:支持节点排斥力、中心引力、拖拽阻尼,双击节点可以动态多层级向外衍生(最高限制 50 节点以防止布局凌乱)。
- **色彩微效**:中心节点使用亮色光晕,参考文献与被引文献用渐变飞线标出,鼠标滑过产生平滑的高亮微动特效。
```mermaid
graph LR
C((中心文献)) -->|Cites| R1((参考文献 A))
C -->|Cites| R2((参考文献 B))
B1((被引文献 X)) -->|Cites| C
B2((被引文献 Y)) -->|Cites| C
classDef center fill:#ec4899,stroke:#db2777,stroke-width:2px;
classDef ref fill:#3b82f6,stroke:#2563eb,stroke-width:1px;
classDef cite fill:#10b981,stroke:#059669,stroke-width:1px;
class C center;
class R1,R2 ref;
class B1,B2 cite;
```
### 2.2 双分栏阅读器 (Split Reader)
- **结构化排版**:中英文双栏段落基准对齐,完美融合 `rehype-katex` 数学公式渲染和 `html2md` 图片嵌入。
- **划词标注与高亮**:鼠标选中阅读器任意段落词句,即刻浮现气泡菜单(支持 4 种高亮配色)。
- **浮动词汇浮屠**:检测到英文正文中含有天文学专业词汇时,自动显示下划线,悬浮可阅读中文释义对照。

44
docs/troubleshooting.md Normal file
View File

@ -0,0 +1,44 @@
# AstroResearch Troubleshooting / 常见问题与排障指南
在使用 AstroResearch 过程中可能遇到的问题及排障步骤如下:
---
## 1. 文献下载相关问题 (Download Issues)
### 1.1 下载任务遇到 "检测到 Cloudflare / 人机验证页面"
- **原因**:部分出版商对频繁的自动化下载请求实施了高强度的 IP 拦截与 CF 校验。
- **解决方法**
1. 系统目前已经实现每两次请求间随机延迟 `maybe_delay()` (500ms~2000ms),以防行为过于机械化。
2. 若拦截频繁,可以尝试在本地配置代理;或者检查 `.env` 中的 `LIBRARY_DIR` 路径是否正确。
3. 对于 ADS Link Gateway 路由,若跳转至 `validate.perfdrive.com`,下载器内置了解码 `ssc` 提取直链的策略,该过程自动进行,如果由于其加密机制变更导致提取失效,系统控制台会输出 `warn` 日志。
### 1.2 官方 HTML (arxiv.org/html) 下载返回 404
- **原因**arXiv 官方 HTML 正文服务仅在 **2023年12月** 之后提交的论文中默认提供。对于老文献,直接请求官方 HTML 会返回 404。
- **解决机制**AstroResearch 的 `download_arxiv_html_with_fallback` 会在官方 HTML 请求失败时,**自动无缝降级回退**到 `ar5iv.labs.arxiv.org` 服务进行拉取。
---
## 2. 文献解析与翻译问题 (Parse & Translation Issues)
### 2.1 翻译请求返回空或报错 "LLM API KEY Missing"
- **原因**:根目录下没有配置正确的 `.env` 文件,或者 LLM 提供的 Endpoint/Model 有误。
- **排查步骤**
1. 确认项目根目录下存在 `.env` 且拥有 `LLM_API_KEY``LLM_API_BASE` 配置。
2. 使用终端运行 `cargo run`,检查启动日志中是否有关于读取环境配置的警告信息。
### 2.2 PDF 解析缺少图表或公式损坏
- **原因**PDF 格式本身不支持结构化语义。直接提取文本会丢失公式和图表。
- **解决机制**
1. 如果文献有 HTML/ar5iv 格式,系统会自动优先基于 HTML 解析,保留完美的 LaTeX 公式。
2. 若该文献只有 PDF 格式,系统自动降级调用本地或远程的 **MinerU** PDF 图文大模型解析。请确保本地 MinerU 解析服务已按照 API 格式运行并在 `.env` 中正确填入 `MINERU_API_URL`
---
## 3. 数据库与运行环境问题 (Runtime & DB Issues)
### 3.1 启动提示 "Database Migration Failed"
- **原因**:本地 SQLite 数据库文件 `astro_research.db` 出现并发锁死或版本 schema 冲突。
- **解决方法**
1. 备份并临时删除根目录下的 `astro_research.db` 数据库文件。
2. 重新启动服务:`cargo run`,系统将重新执行 `migrations/` 下的全部 SQL 迁移脚本以建立最新库结构。

View File

@ -0,0 +1,32 @@
-- migrations/20260608000000_init.sql
CREATE TABLE IF NOT EXISTS papers (
bibcode TEXT PRIMARY KEY,
title TEXT NOT NULL,
authors TEXT, -- JSON array of author names
year TEXT,
pub TEXT, -- journal / publisher
keywords TEXT, -- JSON array of keywords
abstract TEXT,
doi TEXT,
arxiv_id TEXT,
citation_count INTEGER DEFAULT 0,
reference_count INTEGER DEFAULT 0,
pdf_path TEXT,
html_path TEXT,
markdown_path TEXT,
translation_path TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS citations_references (
source_bibcode TEXT NOT NULL,
target_bibcode TEXT NOT NULL,
PRIMARY KEY (source_bibcode, target_bibcode)
);
-- Indexes for performance
CREATE INDEX IF NOT EXISTS idx_papers_doi ON papers(doi);
CREATE INDEX IF NOT EXISTS idx_papers_arxiv_id ON papers(arxiv_id);
CREATE INDEX IF NOT EXISTS idx_citations_ref_source ON citations_references(source_bibcode);
CREATE INDEX IF NOT EXISTS idx_citations_ref_target ON citations_references(target_bibcode);

View File

@ -0,0 +1,15 @@
-- migrations/20260608000001_notes.sql
-- 笔记/高亮表:每条记录对应一篇文献中某个段落的标注笔记
CREATE TABLE IF NOT EXISTS notes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
bibcode TEXT NOT NULL,
paragraph_index INTEGER NOT NULL, -- 在文章 Markdown 段落中的序号(从 0 开始)
note_text TEXT NOT NULL DEFAULT '',
highlight_color TEXT NOT NULL DEFAULT 'yellow', -- 'yellow' | 'green' | 'blue' | 'pink'
selected_text TEXT NOT NULL DEFAULT '', -- 被高亮选中的原始文本片段
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (bibcode) REFERENCES papers(bibcode) ON DELETE CASCADE
);
CREATE INDEX IF NOT EXISTS idx_notes_bibcode ON notes(bibcode);

37
src/README.md Normal file
View File

@ -0,0 +1,37 @@
# AstroResearch Backend / 后端服务模块
本模块是 AstroResearch 的后端部分,基于 **Rust + Axum + SQLx (SQLite)** 构建。
---
## 1. 代码结构说明 (Source Code Structure)
- **[main.rs](main.rs)**:服务启动入口,注册全局 CORS 中间件,连接 SQLite 数据库并运行初始化 SQL 迁移。
- **[config.rs](config.rs)**:使用 `dotenvy` 解析本地 `.env` 环境变量并进行有效性校验。
- **[handlers.rs](handlers.rs)**:处理 Axum API 路由的分发与核心业务逻辑。
- **[download.rs](download.rs)**:智能下载器,处理多级回退及安全拦截绕过。
- **[parser.rs](parser.rs)**GFM Markdown 结构化文献转换器,对 LaTeX 公式实施占位符保护。
- **[translation.rs](translation.rs)**:分词提取天文学专业对照名词,并组合系统提示词调用大模型进行学术翻译。
- **[dictionary.rs](dictionary.rs)**:高性能分词字典,基于 Trie 树的最长前缀匹配。
- **[ads.rs](ads.rs)**NASA ADS 接口适配器。
- **[arxiv.rs](arxiv.rs)**arXiv XML Atom 适配器。
- **[qiniu.rs](qiniu.rs)**:七牛云上传客户端,处理 MinerU PDF 解析产出插图的对象存储托管。
---
## 2. 单元测试 (Testing)
后端各核心处理函数与服务都编写了单元测试。你可以通过以下命令在本地执行所有的单元测试:
```bash
cargo test
```
---
## 3. 本地运行 (Usage)
确保当前目录的父目录(项目根目录)下已正确配置 `.env``dictionary.txt` 文件,然后在项目根目录下运行:
```bash
cargo run
```
服务将在 `http://localhost:8000` 启动,并自动在父目录生成或读取 `astro_research.db` 数据库。

166
src/ads.rs Normal file
View File

@ -0,0 +1,166 @@
// src/ads.rs
use serde::{Deserialize, Serialize};
use reqwest::header::{HeaderMap, HeaderValue, AUTHORIZATION, CONTENT_TYPE};
use tracing::{info, error};
// 原始 ADS API 返回的数据文档结构
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AdsPaperDoc {
pub bibcode: String,
pub title: Option<Vec<String>>,
pub author: Option<Vec<String>>,
pub year: Option<String>,
#[serde(rename = "pub")]
pub pub_journal: Option<String>,
pub keyword: Option<Vec<String>>,
pub abstract_text: Option<String>,
pub doi: Option<Vec<String>>,
pub citation_count: Option<i32>,
pub reference_count: Option<i32>,
pub reference: Option<Vec<String>>,
pub citation: Option<Vec<String>>,
pub identifier: Option<Vec<String>>,
}
#[derive(Debug, Deserialize)]
pub struct AdsResponseDocs {
pub docs: Vec<AdsPaperDoc>,
}
#[derive(Debug, Deserialize)]
pub struct AdsSearchResponse {
pub response: AdsResponseDocs,
}
#[derive(Debug, Deserialize)]
pub struct AdsExportResponse {
pub export: String,
}
// ADS API 服务客户端
pub struct AdsClient {
api_key: String,
client: reqwest::Client,
}
impl AdsClient {
pub fn new(api_key: String) -> Self {
AdsClient {
api_key,
client: reqwest::Client::new(),
}
}
// 拼装鉴权 Header
fn headers(&self) -> HeaderMap {
let mut headers = HeaderMap::new();
headers.insert(
AUTHORIZATION,
HeaderValue::from_str(&format!("Bearer {}", self.api_key)).unwrap_or_else(|_| HeaderValue::from_static("")),
);
headers.insert(CONTENT_TYPE, HeaderValue::from_static("application/json"));
headers
}
// 调用 ADS 检索接口获取文献元数据列表
pub async fn search(&self, query: &str, rows: i32) -> anyhow::Result<Vec<AdsPaperDoc>> {
let url = "https://api.adsabs.harvard.edu/v1/search/query";
// fl 声明返回字段,包括 reference 和 citation 引用关系数组及 identifier
let fl = "bibcode,title,author,year,pub,keyword,abstract,doi,citation_count,reference_count,reference,citation,identifier";
info!("正在发送检索请求到 ADS 平台: 查询词='{}', 数量={}", query, rows);
let response = self.client
.get(url)
.headers(self.headers())
.query(&[("q", query), ("rows", &rows.to_string()), ("fl", fl)])
.send()
.await?;
if !response.status().is_success() {
let status = response.status();
let err_body = response.text().await.unwrap_or_default();
error!("ADS 检索请求失败: 状态码={}, 返回错误={}", status, err_body);
return Err(anyhow::anyhow!("ADS API 接口返回错误码: {}", status));
}
let raw_res: RawSearchResponse = response.json().await?;
let docs = raw_res.response.docs.into_iter().map(|d| {
AdsPaperDoc {
bibcode: d.bibcode,
title: d.title,
author: d.author,
year: d.year,
pub_journal: d.pub_journal,
keyword: d.keyword,
abstract_text: d.abstract_field,
doi: d.doi,
citation_count: d.citation_count,
reference_count: d.reference_count,
reference: d.reference,
citation: d.citation,
identifier: d.identifier,
}
}).collect();
Ok(docs)
}
// 调用 ADS Export 接口导出 BibTeX 文本内容
pub async fn export_bibtex(&self, bibcodes: Vec<String>) -> anyhow::Result<String> {
let url = "https://api.adsabs.harvard.edu/v1/export/bibtex";
info!("正在向 ADS 请求导出 {} 篇文献的 BibTeX 数据", bibcodes.len());
let payload = serde_json::json!({
"bibcode": bibcodes
});
let response = self.client
.post(url)
.headers(self.headers())
.json(&payload)
.send()
.await?;
if !response.status().is_success() {
let status = response.status();
let err_body = response.text().await.unwrap_or_default();
error!("ADS 导出 BibTeX 失败: 状态码={}, 返回信息={}", status, err_body);
return Err(anyhow::anyhow!("ADS 导出接口返回错误码: {}", status));
}
let res_data: AdsExportResponse = response.json().await?;
Ok(res_data.export)
}
}
// 内部反序列化辅助结构,防止由于 abstract/pub 关键字冲突导致编译失败
#[derive(Debug, Deserialize)]
struct RawDoc {
bibcode: String,
title: Option<Vec<String>>,
author: Option<Vec<String>>,
year: Option<String>,
#[serde(rename = "pub")]
pub_journal: Option<String>,
keyword: Option<Vec<String>>,
#[serde(rename = "abstract")]
abstract_field: Option<String>,
doi: Option<Vec<String>>,
citation_count: Option<i32>,
reference_count: Option<i32>,
reference: Option<Vec<String>>,
citation: Option<Vec<String>>,
identifier: Option<Vec<String>>,
}
#[derive(Debug, Deserialize)]
struct RawSearchResponse {
response: RawDocs,
}
#[derive(Debug, Deserialize)]
struct RawDocs {
docs: Vec<RawDoc>,
}

167
src/arxiv.rs Normal file
View File

@ -0,0 +1,167 @@
// src/arxiv.rs
use serde::{Deserialize, Serialize};
use tracing::{info, error};
use regex::Regex;
// 统一的 arXiv 文献临时结构
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ArxivPaper {
pub id: String, // 清洗后的 arXiv ID例如 2301.00001
pub title: String,
pub authors: Vec<String>,
pub year: String,
pub abstract_text: String,
pub doi: Option<String>,
pub pdf_url: String,
}
// arXiv 接口访问客户端
pub struct ArxivClient {
client: reqwest::Client,
}
impl ArxivClient {
pub fn new() -> Self {
ArxivClient {
client: reqwest::Client::new(),
}
}
// 请求 arXiv 官方的 Export 检索接口并解析返回内容
pub async fn search(&self, query: &str, max_results: i32) -> anyhow::Result<Vec<ArxivPaper>> {
let url = "http://export.arxiv.org/api/query";
info!("正在发送检索请求到 arXiv 平台: 查询词='{}', 数量={}", query, max_results);
let response = self.client
.get(url)
.query(&[
("search_query", query),
("max_results", &max_results.to_string()),
])
.send()
.await?;
if !response.status().is_success() {
let status = response.status();
error!("arXiv 请求失败: 状态码={}", status);
return Err(anyhow::anyhow!("arXiv 接口返回错误码: {}", status));
}
let xml_content = response.text().await?;
let papers = parse_arxiv_xml(&xml_content);
Ok(papers)
}
}
// 使用正则表达式手动提取 XML 内容,避免由于命名空间前缀不同造成的反序列化问题
fn parse_arxiv_xml(xml: &str) -> Vec<ArxivPaper> {
let mut papers = Vec::new();
let entry_re = Regex::new(r"(?s)<entry>(.*?)</entry>").unwrap();
let id_re = Regex::new(r"<id>http://arxiv.org/abs/(.*?)(?:v\d+)?</id>").unwrap();
let title_re = Regex::new(r"(?s)<title>(.*?)</title>").unwrap();
let summary_re = Regex::new(r"(?s)<summary>(.*?)</summary>").unwrap();
let published_re = Regex::new(r"<published>(\d{4})-\d{2}-\d{2}").unwrap();
let author_re = Regex::new(r"(?s)<author>\s*<name>(.*?)</name>").unwrap();
let doi_re = Regex::new(r"<arxiv:doi[^>]*>(.*?)</arxiv:doi>").unwrap();
for cap in entry_re.captures_iter(xml) {
let entry_content = &cap[1];
// 提取并清洗 ID
let id = id_re.captures(entry_content)
.map(|c| c[1].trim().to_string())
.unwrap_or_else(|| {
let fallback_id_re = Regex::new(r"<id>(.*?)</id>").unwrap();
fallback_id_re.captures(entry_content)
.map(|c| c[1].trim().to_string())
.unwrap_or_default()
});
if id.is_empty() {
continue;
}
// 提取标题,清理换行与连续空格
let mut title = title_re.captures(entry_content)
.map(|c| c[1].to_string())
.unwrap_or_default();
title = title.replace('\n', " ").replace(" ", " ").trim().to_string();
// 提取摘要
let mut abstract_text = summary_re.captures(entry_content)
.map(|c| c[1].to_string())
.unwrap_or_default();
abstract_text = abstract_text.replace('\n', " ").replace(" ", " ").trim().to_string();
// 提取发布年份
let year = published_re.captures(entry_content)
.map(|c| c[1].to_string())
.unwrap_or_else(|| "未知".to_string());
// 提取作者列表
let mut authors = Vec::new();
for auth_cap in author_re.captures_iter(entry_content) {
let author_name = auth_cap[1].trim().to_string();
if !author_name.is_empty() {
authors.push(author_name);
}
}
// 提取关联 DOI
let doi = doi_re.captures(entry_content)
.map(|c| c[1].trim().to_string());
let pdf_url = format!("https://arxiv.org/pdf/{}.pdf", id);
papers.push(ArxivPaper {
id,
title,
authors,
year,
abstract_text,
doi,
pdf_url,
});
}
papers
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_arxiv_xml() {
let xml_data = r#"<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
<id>http://arxiv.org/abs/2301.00001v2</id>
<title>A Beautiful Title of Astro Research Paper</title>
<summary>This is the abstract. It spans multiple lines.</summary>
<published>2023-01-08T10:00:00Z</published>
<author>
<name>John Doe</name>
</author>
<author>
<name>Jane Smith</name>
</author>
<arxiv:doi xmlns:arxiv="http://arxiv.org/schemas/atom">10.1000/xyz123</arxiv:doi>
</entry>
</feed>"#;
let papers = parse_arxiv_xml(xml_data);
assert_eq!(papers.len(), 1);
let paper = &papers[0];
assert_eq!(paper.id, "2301.00001");
assert_eq!(paper.title, "A Beautiful Title of Astro Research Paper");
assert_eq!(paper.authors, vec!["John Doe".to_string(), "Jane Smith".to_string()]);
assert_eq!(paper.year, "2023");
assert_eq!(paper.abstract_text, "This is the abstract. It spans multiple lines.");
assert_eq!(paper.doi, Some("10.1000/xyz123".to_string()));
assert_eq!(paper.pdf_url, "https://arxiv.org/pdf/2301.00001.pdf");
}
}

101
src/config.rs Normal file
View File

@ -0,0 +1,101 @@
// src/config.rs
use std::env;
use std::path::PathBuf;
// 系统配置结构体,加载并管理从环境变量或 .env 文件读取的参数
#[derive(Clone, Debug)]
pub struct Config {
pub database_url: String, // SQLite 数据库连接 URL
pub ads_api_key: String, // NASA ADS API 访问 Token
pub llm_api_key: String, // 大语言模型 API Key
pub llm_api_base: String, // 大语言模型 API 基础地址
pub llm_model: String, // 调用的翻译大模型名称
pub qiniu_ak: String, // 七牛云 Access Key
pub qiniu_sk: String, // 七牛云 Secret Key
pub qiniu_bucket: String, // 七牛云存储空间名 (Bucket)
pub qiniu_domain: String, // 七牛云外链 CDN 域名
pub mineru_api_url: String, // MinerU PDF 解析远程 API 地址
pub mineru_api_key: String, // MinerU API Token
pub library_dir: PathBuf, // 本地文献馆藏根目录
pub port: u16, // 后端服务监听端口
}
impl Config {
// 从环境变量载入配置参数,提供缺省默认值
pub fn from_env() -> Self {
dotenvy::dotenv().ok();
let database_url = env::var("DATABASE_URL")
.unwrap_or_else(|_| "sqlite://astro_research.db".to_string());
let ads_api_key = env::var("ADS_API_KEY").unwrap_or_default();
let llm_api_key = env::var("LLM_API_KEY").unwrap_or_default();
let llm_api_base = env::var("LLM_API_BASE")
.unwrap_or_else(|_| "https://api.openai.com/v1".to_string());
let llm_model = env::var("LLM_MODEL")
.unwrap_or_else(|_| "gpt-4o-mini".to_string());
let qiniu_ak = env::var("QINIU_AK").unwrap_or_default();
let qiniu_sk = env::var("QINIU_SK").unwrap_or_default();
let qiniu_bucket = env::var("QINIU_BUCKET").unwrap_or_default();
let qiniu_domain = env::var("QINIU_DOMAIN").unwrap_or_default();
let mineru_api_url = env::var("MINERU_API_URL").unwrap_or_default();
let mineru_api_key = env::var("MINERU_API_KEY").unwrap_or_default();
let library_dir_str = env::var("LIBRARY_DIR").unwrap_or_else(|_| "./library".to_string());
let library_dir = PathBuf::from(library_dir_str);
let port = env::var("PORT")
.unwrap_or_else(|_| "8000".to_string())
.parse::<u16>()
.unwrap_or(8000);
Config {
database_url,
ads_api_key,
llm_api_key,
llm_api_base,
llm_model,
qiniu_ak,
qiniu_sk,
qiniu_bucket,
qiniu_domain,
mineru_api_url,
mineru_api_key,
library_dir,
port,
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_config_from_env() {
// 保存并清除环境变量以防干扰
let orig_port = std::env::var("PORT").ok();
let orig_db = std::env::var("DATABASE_URL").ok();
std::env::set_var("PORT", "9999");
std::env::set_var("DATABASE_URL", "sqlite://test.db");
let config = Config::from_env();
assert_eq!(config.port, 9999);
assert_eq!(config.database_url, "sqlite://test.db");
// 恢复环境变量
if let Some(p) = orig_port {
std::env::set_var("PORT", p);
} else {
std::env::remove_var("PORT");
}
if let Some(db) = orig_db {
std::env::set_var("DATABASE_URL", db);
} else {
std::env::remove_var("DATABASE_URL");
}
}
}

713
src/download.rs Normal file
View File

@ -0,0 +1,713 @@
// src/download.rs
//! 文献下载模块
//!
//! 参考 datasheel/node/src/download 设计,实现:
//! - 随机 User-Agent + 完整 Sec-Fetch 头伪装
//! - 流式下载stream_download
//! - PDF/HTML 内容校验 + 反爬检测
//! - 多级回退arXiv 直连 → ADS PUB → ADS EPRINT → CrossRef API
//! - IOP/Springer 等特定出版商会话预热策略
//! - 请求间随机延迟500-2000ms降低触发反爬风险
use std::fs;
use std::path::{Path, PathBuf};
use reqwest::header::{HeaderMap, HeaderValue};
use tokio::io::AsyncWriteExt;
use url::Url;
use tracing::{info, warn};
use anyhow::{Context, Result};
// ─── 浏览器伪装辅助 ────────────────────────────────────────────
/// 生成随机 Firefox User-Agent参考 SearXNG useragents.json
fn gen_useragent() -> String {
use rand::seq::SliceRandom;
let os_list = [
"Windows NT 10.0; Win64; x64",
"X11; Linux x86_64",
"Macintosh; Intel Mac OS X 10_15_7",
];
let versions = ["137.0", "136.0", "135.0", "134.0", "133.0"];
let os = os_list.choose(&mut rand::thread_rng()).unwrap();
let v = versions.choose(&mut rand::thread_rng()).unwrap();
format!("Mozilla/5.0 ({os}; rv:{v}) Gecko/20100101 Firefox/{v}")
}
/// 构建完整浏览器 HTTP 头(参考 SearXNG online.py
fn build_browser_headers() -> HeaderMap {
let mut h = HeaderMap::new();
if let Ok(ua) = HeaderValue::from_str(&gen_useragent()) {
h.insert("User-Agent", ua);
}
h.insert("Accept", HeaderValue::from_static(
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
));
h.insert("Accept-Language", HeaderValue::from_static("en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7"));
h.insert("Accept-Encoding", HeaderValue::from_static("gzip, deflate, br"));
h.insert("DNT", HeaderValue::from_static("1"));
h.insert("Connection", HeaderValue::from_static("keep-alive"));
h.insert("Upgrade-Insecure-Requests", HeaderValue::from_static("1"));
h.insert("Sec-Fetch-Dest", HeaderValue::from_static("document"));
h.insert("Sec-Fetch-Mode", HeaderValue::from_static("navigate"));
h.insert("Sec-Fetch-Site", HeaderValue::from_static("none"));
h.insert("Sec-Fetch-User", HeaderValue::from_static("?1"));
h
}
/// 构建 Chrome 风格 HTTP 头(用于 IOP 等更严格出版商)
fn build_chrome_headers(referer: Option<&str>) -> HeaderMap {
let mut h = HeaderMap::new();
h.insert("User-Agent", HeaderValue::from_static(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36",
));
h.insert("Accept", HeaderValue::from_static(
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
));
h.insert("Accept-Language", HeaderValue::from_static("en-US,en;q=0.9"));
h.insert("Accept-Encoding", HeaderValue::from_static("gzip, deflate, br, zstd"));
h.insert("Sec-Ch-Ua", HeaderValue::from_static(
"\"Google Chrome\";v=\"143\", \"Chromium\";v=\"143\", \"Not A(Brand\";v=\"24\"",
));
h.insert("Sec-Ch-Ua-Mobile", HeaderValue::from_static("?0"));
h.insert("Sec-Ch-Ua-Platform", HeaderValue::from_static("\"Windows\""));
h.insert("Sec-Fetch-Dest", HeaderValue::from_static("document"));
h.insert("Sec-Fetch-Mode", HeaderValue::from_static("navigate"));
h.insert("Sec-Fetch-Site", HeaderValue::from_static("same-origin"));
h.insert("Sec-Fetch-User", HeaderValue::from_static("?1"));
if let Some(r) = referer {
if let Ok(v) = HeaderValue::from_str(r) {
h.insert("Referer", v);
}
}
h
}
// ─── 内容校验 ─────────────────────────────────────────────────
/// 统一验证码/反爬虫检测(参考 SearXNG 异常处理机制)
fn detect_anti_bot(content: &str, url: Option<&str>) -> Result<()> {
let lower = content.to_lowercase();
let cf_patterns = [
"checking your browser", "please wait while we verify",
"cf-browser-verification", "cf_chl_opt", "just a moment",
"enable javascript and cookies", "_cf_chl_tk",
];
for p in &cf_patterns {
if lower.contains(p) {
anyhow::bail!("检测到 Cloudflare 挑战页面(特征: {}", p);
}
}
let captcha_patterns = ["captcha", "recaptcha", "hcaptcha", "verify you are human", "robot check"];
for p in &captcha_patterns {
if lower.contains(p) {
anyhow::bail!("检测到人机验证页面(包含: {}", p);
}
}
let access_denied = [
"login required", "please log in", "subscription required",
"access denied", "you do not have access", "purchase this article",
"sign in to access", "client challenge",
];
for p in &access_denied {
if lower.contains(p) {
anyhow::bail!("检测到出版商访问限制(特征: {}", p);
}
}
if let Some(u) = url {
if u.contains("sorry.google.com") || u.contains("/sorry") {
anyhow::bail!("检测到 Google 验证码页面");
}
}
Ok(())
}
/// 校验响应字节是否为有效 PDF魔数 + 最小大小 + EOF 标记)
fn validate_pdf_content(bytes: &[u8]) -> Result<()> {
if !bytes.starts_with(b"%PDF") {
if bytes.starts_with(b"<!") || bytes.starts_with(b"<html") || bytes.starts_with(b"<HTML") {
let text = String::from_utf8_lossy(&bytes[..bytes.len().min(2048)]);
detect_anti_bot(&text, None)?;
anyhow::bail!("响应内容是 HTML 而非 PDF可能需要登录或验证");
}
anyhow::bail!("响应不是有效的 PDF 文件(缺少 %PDF 魔数)");
}
if bytes.len() < 5000 {
anyhow::bail!("PDF 文件过小({} 字节),可能是错误页面", bytes.len());
}
let scan_len = std::cmp::min(1024, bytes.len());
let tail = &bytes[bytes.len() - scan_len..];
if !tail.windows(5).any(|w| w == b"%%EOF") {
anyhow::bail!("PDF 文件损坏或不完整(未找到尾部 %%EOF 标记)");
}
Ok(())
}
/// 校验 HTML 内容是否为有效文献页(非错误/登录墙)
fn validate_html_content(text: &str) -> Result<()> {
detect_anti_bot(text, None)?;
if text.len() < 2000 {
let lower = text.to_lowercase();
for kw in &["error", "404", "not found", "forbidden", "access denied"] {
if lower.contains(kw) {
anyhow::bail!("响应是错误页面(包含: {}", kw);
}
}
warn!("HTML 内容较短({} 字节),可能不完整", text.len());
}
Ok(())
}
// ─── Downloader 主结构 ─────────────────────────────────────────
/// 文献双格式异步下载管理器
pub struct Downloader {
client: reqwest::Client,
}
impl Downloader {
pub fn new() -> Self {
let mut headers = HeaderMap::new();
headers.insert(reqwest::header::ACCEPT, HeaderValue::from_static(
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
));
headers.insert(reqwest::header::ACCEPT_LANGUAGE, HeaderValue::from_static("en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7"));
headers.insert("DNT", HeaderValue::from_static("1"));
headers.insert(reqwest::header::CONNECTION, HeaderValue::from_static("keep-alive"));
headers.insert("Upgrade-Insecure-Requests", HeaderValue::from_static("1"));
let client = reqwest::Client::builder()
.user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")
.default_headers(headers)
.cookie_store(true) // 启用 Cookie 引擎,记录会话状态
.redirect(reqwest::redirect::Policy::limited(10))
.timeout(std::time::Duration::from_secs(60))
.build()
.expect("Failed to create HTTP client");
Downloader { client }
}
// ─── 辅助工具 ──────────────────────────────────────────────
/// 请求前随机延迟 500-2000ms模拟人类浏览间隔降低反爬触发
async fn maybe_delay() {
let delay_ms = 500 + (rand::random::<u64>() % 1500);
tokio::time::sleep(std::time::Duration::from_millis(delay_ms)).await;
}
/// 流式下载 HTTP 响应到本地文件(逐块写入,支持大文件)
async fn stream_download(&self, response: reqwest::Response, target_path: &Path) -> Result<()> {
use futures_util::StreamExt;
if let Some(parent) = target_path.parent() {
fs::create_dir_all(parent)?;
}
let mut file = tokio::fs::File::create(target_path)
.await
.context("创建目标文件失败")?;
let mut stream = response.bytes_stream();
while let Some(chunk) = stream.next().await {
let bytes = chunk.context("读取响应流时出错")?;
file.write_all(&bytes).await?;
}
file.flush().await?;
Ok(())
}
/// 解析 ADS Link Gateway 路由,若遇 perfdrive 防护则提取 ssc 参数绕过
async fn resolve_ads_gateway(&self, gateway_url: &str) -> Result<String> {
info!("解析 ADS 网关: {}", gateway_url);
// HEAD 请求跟踪重定向(部分出版商阻断 HEAD自动降级 GET
let response = match self.client.head(gateway_url).send().await {
Ok(resp) => resp,
Err(_) => self.client.get(gateway_url).send().await
.context(format!("请求 ADS 网关失败: {}", gateway_url))?,
};
let final_url = response.url().as_str().to_string();
info!("网关解析结果: {}", final_url);
// 如重定向至 validate.perfdrive.com提取 ssc 参数中的真实 URL
if final_url.contains("validate.perfdrive.com") {
if let Ok(parsed) = Url::parse(&final_url) {
if let Some(ssc) = parsed.query_pairs().find(|(k, _)| k == "ssc").map(|(_, v)| v.into_owned()) {
if let Ok(decoded) = urlencoding::decode(&ssc) {
let real_url = decoded.into_owned();
info!("检测到 perfdrive 拦截,解码真实地址: {}", real_url);
return Ok(real_url);
}
}
}
}
// 排除解析失败后仍停留在 link_gateway 的情况
if final_url.contains("link_gateway") || final_url.is_empty() {
anyhow::bail!("ADS 网关未能解析到有效目标(仍在 link_gateway");
}
Ok(final_url)
}
/// 读取文件前 512 字节用于内容嗅探
async fn read_file_header(path: &Path) -> Result<Vec<u8>> {
use tokio::io::AsyncReadExt;
let mut file = tokio::fs::File::open(path).await?;
let mut buf = vec![0u8; 512];
let n = file.read(&mut buf).await?;
buf.truncate(n);
Ok(buf)
}
// ─── 特定出版商下载策略 ────────────────────────────────────
/// IOP Science PDF 下载
/// 参考 datasheel iop.rs先访问主页建立会话获取 Cookie再请求 PDF
async fn download_iop_pdf(&self, doi: &str, dest_path: &Path) -> Result<()> {
let main_url = format!("https://iopscience.iop.org/article/{}", doi);
let pdf_url = format!("https://iopscience.iop.org/article/{}/pdf", doi);
// 步骤 1访问文章主页建立 Cookie 会话
info!("[IOP] 预热主页: {}", main_url);
Self::maybe_delay().await;
match self.client.get(&main_url)
.headers(build_chrome_headers(None))
.send().await
{
Ok(r) => info!("[IOP] 主页响应: {}", r.status()),
Err(e) => warn!("[IOP] 主页访问失败(继续尝试): {:?}", e),
}
// 步骤 2携带 Referer 下载 PDF
info!("[IOP] 下载 PDF: {}", pdf_url);
Self::maybe_delay().await;
let response = self.client.get(&pdf_url)
.headers(build_chrome_headers(Some(&main_url)))
.send().await
.context("IOP PDF 请求失败")?;
let status = response.status();
if !status.is_success() {
anyhow::bail!("[IOP] 返回 HTTP {}", status);
}
self.stream_download(response, dest_path).await?;
// 步骤 3校验下载内容
let bytes = tokio::fs::read(dest_path).await?;
validate_pdf_content(&bytes)?;
info!("[IOP] PDF 下载成功: {:?}", dest_path);
Ok(())
}
/// Springer/Nature HTML 下载(含会话预热)
async fn download_springer_html(&self, doi: &str, dest_path: &Path) -> Result<()> {
let url = format!("https://link.springer.com/article/{}", doi);
info!("[Springer] 下载 HTML: {}", url);
Self::maybe_delay().await;
let response = self.client.get(&url)
.headers(build_browser_headers())
.send().await
.context("Springer HTML 请求失败")?;
let status = response.status();
if !status.is_success() {
anyhow::bail!("[Springer] 返回 HTTP {}", status);
}
self.stream_download(response, dest_path).await?;
let sniff = Self::read_file_header(dest_path).await?;
let text = String::from_utf8_lossy(&sniff);
validate_html_content(&text)?;
info!("[Springer] HTML 下载成功: {:?}", dest_path);
Ok(())
}
/// 通用 PDF 直链下载(带随机延迟 + 内容校验)
async fn download_pdf_direct(&self, url: &str, dest_path: &Path, label: &str) -> Result<()> {
info!("[{}] 下载 PDF: {}", label, url);
Self::maybe_delay().await;
let response = self.client.get(url)
.headers(build_browser_headers())
.send().await
.context(format!("[{}] PDF 请求失败", label))?;
let status = response.status();
if !status.is_success() {
anyhow::bail!("[{}] 返回 HTTP {}", label, status);
}
self.stream_download(response, dest_path).await?;
let bytes = tokio::fs::read(dest_path).await?;
validate_pdf_content(&bytes)?;
info!("[{}] PDF 下载成功: {:?}", label, dest_path);
Ok(())
}
/// 通用 HTML 直链下载(带随机延迟 + 反爬检测)
async fn download_html_direct(&self, url: &str, dest_path: &Path, label: &str) -> Result<()> {
info!("[{}] 下载 HTML: {}", label, url);
Self::maybe_delay().await;
let response = self.client.get(url)
.headers(build_browser_headers())
.send().await
.context(format!("[{}] HTML 请求失败", label))?;
let status = response.status();
if !status.is_success() {
anyhow::bail!("[{}] 返回 HTTP {}", label, status);
}
self.stream_download(response, dest_path).await?;
let sniff = Self::read_file_header(dest_path).await?;
let text = String::from_utf8_lossy(&sniff);
validate_html_content(&text)?;
info!("[{}] HTML 下载成功: {:?}", label, dest_path);
Ok(())
}
// ─── CrossRef 回退通道 ─────────────────────────────────────
/// 通过 CrossRef API 获取 PDF 链接并下载
async fn download_crossref_pdf(&self, doi: &str, dest_path: &Path) -> Result<()> {
let api_url = format!("https://api.crossref.org/works/{}", doi);
info!("[CrossRef] 查询 PDF 链接: {}", api_url);
let data: serde_json::Value = self.client.get(&api_url)
.header("Accept", "application/json")
.send().await
.context("CrossRef API 请求失败")?
.json().await
.context("CrossRef API 响应解析失败")?;
let links = data["message"]["link"].as_array()
.context("CrossRef 未返回 link 数组")?;
let pdf_url = links.iter()
.find(|l| {
let ct = l["content-type"].as_str().unwrap_or("");
ct.contains("pdf") || ct == "unspecified"
})
.and_then(|l| l["URL"].as_str())
.context("CrossRef 未找到 PDF 链接")?;
info!("[CrossRef] PDF 链接: {}", pdf_url);
self.download_pdf_direct(pdf_url, dest_path, "CrossRef").await
}
// ─── 公共入口 ──────────────────────────────────────────────
/// 通过 arXiv ID 直接下载 PDFarxiv.org和 HTML官方 html/ 优先ar5iv 兜底)
///
/// HTML 下载优先级:
/// 1. 官方 `arxiv.org/html/{id}`2023-12 起支持,质量与 ar5iv 相同,更稳定)
/// 2. ar5iv `ar5iv.labs.arxiv.org/html/{id}`(约 3% 论文转换失败时跳过)
pub async fn download_arxiv_direct(&self, arxiv_id: &str, library_dir: &Path) -> (Option<PathBuf>, Option<PathBuf>) {
// 去除版本号v1/v2/v3arxiv.org/html/ 和 ar5iv 均只提供最新渲染版
let clean_id = strip_arxiv_version(arxiv_id);
let pdf_url = format!("https://arxiv.org/pdf/{}", clean_id);
let pdf_dest = library_dir.join("PDF").join(format!("{}.pdf", arxiv_id));
let html_dest = library_dir.join("HTML").join(format!("{}.html", arxiv_id));
let mut pdf_ok = None;
let mut html_ok = None;
// PDF 下载
match self.download_pdf_direct(&pdf_url, &pdf_dest, "arXiv").await {
Ok(_) => pdf_ok = Some(pdf_dest),
Err(e) => warn!("[arXiv] PDF 下载失败: {:?}", e),
}
// HTML 下载:官方 arxiv.org/html/ 优先
let official_html_url = format!("https://arxiv.org/html/{}", clean_id);
match self.download_html_direct(&official_html_url, &html_dest, "arXiv-HTML").await {
Ok(_) => html_ok = Some(html_dest.clone()),
Err(e) => {
warn!("[arXiv-HTML] 官方 HTML 下载失败,回退 ar5iv: {:?}", e);
// ar5iv 兜底:约 97% 成功率,可能有延迟
let ar5iv_url = format!("https://ar5iv.labs.arxiv.org/html/{}", clean_id);
match self.download_html_direct(&ar5iv_url, &html_dest, "ar5iv").await {
Ok(_) => html_ok = Some(html_dest),
Err(e2) => warn!("[ar5iv] HTML 下载也失败: {:?}", e2),
}
}
}
(pdf_ok, html_ok)
}
/// 下载 arXiv HTML官方 arxiv.org/html/ 优先ar5iv 兜底
/// arxiv_id 应已去除版本号
async fn download_arxiv_html_with_fallback(&self, arxiv_id: &str, dest_path: &Path) -> Result<()> {
let official_url = format!("https://arxiv.org/html/{}", arxiv_id);
match self.download_html_direct(&official_url, dest_path, "arXiv-HTML").await {
Ok(()) => Ok(()),
Err(e) => {
warn!("[arXiv-HTML] 官方 HTML 失败,回退 ar5iv: {:?}", e);
let ar5iv_url = format!("https://ar5iv.labs.arxiv.org/html/{}", arxiv_id);
self.download_html_direct(&ar5iv_url, dest_path, "ar5iv").await
}
}
}
/// 为 ADS Bibcode 下载 PDF 与 HTML多级回退策略
///
/// PDF 回退顺序:
/// 1. ADS PUB_PDF 网关 → 按 DOI 前缀路由IOP/Springer 用专属策略,其余通用)
/// 2. ADS EPRINT_PDF 网关
/// 3. CrossRef API PDF需提供 DOI
///
/// HTML 回退顺序:
/// 1. ADS PUB_HTML 网关IOP→ 直联 iopsciencearxiv abs → ar5iv
/// 2. ADS EPRINT_HTML 网关arxiv abs → ar5iv
pub async fn download_paper(&self, bibcode: &str, doi: Option<&str>, library_dir: &Path) -> (Option<PathBuf>, Option<PathBuf>) {
let base = "https://ui.adsabs.harvard.edu/link_gateway";
let pdf_dest = library_dir.join("PDF").join(format!("{}.pdf", bibcode));
let html_dest = library_dir.join("HTML").join(format!("{}.html", bibcode));
let mut pdf_ok: Option<PathBuf> = None;
let mut html_ok: Option<PathBuf> = None;
// ── PDF 下载 ───────────────────────────────────────────
info!("[下载] 开始 PDF 下载: {}", bibcode);
'pdf: {
// 1a. ADS PUB_PDF 网关
let gw = format!("{}/{}/PUB_PDF", base, bibcode);
match self.resolve_ads_gateway(&gw).await {
Ok(resolved) => {
let result = if resolved.contains("iopscience.iop.org") {
// 提取 DOI 路径部分,走 IOP 专属策略
let doi = resolved
.trim_start_matches("https://iopscience.iop.org/article/")
.trim_end_matches("/pdf")
.trim_end_matches('/');
self.download_iop_pdf(doi, &pdf_dest).await
} else if resolved.contains("link.springer.com") || resolved.contains("nature.com") {
// Springer/NatureHTML 更可靠PDF 用通用策略
self.download_pdf_direct(&resolved, &pdf_dest, "Springer").await
} else {
self.download_pdf_direct(&resolved, &pdf_dest, "PUB_PDF").await
};
match result {
Ok(_) => { pdf_ok = Some(pdf_dest.clone()); break 'pdf; }
Err(e) => warn!("[PUB_PDF] 下载失败: {:?}", e),
}
}
Err(e) => warn!("[PUB_PDF] 网关解析失败: {:?}", e),
}
// 1b. ADS EPRINT_PDF 网关
let gw = format!("{}/{}/EPRINT_PDF", base, bibcode);
match self.resolve_ads_gateway(&gw).await {
Ok(resolved) => {
match self.download_pdf_direct(&resolved, &pdf_dest, "EPRINT_PDF").await {
Ok(_) => { pdf_ok = Some(pdf_dest.clone()); break 'pdf; }
Err(e) => warn!("[EPRINT_PDF] 下载失败: {:?}", e),
}
}
Err(e) => warn!("[EPRINT_PDF] 网关解析失败: {:?}", e),
}
// 1c. CrossRef API 回退(需要 DOI
if let Some(doi_str) = doi {
match self.download_crossref_pdf(doi_str, &pdf_dest).await {
Ok(_) => { pdf_ok = Some(pdf_dest.clone()); }
Err(e) => warn!("[CrossRef] PDF 下载失败: {:?}", e),
}
}
}
// ── HTML 下载 ──────────────────────────────────────────
info!("[下载] 开始 HTML 下载: {}", bibcode);
'html: {
// 2a. ADS PUB_HTML 网关
let gw = format!("{}/{}/PUB_HTML", base, bibcode);
match self.resolve_ads_gateway(&gw).await {
Ok(resolved) => {
let result = if resolved.contains("link.springer.com") || resolved.contains("nature.com") {
// Springer/Nature 专属 HTML 策略
let doi_part = resolved
.trim_start_matches("https://link.springer.com/article/")
.trim_start_matches("https://www.nature.com/articles/")
.trim_end_matches('/');
self.download_springer_html(doi_part, &html_dest).await
} else if let Some(arxiv_id) = extract_arxiv_id_from_url(&resolved) {
// ADS 网关指向 arXiv abs 页面 → 优先官方 HTMLar5iv 兜底
self.download_arxiv_html_with_fallback(&arxiv_id, &html_dest).await
} else {
self.download_html_direct(&resolved, &html_dest, "PUB_HTML").await
};
match result {
Ok(_) => { html_ok = Some(html_dest.clone()); break 'html; }
Err(e) => warn!("[PUB_HTML] 下载失败: {:?}", e),
}
}
Err(e) => warn!("[PUB_HTML] 网关解析失败: {:?}", e),
}
// 2b. ADS EPRINT_HTML 网关(大多数天文论文有 arXiv eprint
let gw = format!("{}/{}/EPRINT_HTML", base, bibcode);
match self.resolve_ads_gateway(&gw).await {
Ok(resolved) => {
let result = if let Some(arxiv_id) = extract_arxiv_id_from_url(&resolved) {
self.download_arxiv_html_with_fallback(&arxiv_id, &html_dest).await
} else {
self.download_html_direct(&resolved, &html_dest, "EPRINT_HTML").await
};
match result {
Ok(_) => { html_ok = Some(html_dest.clone()); }
Err(e) => warn!("[EPRINT_HTML] 下载失败: {:?}", e),
}
}
Err(e) => warn!("[EPRINT_HTML] 网关解析失败: {:?}", e),
}
}
(pdf_ok, html_ok)
}
}
fn strip_arxiv_version(arxiv_id: &str) -> String {
use regex::Regex;
static RE_VERSION: std::sync::OnceLock<Regex> = std::sync::OnceLock::new();
let re = RE_VERSION.get_or_init(|| Regex::new(r"v\d+$").unwrap());
re.replace(arxiv_id, "").to_string()
}
fn extract_arxiv_id_from_url(url: &str) -> Option<String> {
let patterns = [
"arxiv.org/abs/",
"arxiv.org/pdf/",
"arxiv.org/html/",
"ar5iv.labs.arxiv.org/html/",
"ar5iv.org/abs/",
"ar5iv.org/html/",
];
for pat in &patterns {
if let Some(pos) = url.find(pat) {
let id_raw = &url[pos + pat.len()..];
let mut id_clean = id_raw.split('?').next().unwrap_or(id_raw)
.split('#').next().unwrap_or(id_raw)
.trim_end_matches('/')
.to_string();
if id_clean.to_lowercase().ends_with(".pdf") {
id_clean.truncate(id_clean.len() - 4);
}
if id_clean.to_lowercase().ends_with(".html") {
id_clean.truncate(id_clean.len() - 5);
}
let id = strip_arxiv_version(&id_clean);
if !id.is_empty() {
return Some(id);
}
}
}
None
}
#[cfg(test)]
mod tests {
use super::*;
use axum::{Router, routing::get, response::Redirect};
#[tokio::test]
async fn test_resolve_ads_gateway_perfdrive() {
let listener = std::net::TcpListener::bind("127.0.0.1:0").unwrap();
listener.set_nonblocking(true).unwrap();
let port = listener.local_addr().unwrap().port();
let target_ssc = "https%3A%2F%2Fexample.com%2Ftarget.pdf";
let redirect_to = format!("https://validate.perfdrive.com/?ssc={}", target_ssc);
let app = Router::new().route("/gate", get(move || {
let r = redirect_to.clone();
async move { Redirect::to(&r) }
}));
let server = axum::serve(
tokio::net::TcpListener::from_std(listener).unwrap(),
app,
);
tokio::spawn(async move { let _ = server.await; });
let downloader = Downloader::new();
let gateway_url = format!("http://127.0.0.1:{}/gate", port);
let result = downloader.resolve_ads_gateway(&gateway_url).await;
assert_eq!(result.unwrap(), "https://example.com/target.pdf");
}
#[test]
fn test_validate_pdf_content_valid() {
let mut pdf = b"%PDF-1.7 ".to_vec();
pdf.extend(vec![0u8; 5100]);
pdf.extend(b"%%EOF");
assert!(validate_pdf_content(&pdf).is_ok());
}
#[test]
fn test_validate_pdf_content_html() {
let html = b"<html><body>please log in</body></html>".to_vec();
let result = validate_pdf_content(&html);
assert!(result.is_err());
}
#[test]
fn test_detect_anti_bot_cloudflare() {
let result = detect_anti_bot("just a moment please", None);
assert!(result.is_err());
}
#[test]
fn test_detect_anti_bot_clean() {
let result = detect_anti_bot("<html><body><h1>Abstract</h1><p>We study...</p></body></html>", None);
assert!(result.is_ok());
}
#[test]
fn test_strip_arxiv_version() {
assert_eq!(strip_arxiv_version("2101.00001v2"), "2101.00001");
assert_eq!(strip_arxiv_version("hep-th/9901001v3"), "hep-th/9901001");
assert_eq!(strip_arxiv_version("2101.00001"), "2101.00001");
}
#[test]
fn test_extract_arxiv_id_from_url() {
assert_eq!(
extract_arxiv_id_from_url("https://arxiv.org/abs/2101.00001v2"),
Some("2101.00001".to_string())
);
assert_eq!(
extract_arxiv_id_from_url("https://arxiv.org/pdf/hep-th/9901001v3.pdf"),
Some("hep-th/9901001".to_string())
);
assert_eq!(
extract_arxiv_id_from_url("https://ar5iv.labs.arxiv.org/html/2101.00001"),
Some("2101.00001".to_string())
);
}
}

1014
src/handlers.rs Normal file

File diff suppressed because it is too large Load Diff

147
src/main.rs Normal file
View File

@ -0,0 +1,147 @@
// src/main.rs
mod config;
mod qiniu;
mod ads;
mod arxiv;
mod download;
mod translation;
mod parser;
mod handlers;
use std::net::SocketAddr;
use std::sync::Arc;
use axum::{
routing::{get, post},
Router,
};
use tower_http::cors::{Any, CorsLayer};
use tower_http::services::ServeDir;
use sqlx::sqlite::SqlitePoolOptions;
use tracing::{info, error};
use crate::config::Config;
use crate::translation::Dictionary;
use crate::qiniu::QiniuClient;
use crate::ads::AdsClient;
use crate::arxiv::ArxivClient;
use crate::download::Downloader;
use crate::handlers::AppState;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// 1. 初始化日志记录器
tracing_subscriber::fmt()
.with_env_filter(
tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info,astroresearch=debug")),
)
.init();
info!("正在启动 AstroResearch 天文学文献辅助系统后端服务...");
// 2. 加载环境变量配置
let config = Config::from_env();
info!("系统配置成功载入。本地 SQLite 连接串: {}", config.database_url);
// 创建本地馆藏物理文件夹分类结构
std::fs::create_dir_all(&config.library_dir).unwrap_or_default();
std::fs::create_dir_all(config.library_dir.join("PDF")).unwrap_or_default();
std::fs::create_dir_all(config.library_dir.join("HTML")).unwrap_or_default();
std::fs::create_dir_all(config.library_dir.join("Markdown")).unwrap_or_default();
std::fs::create_dir_all(config.library_dir.join("Translation")).unwrap_or_default();
// 3. 初始化本地 SQLite 数据库文件连接池
if config.database_url.starts_with("sqlite://") {
let db_path = config.database_url.replace("sqlite://", "");
if !db_path.contains(":memory:") {
let path = std::path::Path::new(&db_path);
if !path.exists() {
if let Some(parent) = path.parent() {
std::fs::create_dir_all(parent).unwrap_or_default();
}
std::fs::File::create(path)?;
info!("初始化创建本地 SQLite 数据库文件: {:?}", path);
}
}
}
let pool = SqlitePoolOptions::new()
.max_connections(5)
.connect(&config.database_url)
.await?;
info!("SQLite 数据库连接已建立。");
// 4. 自动执行数据库迁移脚本
info!("开始执行 SQL 表结构迁移...");
sqlx::migrate!("./migrations")
.run(&pool)
.await?;
info!("数据库迁移执行完成,主表准备就绪。");
// 5. 异步加载天文学专业名词对照词表
let mut dict = Dictionary::new();
if let Err(e) = dict.load_from_file("dictionary.txt") {
error!("天文学名词词表加载失败: {}", e);
}
// 6. 初始化并配置全部 API 与下载客户端
let qiniu = QiniuClient::new(
config.qiniu_ak.clone(),
config.qiniu_sk.clone(),
config.qiniu_bucket.clone(),
config.qiniu_domain.clone(),
);
let ads = AdsClient::new(config.ads_api_key.clone());
let arxiv = ArxivClient::new();
let downloader = Downloader::new();
let app_state = Arc::new(AppState {
config: config.clone(),
db: pool,
dict,
qiniu,
ads,
arxiv,
downloader,
});
// 7. 设置 Axum 路由、CORS 头以及 React 仪表盘静态资源托管
let cors = CorsLayer::new()
.allow_origin(Any)
.allow_methods(Any)
.allow_headers(Any);
let api_routes = Router::new()
.route("/search", get(handlers::search_papers))
.route("/download", post(handlers::download_paper))
.route("/parse", post(handlers::parse_paper))
.route("/translate", post(handlers::translate_paper))
.route("/citations", get(handlers::get_citation_network))
.route("/paper", get(handlers::get_paper_detail))
.route("/library", get(handlers::get_library))
.route("/export", post(handlers::export_citations))
.route("/notes", post(handlers::create_note))
.route("/notes", get(handlers::get_notes))
.route("/notes", axum::routing::delete(handlers::delete_note));
// 静态文件资源代理托管(当前端打包至 dashboard/dist 后,直接挂载到主域名根路由)
let serve_dir = ServeDir::new("dashboard/dist")
.fallback(tower_http::services::ServeFile::new("dashboard/dist/index.html"));
let app = Router::new()
.nest("/api", api_routes)
.fallback_service(serve_dir)
.layer(cors)
.with_state(app_state);
let addr = SocketAddr::from(([0, 0, 0, 0], config.port));
info!("天文学科研服务已成功监听 http://localhost:{}", config.port);
let listener = tokio::net::TcpListener::bind(addr).await?;
axum::serve(listener, app).await?;
Ok(())
}

427
src/parser.rs Normal file
View File

@ -0,0 +1,427 @@
// src/parser.rs
use std::fs;
use std::path::Path;
use serde::Deserialize;
use reqwest::multipart;
use tracing::{info, warn};
use regex::Regex;
use base64::Engine;
use crate::config::Config;
use crate::qiniu::QiniuClient;
// 清理 HTML 结构,仅提取正文部分并转换为标准 Markdown
pub fn html_to_markdown(html_path: &Path) -> anyhow::Result<String> {
info!("正在解析本地 HTML 并提取 Markdown: {:?}", html_path);
let html_content = fs::read_to_string(html_path)?;
// 截断页脚及之后的不相关内容以防干扰解析
let mut truncated_html = html_content.as_str();
if let Some(end) = html_content.find("<div class=\"ar5iv-footer\">") {
truncated_html = &html_content[..end];
} else if let Some(end) = html_content.find("<footer") {
truncated_html = &html_content[..end];
}
let mut main_html = truncated_html;
// 定位正文标记块,滤除页眉、页脚等侧栏广告
if let Some(start) = truncated_html.find("<div class=\"ltx_page_main\">") {
main_html = &truncated_html[start..];
} else if let Some(start) = truncated_html.find("<main") {
main_html = &truncated_html[start..];
} else if let Some(start) = truncated_html.find("<article") {
main_html = &truncated_html[start..];
} else if let Some(start) = truncated_html.find("<body") {
main_html = &truncated_html[start..];
}
// 预处理:删除导航条、页眉、侧边栏等不属于正文的结构
let nav_re = Regex::new(r#"(?s)<nav[^>]*>.*?</nav>"#).unwrap();
let preprocessed_html = nav_re.replace_all(main_html, "").to_string();
let ltx_nav_re = Regex::new(r#"(?s)<div[^>]*class="[^"]*ltx_(?:page_navbar|header|navigation)[^"]*"[^>]*>.*?</div>"#).unwrap();
let preprocessed_html = ltx_nav_re.replace_all(&preprocessed_html, "").to_string();
// 预处理:提前用占位符替换 <math ...>...</math> 公式,防止其内部 Latex 语法被标题解析、图注解析或 html2md 破坏
let mut formulas = Vec::new();
let math_re = Regex::new(r#"(?s)<math\s+([^>]*?)>(.*?)</math>"#).unwrap();
let mut placeholder_counter = 0;
let preprocessed_html = math_re.replace_all(&preprocessed_html, |caps: &regex::Captures| {
let attrs = &caps[1];
let alttext_re = Regex::new(r#"alttext="([^"]*)""#).unwrap();
let mut alttext = alttext_re.captures(attrs)
.map(|c| c[1].to_string())
.unwrap_or_default();
// 如果 alttext 为空,尝试从 <annotation encoding="application/x-tex"> 中提取 LaTeX 公式作为备选方案
if alttext.is_empty() {
let annotation_re = Regex::new(r#"(?s)<annotation\s+[^>]*encoding="application/x-tex"[^>]*>(.*?)</annotation>"#).unwrap();
if let Some(ann_caps) = annotation_re.captures(&caps[2]) {
alttext = ann_caps[1].trim().to_string();
}
}
let is_block = attrs.contains("display=\"block\"") || attrs.contains("display='block'");
formulas.push((alttext, is_block));
let placeholder = format!(" MATHPLACEHOLDER{} ", placeholder_counter);
placeholder_counter += 1;
placeholder
}).to_string();
// 预处理:将 ltx_section 标题标记转换为对应层级的 Markdown heading
// h2: section, h3: subsection, h4: subsubsection
let sec_re = Regex::new(r#"(?s)<(?:h[1-6])[^>]*class="[^"]*ltx_title_section[^"]*"[^>]*>(.*?)</(?:h[1-6])>"#).unwrap();
let preprocessed_html = sec_re.replace_all(&preprocessed_html, |caps: &regex::Captures| {
let inner = strip_html_tags(&caps[1]);
format!("\n\n## {}\n\n", inner.trim())
}).to_string();
let subsec_re = Regex::new(r#"(?s)<(?:h[1-6])[^>]*class="[^"]*ltx_title_subsection[^"]*"[^>]*>(.*?)</(?:h[1-6])>"#).unwrap();
let preprocessed_html = subsec_re.replace_all(&preprocessed_html, |caps: &regex::Captures| {
let inner = strip_html_tags(&caps[1]);
format!("\n\n### {}\n\n", inner.trim())
}).to_string();
let subsubsec_re = Regex::new(r#"(?s)<(?:h[1-6])[^>]*class="[^"]*ltx_title_subsubsection[^"]*"[^>]*>(.*?)</(?:h[1-6])>"#).unwrap();
let preprocessed_html = subsubsec_re.replace_all(&preprocessed_html, |caps: &regex::Captures| {
let inner = strip_html_tags(&caps[1]);
format!("\n\n#### {}\n\n", inner.trim())
}).to_string();
// 预处理:将 figcaption 转换为 Markdown 图注格式
let figcaption_re = Regex::new(r#"(?s)<figcaption[^>]*>(.*?)</figcaption>"#).unwrap();
let preprocessed_html = figcaption_re.replace_all(&preprocessed_html, |caps: &regex::Captures| {
let inner = strip_html_tags(&caps[1]);
format!("\n> **Figure:** {}\n", inner.trim())
}).to_string();
// 预处理:将 ltx_caption (LaTeXML figure/table caption) 转换为图注
let ltx_caption_re = Regex::new(r#"(?s)<(?:span|div|p)[^>]*class="[^"]*ltx_caption[^"]*"[^>]*>(.*?)</(?:span|div|p)>"#).unwrap();
let preprocessed_html = ltx_caption_re.replace_all(&preprocessed_html, |caps: &regex::Captures| {
let inner = strip_html_tags(&caps[1]);
format!("\n> **Caption:** {}\n", inner.trim())
}).to_string();
// 预处理:将 ltx_title 文章标题转为 h1
let title_re = Regex::new(r#"(?s)<(?:h[1-6])[^>]*class="[^"]*ltx_title_document[^"]*"[^>]*>(.*?)</(?:h[1-6])>"#).unwrap();
let preprocessed_html = title_re.replace_all(&preprocessed_html, |caps: &regex::Captures| {
let inner = strip_html_tags(&caps[1]);
format!("\n# {}\n\n", inner.trim())
}).to_string();
// 预处理 HTML 中的 sup, sub, inf 标签为更干净的 markdown 格式,解决 html2md 不转换带属性的上下标的问题
let sup_re = Regex::new(r#"(?s)<sup[^>]*>(.*?)</sup>"#).unwrap();
let preprocessed_html = sup_re.replace_all(&preprocessed_html, "^{$1}").to_string();
let sub_re = Regex::new(r#"(?s)<sub[^>]*>(.*?)</sub>"#).unwrap();
let preprocessed_html = sub_re.replace_all(&preprocessed_html, "_{$1}").to_string();
let inf_re = Regex::new(r#"(?s)<inf[^>]*>(.*?)</inf>"#).unwrap();
let preprocessed_html = inf_re.replace_all(&preprocessed_html, "_{$1}").to_string();
// 预处理:去除 <cite> 标签以防止 html2md 将其转为 blockquote (>) 导致行内引用异常断行
let cite_start_re = Regex::new(r#"(?s)<cite[^>]*>"#).unwrap();
let preprocessed_html = cite_start_re.replace_all(&preprocessed_html, "").to_string();
let cite_end_re = Regex::new(r#"(?s)</cite>"#).unwrap();
let preprocessed_html = cite_end_re.replace_all(&preprocessed_html, "").to_string();
// 预处理 HTML 中的 <img> 标签,将相对路径的图片链接补全为 ar5iv 绝对路径,并统一转换为标准 Markdown 图片格式 ![alt](url)
let img_re = Regex::new(r#"(?s)<img\s+([^>]*?)>"#).unwrap();
let preprocessed_html = img_re.replace_all(&preprocessed_html, |caps: &regex::Captures| {
let attrs = &caps[1];
let src_re = Regex::new(r#"src="([^"]*)""#).unwrap();
let src = src_re.captures(attrs)
.map(|c| c[1].to_string())
.unwrap_or_default();
let alt_re = Regex::new(r#"alt="([^"]*)""#).unwrap();
let alt = alt_re.captures(attrs)
.map(|c| c[1].to_string())
.unwrap_or_else(|| "image".to_string());
let absolute_src = if src.starts_with('/') {
format!("https://ar5iv.labs.arxiv.org{}", src)
} else {
src
};
format!("![{}]({})", alt, absolute_src)
}).to_string();
// 预处理 HTML 中的 LaTeXML 模拟表格标记,将 span 模拟 of tabular/tr/td/th 转换为真正的 <table> 结构以保证 Markdown 排版
let td_re = Regex::new(r#"(?s)<span\s+([^>]*?class="[^"]*ltx_t[dh][^"]*"[^>]*?)>(.*?)</span>"#).unwrap();
let preprocessed_html = td_re.replace_all(&preprocessed_html, " <td>$2</td> ").to_string();
let tr_re = Regex::new(r#"(?s)<span\s+([^>]*?class="[^"]*ltx_tr[^"]*"[^>]*?)>(.*?)</span>"#).unwrap();
let preprocessed_html = tr_re.replace_all(&preprocessed_html, " <tr>$2</tr> ").to_string();
let table_re = Regex::new(r#"(?s)<span\s+([^>]*?class="[^"]*ltx_tabular[^"]*"[^>]*?)>(.*?)</span>"#).unwrap();
let preprocessed_html = table_re.replace_all(&preprocessed_html, " <table>$2</table> ").to_string();
let mut markdown = html2md::parse_html(&preprocessed_html);
// 将公式占位符以逆序还原为原始干净的 LaTeX 格式 ($...$ 或 $$...$$),避免前缀匹配冲突(例如 MATHPLACEHOLDER1 误匹配 MATHPLACEHOLDER10 的前缀)
for i in (0..formulas.len()).rev() {
let (ref alttext, is_block) = formulas[i];
let placeholder = format!("MATHPLACEHOLDER{}", i);
let replacement = if is_block {
format!(" $${}$$ ", alttext)
} else {
format!(" ${}$ ", alttext)
};
markdown = markdown.replace(&placeholder, &replacement);
}
let cleaned = postprocess_markdown(&markdown);
Ok(cleaned)
}
// 移除 Markdown 垃圾属性标识并清洗每行格式
fn postprocess_markdown(text: &str) -> String {
// 按行清理多余前导/尾随空格,同时保留 fenced 代码块内的缩进
let mut clean_lines = Vec::new();
let mut in_code_block = false;
for line in text.lines() {
let trimmed = line.trim();
if trimmed.starts_with("```") {
in_code_block = !in_code_block;
}
if in_code_block {
clean_lines.push(line.to_string());
} else {
clean_lines.push(trimmed.to_string());
}
}
let mut md = clean_lines.join("\n");
let div_re = Regex::new(r"</?div[^>]*>").unwrap();
let span_re = Regex::new(r"</?span[^>]*>").unwrap();
md = div_re.replace_all(&md, "").to_string();
md = span_re.replace_all(&md, "").to_string();
let empty_brackets = Regex::new(r"\[\]").unwrap();
md = empty_brackets.replace_all(&md, "").to_string();
let excessive_newlines = Regex::new(r"\n{4,}").unwrap();
md = excessive_newlines.replace_all(&md, "\n\n\n").to_string();
// 还原被 html2md 自动转义的标题与引用符号
let unescape_h1 = Regex::new(r"\\#\s+").unwrap();
let unescape_h2 = Regex::new(r"\\##\s+").unwrap();
let unescape_h3 = Regex::new(r"\\###\s+").unwrap();
let unescape_h4 = Regex::new(r"\\####\s+").unwrap();
let unescape_quote = Regex::new(r"\\>\s+").unwrap();
let unescape_bold = Regex::new(r"\\\*\\\*").unwrap();
md = unescape_h1.replace_all(&md, "# ").to_string();
md = unescape_h2.replace_all(&md, "## ").to_string();
md = unescape_h3.replace_all(&md, "### ").to_string();
md = unescape_h4.replace_all(&md, "#### ").to_string();
md = unescape_quote.replace_all(&md, "> ").to_string();
md = unescape_bold.replace_all(&md, "**").to_string();
// 还原 HTML 实体转义符以保证 Markdown/LaTeX 中数学符号(如 < 和 >)正常渲染
md = md
.replace("&lt;", "<")
.replace("&gt;", ">")
.replace("&amp;", "&")
.replace("&quot;", "\"")
.replace("&#39;", "'");
md.trim().to_string()
}
// 简单移除 HTML 标签,返回纯文本内容(用于标题/图注提取)
fn strip_html_tags(html: &str) -> String {
let tag_re = Regex::new(r"<[^>]+>").unwrap();
let text = tag_re.replace_all(html, "").to_string();
// 解码常见 HTML 实体
text.replace("&amp;", "&")
.replace("&lt;", "<")
.replace("&gt;", ">")
.replace("&quot;", "\"")
.replace("&nbsp;", " ")
.replace("&#39;", "'")
}
// 调用 MinerU 远程接口解析 PDF并在提取出图片后自动上传至七牛云进行外链替换
pub async fn parse_pdf_via_mineru(
pdf_path: &Path,
qiniu_client: &QiniuClient,
config: &Config
) -> anyhow::Result<String> {
info!("正在请求 MinerU 解析本地 PDF 文献: {:?}", pdf_path);
if config.mineru_api_url.is_empty() {
return Err(anyhow::anyhow!("未在环境变量 .env 中配置 MINERU_API_URL"));
}
let pdf_bytes = fs::read(pdf_path)?;
let filename = pdf_path.file_name()
.and_then(|f| f.to_str())
.unwrap_or("paper.pdf")
.to_string();
let file_part = multipart::Part::bytes(pdf_bytes).file_name(filename);
let form = multipart::Form::new()
.part("file", file_part);
info!("正在发送 PDF 字节流至 MinerU 接口地址: {}", config.mineru_api_url);
let client = reqwest::Client::new();
let mut request = client.post(&config.mineru_api_url).multipart(form);
if !config.mineru_api_key.is_empty() {
request = request.header("Authorization", format!("Bearer {}", config.mineru_api_key));
}
let response = request.send().await?;
if !response.status().is_success() {
return Err(anyhow::anyhow!("MinerU 解析接口返回失败码: {}", response.status()));
}
// MinerU 远程服务响应 JSON包含转换出的 markdown 正文和图片映射
#[derive(Deserialize)]
struct MinerUResponse {
markdown: String,
images: Option<std::collections::HashMap<String, String>>, // 图片文件名 -> Base64 字符串
}
let result: MinerUResponse = response.json().await?;
let mut markdown = result.markdown;
// 上传图片并重写 Markdown 连接地址
if let Some(images) = result.images {
if qiniu_client.is_configured() {
info!("MinerU 成功解析出 {} 张本地插图。正在准备同步至七牛云...", images.len());
for (img_name, base64_data) in images {
if let Ok(img_bytes) = base64::engine::general_purpose::STANDARD.decode(base64_data) {
match qiniu_client.upload_buffer(img_bytes, &img_name).await {
Ok(qiniu_url) => {
// 使用正则将 Markdown 中的本地临时图地址替换为七牛云 CDN 地址
let escaped_img_name = regex::escape(&img_name);
let link_re = Regex::new(&format!(r"\(([^)]*?){}\)", escaped_img_name)).unwrap();
markdown = link_re.replace_all(&markdown, |_: &regex::Captures| {
format!("({})", qiniu_url)
}).to_string();
},
Err(e) => warn!("上传图片至七牛云失败 {}: {}", img_name, e),
}
}
}
} else {
warn!("未检测到七牛云配置,解析出的图片将保留临时地址,无法在外网或 Obsidian 中直观预览");
}
}
Ok(markdown)
}
#[cfg(test)]
mod tests {
use super::*;
use std::io::Write;
#[test]
fn test_postprocess_markdown() {
let dirty = "<div>Hello</div> <span class=\"abc\">World</span> [] &lt;math&gt;\n\n\n\n\nNew Paragraph";
let cleaned = postprocess_markdown(dirty);
assert_eq!(cleaned, "Hello World <math>\n\n\nNew Paragraph");
}
#[test]
fn test_html_to_markdown() -> anyhow::Result<()> {
let html_content = r#"
<!DOCTYPE html>
<html>
<body>
<div class="ltx_page_main">
<h1>Test Document</h1>
<p>This is a <strong>test</strong> paragraph.</p>
</div>
</body>
</html>
"#;
let mut path = std::env::temp_dir();
path.push("test_doc.html");
{
let mut file = std::fs::File::create(&path)?;
file.write_all(html_content.as_bytes())?;
}
let md = html_to_markdown(&path);
let _ = std::fs::remove_file(&path);
let md_content = md?;
assert!(md_content.contains("Test Document"));
assert!(md_content.contains("This is a **test** paragraph."));
Ok(())
}
#[test]
fn test_html_to_markdown_math_and_table() -> anyhow::Result<()> {
let html_content = r#"
<div class="ltx_page_main">
<p>Here is math: <math alttext="\approx" display="inline"><semantics><mo></mo><annotation-xml><approx></approx></annotation-xml><annotation>\approx</annotation></semantics></math> and block <math alttext="\sum_{i=1}^n" display="block">...</math></p>
<span class="ltx_tabular">
<span class="ltx_tr">
<span class="ltx_td">sdB</span>
<span class="ltx_td">subdwarf B</span>
</span>
</span>
</div>
"#;
let mut path = std::env::temp_dir();
path.push("test_math_table.html");
{
let mut file = std::fs::File::create(&path)?;
file.write_all(html_content.as_bytes())?;
}
let md = html_to_markdown(&path);
let _ = std::fs::remove_file(&path);
let md_content = md?;
// 验证数学公式被成功以未转义的 Latex 格式提取还原
assert!(md_content.contains(r#"$\approx$"#));
assert!(md_content.contains(r#"$$\sum_{i=1}^n$$"#));
// 验证表格被转换成了标准 table
assert!(md_content.contains("sdB"));
assert!(md_content.contains("subdwarf B"));
Ok(())
}
#[test]
fn test_html_to_markdown_math_in_headings_and_captions() -> anyhow::Result<()> {
let html_content = r#"
<div class="ltx_page_main">
<h2 class="ltx_title_section">Heading with math <math alttext="\theta_{eff}" display="inline"><semantics><mo></mo><annotation>\theta_{eff}</annotation></semantics></math></h2>
<figcaption>Figure caption with inline math <math alttext="M_\odot" display="inline"><semantics><mo></mo><annotation>M_\odot</annotation></semantics></math> details.</figcaption>
</div>
"#;
let mut path = std::env::temp_dir();
path.push("test_math_heading_caption.html");
{
let mut file = std::fs::File::create(&path)?;
file.write_all(html_content.as_bytes())?;
}
let md = html_to_markdown(&path);
let _ = std::fs::remove_file(&path);
let md_content = md?;
println!("Markdown content:\n{}", md_content);
// 验证公式占位符在标题和图注内没有被 strip_html_tags 破坏,并能恢复成正确的 Latex
assert!(md_content.contains("## Heading with math"));
assert!(md_content.contains(r#"$\theta_{eff}$"#));
assert!(md_content.contains("> **Figure:** Figure caption with inline math"));
assert!(md_content.contains(r#"$M_\odot$"#));
Ok(())
}
}

146
src/qiniu.rs Normal file
View File

@ -0,0 +1,146 @@
use sha1::Sha1;
use hmac::{Hmac, Mac};
use base64::{Engine as _, engine::general_purpose::URL_SAFE_NO_PAD};
use reqwest::multipart;
use tracing::{info, error};
type HmacSha1 = Hmac<Sha1>;
// 七牛云存储访问客户端
pub struct QiniuClient {
access_key: String,
secret_key: String,
bucket: String,
domain: String,
client: reqwest::Client,
}
impl QiniuClient {
pub fn new(access_key: String, secret_key: String, bucket: String, domain: String) -> Self {
QiniuClient {
access_key,
secret_key,
bucket,
domain,
client: reqwest::Client::new(),
}
}
// 判断配置项是否齐全
pub fn is_configured(&self) -> bool {
!self.access_key.is_empty() && !self.secret_key.is_empty() && !self.bucket.is_empty()
}
// 依照七牛云规范,使用 HMAC-SHA1 算法生成上传凭证 Token
fn generate_upload_token(&self, key: &str) -> String {
// 设置 1 小时过期
let deadline = chrono::Utc::now().timestamp() + 3600;
let policy = serde_json::json!({
"scope": format!("{}:{}", self.bucket, key),
"deadline": deadline
});
let policy_str = policy.to_string();
let encoded_policy = URL_SAFE_NO_PAD.encode(policy_str.as_bytes());
let mut mac = HmacSha1::new_from_slice(self.secret_key.as_bytes())
.expect("HMAC 密钥可接收任意大小");
mac.update(encoded_policy.as_bytes());
let result = mac.finalize();
let signature = result.into_bytes();
let encoded_signature = URL_SAFE_NO_PAD.encode(&signature);
format!("{}:{}:{}", self.access_key, encoded_signature, encoded_policy)
}
// 上传图片等字节流数据到七牛云,返回 CDN 加速外链 URL
pub async fn upload_buffer(&self, buffer: Vec<u8>, filename: &str) -> anyhow::Result<String> {
if !self.is_configured() {
return Err(anyhow::anyhow!("本地 .env 文件中未正确配置七牛云参数"));
}
// 使用毫秒级时间戳防重名覆盖
let timestamp = chrono::Utc::now().timestamp_millis();
let key = format!("astroresearch_{}_{}", timestamp, filename);
let token = self.generate_upload_token(&key);
info!("正在上传文献提取图片到七牛云: key='{}'", key);
let form = multipart::Form::new()
.text("token", token)
.text("key", key.clone())
.part("file", multipart::Part::bytes(buffer).file_name(filename.to_string()));
let upload_url = "https://up.qiniu.com";
let response = self.client.post(upload_url)
.multipart(form)
.send()
.await?;
if !response.status().is_success() {
let status = response.status();
let body = response.text().await.unwrap_or_default();
error!("七牛云图片上传失败: 状态码={}, 返回={}", status, body);
return Err(anyhow::anyhow!("七牛云上传响应失败 {}", status));
}
// 拼接最终的下载/访问外链外网地址
let mut base_domain = self.domain.clone();
if !base_domain.starts_with("http://") && !base_domain.starts_with("https://") {
base_domain = format!("http://{}", base_domain);
}
if base_domain.ends_with('/') {
base_domain.pop();
}
let file_url = format!("{}/{}", base_domain, key);
info!("七牛云图片上传成功。访问链接: {}", file_url);
Ok(file_url)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_qiniu_configuration() {
let client = QiniuClient::new(
"".to_string(),
"".to_string(),
"".to_string(),
"".to_string(),
);
assert!(!client.is_configured());
let client2 = QiniuClient::new(
"ak".to_string(),
"sk".to_string(),
"bucket".to_string(),
"domain".to_string(),
);
assert!(client2.is_configured());
}
#[test]
fn test_qiniu_token_generation() {
let client = QiniuClient::new(
"test_ak".to_string(),
"test_sk".to_string(),
"test_bucket".to_string(),
"test_domain".to_string(),
);
let token = client.generate_upload_token("test_key.png");
assert!(token.starts_with("test_ak:"));
let parts: Vec<&str> = token.split(':').collect();
assert_eq!(parts.len(), 3);
assert_eq!(parts[0], "test_ak");
assert!(!parts[1].is_empty());
assert!(!parts[2].is_empty());
}
}

232
src/translation.rs Normal file
View File

@ -0,0 +1,232 @@
// src/translation.rs
use std::collections::{HashMap, HashSet};
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::path::Path;
use serde::Deserialize;
use tracing::{info, warn, error};
use crate::config::Config;
// 天文学专有名词英汉词典匹配管理
#[derive(Clone, Debug)]
pub struct Dictionary {
// 英文名词(全小写) -> 中文标准译名
terms: HashMap<String, String>,
}
impl Dictionary {
pub fn new() -> Self {
Dictionary {
terms: HashMap::new(),
}
}
// 从本地物理文本加载词表数据
pub fn load_from_file<P: AsRef<Path>>(&mut self, path: P) -> anyhow::Result<()> {
let path_ref = path.as_ref();
if !path_ref.exists() {
warn!("词典文件不存在,请检查配置路径: {:?}", path_ref);
return Ok(());
}
info!("正在加载天文学名词词典: {:?}", path_ref);
let file = File::open(path_ref)?;
let reader = BufReader::new(file);
let mut count = 0;
for line in reader.lines() {
let line = line?;
let parts: Vec<&str> = line.split('\t').collect();
if parts.len() >= 2 {
let english = parts[0].trim().to_lowercase();
let chinese = parts[1].trim().to_string();
if !english.is_empty() && !chinese.is_empty() {
self.terms.insert(english, chinese);
count += 1;
}
}
}
info!("天文词典加载成功,总计导入 {} 条专业术语对照", count);
Ok(())
}
// 在英文文献内容中匹配包含的专业词汇,提取其中英文映射关系以供大模型辅助翻译
pub fn match_text(&self, text: &str) -> Vec<(String, String)> {
if self.terms.is_empty() {
return Vec::new();
}
// 基础分词清理:保留字母数字及连接符,其余视为空格以进行精确段落划分
let clean_text = text
.chars()
.map(|c| if c.is_alphanumeric() || c == '-' || c == '\'' || c == ' ' { c } else { ' ' })
.collect::<String>();
let words: Vec<&str> = clean_text.split_whitespace().collect();
let mut matched = HashSet::new();
let mut results = Vec::new();
// 天文学词条跨度最大限制(一般多词短语不超过 6 个英文单词)
let max_span = 6;
let n = words.len();
for i in 0..n {
for len in (1..=max_span).rev() {
if i + len <= n {
let phrase_slice = &words[i..i + len];
let phrase = phrase_slice.join(" ").to_lowercase();
if self.terms.contains_key(&phrase) {
// 避免重复匹配更长名词的子词 (如已匹配 'active galactic nucleus' 就不重复提取其中的 'nucleus')
if !matched.contains(&phrase) {
let chinese = self.terms.get(&phrase).unwrap().clone();
let original_phrase = &words[i..i + len].join(" ");
results.push((original_phrase.clone(), chinese.clone()));
matched.insert(phrase);
}
}
}
}
}
// 优先长词组进行匹配,以防短词冲突影响大模型指引
results.sort_by(|a, b| b.0.len().cmp(&a.0.len()));
results
}
}
// 提取文献专业天文词对照提示词,调用 LLM 大模型进行保留公式的高精度学术翻译
pub async fn translate_markdown(
markdown_content: &str,
dict: &Dictionary,
config: &Config
) -> anyhow::Result<String> {
if config.llm_api_key.is_empty() {
return Err(anyhow::anyhow!("本地配置中缺少 LLM_API_KEY"));
}
// 在英文文献中扫描天文词典匹配专业词汇
let matched_terms = dict.match_text(markdown_content);
let mut terms_instruction = String::new();
if !matched_terms.is_empty() {
terms_instruction.push_str("\n\n在翻译时,请遵循以下天文学名词对照表(严格使用对应的中文译名):\n");
for (en, zh) in matched_terms.iter().take(50) { // 最多注入前 50 条防止超量
terms_instruction.push_str(&format!("- \"{}\" 必须翻译为 \"{}\"\n", en, zh));
}
}
let system_prompt = format!(
"你是一位专业的天文学家和学术翻译家。请将以下英文天文学文献段落翻译成中文。\n\
\n\
1. \n\
2. ** LaTeX $...$ $$...$$ Markdown **\n\
3. {}\n\
",
terms_instruction
);
info!("正在请求大模型开展中英翻译。所选大模型: {}", config.llm_model);
let client = reqwest::Client::new();
let url = format!("{}/chat/completions", config.llm_api_base);
let payload = serde_json::json!({
"model": config.llm_model,
"messages": [
{
"role": "system",
"content": system_prompt
},
{
"role": "user",
"content": markdown_content
}
],
"temperature": 0.3
});
let response = client.post(&url)
.header("Authorization", format!("Bearer {}", config.llm_api_key))
.header("Content-Type", "application/json")
.json(&payload)
.send()
.await?;
if !response.status().is_success() {
let status = response.status();
let body = response.text().await.unwrap_or_default();
error!("LLM 翻译接口调用失败: 状态码={}, 报错={}", status, body);
return Err(anyhow::anyhow!("大模型接口返回错误状态: {}", status));
}
#[derive(Deserialize)]
struct Message {
content: String,
}
#[derive(Deserialize)]
struct Choice {
message: Message,
}
#[derive(Deserialize)]
struct LLMResponse {
choices: Vec<Choice>,
}
let res_data: LLMResponse = response.json().await?;
if let Some(choice) = res_data.choices.first() {
Ok(choice.message.content.clone())
} else {
Err(anyhow::anyhow!("大模型返回空翻译选项集"))
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::io::Write;
#[test]
fn test_dictionary_match() {
let mut dict = Dictionary::new();
// 模拟词典数据
dict.terms.insert("active galactic nucleus".to_string(), "活动星系核".to_string());
dict.terms.insert("galactic nucleus".to_string(), "星系核".to_string());
dict.terms.insert("nucleus".to_string(), "核心".to_string());
dict.terms.insert("black hole".to_string(), "黑洞".to_string());
let text = "We study the active galactic nucleus and its central black hole.";
let matched = dict.match_text(text);
let phrases: Vec<String> = matched.iter().map(|(en, _)| en.clone()).collect();
assert!(phrases.contains(&"active galactic nucleus".to_string()));
assert!(phrases.contains(&"black hole".to_string()));
// 验证最长的在前面
assert_eq!(matched[0].0, "active galactic nucleus");
}
#[test]
fn test_load_from_file() -> anyhow::Result<()> {
let mut path = std::env::temp_dir();
path.push("test_astrodict.txt");
{
let mut file = File::create(&path)?;
writeln!(file, "active galactic nucleus\t活动星系核")?;
writeln!(file, "black hole\t黑洞")?;
}
let mut dict = Dictionary::new();
let res = dict.load_from_file(&path);
let _ = std::fs::remove_file(&path);
res?;
assert_eq!(dict.terms.get("active galactic nucleus").unwrap(), "活动星系核");
assert_eq!(dict.terms.get("black hole").unwrap(), "黑洞");
Ok(())
}
}