2026-06-08 14:54:56 +08:00
|
|
|
|
# Diary News · 私人多源新闻翻译系统
|
2026-06-07 21:51:01 +08:00
|
|
|
|
|
2026-06-08 14:54:56 +08:00
|
|
|
|
> 抓境外权威源 → 自动中英对照 → 智能排版/分类/插图/点评。
|
|
|
|
|
|
> 跑在一台 2C/2G/30G 的香港 VPS 上,自用 + 家人/小圈子。
|
2026-06-07 21:51:01 +08:00
|
|
|
|
|
2026-06-08 14:54:56 +08:00
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 目录
|
|
|
|
|
|
|
|
|
|
|
|
1. [项目目标](#项目目标)
|
|
|
|
|
|
2. [关键特性](#关键特性)
|
|
|
|
|
|
3. [架构概览](#架构概览)
|
|
|
|
|
|
4. [技术栈](#技术栈)
|
|
|
|
|
|
5. [仓库结构](#仓库结构)
|
|
|
|
|
|
6. [数据模型](#数据模型)
|
|
|
|
|
|
7. [快速开始](#快速开始)
|
|
|
|
|
|
8. [功能详解](#功能详解)
|
|
|
|
|
|
9. [LLM 智能增强](#llm-智能增强)
|
|
|
|
|
|
10. [API 概览](#api-概览)
|
|
|
|
|
|
11. [开发-部署工作流](#开发-部署工作流)
|
|
|
|
|
|
12. [运维工具](#运维工具)
|
|
|
|
|
|
13. [故障排查](#故障排查)
|
|
|
|
|
|
14. [路线图](#路线图)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 项目目标
|
|
|
|
|
|
|
|
|
|
|
|
**Why 存在**:
|
|
|
|
|
|
|
|
|
|
|
|
| 痛点 | 解决 |
|
|
|
|
|
|
|---|---|
|
|
|
|
|
|
| 信息茧房(算法推荐让你只看一类) | 多源并列,无个性化排序,纯时间序 |
|
|
|
|
|
|
| 翻译门槛(英文看不动) | 自动翻译(腾讯云 TMT + 本地 NLLB 降级) |
|
|
|
|
|
|
| 内容保存难(网页 404、推文删) | 抓取入库 + 全文 + 译文,永久可查 |
|
|
|
|
|
|
| 单台服务器成本敏感 | 30G 跑全栈,月费 ≤ 50 HKD |
|
|
|
|
|
|
|
|
|
|
|
|
**Who 给谁用**:
|
|
|
|
|
|
|
|
|
|
|
|
- **owner**: 唯一管理员,管理源 / 提示词 / 看健康看板
|
|
|
|
|
|
- **member**: 家庭成员 / 朋友,登录看文章 / 收藏 / 订阅关键词
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 关键特性
|
|
|
|
|
|
|
|
|
|
|
|
- 🌍 **多源 RSS 抓取**:Reuters / BBC / Al Jazeera / NHK / DW,带失败退避(连续 3 次失败把间隔 × 2,封顶 12 小时)
|
|
|
|
|
|
- 🌐 **智能翻译**:腾讯云 TMT(月 500 万字符配额)→ 本地 NLLB-200 降级,30 天 Redis 缓存避免重复
|
|
|
|
|
|
- 🤖 **LLM 智能增强** *(新)*:翻译完成后自动跑 4 项 LLM 任务 — 排版 / 分类 / 插图 / 点评
|
|
|
|
|
|
- 🎨 **AI 配图**:文生图模型自动为每篇文章生成插图(走 Agnes 平台,带限速)
|
|
|
|
|
|
- 👤 **双角色鉴权**:JWT(access 60min + refresh 14d) + API Token(sha256,可撤销,给 Android 预留)
|
|
|
|
|
|
- 📌 **收藏 + 关键词订阅**:用户级书签,服务端定时按关键词命中推送(预留 Telegram 通道)
|
|
|
|
|
|
- 📊 **管理看板**:源健康度 / 翻译配额 / LLM 状态,全部可视化
|
|
|
|
|
|
- 🔄 **热加载**:源/提示词改了不用重启,worker 每天 00:30 重建 job
|
|
|
|
|
|
- 🚀 **一键部署**:SSH 推公钥 + 一键 `git pull` 流程
|
|
|
|
|
|
- 🔒 **安全默认**:bcrypt 密码、API Token 加密、SQL 注入免疫(SQLAlchemy 2.0 参数化)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 架构概览
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
┌─────────────────────────────────────────────────────────┐
|
|
|
|
|
|
│ VPS (HK-News · 2C/2G/30G · Ubuntu 24) │
|
|
|
|
|
|
│ │
|
|
|
|
|
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
|
|
|
|
|
│ │ postgres │ │ redis │ │ caddy │ ← 唯一 │
|
|
|
|
|
|
│ └──────────┘ └──────────┘ │ :80/443 │ 对外 │
|
|
|
|
|
|
│ ▲ ▲ └────┬─────┘ │
|
|
|
|
|
|
│ │ │ │ │
|
|
|
|
|
|
│ ┌────┴──────┐ ┌────┴────┐ ┌────────┴───────┐ │
|
|
|
|
|
|
│ │ api │ │ worker │ │ frontend │ │
|
|
|
|
|
|
│ │ FastAPI │ │APSch+任务│ │ (nginx + SPA)│ │
|
|
|
|
|
|
│ └────┬──────┘ └────┬────┘ └────────────────┘ │
|
|
|
|
|
|
│ │ │ │
|
|
|
|
|
|
│ └──────┬───────┘ │
|
|
|
|
|
|
│ │ │
|
|
|
|
|
|
│ ┌───────────▼──────────┐ │
|
|
|
|
|
|
│ │ RSS 抓取(feedparser)│ │
|
|
|
|
|
|
│ │ 翻译(Tencent+NLLB) │ │
|
|
|
|
|
|
│ │ LLM 增强(Agnes) │ ← 排版/分类/插图/点评 │
|
|
|
|
|
|
│ │ url_hash 去重 │ │
|
|
|
|
|
|
│ └──────────────────────┘ │
|
|
|
|
|
|
└──────────────────────────────────────────────────────────┘
|
|
|
|
|
|
│
|
|
|
|
|
|
│ HTTPS / Push / Pull
|
|
|
|
|
|
▼
|
|
|
|
|
|
┌──────────────────────┐
|
|
|
|
|
|
│ Gitea(代码托管) │
|
|
|
|
|
|
│ http://...:3000 │
|
|
|
|
|
|
└──────────────────────┘
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**数据流**(单篇文章的一生):
|
|
|
|
|
|
```
|
|
|
|
|
|
RSS Feed → feedparser → FetchedItem
|
|
|
|
|
|
↓
|
|
|
|
|
|
url_hash = SHA1(url) + ON CONFLICT DO NOTHING ← 去重
|
|
|
|
|
|
↓ (新文章入库,translation_status=pending)
|
|
|
|
|
|
↓
|
|
|
|
|
|
[translation_loop] 1篇/秒 Semaphore(1)
|
|
|
|
|
|
↓ 调 腾讯 TMT → body_zh_text/html
|
|
|
|
|
|
↓ status: pending → ok
|
|
|
|
|
|
↓
|
|
|
|
|
|
[enrichment_loop] 扫描 *_status=pending 的已译文章
|
|
|
|
|
|
↓ 调 Agnes LLM: 排版 → 分类 → 插图 → 点评
|
|
|
|
|
|
↓ 4 任务独立 try/except,共享 chat_sem + image_sem 限速
|
|
|
|
|
|
↓ status: ok / failed
|
|
|
|
|
|
↓
|
|
|
|
|
|
[文章详情接口] 原文 + 译文 + AI 排版版 + 分类 + 插图 + 点评 全部展示
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 技术栈
|
|
|
|
|
|
|
|
|
|
|
|
### 后端
|
|
|
|
|
|
|
|
|
|
|
|
| 层 | 选型 |
|
|
|
|
|
|
|---|---|
|
|
|
|
|
|
| 语言 | Python 3.12 |
|
|
|
|
|
|
| Web 框架 | FastAPI 0.115 + Uvicorn |
|
|
|
|
|
|
| ORM | SQLAlchemy 2.0(asyncio)+ Alembic |
|
|
|
|
|
|
| 数据库 | PostgreSQL 16(asyncpg + psycopg2) |
|
|
|
|
|
|
| 缓存/限速 | Redis 7(256MB LRU) |
|
|
|
|
|
|
| HTTP 客户端 | httpx 0.28(异步) |
|
|
|
|
|
|
| RSS 解析 | feedparser 6 |
|
|
|
|
|
|
| HTML 抽取 | trafilatura 2 + BeautifulSoup4 + lxml |
|
|
|
|
|
|
| 翻译主 | 腾讯云 TMT(SDK: tencentcloud-sdk-python) |
|
|
|
|
|
|
| 翻译降级 | transformers + NLLB-200-distilled-600M |
|
|
|
|
|
|
| LLM | Agnes(OpenAI 兼容): agnes-2.0-flash / agnes-image-2.1-flash |
|
|
|
|
|
|
| 调度 | APScheduler 3.10(AsyncIO + Cron + Interval) |
|
|
|
|
|
|
| 鉴权 | passlib[bcrypt] 1.7 + bcrypt 4.0.1 + PyJWT 2.10 |
|
|
|
|
|
|
|
|
|
|
|
|
### 前端
|
|
|
|
|
|
|
|
|
|
|
|
| 层 | 选型 |
|
|
|
|
|
|
|---|---|
|
|
|
|
|
|
| 框架 | Vue 3.5 + Vite 5 |
|
|
|
|
|
|
| UI 库 | Naive UI 2.40 |
|
|
|
|
|
|
| 状态 | Pinia 2.2 |
|
|
|
|
|
|
| 路由 | vue-router 4.4 |
|
|
|
|
|
|
| HTTP | axios 1.7(自动 401 refresh) |
|
|
|
|
|
|
| 时间 | dayjs 1.11 |
|
|
|
|
|
|
|
|
|
|
|
|
### 部署
|
|
|
|
|
|
|
|
|
|
|
|
| 层 | 选型 |
|
|
|
|
|
|
|---|---|
|
|
|
|
|
|
| 容器化 | Docker Compose(7 服务) |
|
|
|
|
|
|
| 反代 | Caddy 2(alpine) |
|
|
|
|
|
|
| 静态 | nginx 1.27-alpine(SPA fallback) |
|
|
|
|
|
|
| 构建 | pyproject.toml(setuptools) + npm |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
2026-06-07 21:51:01 +08:00
|
|
|
|
|
|
|
|
|
|
## 仓库结构
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
diary-news/
|
2026-06-08 14:54:56 +08:00
|
|
|
|
├── backend/ # FastAPI 后端
|
2026-06-07 21:51:01 +08:00
|
|
|
|
│ ├── app/
|
2026-06-08 14:54:56 +08:00
|
|
|
|
│ │ ├── api/ # HTTP 路由
|
|
|
|
|
|
│ │ │ ├── auth.py # /auth/login, /auth/refresh
|
|
|
|
|
|
│ │ │ ├── me.py # /me, /me/usage(翻译配额)
|
|
|
|
|
|
│ │ │ ├── articles.py # /articles 列表 + 详情 + 游标分页
|
|
|
|
|
|
│ │ │ ├── sources.py # /sources 只读列表
|
|
|
|
|
|
│ │ │ ├── bookmarks.py # /bookmarks 收藏
|
|
|
|
|
|
│ │ │ ├── subscriptions.py # /subscriptions 关键词订阅
|
|
|
|
|
|
│ │ │ ├── admin.py # /admin/sources, /admin/health, /admin/refresh
|
|
|
|
|
|
│ │ │ └── admin_llm.py # /admin/llm/settings, /admin/llm/enrich/{id}
|
|
|
|
|
|
│ │ ├── core/
|
|
|
|
|
|
│ │ │ ├── security.py # bcrypt + JWT + API Token
|
|
|
|
|
|
│ │ │ └── deps.py # get_current_user, require_owner
|
|
|
|
|
|
│ │ ├── models/ # SQLAlchemy 2.0 ORM
|
|
|
|
|
|
│ │ │ ├── user.py
|
|
|
|
|
|
│ │ │ ├── source.py
|
|
|
|
|
|
│ │ │ ├── article.py
|
|
|
|
|
|
│ │ │ ├── bookmark.py
|
|
|
|
|
|
│ │ │ ├── subscription.py
|
|
|
|
|
|
│ │ │ ├── api_token.py
|
|
|
|
|
|
│ │ │ └── llm_setting.py # LLM 设置(单行表)
|
|
|
|
|
|
│ │ ├── schemas/ # Pydantic v2 I/O 模型
|
|
|
|
|
|
│ │ │ ├── auth.py
|
|
|
|
|
|
│ │ │ ├── article.py
|
|
|
|
|
|
│ │ │ ├── source.py
|
|
|
|
|
|
│ │ │ ├── llm.py # LLM 设置 schemas + 默认提示词
|
|
|
|
|
|
│ │ │ └── misc.py
|
|
|
|
|
|
│ │ ├── services/
|
|
|
|
|
|
│ │ │ ├── fetchers/ # RSS / HTML / Telegram 采集器
|
|
|
|
|
|
│ │ │ ├── translation/ # 腾讯 TMT + 本地 NLLB + 配额门面
|
|
|
|
|
|
│ │ │ └── llm/ # Agnes client + 智能增强
|
|
|
|
|
|
│ │ │ ├── client.py
|
|
|
|
|
|
│ │ │ └── enrichment.py
|
|
|
|
|
|
│ │ ├── workers/
|
|
|
|
|
|
│ │ │ ├── __main__.py # APScheduler + translation_loop + enrichment_loop
|
|
|
|
|
|
│ │ │ └── pipeline.py # fetch_one_source / translate_article
|
|
|
|
|
|
│ │ ├── scripts/ # 初始化(create_user / seed_sources)
|
|
|
|
|
|
│ │ ├── config.py # Pydantic Settings(从 .env 读)
|
|
|
|
|
|
│ │ ├── database.py # 异步 SQLAlchemy 引擎
|
|
|
|
|
|
│ │ ├── redis_client.py # Redis 单例
|
|
|
|
|
|
│ │ └── main.py # FastAPI 入口
|
|
|
|
|
|
│ ├── alembic/
|
|
|
|
|
|
│ │ ├── env.py
|
|
|
|
|
|
│ │ ├── versions/
|
|
|
|
|
|
│ │ │ ├── 0001_initial.py # 6 张表 + 枚举
|
|
|
|
|
|
│ │ │ └── 0002_llm_settings_and_articles_ai.py
|
|
|
|
|
|
│ │ └── alembic.ini
|
|
|
|
|
|
│ ├── Dockerfile # python:3.12-slim
|
|
|
|
|
|
│ ├── pyproject.toml
|
|
|
|
|
|
│ └── .env.example
|
|
|
|
|
|
│
|
|
|
|
|
|
├── frontend/ # Vue 3 + Vite + Naive UI
|
|
|
|
|
|
│ ├── src/
|
|
|
|
|
|
│ │ ├── main.ts # createApp + Pinia + Router
|
|
|
|
|
|
│ │ ├── App.vue
|
|
|
|
|
|
│ │ ├── router.ts # 路由守卫(requiresAuth / ownerOnly)
|
|
|
|
|
|
│ │ ├── api/
|
|
|
|
|
|
│ │ │ ├── client.ts # axios + 401 refresh 拦截器
|
|
|
|
|
|
│ │ │ └── articles.ts # articles / sources / me / bookmarks / admin
|
|
|
|
|
|
│ │ ├── stores/auth.ts # Pinia 鉴权状态(localStorage 持久化)
|
|
|
|
|
|
│ │ ├── components/
|
|
|
|
|
|
│ │ │ └── AppLayout.vue # 顶栏 + 侧栏 + 配额条
|
|
|
|
|
|
│ │ └── views/
|
|
|
|
|
|
│ │ ├── Login.vue
|
|
|
|
|
|
│ │ ├── Feed.vue # 24h 列表 + 游标分页
|
|
|
|
|
|
│ │ ├── ArticleDetail.vue # 原文/译文/AI 排版/分类/插图/点评
|
|
|
|
|
|
│ │ ├── Bookmarks.vue
|
|
|
|
|
|
│ │ ├── Sources.vue
|
|
|
|
|
|
│ │ ├── AdminSources.vue # owner: 源管理 CRUD
|
|
|
|
|
|
│ │ └── AdminLlmSettings.vue # owner: LLM 提示词 + 测连接 + 触发
|
2026-06-07 21:51:01 +08:00
|
|
|
|
│ ├── Dockerfile
|
2026-06-08 14:54:56 +08:00
|
|
|
|
│ ├── nginx.conf
|
|
|
|
|
|
│ ├── vite.config.ts
|
|
|
|
|
|
│ ├── tsconfig.json
|
|
|
|
|
|
│ └── package.json
|
|
|
|
|
|
│
|
2026-06-07 21:51:01 +08:00
|
|
|
|
├── docs/
|
2026-06-08 14:54:56 +08:00
|
|
|
|
│ ├── architecture.md # 77 行实现版架构
|
|
|
|
|
|
│ └── acceptance.md # MVP 验收清单
|
|
|
|
|
|
│
|
|
|
|
|
|
├── scripts/ # 工具脚本
|
|
|
|
|
|
│ ├── deploy_pull.py # 免密部署:clone/pull + 失败回滚
|
|
|
|
|
|
│ ├── server_init.py # 远程服务器初始化(推公钥 + 7 项系统级运维)
|
|
|
|
|
|
│ ├── push_ssh_key.py # 单推公钥(已用 fingerprint 去重)
|
|
|
|
|
|
│ ├── _*.py # 临时调试脚本(gitignored)
|
|
|
|
|
|
│ └── deploy_remote.sh # 远程一键部署(老脚本)
|
|
|
|
|
|
│
|
|
|
|
|
|
├── .env.example # 配置示例(可 commit)
|
|
|
|
|
|
├── .gitignore # 忽略 .env / scripts/_*.py / node_modules 等
|
|
|
|
|
|
├── Caddyfile # 反代规则
|
|
|
|
|
|
├── docker-compose.yml # 7 服务
|
|
|
|
|
|
├── DEPLOY.md # 完整部署手册(165 行)
|
|
|
|
|
|
├── README.md # 本文件
|
|
|
|
|
|
└── news-aggregator-plan.md # 613 行方案设计 v0.1(决策背景)
|
2026-06-07 21:51:01 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-06-08 14:54:56 +08:00
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 数据模型
|
|
|
|
|
|
|
|
|
|
|
|
6 张表 + 1 张配置表(全部 PostgreSQL):
|
|
|
|
|
|
|
|
|
|
|
|
| 表 | 关键字段 | 说明 |
|
|
|
|
|
|
|---|---|---|
|
|
|
|
|
|
| **users** | role(enum: owner/member), password_hash | 用户 + 角色 |
|
|
|
|
|
|
| **sources** | slug(uniq), kind(rss/html_list/tg_channel), priority, fetch_interval_min, consecutive_failures | 采集源 |
|
|
|
|
|
|
| **articles** | url_hash(uniq), translation_status(pending/ok/partial/failed), category, commentary, body_zh_formatted, image_ai_url, *_status, category, entities(JSONB), sentiment, topic_id, bias | 文章 + 译文 + LLM 增强 |
|
|
|
|
|
|
| **bookmarks** | (user_id, article_id) UNIQUE | 收藏 |
|
|
|
|
|
|
| **subscriptions** | keyword, match_in(any/title/body), channel | 关键词订阅 |
|
|
|
|
|
|
| **api_tokens** | token_hash(sha256), expires_at, revoked_at | Android 预留 |
|
|
|
|
|
|
| **llm_settings** | format_prompt, classify_prompt, commentary_prompt, image_prompt_template, image_size, chat_model, image_model, interval_sec, enabled | LLM 提示词(单行) |
|
|
|
|
|
|
|
|
|
|
|
|
ER 关系:
|
|
|
|
|
|
- `users` 1:N → `bookmarks` / `subscriptions` / `api_tokens`
|
|
|
|
|
|
- `sources` 1:N → `articles`(cascade delete)
|
|
|
|
|
|
- `articles` 1:N → `bookmarks`, self-ref `duplicate_of`(去重链)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 快速开始
|
|
|
|
|
|
|
|
|
|
|
|
### 本地开发(Linux/Mac/WSL2)
|
2026-06-07 21:51:01 +08:00
|
|
|
|
|
|
|
|
|
|
```bash
|
2026-06-08 14:54:56 +08:00
|
|
|
|
# 1. 克隆
|
|
|
|
|
|
git clone http://<gitea>/xiaji/diary-news.git
|
|
|
|
|
|
cd diary-news
|
|
|
|
|
|
|
|
|
|
|
|
# 2. 配置
|
2026-06-07 21:51:01 +08:00
|
|
|
|
cp .env.example .env
|
2026-06-08 14:54:56 +08:00
|
|
|
|
# 编辑 .env 填入:
|
|
|
|
|
|
# POSTGRES_PASSWORD / REDIS_PASSWORD / JWT_SECRET(用 openssl rand -hex)
|
|
|
|
|
|
# TENCENTCLOUD_SECRET_ID / TENCENTCLOUD_SECRET_KEY
|
|
|
|
|
|
# AGNES_API_KEY
|
2026-06-07 21:51:01 +08:00
|
|
|
|
|
2026-06-08 14:54:56 +08:00
|
|
|
|
# 3. 启动
|
|
|
|
|
|
docker compose up -d --build
|
2026-06-07 21:51:01 +08:00
|
|
|
|
|
2026-06-08 14:54:56 +08:00
|
|
|
|
# 4. 初始化
|
2026-06-07 21:51:01 +08:00
|
|
|
|
docker compose exec api alembic upgrade head
|
|
|
|
|
|
docker compose exec api python -m app.scripts.create_user --username owner --password YOUR_PASS
|
|
|
|
|
|
docker compose exec api python -m app.scripts.seed_sources
|
|
|
|
|
|
|
2026-06-08 14:54:56 +08:00
|
|
|
|
# 5. 触发一次抓取
|
2026-06-07 21:51:01 +08:00
|
|
|
|
docker compose exec api python -c "import asyncio; from app.workers.pipeline import run_once; asyncio.run(run_once())"
|
|
|
|
|
|
|
2026-06-08 14:54:56 +08:00
|
|
|
|
# 6. 打开
|
|
|
|
|
|
open http://localhost/ # macOS
|
|
|
|
|
|
xdg-open http://localhost/ # Linux
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Windows 本机(无 WSL)
|
|
|
|
|
|
|
|
|
|
|
|
直接装 Python 3.12 + Node 20:
|
|
|
|
|
|
|
|
|
|
|
|
```powershell
|
|
|
|
|
|
# 后端
|
|
|
|
|
|
cd backend
|
|
|
|
|
|
py -3.12 -m venv .venv
|
|
|
|
|
|
.venv\Scripts\activate
|
|
|
|
|
|
pip install -e .
|
|
|
|
|
|
# 起 postgres / redis(用 docker run -d 或者本机服务)
|
|
|
|
|
|
$env:DATABASE_URL = 'postgresql+asyncpg://...'
|
|
|
|
|
|
alembic upgrade head
|
|
|
|
|
|
uvicorn app.main:app --reload
|
|
|
|
|
|
|
|
|
|
|
|
# 前端
|
|
|
|
|
|
cd ../frontend
|
|
|
|
|
|
npm install
|
|
|
|
|
|
npm run dev
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 远程部署
|
|
|
|
|
|
|
|
|
|
|
|
完整步骤见 [`DEPLOY.md`](./DEPLOY.md)。**已部署过一次的机器**只需 2 步:
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 1. 在本机 push 代码 + 跑 deploy_pull.py
|
|
|
|
|
|
git push origin main
|
|
|
|
|
|
python scripts\deploy_pull.py # 自动 clone/pull + 失败回滚
|
|
|
|
|
|
|
|
|
|
|
|
# 2. 在服务器上重启应用(代码变了)
|
|
|
|
|
|
ssh hknews
|
|
|
|
|
|
cd /root/diary-news
|
|
|
|
|
|
docker compose restart api worker
|
|
|
|
|
|
docker compose exec api alembic upgrade head
|
2026-06-07 21:51:01 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-06-08 14:54:56 +08:00
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 功能详解
|
|
|
|
|
|
|
|
|
|
|
|
### RSS 抓取
|
|
|
|
|
|
|
|
|
|
|
|
- **5 个种子源**:Reuters / BBC / Al Jazeera / NHK / DW
|
|
|
|
|
|
- **抓取频率**:每源 60 分钟(可调),失败连续 3 次后间隔 × 2(封顶 12 小时)
|
|
|
|
|
|
- **去重**:`url_hash = SHA1(url)`,PG `ON CONFLICT DO NOTHING` 幂等
|
|
|
|
|
|
- **HTML 抽取**:trafilatura 抓全文,RSS 摘要短时自动补抓
|
|
|
|
|
|
- **可见性**:`/admin/sources` + `/admin/health` 看板
|
|
|
|
|
|
|
|
|
|
|
|
### 翻译(配额 + 缓存 + 降级)
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
[译文不存在]
|
|
|
|
|
|
↓ Semaphore(1) — 1 篇/秒
|
|
|
|
|
|
↓
|
|
|
|
|
|
[缓存命中?]
|
|
|
|
|
|
├─ 是 → 直接返回(30 天有效)
|
|
|
|
|
|
└─ 否 ↓
|
|
|
|
|
|
[主引擎(腾讯 TMT)可配额?]
|
|
|
|
|
|
├─ 是 → 调 TMT → 写缓存 + 加计数
|
|
|
|
|
|
└─ 否 ↓
|
|
|
|
|
|
[本地 NLLB 启用?]
|
|
|
|
|
|
├─ 是 → 调 NLLB → 写缓存
|
|
|
|
|
|
└─ 否 → 原文 + [本条未翻译:配额耗尽] 标记
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
配额在 Redis `translation:month:YYYYMM` 计数器(月度自动滚动)。`TENCENT_TMT_QUOTA_BUFFER=0.05` 表示 95% 触发后切本地(避免爆配额)。
|
|
|
|
|
|
|
|
|
|
|
|
### 用户 / 鉴权
|
|
|
|
|
|
|
|
|
|
|
|
- **JWT**:access 60min + refresh 14d,HS256 签名
|
|
|
|
|
|
- **API Token**:Android 客户端用,sha256 存储(可撤销 + 过期)
|
|
|
|
|
|
- **角色**:`owner` 全部权限,`member` 看文章/收藏/订阅
|
|
|
|
|
|
- **密码**:bcrypt 4.0.1(锁版兼容 passlib)
|
|
|
|
|
|
|
|
|
|
|
|
### 收藏 / 关键词订阅
|
|
|
|
|
|
|
|
|
|
|
|
- 收藏:点星星,加到 `bookmarks`,有 note 字段
|
|
|
|
|
|
- 关键词订阅:扫 `articles.body_text`/`title`,命中后写入 `subscriptions.last_hit_at`,预留 Telegram 通道(MVP 不发)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## LLM 智能增强
|
|
|
|
|
|
|
|
|
|
|
|
> **新功能**(2026-06-08 加入)。翻译完成后,自动调 Agnes LLM 跑 4 项独立任务。
|
|
|
|
|
|
|
|
|
|
|
|
### 4 项任务
|
|
|
|
|
|
|
|
|
|
|
|
| 任务 | 输出字段 | LLM 类型 | 用途 |
|
|
|
|
|
|
|---|---|---|---|
|
|
|
|
|
|
| **排版** | `articles.body_zh_formatted` | chat | 重写译文为网页排版(分段/加粗/列表) |
|
|
|
|
|
|
| **分类** | `articles.category` | chat(返 JSON) | 给文章打 1-3 个分类标签 |
|
|
|
|
|
|
| **插图** | `articles.image_ai_url` | image | 文生图,英文 prompt 拼自 `title` |
|
|
|
|
|
|
| **点评** | `articles.commentary` | chat | 100-200 字评论,客观有深度 |
|
|
|
|
|
|
|
|
|
|
|
|
### 限速
|
|
|
|
|
|
|
|
|
|
|
|
- 单一 `LlmClient`,内部 `chat_sem` + `image_sem` 各 1 个并发
|
|
|
|
|
|
- 每次调用后 `await asyncio.sleep(LLM_INTERVAL_SEC)`(默认 2.0s)
|
|
|
|
|
|
- chat 和 image 互不阻塞
|
|
|
|
|
|
- 4 个任务在 `enrich_article` 内**串行**(已过 client 限速)
|
|
|
|
|
|
|
|
|
|
|
|
### 设置页(Owner only)
|
|
|
|
|
|
|
|
|
|
|
|
`/admin/llm` → 4 个 textarea + 几个 input:
|
|
|
|
|
|
|
|
|
|
|
|
- **总开关** `enabled`
|
|
|
|
|
|
- **4 个提示词**(可重置默认):`format_prompt` / `classify_prompt` / `commentary_prompt` / `image_prompt_template`
|
|
|
|
|
|
- **模型**:`chat_model` / `image_model`(默认 agnes-2.0-flash / agnes-image-2.1-flash)
|
|
|
|
|
|
- **插图尺寸**:`image_size`(默认 1024x768)
|
|
|
|
|
|
- **限速**:`interval_sec`(默认 2.0)
|
|
|
|
|
|
- **测连接**:发个 `ping` chat 请求,1-2 秒内返 OK
|
|
|
|
|
|
- **手动触发**:`POST /admin/llm/enrich/{article_id}` 跑一篇 4 任务
|
|
|
|
|
|
|
|
|
|
|
|
### 默认提示词
|
|
|
|
|
|
|
|
|
|
|
|
`backend/app/schemas/llm.py` 的 `DEFAULT_PROMPTS`,支持占位符:
|
|
|
|
|
|
- `{body}` — 译文正文
|
|
|
|
|
|
- `{title}` — 译后标题
|
|
|
|
|
|
- `{summary}` — 摘要
|
|
|
|
|
|
|
|
|
|
|
|
### 失败隔离
|
|
|
|
|
|
|
|
|
|
|
|
每个任务独立 try/except,失败标 `*_status='failed'`,**不影响**其他任务。
|
|
|
|
|
|
`enrichment_loop` 扫 `*_status` 是 `pending/failed/n/a` 的文章,自动重试 failed。
|
|
|
|
|
|
|
|
|
|
|
|
### 历史文章批量 enrich
|
|
|
|
|
|
|
|
|
|
|
|
新功能**只对**翻译完成后入库的文章生效。历史已翻译文章,手动 reset:
|
|
|
|
|
|
|
|
|
|
|
|
```sql
|
|
|
|
|
|
UPDATE articles
|
|
|
|
|
|
SET format_status='pending', classify_status='pending',
|
|
|
|
|
|
image_ai_status='pending', commentary_status='pending'
|
|
|
|
|
|
WHERE translation_status='ok';
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## API 概览
|
|
|
|
|
|
|
|
|
|
|
|
所有 API 在 `/api/v1` 前缀下,完整定义见 `http://<host>/api/docs`(DEBUG 模式)。
|
|
|
|
|
|
|
|
|
|
|
|
### 公开
|
|
|
|
|
|
|
|
|
|
|
|
- `POST /auth/login` — 用户名 + 密码 → access + refresh token
|
|
|
|
|
|
- `POST /auth/refresh` — refresh token → 新 access
|
|
|
|
|
|
|
|
|
|
|
|
### 需要登录(任意角色)
|
|
|
|
|
|
|
|
|
|
|
|
- `GET /me` — 当前用户信息
|
|
|
|
|
|
- `GET /me/usage` — 翻译配额(已用 / 总额 / 百分比)
|
|
|
|
|
|
- `GET /articles?since=...&source=...&q=...&cursor=...` — 列表(游标分页)
|
|
|
|
|
|
- `GET /articles/{id}` — 详情
|
|
|
|
|
|
- `GET /sources` — 源列表(只读)
|
|
|
|
|
|
- `GET /bookmarks` / `POST /bookmarks` / `DELETE /bookmarks/{id}`
|
|
|
|
|
|
- `GET /subscriptions` / `POST /subscriptions` / `DELETE /subscriptions/{id}`
|
|
|
|
|
|
|
|
|
|
|
|
### Owner only(`/admin/*`)
|
|
|
|
|
|
|
|
|
|
|
|
- `GET /admin/sources` / `POST` / `PATCH /{id}` / `DELETE /{id}` — 源 CRUD
|
|
|
|
|
|
- `POST /admin/refresh/{source_id}` — 立即触发抓取
|
|
|
|
|
|
- `POST /admin/translation/rerun/{article_id}` — 重译
|
|
|
|
|
|
- `GET /admin/health` — 源健康看板
|
|
|
|
|
|
- `POST /admin/translation/quota/reset` — 重置本月配额
|
|
|
|
|
|
- `GET /admin/llm/settings` / `PUT` / `POST /reset` / `POST /test` — LLM 设置
|
|
|
|
|
|
- `POST /admin/llm/enrich/{article_id}` — 手动触发某篇 enrich
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 开发-部署工作流
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
┌─────────────────────────────────────┐
|
|
|
|
|
|
│ 本地 Windows (你) │
|
|
|
|
|
|
│ ─ 编辑代码 │
|
|
|
|
|
|
│ ─ git add / commit │
|
|
|
|
|
|
│ ─ git push origin main │
|
|
|
|
|
|
└─────────────────┬───────────────────┘
|
|
|
|
|
|
│
|
|
|
|
|
|
▼
|
|
|
|
|
|
┌─────────────────────────────────────┐
|
|
|
|
|
|
│ Gitea(代码托管) │
|
|
|
|
|
|
└─────────────────┬───────────────────┘
|
|
|
|
|
|
│
|
|
|
|
|
|
▼
|
|
|
|
|
|
┌─────────────────────────────────────┐
|
|
|
|
|
|
│ 跑 python scripts\deploy_pull.py │
|
|
|
|
|
|
│ ─ 免密登录(SSH key) │
|
|
|
|
|
|
│ ─ git clone/pull │
|
|
|
|
|
|
│ ─ 成功:保持 + 报告 │
|
|
|
|
|
|
│ ─ 失败:git reset --hard <前 sha> │
|
|
|
|
|
|
└─────────────────┬───────────────────┘
|
|
|
|
|
|
│
|
|
|
|
|
|
▼
|
|
|
|
|
|
┌─────────────────────────────────────┐
|
|
|
|
|
|
│ 远程 HK-News │
|
|
|
|
|
|
│ ─ 代码最新 │
|
|
|
|
|
|
│ ─ (如果需要)docker compose restart│
|
|
|
|
|
|
│ ─ (如果需要)alembic upgrade head │
|
|
|
|
|
|
└─────────────────────────────────────┘
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**完整循环命令**(本机跑):
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
git add -A
|
|
|
|
|
|
git commit -m "feat: ..."
|
|
|
|
|
|
git push origin main
|
|
|
|
|
|
python scripts\deploy_pull.py # 我帮你跑,你说"拉一下"即可
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 运维工具
|
|
|
|
|
|
|
|
|
|
|
|
| 脚本 | 用途 | 调用 |
|
|
|
|
|
|
|---|---|---|
|
|
|
|
|
|
| `scripts/deploy_pull.py` | 免密拉取 + 失败回滚 | `python scripts\deploy_pull.py` |
|
|
|
|
|
|
| `scripts/server_init.py` | 服务器系统级初始化(推公钥 + 7 项运维) | `REMOTE_PASS=xxx python scripts/server_init.py` |
|
|
|
|
|
|
| `scripts/push_ssh_key.py` | 单推 SSH 公钥(SSH key fingerprint 去重) | `REMOTE_PASS=xxx python scripts/push_ssh_key.py` |
|
|
|
|
|
|
| `docker compose logs -f worker` | 看 worker 日志 | 服务器上 |
|
|
|
|
|
|
| `docker compose exec api alembic upgrade head` | 跑 migration | 服务器上 |
|
|
|
|
|
|
|
|
|
|
|
|
### deploy_pull.py 完整参数
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python scripts\deploy_pull.py \
|
|
|
|
|
|
--host 207.57.129.228 \
|
|
|
|
|
|
--port 19717 \
|
|
|
|
|
|
--user root \
|
|
|
|
|
|
--repo-dir /root/diary-news \
|
|
|
|
|
|
--repo-url http://124.223.26.33:3000/xiaji/diary-news.git \
|
|
|
|
|
|
--branch main
|
|
|
|
|
|
|
|
|
|
|
|
# 干跑
|
|
|
|
|
|
python scripts\deploy_pull.py --dry-run
|
|
|
|
|
|
|
|
|
|
|
|
# 手动回退
|
|
|
|
|
|
python scripts\deploy_pull.py --rollback <sha>
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
也支持 env var:`DEPLOY_HOST` / `DEPLOY_PORT` / `DEPLOY_USER` / `DEPLOY_REPO_DIR` / `DEPLOY_REPO_URL` / `DEPLOY_SSH_KEY`。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 故障排查
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 翻译一直失败?
|
|
|
|
|
|
1. `docker compose logs worker | grep -E "translate|tencent"`
|
|
|
|
|
|
2. 看 `translation:month:YYYYMM` Redis key 是不是满了
|
|
|
|
|
|
3. 调 `.env` 的 `TENCENT_TMT_QUOTA_BUFFER=0.1` 给更多缓冲
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 某个 RSS 源一直 fail?
|
|
|
|
|
|
1. `/admin/health` 看 `consecutive_failures` 字段
|
|
|
|
|
|
2. `docker compose logs worker | grep <source_slug>`
|
|
|
|
|
|
3. 大概率是 RSS URL 失效或被反爬,先 `enabled=false` 暂停,在 `/admin/sources` 编辑后重启 worker
|
|
|
|
|
|
|
|
|
|
|
|
### Q: LLM 增强不工作?
|
|
|
|
|
|
1. `/admin/llm` → 点 "测连接"
|
|
|
|
|
|
2. 看后端日志 `docker compose logs worker | grep -E "enrich|chat"`
|
|
|
|
|
|
3. 确认 `.env` 里有 `AGNES_API_KEY`
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 服务器磁盘快满?
|
|
|
|
|
|
```sql
|
|
|
|
|
|
DELETE FROM articles WHERE published_at < now() - interval '90 day' AND duplicate_of IS NULL;
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Q: deploy 失败,想回退?
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python scripts\deploy_pull.py --rollback <之前的好 sha>
|
|
|
|
|
|
# 或者手动
|
|
|
|
|
|
ssh hknews "cd /root/diary-news && git reset --hard <sha>"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 中文用户名乱码?
|
|
|
|
|
|
PowerShell 默认 GBK,运行前先 `chcp 65001` 切 UTF-8。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 路线图
|
|
|
|
|
|
|
|
|
|
|
|
- [x] **Phase 1 (MVP)**:5 RSS 源 + 翻译 + 网页 + admin CRUD
|
|
|
|
|
|
- [x] **Phase 1.5**:LLM 智能增强(排版/分类/插图/点评) ✅ 2026-06-08
|
|
|
|
|
|
- [ ] **Phase 2**:PWA 离线缓存 / 关键词订阅推送(Telegram)
|
|
|
|
|
|
- [ ] **Phase 3**:Android 客户端(API Token 已预留)
|
|
|
|
|
|
- [ ] **Phase 4**:自动分类/点评/实体识别(目前是 LLM 一次性,无 ML pipeline)
|
|
|
|
|
|
- [ ] **Phase 5**:跨源立场对照 / 主题聚类
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 文档导航
|
|
|
|
|
|
|
|
|
|
|
|
- 📄 [DEPLOY.md](./DEPLOY.md) — 完整部署手册(新机器从 0 到能访问)
|
|
|
|
|
|
- 📐 [docs/architecture.md](./docs/architecture.md) — 实现版架构(77 行)
|
|
|
|
|
|
- ✅ [docs/acceptance.md](./docs/acceptance.md) — MVP 验收清单
|
|
|
|
|
|
- 📜 [news-aggregator-plan.md](./news-aggregator-plan.md) — 613 行方案设计 v0.1(选型决策背景)
|
|
|
|
|
|
- 🛠 [scripts/deploy_pull.py](./scripts/deploy_pull.py) — 部署工具源码
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
2026-06-07 21:51:01 +08:00
|
|
|
|
## 设计原则
|
|
|
|
|
|
|
|
|
|
|
|
- **轻量**:单机 30G 能跑,不堆重型服务
|
2026-06-08 14:54:56 +08:00
|
|
|
|
- **可控**:源管理 / 翻译配额 / 抓取调度 / LLM 提示词 全部可视化
|
|
|
|
|
|
- **可扩展**:ML 字段(category/commentary/entities/sentiment/bias)已建好,后续直接写值不动表
|
2026-06-07 21:51:01 +08:00
|
|
|
|
- **不反爬对抗**:愿意被 ban IP 就 ban,优先合规
|
2026-06-08 14:54:56 +08:00
|
|
|
|
- **透明失败**:不静默吞错,每个 stage 都有 `*_status` 字段记录
|
|
|
|
|
|
- **幂等可重跑**:所有运维脚本(server_init / deploy_pull)都幂等,跑多遍无副作用
|
2026-06-07 21:51:01 +08:00
|
|
|
|
|
2026-06-08 14:54:56 +08:00
|
|
|
|
---
|
2026-06-07 21:51:01 +08:00
|
|
|
|
|
2026-06-08 14:54:56 +08:00
|
|
|
|
**License**: Private use only.
|