2026-03-06 15:07:51 +08:00
|
|
|
|
# 滚动截屏OCR工具
|
|
|
|
|
|
|
|
|
|
|
|
一个智能的滚动截屏OCR工具,可以自动识别页面中的内容区块(div),滚动截屏并进行OCR文字识别。
|
|
|
|
|
|
|
|
|
|
|
|
## 功能特点
|
|
|
|
|
|
|
|
|
|
|
|
- 🎯 **智能区域检测**:使用灰度阈值 + 连续行判定算法,自动识别内容区块(div)和空白间隔
|
|
|
|
|
|
- 📜 **自动滚动截屏**:根据内容高度自动计算滚动距离,连续截屏
|
|
|
|
|
|
- 🔤 **OCR文字识别**:支持 Umi-OCR 和自定义HTTP OCR服务
|
2026-03-06 16:26:07 +08:00
|
|
|
|
- 🖥️ **图形界面**:美观的PySide6 GUI界面,操作更简单
|
2026-03-06 15:07:51 +08:00
|
|
|
|
- ⌨️ **热键触发**:按 `Ctrl+F9` 快速启动
|
|
|
|
|
|
- 🖱️ **框选区域**:拖动鼠标选择截图区域
|
|
|
|
|
|
- 🛑 **智能停止**:检测到重复内容时自动停止
|
2026-03-06 16:26:07 +08:00
|
|
|
|
- 📱 **系统托盘**:最小化到托盘,不占用任务栏
|
2026-03-06 15:07:51 +08:00
|
|
|
|
|
|
|
|
|
|
## 适用场景
|
|
|
|
|
|
|
|
|
|
|
|
- 长网页滚动截图OCR
|
|
|
|
|
|
- 聊天记录导出
|
|
|
|
|
|
- 长文档内容提取
|
|
|
|
|
|
- 任何需要滚动才能看完全部的内容
|
|
|
|
|
|
|
|
|
|
|
|
## 安装
|
|
|
|
|
|
|
|
|
|
|
|
### 1. 克隆仓库
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
git clone <远程仓库地址>
|
|
|
|
|
|
cd long-screen-cut
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2. 安装依赖
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
pip install -r requirements.txt
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
依赖列表:
|
|
|
|
|
|
- opencv-python >= 4.8.0
|
|
|
|
|
|
- numpy >= 1.24.0
|
|
|
|
|
|
- pillow >= 10.0.0
|
|
|
|
|
|
- pyautogui >= 0.9.54
|
|
|
|
|
|
- keyboard >= 0.13.5
|
|
|
|
|
|
- mouse >= 0.7.1
|
|
|
|
|
|
- requests >= 2.31.0
|
|
|
|
|
|
- loguru >= 0.7.0
|
2026-03-06 16:26:07 +08:00
|
|
|
|
- pyside6 >= 6.5.0
|
2026-03-06 15:07:51 +08:00
|
|
|
|
|
|
|
|
|
|
### 3. 安装OCR引擎(二选一)
|
|
|
|
|
|
|
|
|
|
|
|
#### 方案A:Umi-OCR(推荐)
|
|
|
|
|
|
|
|
|
|
|
|
1. 下载 [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR/releases)
|
|
|
|
|
|
2. 解压并运行 `Umi-OCR.exe`
|
|
|
|
|
|
3. 进入 **设置 → HTTP接口**
|
|
|
|
|
|
4. 勾选 **启用HTTP服务**
|
|
|
|
|
|
5. 确保端口为 `1224`(默认)
|
|
|
|
|
|
|
|
|
|
|
|
#### 方案B:自定义HTTP OCR服务
|
|
|
|
|
|
|
|
|
|
|
|
参考 `ocr_server_example.py` 实现自己的OCR服务,或修改配置使用其他OCR API。
|
|
|
|
|
|
|
|
|
|
|
|
## 使用方法
|
|
|
|
|
|
|
2026-03-06 16:26:07 +08:00
|
|
|
|
### 图形界面方式(推荐)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python gui.py
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
界面功能:
|
|
|
|
|
|
- **开始截屏**按钮:启动截屏OCR流程
|
|
|
|
|
|
- **停止**按钮:手动停止当前任务
|
|
|
|
|
|
- **清空日志**按钮:清空日志显示区域
|
|
|
|
|
|
- **日志显示**:彩色日志输出,带时间戳
|
|
|
|
|
|
- **进度条**:显示当前任务进度
|
|
|
|
|
|
- **状态标签**:显示当前运行状态
|
|
|
|
|
|
- **系统托盘**:关闭窗口会最小化到托盘
|
|
|
|
|
|
|
|
|
|
|
|
### 命令行方式
|
2026-03-06 15:07:51 +08:00
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python main.py
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 操作流程
|
|
|
|
|
|
|
2026-03-06 16:26:07 +08:00
|
|
|
|
1. **启动程序**:运行 `python gui.py` 或 `python main.py`
|
|
|
|
|
|
2. **触发截屏**:
|
|
|
|
|
|
- GUI方式:点击「开始截屏」按钮
|
|
|
|
|
|
- 命令行:按 `Ctrl+F9`
|
2026-03-06 15:07:51 +08:00
|
|
|
|
3. **检查服务**:程序会检查OCR服务是否运行
|
|
|
|
|
|
4. **框选区域**:按住鼠标左键拖动,选择要截图的区域
|
|
|
|
|
|
5. **自动处理**:程序会自动:
|
|
|
|
|
|
- 截取当前屏幕
|
|
|
|
|
|
- 分析内容区块(div)
|
|
|
|
|
|
- OCR识别文字
|
|
|
|
|
|
- 计算滚动距离
|
|
|
|
|
|
- 滚动到下一屏
|
|
|
|
|
|
- 重复上述过程
|
|
|
|
|
|
6. **自动停止**:当检测到重复内容时自动停止
|
|
|
|
|
|
|
|
|
|
|
|
### 输出结果
|
|
|
|
|
|
|
|
|
|
|
|
- 截图保存在 `output/` 目录
|
|
|
|
|
|
- OCR结果保存在 `output/all_results_时间戳.json`
|
|
|
|
|
|
|
|
|
|
|
|
## 配置说明
|
|
|
|
|
|
|
|
|
|
|
|
编辑 `main.py` 中的 `Config` 类:
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
class Config:
|
|
|
|
|
|
# 热键设置
|
|
|
|
|
|
HOTKEY = "ctrl+f9"
|
|
|
|
|
|
|
|
|
|
|
|
# 图像分析参数
|
|
|
|
|
|
GRAY_THRESHOLD = 240 # 灰度阈值(0-255)
|
|
|
|
|
|
CONSECUTIVE_LINES = 3 # 连续多少行判定为空白
|
|
|
|
|
|
WHITE_PIXEL_RATIO = 0.9 # 白色像素比例阈值
|
|
|
|
|
|
|
|
|
|
|
|
# OCR设置
|
|
|
|
|
|
OCR_ENGINE = "umi" # "umi" 或 "http"
|
|
|
|
|
|
OCR_API_URL = "http://localhost:8000/ocr" # HTTP模式时使用
|
|
|
|
|
|
OCR_TIMEOUT = 30 # OCR请求超时时间(秒)
|
|
|
|
|
|
|
|
|
|
|
|
# Umi-OCR设置
|
|
|
|
|
|
UMI_OCR_HOST = "127.0.0.1"
|
|
|
|
|
|
UMI_OCR_PORT = 1224
|
|
|
|
|
|
|
|
|
|
|
|
# 滚动设置
|
|
|
|
|
|
SCROLL_DELAY = 0.5 # 滚动后等待渲染时间(秒)
|
|
|
|
|
|
MAX_SCROLL_COUNT = 100 # 最大滚动次数
|
|
|
|
|
|
|
|
|
|
|
|
# 输出设置
|
|
|
|
|
|
OUTPUT_DIR = "output"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 核心算法
|
|
|
|
|
|
|
|
|
|
|
|
### 内容区块检测算法
|
|
|
|
|
|
|
|
|
|
|
|
1. **灰度转换**:将截图转换为灰度图
|
|
|
|
|
|
2. **逐行扫描**:计算每行的白色像素比例
|
|
|
|
|
|
3. **空白判定**:如果一行中超过 `WHITE_PIXEL_RATIO`(默认90%)的像素灰度值 > `GRAY_THRESHOLD`(默认240),则认为是空白行
|
|
|
|
|
|
4. **连续判定**:连续 `CONSECUTIVE_LINES`(默认3行)空白行视为间隔区域
|
|
|
|
|
|
5. **区块划分**:非空白行区域视为内容区块(div)
|
|
|
|
|
|
|
|
|
|
|
|
### 滚动距离计算
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
滚动距离 = 第一个div高度 + 其后空白间隔高度 - 重叠区域
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
重叠区域确保连续性,默认为div高度的1/4。
|
|
|
|
|
|
|
|
|
|
|
|
## 项目结构
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
long-screen-cut/
|
2026-03-06 16:26:07 +08:00
|
|
|
|
├── main.py # 主程序(命令行版)
|
|
|
|
|
|
├── gui.py # 图形界面程序(PySide6)
|
2026-03-06 15:07:51 +08:00
|
|
|
|
├── umi_ocr_client.py # Umi-OCR HTTP客户端
|
|
|
|
|
|
├── ocr_server_example.py # OCR服务示例(Flask)
|
|
|
|
|
|
├── requirements.txt # Python依赖
|
|
|
|
|
|
├── .gitignore # Git忽略配置
|
|
|
|
|
|
└── README.md # 本文件
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## API文档
|
|
|
|
|
|
|
|
|
|
|
|
### UmiOCRClient
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
from umi_ocr_client import UmiOCRClient
|
|
|
|
|
|
|
|
|
|
|
|
client = UmiOCRClient(host="127.0.0.1", port=1224)
|
|
|
|
|
|
|
|
|
|
|
|
# 检查服务状态
|
|
|
|
|
|
if client.is_service_running():
|
|
|
|
|
|
print("服务运行中")
|
|
|
|
|
|
|
|
|
|
|
|
# 截图识别
|
|
|
|
|
|
text = client.recognize_screenshot()
|
|
|
|
|
|
|
|
|
|
|
|
# 图片文件识别
|
|
|
|
|
|
text = client.recognize_image("/path/to/image.png")
|
|
|
|
|
|
|
|
|
|
|
|
# 批量识别
|
|
|
|
|
|
texts = client.recognize_images(["1.png", "2.png", "3.png"])
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 常见问题
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 程序提示"Umi-OCR服务未运行"
|
|
|
|
|
|
|
|
|
|
|
|
A: 请确保:
|
|
|
|
|
|
1. Umi-OCR软件已启动
|
|
|
|
|
|
2. 进入 **设置 → HTTP接口**
|
|
|
|
|
|
3. 勾选 **启用HTTP服务**
|
|
|
|
|
|
4. 端口设置为 `1224`
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 识别区域不准确
|
|
|
|
|
|
|
|
|
|
|
|
A: 调整 `Config` 中的图像分析参数:
|
|
|
|
|
|
- `GRAY_THRESHOLD`:降低可以识别更浅的背景色
|
|
|
|
|
|
- `CONSECUTIVE_LINES`:增加可以减少误判
|
|
|
|
|
|
- `WHITE_PIXEL_RATIO`:降低可以容忍更多杂色
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 滚动太快/太慢
|
|
|
|
|
|
|
|
|
|
|
|
A: 调整 `SCROLL_DELAY`:
|
|
|
|
|
|
- 网页加载慢:增加延迟
|
|
|
|
|
|
- 本地应用:可以减少延迟
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 如何停止程序
|
|
|
|
|
|
|
|
|
|
|
|
A:
|
|
|
|
|
|
- 正常停止:按 `Ctrl+C`
|
|
|
|
|
|
- 强制停止:关闭终端窗口
|
|
|
|
|
|
|
|
|
|
|
|
## 开发计划
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] 支持更多OCR引擎(PaddleOCR、Tesseract等)
|
|
|
|
|
|
- [ ] GUI界面
|
|
|
|
|
|
- [ ] 支持水平滚动
|
|
|
|
|
|
- [ ] 智能去重(相似度判断)
|
|
|
|
|
|
- [ ] 导出为多种格式(Markdown、Word、PDF)
|
|
|
|
|
|
|
|
|
|
|
|
## 许可证
|
|
|
|
|
|
|
|
|
|
|
|
MIT License
|
|
|
|
|
|
|
|
|
|
|
|
## 致谢
|
|
|
|
|
|
|
|
|
|
|
|
- [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR) - 优秀的离线OCR软件
|