Compare commits
80 Commits
25da264413
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 498d5110e9 | |||
| 851d536b59 | |||
| adc9c76864 | |||
| 624e158be9 | |||
| 5bc40abbc1 | |||
| bd2c457f54 | |||
| 179bfa327b | |||
| c2357ffb67 | |||
| 0d287e7c1f | |||
| 674ee1e1e2 | |||
| 0cf231f9f7 | |||
| f82da3bab1 | |||
| 22a50ad5c6 | |||
| 0d9e427a34 | |||
| ec68b83827 | |||
| 130bbfb090 | |||
| 6e83136dc6 | |||
| f6f4da7d07 | |||
| a2be43d42a | |||
| a4c106fa5a | |||
| f24ca9aa29 | |||
| a537d3825b | |||
| e67931c3ca | |||
| b7cd03434d | |||
| a9d6c4699d | |||
| 3984b81f86 | |||
| d62cd2fcca | |||
| d44a294bf7 | |||
| 57e0029eb1 | |||
| a2ecc7f451 | |||
| 6ae10c9d36 | |||
| 20b2f46533 | |||
| 43ec564daa | |||
| 8cc25b7c2e | |||
| a158e3d6bf | |||
| 71bef2bd06 | |||
| b62d4ff40d | |||
| 272f4440fd | |||
| 1693c1963f | |||
| e614bfcf93 | |||
| 28ea813110 | |||
| 18aff6b945 | |||
| 9c48648b26 | |||
| afeb00ccc4 | |||
| deea6764cf | |||
| 9e20d439bf | |||
| 389486ad6e | |||
| f24acb18cf | |||
| c2836428ca | |||
| 9026aa8f4b | |||
| ff7e114324 | |||
| 1c91dd45ed | |||
| b32549c5df | |||
| 8fcf3bcfe2 | |||
| 33f0e48bf5 | |||
| 344a0a9c93 | |||
| 8cbd6462d3 | |||
| 4c2ee60431 | |||
| cf64532b16 | |||
| 4fc896977d | |||
| fb1d3d5a56 | |||
| 4a48b9a9cb | |||
| 9088939701 | |||
| ee308c6d6f | |||
| 12b57f1c57 | |||
| 8258c3532d | |||
| 40df4ee171 | |||
| 1da5501e55 | |||
| 74dfa978cf | |||
| e851d0d5fb | |||
| d07017cf11 | |||
| 256850f752 | |||
| d6ec1eadc9 | |||
| 1507416806 | |||
| d5344aaa4a | |||
| 5f05a62419 | |||
| 06d2d07165 | |||
| 8f56db7d86 | |||
| 4dda80aa4c | |||
| d79051cb24 |
68
.gitignore
vendored
Normal file
68
.gitignore
vendored
Normal file
@@ -0,0 +1,68 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
|
||||
# Logs
|
||||
*.log
|
||||
integrated_product_system.log
|
||||
|
||||
# Databases
|
||||
*.db
|
||||
*.sqlite
|
||||
|
||||
# IDE
|
||||
.trae/
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Test files
|
||||
*test*.py
|
||||
*Test*.py
|
||||
pytest_cache/
|
||||
.tox/
|
||||
.coverage
|
||||
coverage.xml
|
||||
|
||||
# Temporary files
|
||||
*.tmp
|
||||
*.temp
|
||||
temp*.txt
|
||||
*.bak
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
temp_*.txt
|
||||
|
||||
# Bug and debug files
|
||||
*debug*.png
|
||||
*bug*.txt
|
||||
|
||||
# Batch files
|
||||
*.bat
|
||||
|
||||
# Output files
|
||||
*.out
|
||||
*.output
|
||||
|
||||
# Environment
|
||||
.env
|
||||
.env.local
|
||||
.env.*.local
|
||||
|
||||
# Documentation build
|
||||
_build/
|
||||
build/
|
||||
dist/
|
||||
*.egg-info/
|
||||
|
||||
# Other
|
||||
2025年12月*.txt
|
||||
*.png
|
||||
68
.trae/documents/实现用户关注数转换功能.md
Normal file
68
.trae/documents/实现用户关注数转换功能.md
Normal file
@@ -0,0 +1,68 @@
|
||||
## 实现计划
|
||||
|
||||
### 1. 数据库结构更新
|
||||
|
||||
* **修改`init_database`方法**:在`product_analysis`表中添加`follows`字段,用于存储转换后的用户关注数
|
||||
|
||||
### 2. 添加用户关注数转换方法
|
||||
|
||||
* **创建`convert_user_count_to_number`方法**:使用Ollama API将`user_count`文本转换为数字
|
||||
|
||||
* 处理不同格式:"53 followers" → 53,"1.9K followers" → 1900
|
||||
|
||||
* 调用Ollama API进行智能转换
|
||||
|
||||
* 返回转换后的数字
|
||||
|
||||
### 3. 集成到现有分析流程
|
||||
|
||||
* **修改`get_product_data`方法**:在查询中包含`user_count`和`url`字段
|
||||
|
||||
* **更新`analyze_products`方法**:
|
||||
|
||||
* 扩展返回值处理,包含`user_count`和`url`
|
||||
|
||||
* 在分析过程中调用转换方法处理关注数
|
||||
|
||||
* 将转换后的数字传递给保存方法
|
||||
|
||||
### 4. 更新数据保存方法
|
||||
|
||||
* **修改`save_analysis_result`方法**:添加`follows`参数,将转换后的关注数保存到数据库
|
||||
|
||||
### 5. 添加关注数分析更新功能
|
||||
|
||||
* **创建`analyze_follower_counts`方法**:
|
||||
|
||||
* 查询所有产品及其分析记录
|
||||
|
||||
* 对每个产品转换`user_count`并更新`product_analysis.follows`
|
||||
|
||||
* 处理已有分析记录的关注数更新
|
||||
|
||||
### 6. 完善工作流程
|
||||
|
||||
* **更新`run_full_workflow_async`方法**:添加第4步,执行关注数分析更新
|
||||
|
||||
## 预期效果
|
||||
|
||||
* 新的`product_analysis`表将包含`follows`字段,存储转换后的数字关注数
|
||||
|
||||
* 新分析的产品将自动转换并保存关注数
|
||||
|
||||
* 已有产品将通过额外步骤更新关注数
|
||||
|
||||
* 使用Ollama API确保转换准确性
|
||||
|
||||
## 关键技术点
|
||||
|
||||
* SQLite数据库表结构修改
|
||||
|
||||
* Ollama API调用与结果解析
|
||||
|
||||
* 文本到数字的智能转换
|
||||
|
||||
* 现有代码的无缝集成
|
||||
|
||||
* 批量数据处理与更新
|
||||
|
||||
5875
2025年11月9日131545.txt
5875
2025年11月9日131545.txt
File diff suppressed because it is too large
Load Diff
5790
2026年1月15日1991.txt
Normal file
5790
2026年1月15日1991.txt
Normal file
File diff suppressed because it is too large
Load Diff
5820
2026年1月17日16419.txt
Normal file
5820
2026年1月17日16419.txt
Normal file
File diff suppressed because it is too large
Load Diff
5800
2026年1月18日9249.txt
Normal file
5800
2026年1月18日9249.txt
Normal file
File diff suppressed because it is too large
Load Diff
5840
2026年1月21日19238.txt
Normal file
5840
2026年1月21日19238.txt
Normal file
File diff suppressed because it is too large
Load Diff
5795
2026年1月22日18556.txt
Normal file
5795
2026年1月22日18556.txt
Normal file
File diff suppressed because it is too large
Load Diff
5855
2026年1月29日20470.txt
Normal file
5855
2026年1月29日20470.txt
Normal file
File diff suppressed because it is too large
Load Diff
5795
2026年1月31日91239.txt
Normal file
5795
2026年1月31日91239.txt
Normal file
File diff suppressed because it is too large
Load Diff
5800
2026年3月10日183431.txt
Normal file
5800
2026年3月10日183431.txt
Normal file
File diff suppressed because it is too large
Load Diff
5810
2026年3月8日18119.txt
Normal file
5810
2026年3月8日18119.txt
Normal file
File diff suppressed because it is too large
Load Diff
286
README.md
286
README.md
@@ -1,21 +1,60 @@
|
||||
# TopHub数据处理系统
|
||||
# TopHub数据处理与产品分析系统
|
||||
|
||||
本项目用于处理TopHub网站抓取的临时文件,对数据进行分类并存储到SQLite数据库中。
|
||||
本项目包含两个核心功能模块:
|
||||
1. TopHub网站数据抓取与处理系统
|
||||
2. ProductHunt产品抓取与AI分析系统
|
||||
|
||||
## 功能特点
|
||||
|
||||
1. **文件解析**:读取临时文件(格式为"日期+时间.txt"),每5行作为一个数据单元
|
||||
2. **数据提取**:从每个数据单元中提取标题和链接
|
||||
3. **智能分类**:调用本地API(Ollama)对标题进行自动分类
|
||||
4. **去重处理**:检查标题+日期是否已存在于数据库中,避免重复录入
|
||||
5. **进度显示**:使用进度条显示处理进度
|
||||
6. **分类标准化**:将相似分类合并为标准分类
|
||||
### TopHub数据抓取与处理
|
||||
- **网站抓取**:从tophub.today网站抓取数据,支持节点ID范围遍历
|
||||
- **智能过滤**:根据过滤列表自动跳过指定栏目内容
|
||||
- **数据存储**:将抓取数据保存到SQLite数据库
|
||||
- **分类处理**:调用本地API进行智能分类
|
||||
- **去重处理**:避免重复数据录入
|
||||
- **分类标准化**:相似分类自动合并
|
||||
|
||||
### ProductHunt产品分析
|
||||
- **产品抓取**:从ProductHunt抓取产品详细信息
|
||||
- **AI分析**:调用Ollama API分析产品开发难度
|
||||
- **数据管理**:完整的产品数据库管理
|
||||
- **关注数转换**:将文本形式的关注数转换为数字
|
||||
- **难度评分**:自动计算产品开发难度分数
|
||||
- **缺失数据补充**:自动补全缺失的产品链接和评分
|
||||
|
||||
### 数据可视化
|
||||
- **GUI查看器**:使用PySide6构建的可视化数据查看器
|
||||
- **搜索筛选**:支持关键词搜索和分类筛选
|
||||
- **分类统计**:实时显示分类统计信息
|
||||
- **数据操作**:支持批量删除、标记感兴趣和评分调整
|
||||
|
||||
## 文件说明
|
||||
|
||||
### 核心脚本
|
||||
|
||||
1. **process_temp_files.py** - 主处理脚本
|
||||
1. **tophub_scraper.py** - TopHub网站数据抓取脚本
|
||||
- 从tophub.today网站抓取数据
|
||||
- 根据过滤列表过滤内容
|
||||
- 保存数据到临时文件
|
||||
- 调用数据导入脚本
|
||||
|
||||
2. **product/integrated_product_system.py** - 全功能产品抓取与分析系统
|
||||
- 整合产品抓取和AI分析功能
|
||||
- 从tophub数据库查询ProductHunt链接
|
||||
- 使用Playwright抓取产品详细信息
|
||||
- 调用Ollama API分析产品开发难度
|
||||
- 管理产品数据库
|
||||
- 提供完整的工作流程
|
||||
|
||||
3. **db_viewer.py** - TopHub数据查看器
|
||||
- PySide6界面应用程序
|
||||
- 显示SQLite数据库中的抓取数据
|
||||
- 支持搜索、筛选和分类统计
|
||||
- 支持链接点击和数据操作
|
||||
|
||||
### 辅助脚本
|
||||
|
||||
1. **process_temp_files.py** - 临时文件处理脚本
|
||||
- 解析临时文件
|
||||
- 调用API进行分类
|
||||
- 存储到数据库
|
||||
@@ -28,30 +67,76 @@
|
||||
- 将相似分类合并为标准分类
|
||||
- 提供分类映射规则
|
||||
|
||||
### 辅助脚本
|
||||
4. **run_viewer.py** - 数据库查看器启动脚本
|
||||
- 检查依赖包
|
||||
- 启动SQLite数据库查看器
|
||||
|
||||
1. **check_db.py** - 数据库结构检查脚本
|
||||
2. **test_api.py** - API测试脚本
|
||||
3. **view_categories.py** - 查看分类示例脚本
|
||||
5. **check_db.py** - 数据库结构检查脚本
|
||||
6. **test_api.py** - API测试脚本
|
||||
7. **view_categories.py** - 查看分类示例脚本
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 1. 处理临时文件
|
||||
### 1. TopHub数据抓取
|
||||
|
||||
```bash
|
||||
python process_temp_files.py
|
||||
python tophub_scraper.py
|
||||
```
|
||||
|
||||
该脚本会:
|
||||
- 扫描当前目录下的所有临时文件(格式为"日期+时间.txt")
|
||||
- 解析文件内容,提取标题和链接
|
||||
- 调用本地API对标题进行分类
|
||||
- 检查并避免重复数据
|
||||
- 存储到tophub_data.db数据库
|
||||
- 从tophub.today网站抓取数据
|
||||
- 根据过滤列表过滤内容(可配置tophub_ban_column.txt)
|
||||
- 将抓取数据保存为临时文件(格式:YYYY年MM月DD日HHMMSS.txt)
|
||||
- 调用数据导入脚本处理抓取结果
|
||||
|
||||
### 2. 清理和标准化分类
|
||||
### 2. ProductHunt产品抓取与分析
|
||||
|
||||
```bash
|
||||
# 运行完整工作流程:抓取+分析+数据补充
|
||||
python product/integrated_product_system.py
|
||||
|
||||
# 仅进行分析,不抓取数据
|
||||
python product/integrated_product_system.py --analyze-only
|
||||
|
||||
# 限制最大分析产品数量
|
||||
python product/integrated_product_system.py --max-products 100
|
||||
```
|
||||
|
||||
主要功能:
|
||||
- 从tophub数据库查询ProductHunt链接
|
||||
- 使用Playwright抓取产品详细信息
|
||||
- 调用Ollama API分析产品开发难度
|
||||
- 自动计算难度分数
|
||||
- 转换用户关注数为数字格式
|
||||
- 补全缺失的产品链接
|
||||
- 重新分析无效难度评分
|
||||
|
||||
### 3. 数据可视化查看
|
||||
|
||||
```bash
|
||||
# 启动数据库查看器
|
||||
python db_viewer.py
|
||||
```
|
||||
|
||||
或使用启动脚本:
|
||||
|
||||
```bash
|
||||
python run_viewer.py
|
||||
```
|
||||
|
||||
查看器功能:
|
||||
- 显示数据库中的抓取数据
|
||||
- 支持关键词搜索和分类筛选
|
||||
- 实时分类统计显示
|
||||
- 支持链接点击在浏览器中打开
|
||||
- 支持批量删除和评分调整
|
||||
|
||||
### 4. 分类处理
|
||||
|
||||
```bash
|
||||
# 处理临时文件
|
||||
python process_temp_files.py
|
||||
|
||||
# 清理分类中的特殊字符
|
||||
python cleanup_categories.py
|
||||
|
||||
@@ -59,74 +144,118 @@ python cleanup_categories.py
|
||||
python standardize_categories.py
|
||||
```
|
||||
|
||||
### 3. 查看数据
|
||||
|
||||
```bash
|
||||
# 查看分类示例
|
||||
python view_categories.py
|
||||
|
||||
# 检查数据库结构
|
||||
python check_db.py
|
||||
```
|
||||
|
||||
## 数据库结构
|
||||
|
||||
数据库文件为`tophub_data.db`,包含以下表:
|
||||
### 1. TopHub数据数据库 (tophub_data.db)
|
||||
|
||||
1. **tophub_entries** - 主数据表
|
||||
- id: 主键
|
||||
- text_content: 标题内容(非空)
|
||||
- link: 链接
|
||||
- category: 分类
|
||||
- scrape_time: 抓取时间
|
||||
包含TopHub网站抓取的原始数据:
|
||||
|
||||
2. **classification_progress** - 分类进度表
|
||||
- id: 主键
|
||||
- total_count: 总数量
|
||||
- processed_count: 已处理数量
|
||||
- last_updated: 最后更新时间
|
||||
- **articles** - 主数据表
|
||||
- id: 主键
|
||||
- title: 标题内容
|
||||
- url: 链接
|
||||
- category: 分类
|
||||
- source_date: 来源日期
|
||||
- score: 评分
|
||||
- is_interested: 是否感兴趣
|
||||
|
||||
- **classification_progress** - 分类进度表
|
||||
- id: 主键
|
||||
- total_count: 总数量
|
||||
- processed_count: 已处理数量
|
||||
- last_updated: 最后更新时间
|
||||
|
||||
### 2. 产品分析数据库 (products.db)
|
||||
|
||||
包含ProductHunt产品的详细信息和分析结果:
|
||||
|
||||
- **products** - 产品信息表
|
||||
- id: 主键
|
||||
- url: 产品链接(唯一)
|
||||
- name: 产品名称
|
||||
- introduction: 产品简介
|
||||
- user_count: 用户数量
|
||||
- maker_link: 制作者链接
|
||||
- maker_statement: 制作者声明
|
||||
- created_at: 创建时间
|
||||
- updated_at: 更新时间
|
||||
|
||||
- **product_analysis** - 产品分析结果表
|
||||
- id: 主键
|
||||
- original_name: 原始产品名称
|
||||
- product_intro: 产品简介
|
||||
- development_difficulty: 开发难度描述
|
||||
- ai_response: AI原始响应
|
||||
- difficulty_score: 难度分数
|
||||
- product_link: 产品链接
|
||||
- follows: 关注数
|
||||
- created_at: 创建时间
|
||||
|
||||
## API配置
|
||||
|
||||
脚本使用本地Ollama API进行分类:
|
||||
- API地址:http://localhost:11434/api/generate
|
||||
- 模型:gemma3:4b
|
||||
- 请求格式:JSON
|
||||
项目使用本地Ollama API进行AI相关任务:
|
||||
- **API地址**:http://localhost:11434/api/generate
|
||||
- **模型**:qwen3:8b
|
||||
- **请求格式**:JSON
|
||||
|
||||
主要用途:
|
||||
1. **TopHub数据分类**:对抓取的标题进行智能分类
|
||||
2. **产品开发难度分析**:分析ProductHunt产品的开发难度
|
||||
3. **用户关注数转换**:将文本形式的关注数转换为数字
|
||||
4. **难度评分计算**:自动计算产品开发难度分数
|
||||
|
||||
## 核心依赖
|
||||
|
||||
### 基础依赖
|
||||
- requests: HTTP请求处理
|
||||
- sqlite3: 数据库操作
|
||||
- loguru: 日志记录
|
||||
- tqdm: 进度条显示
|
||||
|
||||
### 产品分析依赖
|
||||
- asyncio: 异步编程
|
||||
- playwright: 网页抓取
|
||||
- PySide6: GUI界面(仅用于查看器)
|
||||
|
||||
## 日志文件
|
||||
|
||||
系统会生成以下日志文件:
|
||||
- **tophub_scraper.log** - TopHub抓取日志
|
||||
- **integrated_product_system.log** - 产品分析系统日志
|
||||
- **process_temp_files.log** - 临时文件处理日志
|
||||
- **cleanup_categories.log** - 分类清理日志
|
||||
- **standardize_categories.log** - 分类标准化日志
|
||||
|
||||
## 分类标准
|
||||
|
||||
系统支持以下标准分类:
|
||||
|
||||
1. 科技 - 新质科技、互联网等
|
||||
2. 社会 - 社会新闻、生活服务等
|
||||
3. 体育 - 体育新闻、足球等
|
||||
4. 历史 - 历史事件、历史人物等
|
||||
5. 安全 - 安全漏洞、安全科技等
|
||||
6. 军事 - 军事新闻、国防等
|
||||
7. 金融 - 金融新闻、市场分析等
|
||||
8. 购物 - 电商、购物等
|
||||
9. 游戏 - 游戏新闻等
|
||||
10. 娱乐 - 娱乐八卦、音乐等
|
||||
11. 健康 - 健康医疗、健康生活等
|
||||
1. 科技 - 新质科技、互联网、人工智能等
|
||||
2. 社会 - 社会新闻、生活服务、热点事件等
|
||||
3. 体育 - 体育新闻、足球、篮球等
|
||||
4. 历史 - 历史事件、历史人物、考古发现等
|
||||
5. 安全 - 安全漏洞、网络安全、数据安全等
|
||||
6. 军事 - 军事新闻、国防、武器装备等
|
||||
7. 金融 - 金融新闻、市场分析、投资等
|
||||
8. 购物 - 电商、购物、消费等
|
||||
9. 游戏 - 游戏新闻、游戏开发、游戏测评等
|
||||
10. 娱乐 - 娱乐八卦、音乐、影视等
|
||||
11. 健康 - 健康医疗、健康生活、健身等
|
||||
12. 其他 - 其他未分类内容
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. 确保本地Ollama服务已启动并可访问
|
||||
2. 临时文件格式必须为"日期+时间.txt"
|
||||
3. 每个数据单元包含5行:节点ID、分类、标题、链接和分隔线
|
||||
4. 数据库文件会自动创建,无需手动创建
|
||||
|
||||
## 日志文件
|
||||
|
||||
系统会生成以下日志文件:
|
||||
- process_temp_files.log - 主处理日志
|
||||
- cleanup_categories.log - 分类清理日志
|
||||
- standardize_categories.log - 分类标准化日志
|
||||
1. **Ollama服务**:确保本地Ollama服务已启动并可访问(默认端口11434)
|
||||
2. **Chrome浏览器**:产品抓取功能需要已运行的Chrome浏览器实例(调试端口9222)
|
||||
3. **临时文件格式**:TopHub抓取生成的临时文件格式为"YYYY年MM月DD日HHMMSS.txt"
|
||||
4. **数据单元结构**:每个数据单元包含5行:节点ID、分类、标题、链接和分隔线
|
||||
5. **数据库自动创建**:所有数据库文件会自动创建,无需手动创建
|
||||
6. **依赖安装**:使用GUI查看器前,请安装依赖:`pip install -r requirements_gui.txt`
|
||||
7. **过滤列表配置**:可通过编辑tophub_ban_column.txt文件配置需要过滤的栏目
|
||||
|
||||
## 示例
|
||||
|
||||
### 临时文件格式示例
|
||||
### TopHub抓取临时文件示例
|
||||
|
||||
```
|
||||
节点ID: 102
|
||||
@@ -141,9 +270,18 @@ python check_db.py
|
||||
--------------------------------------------------
|
||||
```
|
||||
|
||||
### 处理结果示例
|
||||
### 产品分析结果示例
|
||||
|
||||
```
|
||||
标题 '女机器人' 分类为: 科技
|
||||
标题 '这个应该属于底盘不行吗' 分类为: 其他
|
||||
```
|
||||
产品 'AI Assistant' 分析完成
|
||||
- 难度描述: 中等难度,需要一定的AI开发经验
|
||||
- 难度分数: 60/100
|
||||
- 关注数: 1500
|
||||
```
|
||||
|
||||
### 数据库查看器界面
|
||||
|
||||
- 显示所有抓取数据,支持实时搜索和筛选
|
||||
- 分类统计显示在顶部
|
||||
- 点击链接可直接在浏览器中打开
|
||||
- 右键菜单支持批量操作和评分调整
|
||||
BIN
__pycache__/selenium_scraper.cpython-38.pyc
Normal file
BIN
__pycache__/selenium_scraper.cpython-38.pyc
Normal file
Binary file not shown.
BIN
__pycache__/sqlite_viewer.cpython-313.pyc
Normal file
BIN
__pycache__/sqlite_viewer.cpython-313.pyc
Normal file
Binary file not shown.
@@ -1,72 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
添加感兴趣标记字段脚本
|
||||
为articles表添加is_interested字段,默认值为0
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import os
|
||||
from loguru import logger
|
||||
|
||||
def add_interested_field():
|
||||
"""为articles表添加is_interested字段"""
|
||||
# 获取当前脚本所在目录的数据库文件路径
|
||||
script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
db_path = os.path.join(script_dir, "tophub_data.db")
|
||||
|
||||
# 检查数据库文件是否存在
|
||||
if not os.path.exists(db_path):
|
||||
logger.error(f"数据库文件不存在: {db_path}")
|
||||
return False
|
||||
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect(db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 检查is_interested字段是否已存在
|
||||
cursor.execute("PRAGMA table_info(articles)")
|
||||
columns = cursor.fetchall()
|
||||
column_names = [column[1] for column in columns]
|
||||
|
||||
if "is_interested" in column_names:
|
||||
logger.info("is_interested字段已存在,无需添加")
|
||||
conn.close()
|
||||
return True
|
||||
|
||||
# 添加is_interested字段,默认值为0
|
||||
logger.info("正在添加is_interested字段...")
|
||||
cursor.execute("ALTER TABLE articles ADD COLUMN is_interested INTEGER DEFAULT 0")
|
||||
|
||||
# 提交更改
|
||||
conn.commit()
|
||||
logger.info("成功添加is_interested字段")
|
||||
|
||||
# 验证字段是否添加成功
|
||||
cursor.execute("PRAGMA table_info(articles)")
|
||||
columns = cursor.fetchall()
|
||||
column_names = [column[1] for column in columns]
|
||||
|
||||
if "is_interested" in column_names:
|
||||
logger.info("验证成功:is_interested字段已添加到articles表")
|
||||
else:
|
||||
logger.error("验证失败:is_interested字段未成功添加")
|
||||
conn.close()
|
||||
return False
|
||||
|
||||
conn.close()
|
||||
return True
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"数据库操作出错: {str(e)}")
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.error(f"添加字段时出错: {str(e)}")
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
logger.add("db_modify.log", rotation="10 MB", level="INFO")
|
||||
if add_interested_field():
|
||||
logger.info("数据库修改完成")
|
||||
else:
|
||||
logger.error("数据库修改失败")
|
||||
2596
db_modify.log
2596
db_modify.log
File diff suppressed because it is too large
Load Diff
@@ -1,6 +0,0 @@
|
||||
2025-11-07 23:49:35.277 | INFO | __main__:modify_database_structure:44 - 正在添加score字段...
|
||||
2025-11-07 23:49:35.281 | INFO | __main__:modify_database_structure:48 - 正在转换is_interested数据到score字段...
|
||||
2025-11-07 23:49:35.288 | INFO | __main__:modify_database_structure:63 - 成功添加score字段并转换数据
|
||||
2025-11-07 23:49:35.289 | INFO | __main__:modify_database_structure:71 - 验证成功:score字段已添加到articles表
|
||||
2025-11-07 23:49:35.289 | INFO | __main__:modify_database_structure:84 - 数据转换结果: score=7的记录数: 1, score=5的记录数: 1196
|
||||
2025-11-07 23:49:35.290 | INFO | __main__:<module>:99 - 数据库结构修改完成
|
||||
1669
db_modify_zhipu.log
1669
db_modify_zhipu.log
File diff suppressed because it is too large
Load Diff
@@ -114,7 +114,8 @@ class DatabaseViewer(QMainWindow):
|
||||
# 设置列宽
|
||||
header = self.table.horizontalHeader()
|
||||
header.setSectionResizeMode(0, QHeaderView.ResizeToContents) # ID列
|
||||
header.setSectionResizeMode(1, QHeaderView.Stretch) # 文本内容列
|
||||
header.setSectionResizeMode(1, QHeaderView.Interactive) # 标题列 - 允许用户调整
|
||||
self.table.setColumnWidth(1, 400) # 设置标题列默认宽度为400像素
|
||||
header.setSectionResizeMode(2, QHeaderView.ResizeToContents) # 链接列
|
||||
header.setSectionResizeMode(3, QHeaderView.ResizeToContents) # 分类列
|
||||
header.setSectionResizeMode(4, QHeaderView.ResizeToContents) # 时间列
|
||||
@@ -434,8 +435,8 @@ class DatabaseViewer(QMainWindow):
|
||||
url = url_item.text() if url_item else ""
|
||||
date = date_item.text() if date_item else ""
|
||||
|
||||
# 用空格组合信息
|
||||
info = f"{title} {url} {date}".strip()
|
||||
# 按照要求的格式组合信息:"日期 标题\n链接"
|
||||
info = f"{date} {title}\n{url}".strip()
|
||||
all_info.append(info)
|
||||
|
||||
# 将所有信息用换行符连接
|
||||
|
||||
BIN
debug_maker_link_failure.png
Normal file
BIN
debug_maker_link_failure.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 526 KiB |
@@ -1,55 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
修复db_viewer.py文件中的方法位置问题
|
||||
将increase_score和decrease_score方法从文件末尾移动到DatabaseViewer类内部
|
||||
"""
|
||||
|
||||
import re
|
||||
|
||||
def fix_db_viewer():
|
||||
"""修复db_viewer.py文件"""
|
||||
try:
|
||||
# 读取原始文件
|
||||
with open('db_viewer.py', 'r', encoding='utf-8') as f:
|
||||
content = f.read()
|
||||
|
||||
# 找到increase_score和decrease_score方法
|
||||
increase_score_match = re.search(r'\n\s*def increase_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', content, re.DOTALL)
|
||||
decrease_score_match = re.search(r'\n\s*def decrease_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', content, re.DOTALL)
|
||||
|
||||
if not increase_score_match or not decrease_score_match:
|
||||
print("未找到increase_score或decrease_score方法")
|
||||
return False
|
||||
|
||||
# 提取方法内容
|
||||
increase_score_method = increase_score_match.group(0)
|
||||
decrease_score_method = decrease_score_match.group(0)
|
||||
|
||||
# 从文件末尾移除这两个方法
|
||||
content = re.sub(r'\n\s*def increase_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', '', content, flags=re.DOTALL)
|
||||
content = re.sub(r'\n\s*def decrease_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', '', content, flags=re.DOTALL)
|
||||
|
||||
# 找到mark_as_not_interested方法的结束位置,在其后插入新方法
|
||||
mark_as_not_interested_match = re.search(r'(\n\s*def mark_as_not_interested\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z))', content, re.DOTALL)
|
||||
|
||||
if not mark_as_not_interested_match:
|
||||
print("未找到mark_as_not_interested方法")
|
||||
return False
|
||||
|
||||
# 在mark_as_not_interested方法后插入新方法
|
||||
insertion_point = mark_as_not_interested_match.end(1)
|
||||
new_content = content[:insertion_point] + increase_score_method + decrease_score_method + content[insertion_point:]
|
||||
|
||||
# 写入修复后的文件
|
||||
with open('db_viewer.py', 'w', encoding='utf-8') as f:
|
||||
f.write(new_content)
|
||||
|
||||
print("成功修复db_viewer.py文件")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"修复文件时出错: {str(e)}")
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
fix_db_viewer()
|
||||
@@ -1,2 +0,0 @@
|
||||
2025-11-07 23:39:42.157 | INFO | __main__:<module>:42 - 开始GUI测试
|
||||
2025-11-07 23:39:47.875 | INFO | __main__:close_app:30 - 测试完成,关闭应用程序
|
||||
25084
integrated_product_system.log
Normal file
25084
integrated_product_system.log
Normal file
File diff suppressed because it is too large
Load Diff
155
jusuan.py
Normal file
155
jusuan.py
Normal file
@@ -0,0 +1,155 @@
|
||||
# 巨量算数,区域指南的,景区数据
|
||||
from selenium import webdriver
|
||||
from selenium.webdriver.chrome.options import Options
|
||||
from selenium.webdriver.chrome.service import Service
|
||||
from selenium.webdriver.common.by import By
|
||||
from selenium.webdriver.support.ui import WebDriverWait
|
||||
from selenium.webdriver.support import expected_conditions as EC
|
||||
from loguru import logger
|
||||
import time
|
||||
import json
|
||||
import os
|
||||
|
||||
# 配置日志
|
||||
logger.add("jusuan_scraper.log", rotation="10 MB", level="INFO")
|
||||
|
||||
def scrape_jusuan_data():
|
||||
"""
|
||||
抓取巨量算数网页上的景区数据
|
||||
"""
|
||||
try:
|
||||
# 配置Chrome选项,指定调试端口(与命令行端口一致)
|
||||
chrome_options = Options()
|
||||
chrome_options.add_experimental_option("debuggerAddress", "localhost:9222")
|
||||
|
||||
# 尝试初始化WebDriver
|
||||
logger.info("正在连接到Chrome浏览器...")
|
||||
driver = None
|
||||
|
||||
# 方法1:尝试使用默认的Chrome驱动
|
||||
try:
|
||||
driver = webdriver.Chrome(options=chrome_options)
|
||||
logger.info("使用默认Chrome驱动连接成功")
|
||||
except Exception as e:
|
||||
logger.warning(f"使用默认Chrome驱动失败: {str(e)}")
|
||||
|
||||
# 方法2:尝试使用webdriver-manager自动管理驱动
|
||||
try:
|
||||
from webdriver_manager.chrome import ChromeDriverManager
|
||||
service = Service(ChromeDriverManager().install())
|
||||
driver = webdriver.Chrome(service=service, options=chrome_options)
|
||||
logger.info("使用webdriver-manager连接成功")
|
||||
except Exception as e2:
|
||||
logger.warning(f"使用webdriver-manager失败: {str(e2)}")
|
||||
|
||||
# 方法3:尝试指定常见的ChromeDriver路径
|
||||
common_paths = [
|
||||
r"C:\chromedriver\chromedriver.exe",
|
||||
r"C:\Program Files\Google\Chrome\Application\chromedriver.exe",
|
||||
r"C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe",
|
||||
os.path.join(os.getcwd(), "chromedriver.exe")
|
||||
]
|
||||
|
||||
for path in common_paths:
|
||||
if os.path.exists(path):
|
||||
try:
|
||||
service = Service(path)
|
||||
driver = webdriver.Chrome(service=service, options=chrome_options)
|
||||
logger.info(f"使用路径 {path} 连接成功")
|
||||
break
|
||||
except Exception as e3:
|
||||
logger.warning(f"使用路径 {path} 失败: {str(e3)}")
|
||||
continue
|
||||
|
||||
if driver is None:
|
||||
raise Exception("所有连接Chrome浏览器的方法都失败了")
|
||||
|
||||
# 访问目标网页
|
||||
target_url = "https://trendinsight.oceanengine.com/area?dates=daily-20251112_weekly-20251109_monthly-202510&area=%5B%2211%22%5D&category_id=3&rankStyle=monthly"
|
||||
logger.info(f"正在访问网页: {target_url}")
|
||||
driver.get(target_url)
|
||||
|
||||
# 等待页面加载
|
||||
logger.info("等待页面加载完成...")
|
||||
WebDriverWait(driver, 15).until(
|
||||
EC.presence_of_element_located((By.CLASS_NAME, "byted-table-body"))
|
||||
)
|
||||
|
||||
# 获取表格主体
|
||||
table_body = driver.find_element(By.CLASS_NAME, "byted-table-body")
|
||||
logger.info("找到表格主体,开始抓取数据...")
|
||||
|
||||
# 获取所有行
|
||||
rows = table_body.find_elements(By.TAG_NAME, "div")
|
||||
logger.info(f"找到 {len(rows)} 行数据")
|
||||
|
||||
# 存储抓取的数据
|
||||
scraped_data = []
|
||||
|
||||
# 遍历每一行
|
||||
for i, row in enumerate(rows):
|
||||
try:
|
||||
# 查找景区名称 (class包含"poiTitle-")
|
||||
poi_title_element = row.find_element(By.CSS_SELECTOR, '[class*="poiTitle-"]')
|
||||
poi_name = poi_title_element.text.strip()
|
||||
|
||||
# 查找景区分类 (class包含"categoryIconBox-")
|
||||
category_element = row.find_element(By.CSS_SELECTOR, '[class*="categoryIconBox-"]')
|
||||
category = category_element.text.strip()
|
||||
|
||||
# 查找热度指数值 (class包含"numberValue-")
|
||||
heat_index_element = row.find_element(By.CSS_SELECTOR, '[class*="numberValue-"]')
|
||||
heat_index = heat_index_element.text.strip()
|
||||
|
||||
# 将数据添加到列表
|
||||
data_entry = {
|
||||
"序号": i + 1,
|
||||
"景区名称": poi_name,
|
||||
"景区分类": category,
|
||||
"热度指数": heat_index
|
||||
}
|
||||
scraped_data.append(data_entry)
|
||||
|
||||
logger.info(f"抓取第 {i+1} 条数据: {poi_name} | {category} | {heat_index}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"处理第 {i+1} 行时出错: {str(e)}")
|
||||
continue
|
||||
|
||||
# 将数据保存为JSON文件
|
||||
output_file = "jusuan_scenic_spots_data.json"
|
||||
with open(output_file, "w", encoding="utf-8") as f:
|
||||
json.dump(scraped_data, f, ensure_ascii=False, indent=2)
|
||||
|
||||
logger.info(f"数据抓取完成,共 {len(scraped_data)} 条记录,已保存到 {output_file}")
|
||||
|
||||
# 打印前5条数据作为预览
|
||||
logger.info("前5条数据预览:")
|
||||
for i, data in enumerate(scraped_data[:5]):
|
||||
logger.info(f"{i+1}. {data['景区名称']} | {data['景区分类']} | {data['热度指数']}")
|
||||
|
||||
return scraped_data
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"抓取过程中发生错误: {str(e)}")
|
||||
return None
|
||||
finally:
|
||||
# 关闭浏览器连接(但不关闭浏览器本身)
|
||||
if driver:
|
||||
try:
|
||||
driver.quit()
|
||||
except:
|
||||
pass
|
||||
|
||||
if __name__ == "__main__":
|
||||
logger.info("开始执行巨量算数景区数据抓取...")
|
||||
logger.info("请确保Chrome浏览器已通过以下命令启动:")
|
||||
logger.info('"C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe" --remote-debugging-port=9222 --user-data-dir="C:\\tmp"')
|
||||
|
||||
result = scrape_jusuan_data()
|
||||
if result:
|
||||
logger.info("抓取任务完成")
|
||||
else:
|
||||
logger.error("抓取任务失败")
|
||||
logger.info("请尝试安装webdriver-manager: pip install webdriver-manager")
|
||||
logger.info("或者手动下载ChromeDriver并放在系统PATH中")
|
||||
BIN
modal_window_debug.png
Normal file
BIN
modal_window_debug.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 231 KiB |
@@ -1,79 +0,0 @@
|
||||
import sys
|
||||
import requests
|
||||
import json
|
||||
from PySide6.QtWidgets import QApplication, QMainWindow, QListWidget, QVBoxLayout, QWidget, QLabel, QPushButton
|
||||
from PySide6.QtCore import Qt
|
||||
from loguru import logger
|
||||
|
||||
class OllamaModelViewer(QMainWindow):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.setWindowTitle("Ollama 模型查看器")
|
||||
self.setGeometry(100, 100, 600, 400)
|
||||
|
||||
# 创建主窗口部件
|
||||
self.central_widget = QWidget()
|
||||
self.setCentralWidget(self.central_widget)
|
||||
|
||||
# 创建布局
|
||||
self.layout = QVBoxLayout()
|
||||
self.central_widget.setLayout(self.layout)
|
||||
|
||||
# 创建标题标签
|
||||
self.title_label = QLabel("当前安装的Ollama模型:")
|
||||
self.title_label.setStyleSheet("font-weight: bold; font-size: 14px;")
|
||||
self.layout.addWidget(self.title_label)
|
||||
|
||||
# 创建列表部件
|
||||
self.model_list = QListWidget()
|
||||
self.model_list.setStyleSheet("font-family: monospace;")
|
||||
self.layout.addWidget(self.model_list)
|
||||
|
||||
# 创建刷新按钮
|
||||
self.refresh_button = QPushButton("刷新模型列表")
|
||||
self.refresh_button.clicked.connect(self.fetch_models)
|
||||
self.layout.addWidget(self.refresh_button)
|
||||
|
||||
# 初始加载模型
|
||||
self.fetch_models()
|
||||
|
||||
def fetch_models(self):
|
||||
"""从Ollama API获取模型列表"""
|
||||
self.model_list.clear()
|
||||
|
||||
try:
|
||||
logger.info("正在获取Ollama模型列表...")
|
||||
response = requests.get("http://localhost:11434/api/tags", timeout=5)
|
||||
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
models = data.get("models", [])
|
||||
|
||||
if models:
|
||||
for model in models:
|
||||
model_name = model.get("model", "")
|
||||
if model_name:
|
||||
self.model_list.addItem(model_name)
|
||||
logger.info(f"找到模型: {model_name}")
|
||||
else:
|
||||
self.model_list.addItem("未找到任何模型")
|
||||
logger.info("未找到任何模型")
|
||||
else:
|
||||
self.model_list.addItem(f"API请求失败,状态码: {response.status_code}")
|
||||
logger.error(f"API请求失败,状态码: {response.status_code}")
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
self.model_list.addItem("无法连接到Ollama API")
|
||||
logger.error(f"无法连接到Ollama API: {str(e)}")
|
||||
except json.JSONDecodeError as e:
|
||||
self.model_list.addItem("API响应格式错误")
|
||||
logger.error(f"API响应格式错误: {str(e)}")
|
||||
except Exception as e:
|
||||
self.model_list.addItem(f"发生错误: {str(e)}")
|
||||
logger.error(f"发生未知错误: {str(e)}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
app = QApplication(sys.argv)
|
||||
window = OllamaModelViewer()
|
||||
window.show()
|
||||
sys.exit(app.exec())
|
||||
213
product/README.md
Normal file
213
product/README.md
Normal file
@@ -0,0 +1,213 @@
|
||||
# 全功能产品抓取与分析系统
|
||||
|
||||
这是一个整合了产品抓取和AI分析功能的完整系统,将原来的 `integrated_scraper.py` 和 `product_ai_analysis.py` 合并为一个统一的系统。
|
||||
|
||||
## 功能特性
|
||||
|
||||
### 数据抓取功能
|
||||
- 从tophub_data.db数据库中查询ProductHunt链接
|
||||
- 使用playwright连接Chrome浏览器抓取产品信息
|
||||
- 自动去重,避免重复抓取
|
||||
- 支持批量抓取和进度显示
|
||||
- 保存产品信息到products表
|
||||
|
||||
### AI分析功能
|
||||
- 调用Ollama AI API(qwen3:8b模型)分析产品开发难度
|
||||
- 自动解析AI响应,提取产品名称、简介和开发难度
|
||||
- 保存分析结果到product_analysis表
|
||||
- 支持断点续分析,避免重复分析
|
||||
- 自动延时保护,避免API过载
|
||||
|
||||
### 系统特性
|
||||
- 统一的配置管理(config.py)
|
||||
- 完整的日志记录(loguru)
|
||||
- 进度条显示(tqdm)
|
||||
- 错误处理和重试机制
|
||||
- 模块化设计,易于扩展
|
||||
|
||||
## 文件结构
|
||||
|
||||
```
|
||||
product/
|
||||
├── integrated_product_system.py # 主系统文件(核心功能)
|
||||
├── run_system.py # 简化命令行界面
|
||||
├── config.py # 配置文件
|
||||
├── README.md # 使用说明
|
||||
└── playwright-get-data.py # playwright抓取模块(依赖文件)
|
||||
```
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 1. 基本使用(完整模式)
|
||||
```bash
|
||||
# 运行完整工作流程(抓取+分析)
|
||||
python run_system.py --mode full
|
||||
|
||||
# 或者使用主系统文件
|
||||
python integrated_product_system.py
|
||||
```
|
||||
|
||||
### 2. 仅抓取模式
|
||||
```bash
|
||||
# 仅运行抓取功能
|
||||
python run_system.py --mode scraping
|
||||
|
||||
# 指定抓取数量限制
|
||||
python run_system.py --mode scraping --limit 50
|
||||
|
||||
# 不跳过重复URL
|
||||
python run_system.py --mode scraping --no-skip-duplicates
|
||||
```
|
||||
|
||||
### 3. 仅分析模式
|
||||
```bash
|
||||
# 仅运行AI分析功能
|
||||
python run_system.py --mode analysis
|
||||
|
||||
# 限制分析数量
|
||||
python run_system.py --mode analysis --max-products 100
|
||||
```
|
||||
|
||||
### 4. 高级选项
|
||||
```bash
|
||||
# 指定数据库路径
|
||||
python run_system.py --tophub-db /path/to/tophub_data.db --product-db /path/to/products.db
|
||||
|
||||
# 指定Chrome调试端口
|
||||
python run_system.py --debug-port 9222
|
||||
|
||||
# 指定日志文件和级别
|
||||
python run_system.py --log-file my_log.log --log-level DEBUG
|
||||
|
||||
# 指定特定URL进行抓取
|
||||
python run_system.py --mode scraping --urls https://www.producthunt.com/posts/example-product
|
||||
```
|
||||
|
||||
## 更新日志
|
||||
|
||||
### v1.0.1 (当前版本)
|
||||
- ✅ 修复异步调用问题,支持在已有事件循环中运行
|
||||
- ✅ 优化错误处理和事件循环管理
|
||||
- ✅ 测试验证所有运行模式正常工作
|
||||
|
||||
### v1.0.0
|
||||
- ✨ 合并integrated_scraper.py和product_ai_analysis.py功能
|
||||
- ✨ 添加统一的配置管理
|
||||
- ✨ 提供简化的命令行界面
|
||||
- ✨ 增强错误处理和日志记录
|
||||
- ✨ 支持多种运行模式
|
||||
|
||||
## 数据库结构
|
||||
|
||||
### products表(产品信息)
|
||||
```sql
|
||||
CREATE TABLE products (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
url TEXT NOT NULL UNIQUE,
|
||||
name TEXT,
|
||||
introduction TEXT,
|
||||
user_count TEXT,
|
||||
maker_link TEXT,
|
||||
maker_statement TEXT,
|
||||
created_at TEXT NOT NULL,
|
||||
updated_at TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### product_analysis表(AI分析结果)
|
||||
```sql
|
||||
CREATE TABLE product_analysis (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
original_id INTEGER,
|
||||
original_name TEXT,
|
||||
product_name TEXT,
|
||||
product_intro TEXT,
|
||||
development_difficulty TEXT,
|
||||
difficulty_score INTEGER,
|
||||
ai_response TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY (original_id) REFERENCES products (id)
|
||||
);
|
||||
```
|
||||
|
||||
## 配置说明
|
||||
|
||||
编辑 `config.py` 文件可以修改系统配置:
|
||||
|
||||
- **DATABASE_CONFIG**: 数据库路径配置
|
||||
- **CHROME_CONFIG**: Chrome浏览器配置
|
||||
- **AI_CONFIG**: AI API配置(Ollama)
|
||||
- **SCRAPING_CONFIG**: 抓取配置
|
||||
- **LOGGING_CONFIG**: 日志配置
|
||||
- **ANALYSIS_CONFIG**: 分析配置
|
||||
|
||||
## 系统要求
|
||||
|
||||
- Python 3.7+
|
||||
- Chrome浏览器(已运行,调试端口开启)
|
||||
- Ollama服务(已运行,qwen3:8b模型已安装)
|
||||
- SQLite数据库
|
||||
|
||||
## 依赖库
|
||||
|
||||
```bash
|
||||
pip install loguru tqdm requests playwright
|
||||
```
|
||||
|
||||
## 运行步骤
|
||||
|
||||
1. **确保Chrome浏览器已运行并开启调试端口**
|
||||
```bash
|
||||
# Windows
|
||||
chrome.exe --remote-debugging-port=9222
|
||||
|
||||
# macOS
|
||||
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
|
||||
```
|
||||
|
||||
2. **确保Ollama服务已运行**
|
||||
```bash
|
||||
# 启动Ollama服务
|
||||
ollama serve
|
||||
|
||||
# 安装qwen3:8b模型(如果未安装)
|
||||
ollama pull qwen3:8b
|
||||
```
|
||||
|
||||
3. **确保tophub_data.db数据库存在**
|
||||
- 数据库应包含articles表,且有url字段
|
||||
|
||||
4. **运行系统**
|
||||
```bash
|
||||
python run_system.py
|
||||
```
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q: 系统运行时提示Chrome连接失败?
|
||||
A: 确保Chrome浏览器已运行并开启了调试端口(默认9222)。
|
||||
|
||||
### Q: AI分析时提示API调用失败?
|
||||
A: 确保Ollama服务已运行,且qwen3:8b模型已安装。
|
||||
|
||||
### Q: 如何查看抓取和分析的进度?
|
||||
A: 系统会自动显示进度条,同时也会在日志文件中记录详细信息。
|
||||
|
||||
### Q: 如何只分析特定数量的产品?
|
||||
A: 使用 `--max-products` 参数,例如:`python run_system.py --max-products 50`
|
||||
|
||||
### Q: 如何重新分析已分析过的产品?
|
||||
A: 系统默认会跳过已分析的产品,如需重新分析,请删除product_analysis表中对应记录。
|
||||
|
||||
## 更新日志
|
||||
|
||||
### v1.0.0 (当前版本)
|
||||
- ✨ 合并integrated_scraper.py和product_ai_analysis.py功能
|
||||
- ✨ 添加统一的配置管理
|
||||
- ✨ 提供简化的命令行界面
|
||||
- ✨ 增强错误处理和日志记录
|
||||
- ✨ 支持多种运行模式
|
||||
|
||||
## 联系支持
|
||||
|
||||
如有问题,请查看日志文件获取详细信息,或检查系统配置是否正确。
|
||||
BIN
product/__pycache__/config.cpython-313.pyc
Normal file
BIN
product/__pycache__/config.cpython-313.pyc
Normal file
Binary file not shown.
BIN
product/__pycache__/integrated_product_system.cpython-313.pyc
Normal file
BIN
product/__pycache__/integrated_product_system.cpython-313.pyc
Normal file
Binary file not shown.
BIN
product/__pycache__/playwright-get-data.cpython-313.pyc
Normal file
BIN
product/__pycache__/playwright-get-data.cpython-313.pyc
Normal file
Binary file not shown.
BIN
product/__pycache__/web_sqlite_viewer.cpython-313.pyc
Normal file
BIN
product/__pycache__/web_sqlite_viewer.cpython-313.pyc
Normal file
Binary file not shown.
52
product/config.py
Normal file
52
product/config.py
Normal file
@@ -0,0 +1,52 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
全功能产品系统配置文件
|
||||
"""
|
||||
|
||||
import os
|
||||
|
||||
# 数据库配置
|
||||
DATABASE_CONFIG = {
|
||||
'tophub_db_path': os.path.join(os.path.dirname(os.path.dirname(__file__)), "tophub_data.db"),
|
||||
'product_db_path': os.path.join(os.path.dirname(__file__), "products.db"),
|
||||
}
|
||||
|
||||
# Chrome调试配置
|
||||
CHROME_CONFIG = {
|
||||
'debug_port': 9222,
|
||||
'headless': False,
|
||||
'timeout': 30,
|
||||
}
|
||||
|
||||
# AI分析配置
|
||||
AI_CONFIG = {
|
||||
'api_url': "http://localhost:11434/api/generate",
|
||||
'model': "qwen3:8b",
|
||||
'timeout': 60,
|
||||
'retry_count': 3,
|
||||
'retry_delay': 5,
|
||||
}
|
||||
|
||||
# 抓取配置
|
||||
SCRAPING_CONFIG = {
|
||||
'default_limit': 0, # 0表示不限制
|
||||
'skip_duplicates': True,
|
||||
'batch_size': 10,
|
||||
'delay_between_requests': 2,
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
LOGGING_CONFIG = {
|
||||
'log_file': "integrated_product_system.log",
|
||||
'log_level': "INFO",
|
||||
'log_rotation': "10 MB",
|
||||
'log_format': "<green>{time:YYYY-MM-DD HH:mm:ss}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - <level>{message}</level>",
|
||||
}
|
||||
|
||||
# 分析配置
|
||||
ANALYSIS_CONFIG = {
|
||||
'max_products': None, # None表示分析所有产品
|
||||
'batch_size': 1, # 每次分析的产品数量
|
||||
'delay_between_analyses': 2, # 分析间隔(秒)
|
||||
}
|
||||
1121
product/integrated_product_system.py
Normal file
1121
product/integrated_product_system.py
Normal file
File diff suppressed because it is too large
Load Diff
656
product/playwright-get-data.py
Normal file
656
product/playwright-get-data.py
Normal file
@@ -0,0 +1,656 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
使用Playwright连接远程Chrome调试端口访问ProductHunt页面
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
from playwright.async_api import async_playwright
|
||||
from loguru import logger
|
||||
import sys
|
||||
from datetime import datetime
|
||||
|
||||
# 配置日志
|
||||
logger.remove()
|
||||
logger.add(sys.stderr, level="INFO", format="<green>{time:YYYY-MM-DD HH:mm:ss}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - <level>{message}</level>")
|
||||
|
||||
|
||||
class ProductHuntScraper:
|
||||
"""ProductHunt数据抓取器"""
|
||||
|
||||
def __init__(self, debug_port=9222):
|
||||
self.debug_port = debug_port
|
||||
self.browser = None
|
||||
self.page = None
|
||||
self.click_records = [] # 记录点击行为
|
||||
self.dom_selection_records = [] # 记录DOM选取行为
|
||||
|
||||
async def connect_to_existing_chrome(self):
|
||||
"""连接到已运行的Chrome实例"""
|
||||
logger.info(f"正在连接到Chrome远程调试端口 {self.debug_port}")
|
||||
|
||||
try:
|
||||
# 创建Playwright实例并保持引用
|
||||
self.playwright = await async_playwright().start()
|
||||
|
||||
# 连接到已运行的Chrome实例
|
||||
self.browser = await self.playwright.chromium.connect_over_cdp(
|
||||
f"http://localhost:{self.debug_port}"
|
||||
)
|
||||
|
||||
# 获取第一个上下文(通常是默认的)
|
||||
contexts = self.browser.contexts
|
||||
if contexts:
|
||||
context = contexts[0]
|
||||
# 获取第一个页面
|
||||
pages = context.pages
|
||||
if pages:
|
||||
self.page = pages[0]
|
||||
else:
|
||||
# 如果没有页面,创建新页面
|
||||
self.page = await context.new_page()
|
||||
else:
|
||||
# 如果没有上下文,创建新上下文
|
||||
context = await self.browser.new_context()
|
||||
self.page = await context.new_page()
|
||||
|
||||
logger.success("成功连接到Chrome浏览器")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"连接Chrome失败: {e}")
|
||||
return False
|
||||
|
||||
async def record_click(self, x, y, selector="", description=""):
|
||||
"""记录点击行为"""
|
||||
click_record = {
|
||||
"timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"type": "click",
|
||||
"x": x,
|
||||
"y": y,
|
||||
"selector": selector,
|
||||
"description": description
|
||||
}
|
||||
self.click_records.append(click_record)
|
||||
logger.info(f"记录点击: {description} - 坐标({x}, {y}) - 选择器: {selector}")
|
||||
|
||||
async def record_dom_selection(self, selector, description=""):
|
||||
"""记录DOM选取行为"""
|
||||
dom_record = {
|
||||
"timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"type": "dom_selection",
|
||||
"selector": selector,
|
||||
"description": description
|
||||
}
|
||||
self.dom_selection_records.append(dom_record)
|
||||
logger.info(f"记录DOM选取: {description} - 选择器: {selector}")
|
||||
|
||||
async def save_behavior_records(self):
|
||||
"""保存行为记录到文件"""
|
||||
import json
|
||||
|
||||
records = {
|
||||
"click_records": self.click_records,
|
||||
"dom_selection_records": self.dom_selection_records
|
||||
}
|
||||
|
||||
filename = f"playwright_behavior_records_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
|
||||
|
||||
with open(filename, "w", encoding="utf-8") as f:
|
||||
json.dump(records, f, ensure_ascii=False, indent=2)
|
||||
|
||||
logger.success(f"行为记录已保存到: {filename}")
|
||||
|
||||
async def navigate_to_producthunt(self, url):
|
||||
"""导航到ProductHunt页面"""
|
||||
if not self.page:
|
||||
logger.error("页面未初始化")
|
||||
return False
|
||||
|
||||
try:
|
||||
logger.info(f"正在访问: {url}")
|
||||
# 增加页面导航超时时间到300秒
|
||||
await self.page.goto(url, wait_until="domcontentloaded", timeout=300000)
|
||||
|
||||
# 等待页面标题包含"Product Hunt",最长等待300秒
|
||||
logger.info("等待页面标题包含'Product Hunt'...")
|
||||
max_wait_time = 60 # 最大等待时间(秒)
|
||||
wait_interval = 5 # 检查间隔(秒)
|
||||
waited_time = 0
|
||||
|
||||
while waited_time < max_wait_time:
|
||||
# 获取页面标题
|
||||
title = await self.page.title()
|
||||
logger.info(f"当前页面标题: {title}")
|
||||
|
||||
# 检查标题是否包含"Product Hunt"
|
||||
if "Product Hunt" in title:
|
||||
logger.success(f"页面标题已包含'Product Hunt',等待时间: {waited_time}秒")
|
||||
logger.success("Product Hunt网站已成功打开")
|
||||
return True
|
||||
|
||||
# 检查是否遇到Cloudflare验证
|
||||
if "Just a moment" in title or "请稍候" in title or "Checking your browser" in title:
|
||||
logger.info("遇到Cloudflare验证,等待验证完成...")
|
||||
await asyncio.sleep(10) # 等待10秒
|
||||
waited_time += 10
|
||||
continue
|
||||
|
||||
# 检查是否已成功加载页面内容
|
||||
try:
|
||||
# 尝试查找页面中的关键元素
|
||||
h1_element = await self.page.query_selector("h1")
|
||||
if h1_element:
|
||||
logger.success("检测到页面内容已加载")
|
||||
return True
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# 等待一段时间后再次检查
|
||||
await asyncio.sleep(wait_interval)
|
||||
waited_time += wait_interval
|
||||
logger.info(f"已等待 {waited_time} 秒,继续等待...")
|
||||
|
||||
# 如果超时仍未找到目标标题
|
||||
logger.warning(f"等待超时({max_wait_time}秒),页面标题仍未包含'Product Hunt'")
|
||||
logger.info(f"最终页面标题: {await self.page.title()}")
|
||||
|
||||
# 即使超时,如果页面正常加载也返回True
|
||||
final_title = await self.page.title()
|
||||
if final_title and "Not Found" not in final_title and "Error" not in final_title:
|
||||
logger.success("页面已正常加载,但标题不符合预期")
|
||||
return True
|
||||
else:
|
||||
logger.error("页面加载失败")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"访问页面失败: {e}")
|
||||
return False
|
||||
|
||||
async def extract_maker_statement_from_current_window(self, maker_link, maker_text):
|
||||
"""在当前窗口中提取制作人发言"""
|
||||
if not maker_link:
|
||||
logger.warning("制作人链接为空")
|
||||
return ""
|
||||
|
||||
if not self.page:
|
||||
logger.error("当前页面未初始化")
|
||||
return ""
|
||||
|
||||
try:
|
||||
# 记录点击制作人链接的行为
|
||||
await self.record_click("制作人链接", "点击制作人链接在当前窗口打开")
|
||||
|
||||
# 保存当前页面的URL,以便后续返回
|
||||
original_url = self.page.url
|
||||
logger.info(f"保存当前页面URL: {original_url}")
|
||||
|
||||
# 在当前页面导航到制作人链接
|
||||
logger.info(f"正在在当前窗口打开制作人链接: {maker_link}")
|
||||
|
||||
# 设置更长的超时时间来处理模态窗口
|
||||
try:
|
||||
await self.page.goto(maker_link, wait_until="domcontentloaded", timeout=60000)
|
||||
logger.success("页面导航成功")
|
||||
except Exception as e:
|
||||
logger.error(f"页面导航失败: {e}")
|
||||
# 尝试返回原始页面
|
||||
try:
|
||||
await self.page.goto(original_url, wait_until="domcontentloaded")
|
||||
logger.success(f"已返回原始页面: {original_url}")
|
||||
except Exception as return_error:
|
||||
logger.error(f"返回原始页面失败: {return_error}")
|
||||
return ""
|
||||
|
||||
# 等待页面加载
|
||||
await self.page.wait_for_load_state("networkidle")
|
||||
|
||||
# 检查并处理可能的模态窗口
|
||||
try:
|
||||
logger.info("检查是否存在模态窗口...")
|
||||
modal_selectors = [
|
||||
"[role='dialog']",
|
||||
".modal",
|
||||
".modal-dialog",
|
||||
"[data-testid='modal']",
|
||||
"[class*='modal']",
|
||||
"[class*='overlay']",
|
||||
"[class*='dialog']",
|
||||
"[class*='popup']"
|
||||
]
|
||||
|
||||
for selector in modal_selectors:
|
||||
try:
|
||||
modal_element = await self.page.query_selector(selector)
|
||||
if modal_element:
|
||||
logger.info(f"检测到模态窗口,选择器: {selector}")
|
||||
|
||||
# 尝试关闭模态窗口
|
||||
close_selectors = [
|
||||
"[aria-label='Close']",
|
||||
".close",
|
||||
".modal-close",
|
||||
"[data-testid='close']",
|
||||
"button:has-text('Close')",
|
||||
"button:has-text('关闭')",
|
||||
"button:has-text('X')"
|
||||
]
|
||||
|
||||
for close_selector in close_selectors:
|
||||
try:
|
||||
close_button = await modal_element.query_selector(close_selector)
|
||||
if close_button:
|
||||
await close_button.click()
|
||||
logger.success(f"已关闭模态窗口,使用选择器: {close_selector}")
|
||||
await self.page.wait_for_timeout(1000) # 等待关闭动画
|
||||
break
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
# 如果模态窗口仍然存在,尝试点击模态窗口外部关闭
|
||||
try:
|
||||
await self.page.mouse.click(10, 10) # 点击页面左上角
|
||||
logger.info("尝试点击页面外部关闭模态窗口")
|
||||
await self.page.wait_for_timeout(1000)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
break
|
||||
except Exception:
|
||||
continue
|
||||
except Exception as e:
|
||||
logger.warning(f"检查模态窗口时出错: {e}")
|
||||
|
||||
# 快速检查页面是否已加载
|
||||
logger.info("快速检查页面加载状态...")
|
||||
|
||||
# 立即尝试获取页面内容,不等待特定元素
|
||||
try:
|
||||
title_text = await self.page.title()
|
||||
logger.info(f"页面标题: {title_text}")
|
||||
except Exception as e:
|
||||
logger.warning(f"获取页面标题失败: {e}")
|
||||
|
||||
# 快速检查页面是否有内容
|
||||
try:
|
||||
body_element = await self.page.query_selector("body")
|
||||
if body_element:
|
||||
body_text = await body_element.text_content()
|
||||
if len(body_text.strip()) > 10:
|
||||
logger.success("页面内容已加载")
|
||||
else:
|
||||
logger.warning("页面内容为空或过短")
|
||||
except Exception as e:
|
||||
logger.warning(f"检查页面内容失败: {e}")
|
||||
|
||||
# 短暂等待确保DOM稳定
|
||||
logger.info("等待DOM稳定...")
|
||||
await self.page.wait_for_timeout(2000) # 等待2秒
|
||||
|
||||
# 保存模态窗口截图用于调试
|
||||
modal_screenshot = "modal_window_debug.png"
|
||||
await self.page.screenshot(path=modal_screenshot, full_page=True)
|
||||
logger.info(f"模态窗口调试截图已保存到: {modal_screenshot}")
|
||||
|
||||
# 首先检查页面内容,获取页面主要文本
|
||||
try:
|
||||
page_content = await self.page.content()
|
||||
logger.info("页面内容已获取")
|
||||
|
||||
# 检查页面是否包含常见的关键词
|
||||
keywords = ['comment', 'discussion', 'maker', 'creator', 'author', 'statement', 'description']
|
||||
found_keywords = [kw for kw in keywords if kw in page_content.lower()]
|
||||
if found_keywords:
|
||||
logger.info(f"页面包含关键词: {found_keywords}")
|
||||
else:
|
||||
logger.warning("页面未检测到常见关键词")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"获取页面内容失败: {e}")
|
||||
|
||||
# 提取制作人评论内容 - 针对模态窗口的多种选择器策略
|
||||
logger.info("正在提取制作人评论内容...")
|
||||
|
||||
# 策略1:尝试多种XPath选择器
|
||||
xpath_selectors = [
|
||||
# 新的主要选择器:包含prose、prose-format和richText类的div
|
||||
"//div[contains(@class, 'prose') and contains(@class, 'prose-format') and contains(@class, 'richText')]",
|
||||
# 备用选择器
|
||||
'//*[@id="comment-4597755"]/div/div[2]/div/div/div', # 原始选择器
|
||||
'//div[contains(@class, "comment")]//div[contains(@class, "text")]', # 通用评论选择器
|
||||
'//div[contains(@class, "modal")]//div[contains(@class, "content")]', # 模态窗口内容
|
||||
'//div[contains(@class, "dialog")]//div[contains(@class, "body")]', # 对话框内容
|
||||
'//section//div[contains(@class, "text")]', # section内的文本内容
|
||||
'//div[contains(@class, "launch")]//div[contains(@class, "description")]', # 发布描述
|
||||
'//article//div[contains(@class, "content")]', # 文章内容
|
||||
'//main//div[contains(@class, "text")]', # 主要内容区文本
|
||||
# 其他备用选择器
|
||||
"//div[contains(@class, 'styles_commentsContainer')]//div[contains(@class, 'styles_comment')]//div[contains(@class, 'styles_commentBody')]//p",
|
||||
"//div[contains(@class, 'comment')]//p",
|
||||
"//div[contains(@class, 'comments')]//p",
|
||||
]
|
||||
|
||||
for i, xpath in enumerate(xpath_selectors, 1):
|
||||
try:
|
||||
logger.info(f"尝试选择器 {i}/{len(xpath_selectors)}: {xpath}")
|
||||
comment_element = await self.page.query_selector(f'xpath={xpath}')
|
||||
if comment_element:
|
||||
maker_statement = (await comment_element.text_content()).strip()
|
||||
if maker_statement: # 确保有内容
|
||||
logger.success(f"使用选择器 {i} 成功提取制作人评论内容: {maker_statement[:200]}...")
|
||||
|
||||
# 提取完成后返回原始页面
|
||||
logger.info("提取完成,正在返回原始产品页面...")
|
||||
await self.page.goto(original_url, wait_until="domcontentloaded")
|
||||
logger.success(f"已成功返回原始页面: {original_url}")
|
||||
|
||||
return maker_statement
|
||||
else:
|
||||
logger.warning(f"选择器 {i} 提取的内容为空")
|
||||
except Exception as e:
|
||||
logger.warning(f"选择器 {i} 失败: {e}")
|
||||
|
||||
# 策略2:如果所有选择器都失败,尝试提取页面主要文本内容
|
||||
logger.info("所有选择器失败,尝试提取页面主要文本内容...")
|
||||
try:
|
||||
# 获取页面body文本
|
||||
body_element = await self.page.query_selector('body')
|
||||
if body_element:
|
||||
full_text = (await body_element.text_content()).strip()
|
||||
# 提取前500个字符作为制作人发言
|
||||
if len(full_text) > 100:
|
||||
maker_statement = full_text[:500]
|
||||
logger.info(f"提取页面主要文本内容: {maker_statement[:200]}...")
|
||||
|
||||
# 提取完成后返回原始页面
|
||||
logger.info("提取完成,正在返回原始产品页面...")
|
||||
await self.page.goto(original_url, wait_until="domcontentloaded")
|
||||
logger.success(f"已成功返回原始页面: {original_url}")
|
||||
|
||||
return maker_statement
|
||||
except Exception as e:
|
||||
logger.error(f"提取页面主要文本内容失败: {e}")
|
||||
|
||||
# 策略3:如果仍然失败,记录页面截图以便调试
|
||||
logger.warning("所有提取策略都失败,保存截图用于调试...")
|
||||
try:
|
||||
screenshot_path = "modal_debug_screenshot.png"
|
||||
await self.page.screenshot(path=screenshot_path, full_page=True)
|
||||
logger.info(f"模态窗口截图已保存到: {screenshot_path}")
|
||||
except Exception as e:
|
||||
logger.error(f"保存截图失败: {e}")
|
||||
|
||||
# 即使未找到元素,也返回原始页面
|
||||
logger.info("正在返回原始产品页面...")
|
||||
await self.page.goto(original_url, wait_until="domcontentloaded")
|
||||
logger.success(f"已成功返回原始页面: {original_url}")
|
||||
|
||||
return ""
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"在当前窗口打开制作人链接失败: {e}")
|
||||
|
||||
# 保存当前页面截图用于调试
|
||||
try:
|
||||
debug_screenshot = "debug_maker_link_failure.png"
|
||||
await self.page.screenshot(path=debug_screenshot, full_page=True)
|
||||
logger.info(f"错误调试截图已保存到: {debug_screenshot}")
|
||||
except Exception as screenshot_error:
|
||||
logger.error(f"保存调试截图失败: {screenshot_error}")
|
||||
|
||||
# 发生异常时也尝试返回原始页面
|
||||
try:
|
||||
logger.info("发生异常,尝试返回原始产品页面...")
|
||||
await self.page.goto(original_url, wait_until="domcontentloaded")
|
||||
logger.success(f"已成功返回原始页面: {original_url}")
|
||||
except Exception as return_error:
|
||||
logger.error(f"返回原始页面失败: {return_error}")
|
||||
|
||||
return ""
|
||||
|
||||
async def _extract_maker_statement_direct_open(self, maker_link, maker_text):
|
||||
"""备用方法:直接在新窗口中打开链接"""
|
||||
try:
|
||||
logger.info("使用备用方法:直接在新窗口中打开链接...")
|
||||
# 创建新页面
|
||||
new_page = await self.browser.new_page()
|
||||
|
||||
# 导航到制作人页面
|
||||
await new_page.goto(maker_link, wait_until="domcontentloaded", timeout=30000)
|
||||
|
||||
# 等待页面加载
|
||||
await new_page.wait_for_timeout(15000)
|
||||
logger.info("页面加载等待完成,开始提取内容...")
|
||||
|
||||
# 抓取第一个section的tag
|
||||
await self.record_dom_selection('section', "备用方法-新窗口第一个section标签")
|
||||
first_section = await new_page.query_selector('section')
|
||||
if first_section:
|
||||
logger.success("找到第一个section标签")
|
||||
|
||||
# 在section下面找一个没有任何class的div标签
|
||||
await self.record_dom_selection('div:not([class])', "备用方法-section下无class的div标签")
|
||||
div_without_class = await first_section.query_selector('div:not([class])')
|
||||
if div_without_class:
|
||||
logger.success("找到无class的div标签")
|
||||
|
||||
# 提取div及其子标签的所有文本内容
|
||||
maker_statement = await div_without_class.inner_text()
|
||||
result = maker_statement.strip()
|
||||
|
||||
logger.info(f"制作人发言(新窗口): {result[:2000]}...")
|
||||
else:
|
||||
logger.warning("未找到无class的div标签")
|
||||
# 回退到提取section的文本内容
|
||||
section_text = await first_section.inner_text()
|
||||
result = section_text.strip()
|
||||
logger.info(f"制作人发言(回退section): {result[:200]}...")
|
||||
else:
|
||||
logger.warning("未找到section标签")
|
||||
# 回退到原始a标签文本
|
||||
result = maker_text
|
||||
logger.info(f"制作人发言(回退a标签): {maker_text[:200]}...")
|
||||
|
||||
# 添加充分延迟,确保内容完全加载
|
||||
logger.info("等待内容完全稳定...")
|
||||
await new_page.wait_for_timeout(3000)
|
||||
|
||||
# 关闭新页面
|
||||
await new_page.close()
|
||||
logger.info("新窗口已关闭")
|
||||
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"备用方法也失败: {e}")
|
||||
# 如果备用方法也失败,回退到原始a标签文本
|
||||
return maker_text
|
||||
|
||||
async def extract_product_info(self):
|
||||
"""提取产品信息"""
|
||||
if not self.page:
|
||||
logger.error("页面未初始化")
|
||||
return None
|
||||
|
||||
try:
|
||||
product_info = {}
|
||||
|
||||
# 提取产品名称(XPath: //h1)
|
||||
logger.info("正在提取产品名称...")
|
||||
try:
|
||||
await self.record_dom_selection("//h1", "产品名称")
|
||||
name_element = await self.page.query_selector("xpath=//h1")
|
||||
if name_element:
|
||||
product_info["name"] = (await name_element.text_content()).strip()
|
||||
logger.info(f"产品名称: {product_info['name']}")
|
||||
else:
|
||||
logger.warning("未找到XPath为//h1的元素")
|
||||
except Exception as e:
|
||||
logger.error(f"提取产品名称失败: {e}")
|
||||
|
||||
# 提取产品简介(XPath: //*[@class=\"relative text-16 font-normal text-gray-700\"]//div)
|
||||
logger.info("正在提取产品简介...")
|
||||
try:
|
||||
await self.record_dom_selection('//*[@class="relative text-16 font-normal text-gray-700"]//div', "产品简介")
|
||||
intro_element = await self.page.query_selector('xpath=//*[@class="relative text-16 font-normal text-gray-700"]//div')
|
||||
if intro_element:
|
||||
product_info["introduction"] = (await intro_element.text_content()).strip()
|
||||
logger.info(f"产品简介: {product_info['introduction'][:200]}...")
|
||||
else:
|
||||
logger.warning("未找到XPath为//*[@class=\"relative text-16 font-normal text-gray-700\"]//div的元素")
|
||||
except Exception as e:
|
||||
logger.error(f"提取产品简介失败: {e}")
|
||||
|
||||
# 提取用户数(XPath: //*[@class=\"flex flex-row gap-2\"]//div/div[2]/span/p)
|
||||
logger.info("正在提取用户数...")
|
||||
try:
|
||||
await self.record_dom_selection('//*[@class="flex flex-row gap-2"]//div/div[2]/span/p', "用户数")
|
||||
user_count_element = await self.page.query_selector('xpath=//*[@class="flex flex-row gap-2"]//div/div[2]/span/p')
|
||||
if user_count_element:
|
||||
product_info["user_count"] = (await user_count_element.text_content()).strip()
|
||||
logger.info(f"用户数: {product_info['user_count']}")
|
||||
else:
|
||||
logger.warning("未找到XPath为//*[@class=\"flex flex-row gap-2\"]//div/div[2]/span/p的元素")
|
||||
except Exception as e:
|
||||
logger.error(f"提取用户数失败: {e}")
|
||||
|
||||
# 提取制作人发言链接(XPath: //span[contains(@class, \"absolute\")]的父级a标签)
|
||||
logger.info("正在提取制作人发言链接...")
|
||||
try:
|
||||
# 增加显性等待,等待页面元素加载完成
|
||||
logger.info("等待页面元素加载...")
|
||||
await self.page.wait_for_timeout(20000) # 等待20秒
|
||||
|
||||
# 先找到包含class="absolute"的span元素
|
||||
await self.record_dom_selection('//span[contains(@class, "absolute")]', "制作人span标签")
|
||||
span_element = await self.page.query_selector('xpath=//span[contains(@class, "absolute")]')
|
||||
if span_element:
|
||||
# 找到span元素的父级a标签
|
||||
await self.record_dom_selection('//span[contains(@class, "absolute")]/parent::a', "制作人链接")
|
||||
|
||||
# 使用更可靠的方法获取父级a标签
|
||||
a_element = await span_element.evaluate_handle('(element) => element.closest("a")')
|
||||
|
||||
# 检查a_element是否为有效的元素句柄
|
||||
if a_element:
|
||||
# 提取a标签的文本内容
|
||||
maker_text = (await a_element.text_content()).strip()
|
||||
# 提取a标签的href属性(超链接)
|
||||
maker_link = await a_element.get_attribute('href')
|
||||
|
||||
# 拼凑完整的URL
|
||||
if maker_link:
|
||||
if not maker_link.startswith('http'):
|
||||
# 如果是相对路径,拼凑为完整URL
|
||||
base_url = "https://www.producthunt.com"
|
||||
if maker_link.startswith('/'):
|
||||
maker_link = base_url + maker_link
|
||||
else:
|
||||
maker_link = base_url + '/' + maker_link
|
||||
|
||||
# 验证URL是否有效(不能只是根路径)
|
||||
if maker_link == "https://www.producthunt.com/" or maker_link == "https://www.producthunt.com":
|
||||
logger.warning(f"制作人链接无效,跳过提取: {maker_link}")
|
||||
product_info["maker_link"] = ""
|
||||
product_info["maker_statement"] = ""
|
||||
else:
|
||||
product_info["maker_link"] = maker_link
|
||||
logger.info(f"制作人链接: {maker_link}")
|
||||
|
||||
# 调用子函数在当前窗口中提取制作人发言
|
||||
product_info["maker_statement"] = await self.extract_maker_statement_from_current_window(maker_link, maker_text)
|
||||
else:
|
||||
logger.warning("未获取到制作人链接")
|
||||
product_info["maker_link"] = ""
|
||||
product_info["maker_statement"] = ""
|
||||
else:
|
||||
logger.warning("未找到制作人链接的a标签")
|
||||
else:
|
||||
logger.warning("未找到XPath为//span[contains(@class, \"absolute\")]的元素")
|
||||
except Exception as e:
|
||||
logger.error(f"提取制作人发言链接失败: {e}")
|
||||
|
||||
# 保存到临时文件
|
||||
temp_file_path = "temp_product_info.txt"
|
||||
with open(temp_file_path, "w", encoding="utf-8") as f:
|
||||
f.write("=== Product Hunt 产品信息 ===\n\n")
|
||||
f.write(f"产品名称: {product_info.get('name', '未获取')}\n\n")
|
||||
f.write(f"产品简介: {product_info.get('introduction', '未获取')}\n\n")
|
||||
f.write(f"制作人发言: {product_info.get('maker_statement', '未获取')}\n\n")
|
||||
f.write(f"用户数: {product_info.get('user_count', '未获取')}\n\n")
|
||||
f.write(f"提取时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
|
||||
|
||||
logger.info(f"产品信息已保存到临时文件: {temp_file_path}")
|
||||
|
||||
# 截取页面截图
|
||||
screenshot_path = "product_screenshot.png"
|
||||
await self.page.screenshot(path=screenshot_path, full_page=True)
|
||||
logger.info(f"页面截图已保存到: {screenshot_path}")
|
||||
|
||||
return product_info
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"提取产品信息失败: {e}")
|
||||
return None
|
||||
|
||||
async def close(self):
|
||||
"""关闭连接"""
|
||||
if self.browser:
|
||||
await self.browser.close()
|
||||
logger.info("浏览器连接已关闭")
|
||||
|
||||
if hasattr(self, 'playwright') and self.playwright:
|
||||
await self.playwright.stop()
|
||||
logger.info("Playwright实例已关闭")
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
logger.info("开始ProductHunt数据抓取任务")
|
||||
|
||||
# 目标URL
|
||||
target_url = "https://www.producthunt.com/products/palettebrain"
|
||||
|
||||
# 创建抓取器实例
|
||||
scraper = ProductHuntScraper(debug_port=9222)
|
||||
|
||||
try:
|
||||
# 连接到Chrome
|
||||
if not await scraper.connect_to_existing_chrome():
|
||||
logger.error("无法连接到Chrome,请确保Chrome已启动并启用远程调试")
|
||||
return
|
||||
|
||||
# 导航到目标页面
|
||||
if not await scraper.navigate_to_producthunt(target_url):
|
||||
logger.error("页面访问失败")
|
||||
return
|
||||
|
||||
# 提取产品信息
|
||||
product_info = await scraper.extract_product_info()
|
||||
|
||||
if product_info:
|
||||
logger.success("产品信息提取完成")
|
||||
# 保存产品信息到JSON文件
|
||||
import json
|
||||
with open("product_info.json", "w", encoding="utf-8") as f:
|
||||
json.dump(product_info, f, ensure_ascii=False, indent=2)
|
||||
logger.info("产品信息已保存到 product_info.json")
|
||||
|
||||
# 保存点击和DOM选取行为记录
|
||||
await scraper.save_behavior_records()
|
||||
logger.info("行为记录已保存到 behavior_records.json")
|
||||
else:
|
||||
logger.warning("未能提取到产品信息")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"执行过程中发生错误: {e}")
|
||||
|
||||
finally:
|
||||
# 关闭连接
|
||||
await scraper.close()
|
||||
logger.info("任务完成")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
BIN
product/product.db
Normal file
BIN
product/product.db
Normal file
Binary file not shown.
BIN
product/products.db
Normal file
BIN
product/products.db
Normal file
Binary file not shown.
127
product/run_system.py
Normal file
127
product/run_system.py
Normal file
@@ -0,0 +1,127 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
全功能产品系统运行脚本
|
||||
提供简化的命令行界面
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import os
|
||||
import sys
|
||||
from loguru import logger
|
||||
|
||||
# 导入主系统
|
||||
from integrated_product_system import IntegratedProductSystem
|
||||
from config import DATABASE_CONFIG, CHROME_CONFIG, AI_CONFIG, SCRAPING_CONFIG, LOGGING_CONFIG, ANALYSIS_CONFIG
|
||||
|
||||
|
||||
def setup_logging(log_file=None, log_level="INFO"):
|
||||
"""设置日志配置"""
|
||||
if log_file is None:
|
||||
log_file = LOGGING_CONFIG['log_file']
|
||||
|
||||
logger.remove()
|
||||
logger.add(sys.stderr, level=log_level, format=LOGGING_CONFIG['log_format'])
|
||||
logger.add(log_file, level=log_level, rotation=LOGGING_CONFIG['log_rotation'])
|
||||
|
||||
logger.info("日志系统初始化完成")
|
||||
|
||||
|
||||
def print_system_info():
|
||||
"""打印系统信息"""
|
||||
logger.info("=== 全功能产品抓取与分析系统 ===")
|
||||
logger.info(f"数据库路径: {DATABASE_CONFIG['product_db_path']}")
|
||||
logger.info(f"Chrome调试端口: {CHROME_CONFIG['debug_port']}")
|
||||
logger.info(f"AI模型: {AI_CONFIG['model']}")
|
||||
logger.info(f"API地址: {AI_CONFIG['api_url']}")
|
||||
logger.info("=" * 40)
|
||||
|
||||
|
||||
async def run_scraping_mode(args):
|
||||
"""运行抓取模式"""
|
||||
logger.info("运行抓取模式...")
|
||||
|
||||
system = IntegratedProductSystem(
|
||||
tophub_db_path=args.tophub_db or DATABASE_CONFIG['tophub_db_path'],
|
||||
product_db_path=args.product_db or DATABASE_CONFIG['product_db_path'],
|
||||
debug_port=args.debug_port or CHROME_CONFIG['debug_port'],
|
||||
limit=args.limit or SCRAPING_CONFIG['default_limit'],
|
||||
skip_duplicates=args.skip_duplicates if hasattr(args, 'skip_duplicates') else SCRAPING_CONFIG['skip_duplicates']
|
||||
)
|
||||
|
||||
# 初始化数据库
|
||||
system.init_database()
|
||||
|
||||
# 运行抓取
|
||||
await system.run_scraping(urls=args.urls)
|
||||
|
||||
|
||||
async def run_analysis_mode(args):
|
||||
"""运行分析模式"""
|
||||
logger.info("运行分析模式...")
|
||||
|
||||
system = IntegratedProductSystem(
|
||||
product_db_path=args.product_db or DATABASE_CONFIG['product_db_path']
|
||||
)
|
||||
|
||||
# 初始化数据库
|
||||
system.init_database()
|
||||
|
||||
# 运行分析
|
||||
system.analyze_products(max_products=args.max_products)
|
||||
|
||||
|
||||
async def run_full_mode(args):
|
||||
"""运行完整模式(抓取+分析)"""
|
||||
logger.info("运行完整模式(抓取+分析)...")
|
||||
|
||||
system = IntegratedProductSystem(
|
||||
tophub_db_path=args.tophub_db or DATABASE_CONFIG['tophub_db_path'],
|
||||
product_db_path=args.product_db or DATABASE_CONFIG['product_db_path'],
|
||||
debug_port=args.debug_port or CHROME_CONFIG['debug_port'],
|
||||
limit=args.limit or SCRAPING_CONFIG['default_limit'],
|
||||
skip_duplicates=args.skip_duplicates if hasattr(args, 'skip_duplicates') else SCRAPING_CONFIG['skip_duplicates']
|
||||
)
|
||||
|
||||
# 运行完整工作流程
|
||||
system.run_full_workflow(max_products=args.max_products)
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
parser = argparse.ArgumentParser(description="全功能产品抓取与分析系统")
|
||||
|
||||
# 通用参数
|
||||
parser.add_argument("--mode", choices=["scraping", "analysis", "full"], default="full",
|
||||
help="运行模式: scraping(仅抓取), analysis(仅分析), full(抓取+分析)")
|
||||
parser.add_argument("--tophub-db", help="tophub数据库路径")
|
||||
parser.add_argument("--product-db", help="产品数据库路径")
|
||||
parser.add_argument("--debug-port", type=int, help="Chrome调试端口")
|
||||
parser.add_argument("--limit", type=int, help="抓取链接数量限制")
|
||||
parser.add_argument("--max-products", type=int, help="最大分析产品数量")
|
||||
parser.add_argument("--log-file", help="日志文件路径")
|
||||
parser.add_argument("--log-level", choices=["DEBUG", "INFO", "WARNING", "ERROR"],
|
||||
default="INFO", help="日志级别")
|
||||
parser.add_argument("--no-skip-duplicates", action="store_true", help="不跳过重复URL")
|
||||
parser.add_argument("--urls", nargs="+", help="指定要抓取的URL列表")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 设置日志
|
||||
setup_logging(args.log_file, args.log_level)
|
||||
|
||||
# 打印系统信息
|
||||
print_system_info()
|
||||
|
||||
# 根据模式运行
|
||||
if args.mode == "scraping":
|
||||
asyncio.run(run_scraping_mode(args))
|
||||
elif args.mode == "analysis":
|
||||
asyncio.run(run_analysis_mode(args))
|
||||
else: # full mode
|
||||
asyncio.run(run_full_mode(args))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
757
product/sqlite_viewer.py
Normal file
757
product/sqlite_viewer.py
Normal file
@@ -0,0 +1,757 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
SQLite数据库查看器 - 基于PySide6
|
||||
功能:打开product目录下的product.db的sqlite文件,显示表和数据的界面
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import sqlite3
|
||||
from loguru import logger
|
||||
|
||||
from PySide6.QtWidgets import (QApplication, QMainWindow, QVBoxLayout, QHBoxLayout,
|
||||
QWidget, QPushButton, QTableWidget, QTableWidgetItem,
|
||||
QListWidget, QListWidgetItem, QSplitter, QFileDialog,
|
||||
QLabel, QStatusBar, QMessageBox, QHeaderView, QComboBox,
|
||||
QLineEdit, QGroupBox, QTextEdit, QStyledItemDelegate, QMenu,
|
||||
QInputDialog)
|
||||
from PySide6.QtCore import Qt, QSize
|
||||
from PySide6.QtGui import QAction, QFontMetrics
|
||||
|
||||
|
||||
class MultiLineDelegate(QStyledItemDelegate):
|
||||
"""多行文本委托,支持自动调整行高"""
|
||||
|
||||
def __init__(self, parent=None):
|
||||
super().__init__(parent)
|
||||
self.min_height = 30 # 最小行高
|
||||
self.max_height = 200 # 最大行高
|
||||
|
||||
def paint(self, painter, option, index):
|
||||
"""自定义绘制,支持多行文本"""
|
||||
# 保存原始选项
|
||||
opt = option
|
||||
|
||||
# 获取文本内容
|
||||
text = index.data(Qt.DisplayRole)
|
||||
if text is None:
|
||||
text = ""
|
||||
|
||||
# 设置文本换行
|
||||
text = str(text)
|
||||
|
||||
# 计算文本高度
|
||||
metrics = QFontMetrics(option.font)
|
||||
rect = option.rect
|
||||
|
||||
# 计算需要的行数
|
||||
lines = text.count('\n') + 1
|
||||
line_height = metrics.lineSpacing()
|
||||
text_height = lines * line_height + 10 # 添加一些边距
|
||||
|
||||
# 限制高度在最小和最大值之间
|
||||
if text_height < self.min_height:
|
||||
text_height = self.min_height
|
||||
elif text_height > self.max_height:
|
||||
text_height = self.max_height
|
||||
|
||||
# 调整绘制区域高度
|
||||
opt.rect.setHeight(text_height)
|
||||
|
||||
# 调用父类绘制方法
|
||||
super().paint(painter, opt, index)
|
||||
|
||||
def sizeHint(self, option, index):
|
||||
"""返回建议的单元格大小"""
|
||||
# 获取文本内容
|
||||
text = index.data(Qt.DisplayRole)
|
||||
if text is None:
|
||||
text = ""
|
||||
|
||||
text = str(text)
|
||||
|
||||
# 计算文本尺寸
|
||||
metrics = QFontMetrics(option.font)
|
||||
|
||||
# 计算行数
|
||||
lines = text.count('\n') + 1
|
||||
line_height = metrics.lineSpacing()
|
||||
text_height = lines * line_height + 10 # 添加边距
|
||||
|
||||
# 计算文本宽度(考虑换行)
|
||||
if '\n' in text:
|
||||
# 多行文本,计算最长行的宽度
|
||||
max_width = 0
|
||||
for line in text.split('\n'):
|
||||
line_width = metrics.horizontalAdvance(line) + 20
|
||||
max_width = max(max_width, line_width)
|
||||
else:
|
||||
# 单行文本
|
||||
max_width = metrics.horizontalAdvance(text) + 20
|
||||
|
||||
# 限制高度
|
||||
if text_height < self.min_height:
|
||||
text_height = self.min_height
|
||||
elif text_height > self.max_height:
|
||||
text_height = self.max_height
|
||||
|
||||
# 最小宽度设置为100像素
|
||||
max_width = max(max_width, 100)
|
||||
|
||||
return QSize(max_width, text_height)
|
||||
|
||||
|
||||
class SQLiteViewer(QMainWindow):
|
||||
"""SQLite数据库查看器主窗口"""
|
||||
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
logger.info("初始化SQLite数据库查看器")
|
||||
self.db_connection = None
|
||||
self.current_table = None
|
||||
self.init_ui()
|
||||
|
||||
def init_ui(self):
|
||||
"""初始化用户界面"""
|
||||
logger.info("设置主窗口界面")
|
||||
self.setWindowTitle("SQLite数据库查看器")
|
||||
self.setGeometry(100, 100, 1200, 800)
|
||||
|
||||
# 创建中央部件
|
||||
central_widget = QWidget()
|
||||
self.setCentralWidget(central_widget)
|
||||
|
||||
# 创建主布局
|
||||
main_layout = QVBoxLayout(central_widget)
|
||||
|
||||
# 创建顶部按钮布局
|
||||
self.create_top_buttons(main_layout)
|
||||
|
||||
# 创建筛选控件区域
|
||||
self.create_filter_section(main_layout)
|
||||
|
||||
# 创建分割器(左侧表列表,右侧数据表格)
|
||||
self.create_splitter(main_layout)
|
||||
|
||||
# 创建状态栏
|
||||
self.create_status_bar()
|
||||
|
||||
# 创建菜单栏
|
||||
self.create_menubar()
|
||||
|
||||
logger.info("界面初始化完成")
|
||||
|
||||
def create_top_buttons(self, layout):
|
||||
"""创建顶部按钮布局"""
|
||||
logger.info("创建顶部按钮")
|
||||
button_layout = QHBoxLayout()
|
||||
|
||||
# 打开数据库按钮
|
||||
self.open_button = QPushButton("打开SQLite数据库")
|
||||
self.open_button.clicked.connect(self.open_database)
|
||||
button_layout.addWidget(self.open_button)
|
||||
|
||||
# 刷新按钮
|
||||
self.refresh_button = QPushButton("刷新")
|
||||
self.refresh_button.clicked.connect(self.refresh_data)
|
||||
self.refresh_button.setEnabled(False)
|
||||
button_layout.addWidget(self.refresh_button)
|
||||
|
||||
# 数据库路径显示
|
||||
self.db_path_label = QLabel("未打开数据库")
|
||||
button_layout.addWidget(self.db_path_label)
|
||||
|
||||
button_layout.addStretch()
|
||||
layout.addLayout(button_layout)
|
||||
|
||||
def create_filter_section(self, layout):
|
||||
"""创建筛选控件区域"""
|
||||
logger.info("创建筛选控件区域")
|
||||
|
||||
# 创建筛选分组框
|
||||
filter_group = QGroupBox("数据筛选")
|
||||
filter_layout = QHBoxLayout(filter_group)
|
||||
|
||||
# 字段选择标签
|
||||
filter_layout.addWidget(QLabel("筛选字段:"))
|
||||
|
||||
# 字段选择下拉框
|
||||
self.field_combo = QComboBox()
|
||||
self.field_combo.setMinimumWidth(150)
|
||||
filter_layout.addWidget(self.field_combo)
|
||||
|
||||
# 筛选条件标签
|
||||
filter_layout.addWidget(QLabel("筛选条件:"))
|
||||
|
||||
# 筛选条件输入框
|
||||
self.filter_input = QLineEdit()
|
||||
self.filter_input.setPlaceholderText("输入筛选条件,如:<75 或 name='test' 或 created_at>'2024-01-01'")
|
||||
self.filter_input.setMinimumWidth(300)
|
||||
filter_layout.addWidget(self.filter_input)
|
||||
|
||||
# 筛选按钮
|
||||
self.filter_button = QPushButton("筛选")
|
||||
self.filter_button.clicked.connect(self.apply_filter)
|
||||
filter_layout.addWidget(self.filter_button)
|
||||
|
||||
# 清除筛选按钮
|
||||
self.clear_filter_button = QPushButton("清除筛选")
|
||||
self.clear_filter_button.clicked.connect(self.clear_filter)
|
||||
self.clear_filter_button.setEnabled(False)
|
||||
filter_layout.addWidget(self.clear_filter_button)
|
||||
|
||||
filter_layout.addStretch()
|
||||
|
||||
# 初始状态下禁用筛选控件
|
||||
self.field_combo.setEnabled(False)
|
||||
self.filter_input.setEnabled(False)
|
||||
self.filter_button.setEnabled(False)
|
||||
|
||||
layout.addWidget(filter_group)
|
||||
|
||||
def create_splitter(self, layout):
|
||||
"""创建分割器界面"""
|
||||
logger.info("创建分割器界面")
|
||||
splitter = QSplitter(Qt.Horizontal)
|
||||
|
||||
# 左侧:表列表
|
||||
left_widget = QWidget()
|
||||
left_layout = QVBoxLayout(left_widget)
|
||||
|
||||
left_layout.addWidget(QLabel("数据库表列表:"))
|
||||
self.table_list = QListWidget()
|
||||
self.table_list.itemClicked.connect(self.on_table_selected)
|
||||
left_layout.addWidget(self.table_list)
|
||||
|
||||
# 右侧:数据表格
|
||||
right_widget = QWidget()
|
||||
right_layout = QVBoxLayout(right_widget)
|
||||
|
||||
right_layout.addWidget(QLabel("表数据:"))
|
||||
self.data_table = QTableWidget()
|
||||
self.data_table.setAlternatingRowColors(True)
|
||||
|
||||
# 设置表格支持多行内容和可调整列宽
|
||||
self.data_table.setItemDelegate(MultiLineDelegate(self.data_table))
|
||||
self.data_table.setWordWrap(True) # 启用自动换行
|
||||
self.data_table.setTextElideMode(Qt.ElideNone) # 不省略文本
|
||||
|
||||
# 设置列头支持拖拽调整大小
|
||||
header = self.data_table.horizontalHeader()
|
||||
header.setSectionsMovable(True) # 允许移动列
|
||||
header.setStretchLastSection(False) # 不自动拉伸最后一列
|
||||
|
||||
# 设置行头自动调整高度
|
||||
self.data_table.verticalHeader().setSectionResizeMode(QHeaderView.ResizeToContents)
|
||||
|
||||
# 启用行高调整功能 - 允许用户手动拖拽调整行高
|
||||
self.data_table.verticalHeader().setSectionsMovable(False) # 行不允许移动
|
||||
self.data_table.verticalHeader().setSectionResizeMode(QHeaderView.Interactive) # 允许手动调整行高
|
||||
|
||||
# 添加右键菜单支持
|
||||
self.data_table.setContextMenuPolicy(Qt.CustomContextMenu)
|
||||
self.data_table.customContextMenuRequested.connect(self.show_table_context_menu)
|
||||
|
||||
right_layout.addWidget(self.data_table)
|
||||
|
||||
splitter.addWidget(left_widget)
|
||||
splitter.addWidget(right_widget)
|
||||
splitter.setSizes([300, 900])
|
||||
|
||||
layout.addWidget(splitter)
|
||||
|
||||
def create_status_bar(self):
|
||||
"""创建状态栏"""
|
||||
logger.info("创建状态栏")
|
||||
self.status_bar = QStatusBar()
|
||||
self.setStatusBar(self.status_bar)
|
||||
self.status_bar.showMessage("就绪")
|
||||
|
||||
def create_menubar(self):
|
||||
"""创建菜单栏"""
|
||||
logger.info("创建菜单栏")
|
||||
menubar = self.menuBar()
|
||||
|
||||
# 文件菜单
|
||||
file_menu = menubar.addMenu("文件")
|
||||
|
||||
open_action = QAction("打开数据库", self)
|
||||
open_action.triggered.connect(self.open_database)
|
||||
file_menu.addAction(open_action)
|
||||
|
||||
exit_action = QAction("退出", self)
|
||||
exit_action.triggered.connect(self.close)
|
||||
file_menu.addAction(exit_action)
|
||||
|
||||
def open_database(self):
|
||||
"""打开SQLite数据库文件"""
|
||||
logger.info("打开数据库文件对话框")
|
||||
|
||||
# 默认打开product目录下的product.db
|
||||
default_path = os.path.join('product', 'product.db')
|
||||
if os.path.exists(default_path):
|
||||
file_path, _ = QFileDialog.getOpenFileName(
|
||||
self, "打开SQLite数据库", default_path, "SQLite数据库文件 (*.db *.sqlite *.sqlite3)"
|
||||
)
|
||||
else:
|
||||
file_path, _ = QFileDialog.getOpenFileName(
|
||||
self, "打开SQLite数据库", "", "SQLite数据库文件 (*.db *.sqlite *.sqlite3)"
|
||||
)
|
||||
|
||||
if file_path:
|
||||
logger.info(f"打开数据库文件: {file_path}")
|
||||
self.connect_to_database(file_path)
|
||||
|
||||
def connect_to_database(self, file_path):
|
||||
"""连接到指定的SQLite数据库"""
|
||||
try:
|
||||
if self.db_connection:
|
||||
self.db_connection.close()
|
||||
|
||||
self.db_connection = sqlite3.connect(file_path)
|
||||
logger.info("数据库连接成功")
|
||||
|
||||
self.db_path_label.setText(f"数据库: {os.path.basename(file_path)}")
|
||||
self.status_bar.showMessage(f"已连接到数据库: {os.path.basename(file_path)}")
|
||||
self.refresh_button.setEnabled(True)
|
||||
|
||||
# 加载表列表
|
||||
self.load_table_list()
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"数据库连接失败: {e}")
|
||||
QMessageBox.critical(self, "错误", f"无法打开数据库: {e}")
|
||||
|
||||
def load_table_list(self):
|
||||
"""加载数据库表列表"""
|
||||
if not self.db_connection:
|
||||
return
|
||||
|
||||
try:
|
||||
cursor = self.db_connection.cursor()
|
||||
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
|
||||
tables = cursor.fetchall()
|
||||
|
||||
self.table_list.clear()
|
||||
for table in tables:
|
||||
item = QListWidgetItem(table[0])
|
||||
self.table_list.addItem(item)
|
||||
|
||||
logger.info(f"加载了 {len(tables)} 个表")
|
||||
self.status_bar.showMessage(f"已加载 {len(tables)} 个表")
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"加载表列表失败: {e}")
|
||||
QMessageBox.critical(self, "错误", f"加载表列表失败: {e}")
|
||||
|
||||
def on_table_selected(self, item):
|
||||
"""当表被选中时加载表数据和字段列表"""
|
||||
table_name = item.text()
|
||||
logger.info(f"选中表: {table_name}")
|
||||
self.current_table = table_name
|
||||
self.load_table_data(table_name)
|
||||
self.update_field_combo(table_name)
|
||||
|
||||
def load_table_data(self, table_name):
|
||||
"""加载指定表的数据"""
|
||||
if not self.db_connection:
|
||||
return
|
||||
|
||||
try:
|
||||
cursor = self.db_connection.cursor()
|
||||
|
||||
# 获取表结构
|
||||
cursor.execute(f"PRAGMA table_info({table_name})")
|
||||
columns = cursor.fetchall()
|
||||
column_names = [col[1] for col in columns]
|
||||
|
||||
# 获取数据
|
||||
cursor.execute(f"SELECT * FROM {table_name}")
|
||||
data = cursor.fetchall()
|
||||
|
||||
# 设置表格
|
||||
self.data_table.setRowCount(len(data))
|
||||
self.data_table.setColumnCount(len(column_names))
|
||||
self.data_table.setHorizontalHeaderLabels(column_names)
|
||||
|
||||
# 填充数据
|
||||
for row_idx, row_data in enumerate(data):
|
||||
for col_idx, cell_data in enumerate(row_data):
|
||||
# 处理None值和格式化数据
|
||||
if cell_data is None:
|
||||
display_text = ""
|
||||
elif isinstance(cell_data, (int, float)):
|
||||
# 数字类型保持原样,但转换为字符串
|
||||
display_text = str(cell_data)
|
||||
else:
|
||||
# 文本类型,保留原始格式,包括换行符
|
||||
display_text = str(cell_data)
|
||||
|
||||
item = QTableWidgetItem(display_text)
|
||||
item.setToolTip(display_text) # 添加悬停提示
|
||||
self.data_table.setItem(row_idx, col_idx, item)
|
||||
|
||||
# 调整列宽 - 使用Interactive模式让用户可以手动调整
|
||||
header = self.data_table.horizontalHeader()
|
||||
header.setSectionResizeMode(QHeaderView.Interactive)
|
||||
|
||||
# 设置初始列宽为内容宽度,但有最大宽度限制
|
||||
for col in range(len(column_names)):
|
||||
# 计算该列内容的最大宽度
|
||||
max_width = 0
|
||||
for row in range(min(100, len(data))): # 只检查前100行,避免性能问题
|
||||
item = self.data_table.item(row, col)
|
||||
if item and item.text():
|
||||
text_width = self.data_table.fontMetrics().horizontalAdvance(item.text()) + 20
|
||||
max_width = max(max_width, text_width)
|
||||
|
||||
# 设置列宽,最小100像素,最大400像素
|
||||
column_width = min(max(max_width, 100), 400)
|
||||
self.data_table.setColumnWidth(col, column_width)
|
||||
|
||||
logger.info(f"加载表 {table_name} 数据完成,共 {len(data)} 行")
|
||||
self.status_bar.showMessage(f"表 {table_name}: {len(data)} 行数据")
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"加载表数据失败: {e}")
|
||||
QMessageBox.critical(self, "错误", f"加载表数据失败: {e}")
|
||||
|
||||
def update_field_combo(self, table_name):
|
||||
"""更新字段选择下拉框"""
|
||||
if not self.db_connection:
|
||||
return
|
||||
|
||||
try:
|
||||
cursor = self.db_connection.cursor()
|
||||
cursor.execute(f"PRAGMA table_info({table_name})")
|
||||
columns = cursor.fetchall()
|
||||
|
||||
# 清空当前字段列表
|
||||
self.field_combo.clear()
|
||||
|
||||
# 添加所有字段到下拉框
|
||||
for column in columns:
|
||||
field_name = column[1] # 字段名在第二个位置
|
||||
self.field_combo.addItem(field_name)
|
||||
|
||||
# 启用筛选控件
|
||||
self.field_combo.setEnabled(True)
|
||||
self.filter_input.setEnabled(True)
|
||||
self.filter_button.setEnabled(True)
|
||||
|
||||
logger.info(f"更新字段下拉框: {table_name}, 共 {len(columns)} 个字段")
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"获取表字段信息失败: {e}")
|
||||
QMessageBox.warning(self, "错误", f"获取表字段信息失败: {e}")
|
||||
|
||||
def apply_filter(self):
|
||||
"""应用筛选条件"""
|
||||
if not self.db_connection or not self.current_table:
|
||||
return
|
||||
|
||||
selected_field = self.field_combo.currentText()
|
||||
filter_condition = self.filter_input.text().strip()
|
||||
|
||||
if not selected_field or not filter_condition:
|
||||
QMessageBox.warning(self, "警告", "请选择筛选字段并输入筛选条件")
|
||||
return
|
||||
|
||||
try:
|
||||
cursor = self.db_connection.cursor()
|
||||
|
||||
# 检查是否为数值比较(支持 <, >, <=, >=, =, != 操作符)
|
||||
import re
|
||||
numeric_pattern = r'^\s*([><]=?|!=|=)\s*([\d.]+)\s*$'
|
||||
match = re.match(numeric_pattern, filter_condition)
|
||||
|
||||
if match:
|
||||
# 数值比较
|
||||
operator = match.group(1)
|
||||
value = match.group(2)
|
||||
query = f"SELECT * FROM {self.current_table} WHERE {selected_field} {operator} ?"
|
||||
filter_value = float(value)
|
||||
else:
|
||||
# 文本模糊匹配
|
||||
query = f"SELECT * FROM {self.current_table} WHERE {selected_field} LIKE ?"
|
||||
filter_value = f"%{filter_condition}%"
|
||||
|
||||
# 执行查询
|
||||
cursor.execute(query, (filter_value,))
|
||||
data = cursor.fetchall()
|
||||
|
||||
# 获取表结构
|
||||
cursor.execute(f"PRAGMA table_info({self.current_table})")
|
||||
columns = cursor.fetchall()
|
||||
column_names = [col[1] for col in columns]
|
||||
|
||||
# 更新表格显示
|
||||
self.data_table.setRowCount(len(data))
|
||||
self.data_table.setColumnCount(len(column_names))
|
||||
self.data_table.setHorizontalHeaderLabels(column_names)
|
||||
|
||||
# 填充筛选后的数据
|
||||
for row_idx, row_data in enumerate(data):
|
||||
for col_idx, cell_data in enumerate(row_data):
|
||||
# 处理None值和格式化数据
|
||||
if cell_data is None:
|
||||
display_text = ""
|
||||
elif isinstance(cell_data, (int, float)):
|
||||
# 数字类型保持原样,但转换为字符串
|
||||
display_text = str(cell_data)
|
||||
else:
|
||||
# 文本类型,保留原始格式,包括换行符
|
||||
display_text = str(cell_data)
|
||||
|
||||
item = QTableWidgetItem(display_text)
|
||||
item.setToolTip(display_text) # 添加悬停提示
|
||||
self.data_table.setItem(row_idx, col_idx, item)
|
||||
|
||||
# 调整列宽 - 使用Interactive模式让用户可以手动调整
|
||||
header = self.data_table.horizontalHeader()
|
||||
header.setSectionResizeMode(QHeaderView.Interactive)
|
||||
|
||||
# 设置初始列宽为内容宽度,但有最大宽度限制
|
||||
for col in range(len(column_names)):
|
||||
# 计算该列内容的最大宽度
|
||||
max_width = 0
|
||||
for row in range(min(100, len(data))): # 只检查前100行,避免性能问题
|
||||
item = self.data_table.item(row, col)
|
||||
if item and item.text():
|
||||
text_width = self.data_table.fontMetrics().horizontalAdvance(item.text()) + 20
|
||||
max_width = max(max_width, text_width)
|
||||
|
||||
# 设置列宽,最小100像素,最大400像素
|
||||
column_width = min(max(max_width, 100), 400)
|
||||
self.data_table.setColumnWidth(col, column_width)
|
||||
|
||||
# 启用清除筛选按钮
|
||||
self.clear_filter_button.setEnabled(True)
|
||||
|
||||
if match:
|
||||
logger.info(f"应用数值筛选条件: {selected_field} {operator} {value}, 匹配到 {len(data)} 行数据")
|
||||
self.status_bar.showMessage(f"筛选结果: {len(data)} 行数据 (条件: {selected_field} {operator} {value})")
|
||||
else:
|
||||
logger.info(f"应用文本筛选条件: {selected_field} LIKE '%{filter_condition}%', 匹配到 {len(data)} 行数据")
|
||||
self.status_bar.showMessage(f"筛选结果: {len(data)} 行数据 (条件: {selected_field} 包含 '{filter_condition}')")
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"筛选数据失败: {e}")
|
||||
QMessageBox.critical(self, "错误", f"筛选数据失败: {e}")
|
||||
|
||||
def clear_filter(self):
|
||||
"""清除筛选条件,显示所有数据"""
|
||||
if not self.current_table:
|
||||
return
|
||||
|
||||
try:
|
||||
# 重新加载完整数据
|
||||
self.load_table_data(self.current_table)
|
||||
|
||||
# 清空筛选条件
|
||||
self.filter_input.clear()
|
||||
|
||||
# 禁用清除筛选按钮
|
||||
self.clear_filter_button.setEnabled(False)
|
||||
|
||||
logger.info("清除筛选条件,显示所有数据")
|
||||
self.status_bar.showMessage("已清除筛选条件,显示所有数据")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"清除筛选失败: {e}")
|
||||
QMessageBox.critical(self, "错误", f"清除筛选失败: {e}")
|
||||
|
||||
def refresh_data(self):
|
||||
"""刷新当前数据"""
|
||||
logger.info("刷新数据")
|
||||
if self.current_table:
|
||||
self.load_table_data(self.current_table)
|
||||
else:
|
||||
self.load_table_list()
|
||||
|
||||
def show_table_context_menu(self, position):
|
||||
"""显示表格右键菜单"""
|
||||
menu = QMenu()
|
||||
|
||||
# 添加菜单项
|
||||
auto_resize_action = menu.addAction("自动调整列宽")
|
||||
auto_resize_rows_action = menu.addAction("自动调整行高")
|
||||
|
||||
# 添加行高调整子菜单
|
||||
row_height_menu = menu.addMenu("设置行高")
|
||||
increase_height_action = row_height_menu.addAction("增加行高 (+10px)")
|
||||
decrease_height_action = row_height_menu.addAction("减少行高 (-10px)")
|
||||
reset_height_action = row_height_menu.addAction("重置行高为默认值")
|
||||
custom_height_action = row_height_menu.addAction("自定义行高...")
|
||||
|
||||
copy_action = menu.addAction("复制选中内容")
|
||||
|
||||
# 显示菜单
|
||||
action = menu.exec(self.data_table.mapToGlobal(position))
|
||||
|
||||
if action == auto_resize_action:
|
||||
self.auto_resize_columns()
|
||||
elif action == auto_resize_rows_action:
|
||||
self.auto_resize_rows()
|
||||
elif action == increase_height_action:
|
||||
self.adjust_row_height(10)
|
||||
elif action == decrease_height_action:
|
||||
self.adjust_row_height(-10)
|
||||
elif action == reset_height_action:
|
||||
self.reset_row_height()
|
||||
elif action == custom_height_action:
|
||||
self.set_custom_row_height()
|
||||
elif action == copy_action:
|
||||
self.copy_selected_content()
|
||||
|
||||
def auto_resize_columns(self):
|
||||
"""自动调整所有列宽"""
|
||||
logger.info("自动调整列宽")
|
||||
|
||||
# 遍历所有列
|
||||
for col in range(self.data_table.columnCount()):
|
||||
# 计算该列内容的最大宽度
|
||||
max_width = 0
|
||||
for row in range(min(100, self.data_table.rowCount())): # 只检查前100行
|
||||
item = self.data_table.item(row, col)
|
||||
if item and item.text():
|
||||
text_width = self.data_table.fontMetrics().horizontalAdvance(item.text()) + 20
|
||||
max_width = max(max_width, text_width)
|
||||
|
||||
# 设置列宽,最小100像素,最大500像素
|
||||
column_width = min(max(max_width, 100), 500)
|
||||
self.data_table.setColumnWidth(col, column_width)
|
||||
|
||||
self.status_bar.showMessage("已自动调整列宽")
|
||||
|
||||
def auto_resize_rows(self):
|
||||
"""自动调整所有行高"""
|
||||
logger.info("自动调整行高")
|
||||
|
||||
# 触发重新计算行高
|
||||
self.data_table.resizeRowsToContents()
|
||||
|
||||
self.status_bar.showMessage("已自动调整行高")
|
||||
|
||||
def adjust_row_height(self, delta: int):
|
||||
"""调整选中行的行高"""
|
||||
selected_items = self.data_table.selectedItems()
|
||||
if not selected_items:
|
||||
# 如果没有选中行,调整所有行
|
||||
for row in range(self.data_table.rowCount()):
|
||||
current_height = self.data_table.rowHeight(row)
|
||||
new_height = max(current_height + delta, 20) # 最小行高20像素
|
||||
self.data_table.setRowHeight(row, new_height)
|
||||
self.status_bar.showMessage(f"所有行高已调整 {delta:+d} 像素")
|
||||
else:
|
||||
# 调整选中行
|
||||
selected_rows = set(item.row() for item in selected_items)
|
||||
for row in selected_rows:
|
||||
current_height = self.data_table.rowHeight(row)
|
||||
new_height = max(current_height + delta, 20) # 最小行高20像素
|
||||
self.data_table.setRowHeight(row, new_height)
|
||||
self.status_bar.showMessage(f"已调整 {len(selected_rows)} 行的行高 {delta:+d} 像素")
|
||||
|
||||
def reset_row_height(self):
|
||||
"""重置行高为默认值"""
|
||||
logger.info("重置行高为默认值")
|
||||
|
||||
# 重置为默认行高(30像素)
|
||||
default_height = 30
|
||||
for row in range(self.data_table.rowCount()):
|
||||
self.data_table.setRowHeight(row, default_height)
|
||||
|
||||
self.status_bar.showMessage("行高已重置为默认值")
|
||||
|
||||
def set_custom_row_height(self):
|
||||
"""设置自定义行高"""
|
||||
# 获取当前选中行的行高作为默认值
|
||||
selected_items = self.data_table.selectedItems()
|
||||
if selected_items:
|
||||
current_height = self.data_table.rowHeight(selected_items[0].row())
|
||||
else:
|
||||
current_height = 30
|
||||
|
||||
# 显示输入对话框
|
||||
height, ok = QInputDialog.getInt(
|
||||
self,
|
||||
"设置行高",
|
||||
"请输入行高(像素):",
|
||||
current_height, # 默认值
|
||||
20, # 最小值
|
||||
500 # 最大值
|
||||
)
|
||||
|
||||
if ok:
|
||||
if selected_items:
|
||||
# 设置选中行
|
||||
selected_rows = set(item.row() for item in selected_items)
|
||||
for row in selected_rows:
|
||||
self.data_table.setRowHeight(row, height)
|
||||
self.status_bar.showMessage(f"已设置 {len(selected_rows)} 行的行高为 {height} 像素")
|
||||
else:
|
||||
# 设置所有行
|
||||
for row in range(self.data_table.rowCount()):
|
||||
self.data_table.setRowHeight(row, height)
|
||||
self.status_bar.showMessage(f"所有行高已设置为 {height} 像素")
|
||||
|
||||
def copy_selected_content(self):
|
||||
"""复制选中的内容"""
|
||||
selected_items = self.data_table.selectedItems()
|
||||
if not selected_items:
|
||||
return
|
||||
|
||||
# 按行列组织数据
|
||||
rows = {}
|
||||
for item in selected_items:
|
||||
row = item.row()
|
||||
col = item.column()
|
||||
if row not in rows:
|
||||
rows[row] = {}
|
||||
rows[row][col] = item.text()
|
||||
|
||||
# 构建复制的文本
|
||||
text_lines = []
|
||||
for row in sorted(rows.keys()):
|
||||
row_data = []
|
||||
for col in sorted(rows[row].keys()):
|
||||
row_data.append(rows[row][col])
|
||||
text_lines.append('\t'.join(row_data))
|
||||
|
||||
# 复制到剪贴板
|
||||
clipboard = QApplication.clipboard()
|
||||
clipboard.setText('\n'.join(text_lines))
|
||||
|
||||
self.status_bar.showMessage(f"已复制 {len(selected_items)} 个单元格的内容")
|
||||
|
||||
def closeEvent(self, event):
|
||||
"""关闭事件处理"""
|
||||
logger.info("关闭应用程序")
|
||||
if self.db_connection:
|
||||
self.db_connection.close()
|
||||
event.accept()
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
logger.info("启动SQLite数据库查看器")
|
||||
|
||||
# 配置日志
|
||||
logger.add("sqlite_viewer.log", rotation="10 MB", level="INFO")
|
||||
|
||||
app = QApplication(sys.argv)
|
||||
|
||||
# 设置应用程序信息
|
||||
app.setApplicationName("SQLite数据库查看器")
|
||||
app.setApplicationVersion("1.0.0")
|
||||
|
||||
viewer = SQLiteViewer()
|
||||
viewer.show()
|
||||
|
||||
logger.info("应用程序启动完成")
|
||||
sys.exit(app.exec())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1
product/start_chrome.bat
Normal file
1
product/start_chrome.bat
Normal file
@@ -0,0 +1 @@
|
||||
"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222 --user-data-dir="C:\temp\chrome_debug"
|
||||
670
product/templates/index.html
Normal file
670
product/templates/index.html
Normal file
@@ -0,0 +1,670 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="zh-CN">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>SQLite数据库查看器</title>
|
||||
<style>
|
||||
* {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
min-height: 100vh;
|
||||
color: #333;
|
||||
}
|
||||
|
||||
.container {
|
||||
width: 100%;
|
||||
margin: 0 auto;
|
||||
padding: 20px;
|
||||
}
|
||||
|
||||
.header {
|
||||
background: rgba(255, 255, 255, 0.95);
|
||||
backdrop-filter: blur(10px);
|
||||
border-radius: 15px;
|
||||
padding: 25px;
|
||||
margin-bottom: 25px;
|
||||
box-shadow: 0 8px 32px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
.header h1 {
|
||||
color: #2c3e50;
|
||||
font-size: 2.5em;
|
||||
margin-bottom: 10px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.controls {
|
||||
display: flex;
|
||||
gap: 20px;
|
||||
align-items: center;
|
||||
flex-wrap: wrap;
|
||||
margin-top: 20px;
|
||||
}
|
||||
|
||||
.control-group {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 8px;
|
||||
}
|
||||
|
||||
.control-group label {
|
||||
font-weight: 600;
|
||||
color: #34495e;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
select, input {
|
||||
padding: 12px 15px;
|
||||
border: 2px solid #e0e6ed;
|
||||
border-radius: 8px;
|
||||
font-size: 14px;
|
||||
transition: all 0.3s ease;
|
||||
background: white;
|
||||
}
|
||||
|
||||
select:focus, input:focus {
|
||||
outline: none;
|
||||
border-color: #667eea;
|
||||
box-shadow: 0 0 0 3px rgba(102, 126, 234, 0.1);
|
||||
}
|
||||
|
||||
.btn {
|
||||
padding: 12px 24px;
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
color: white;
|
||||
border: none;
|
||||
border-radius: 8px;
|
||||
cursor: pointer;
|
||||
font-weight: 600;
|
||||
transition: all 0.3s ease;
|
||||
text-decoration: none;
|
||||
display: inline-block;
|
||||
}
|
||||
|
||||
.btn:hover {
|
||||
transform: translateY(-2px);
|
||||
box-shadow: 0 5px 15px rgba(0, 0, 0, 0.2);
|
||||
}
|
||||
|
||||
.data-container {
|
||||
background: rgba(255, 255, 255, 0.95);
|
||||
backdrop-filter: blur(10px);
|
||||
border-radius: 15px;
|
||||
padding: 25px;
|
||||
box-shadow: 0 8px 32px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
.table-info {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
margin-bottom: 20px;
|
||||
padding: 15px;
|
||||
background: #f8f9fa;
|
||||
border-radius: 10px;
|
||||
}
|
||||
|
||||
.table-info h2 {
|
||||
color: #2c3e50;
|
||||
font-size: 1.5em;
|
||||
}
|
||||
|
||||
.stats {
|
||||
display: flex;
|
||||
gap: 20px;
|
||||
font-size: 0.9em;
|
||||
color: #7f8c8d;
|
||||
}
|
||||
|
||||
.table-wrapper {
|
||||
overflow-x: auto;
|
||||
border-radius: 10px;
|
||||
box-shadow: 0 4px 16px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
table {
|
||||
width: 100%;
|
||||
border-collapse: collapse;
|
||||
background: white;
|
||||
font-size: 14px;
|
||||
}
|
||||
|
||||
th {
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
color: white;
|
||||
padding: 15px 12px;
|
||||
text-align: left;
|
||||
font-weight: 600;
|
||||
position: sticky;
|
||||
top: 0;
|
||||
z-index: 10;
|
||||
}
|
||||
|
||||
td {
|
||||
padding: 12px;
|
||||
border-bottom: 1px solid #ecf0f1;
|
||||
vertical-align: top;
|
||||
}
|
||||
|
||||
tr:nth-child(even) {
|
||||
background-color: #f8f9fa;
|
||||
}
|
||||
|
||||
tr:hover {
|
||||
background-color: #e3f2fd;
|
||||
transition: background-color 0.3s ease;
|
||||
}
|
||||
|
||||
.multiline-cell {
|
||||
white-space: pre-wrap;
|
||||
line-height: 1.6;
|
||||
max-height: 200px;
|
||||
overflow-y: auto;
|
||||
padding: 8px;
|
||||
background: #fff3cd;
|
||||
border-radius: 6px;
|
||||
border-left: 4px solid #ffc107;
|
||||
}
|
||||
|
||||
.normal-cell {
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
max-width: 300px;
|
||||
}
|
||||
|
||||
.empty-cell {
|
||||
color: #95a5a6;
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
.pagination {
|
||||
display: flex;
|
||||
justify-content: center;
|
||||
align-items: center;
|
||||
gap: 10px;
|
||||
margin-top: 25px;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.page-info {
|
||||
color: #7f8c8d;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.page-btn {
|
||||
padding: 8px 12px;
|
||||
border: 2px solid #e0e6ed;
|
||||
background: white;
|
||||
border-radius: 6px;
|
||||
cursor: pointer;
|
||||
transition: all 0.3s ease;
|
||||
min-width: 40px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.page-btn:hover {
|
||||
border-color: #667eea;
|
||||
background: #667eea;
|
||||
color: white;
|
||||
}
|
||||
|
||||
.page-btn.active {
|
||||
background: #667eea;
|
||||
color: white;
|
||||
border-color: #667eea;
|
||||
}
|
||||
|
||||
.page-btn:disabled {
|
||||
opacity: 0.5;
|
||||
cursor: not-allowed;
|
||||
}
|
||||
|
||||
.loading {
|
||||
text-align: center;
|
||||
padding: 40px;
|
||||
color: #7f8c8d;
|
||||
font-size: 1.1em;
|
||||
}
|
||||
|
||||
.error {
|
||||
background: #f8d7da;
|
||||
color: #721c24;
|
||||
padding: 15px;
|
||||
border-radius: 8px;
|
||||
border: 1px solid #f5c6cb;
|
||||
margin: 20px 0;
|
||||
}
|
||||
|
||||
.no-data {
|
||||
text-align: center;
|
||||
padding: 40px;
|
||||
color: #7f8c8d;
|
||||
font-size: 1.1em;
|
||||
}
|
||||
|
||||
.analyze-btn {
|
||||
background: linear-gradient(135deg, #28a745, #20c997);
|
||||
color: white;
|
||||
border: none;
|
||||
padding: 8px 16px;
|
||||
border-radius: 6px;
|
||||
font-size: 1em;
|
||||
cursor: pointer;
|
||||
transition: all 0.3s ease;
|
||||
}
|
||||
|
||||
.analyze-btn:hover {
|
||||
transform: translateY(-1px);
|
||||
box-shadow: 0 4px 8px rgba(40, 167, 69, 0.3);
|
||||
}
|
||||
|
||||
.analyze-btn:disabled {
|
||||
background: #6c757d;
|
||||
cursor: not-allowed;
|
||||
transform: none;
|
||||
box-shadow: none;
|
||||
}
|
||||
|
||||
.progress-container {
|
||||
margin-top: 20px;
|
||||
padding: 15px;
|
||||
background: rgba(255, 255, 255, 0.95);
|
||||
backdrop-filter: blur(10px);
|
||||
border-radius: 12px;
|
||||
box-shadow: 0 4px 20px rgba(0, 0, 0, 0.1);
|
||||
display: none;
|
||||
}
|
||||
|
||||
.progress-bar {
|
||||
width: 100%;
|
||||
height: 20px;
|
||||
background: #e9ecef;
|
||||
border-radius: 10px;
|
||||
overflow: hidden;
|
||||
margin: 10px 0;
|
||||
}
|
||||
|
||||
.progress-fill {
|
||||
height: 100%;
|
||||
background: linear-gradient(90deg, #667eea, #764ba2);
|
||||
transition: width 0.3s ease;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
color: white;
|
||||
font-size: 0.8em;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
@media (max-width: 768px) {
|
||||
.controls {
|
||||
flex-direction: column;
|
||||
align-items: stretch;
|
||||
}
|
||||
|
||||
.table-info {
|
||||
flex-direction: column;
|
||||
gap: 15px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.stats {
|
||||
justify-content: center;
|
||||
}
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
<div class="header">
|
||||
<h1>🗄️ SQLite数据库查看器</h1>
|
||||
<div class="controls">
|
||||
<div class="control-group">
|
||||
<label for="tableSelect">选择数据表:</label>
|
||||
<select id="tableSelect">
|
||||
<option value="">加载中...</option>
|
||||
</select>
|
||||
</div>
|
||||
<div class="control-group">
|
||||
<label for="analyzeBtn">分析:</label>
|
||||
<button id="analyzeScoresBtn" class="analyze-btn">📊 分析缺失分数</button>
|
||||
</div>
|
||||
<div class="control-group">
|
||||
<label for="searchField">筛选字段:</label>
|
||||
<select id="searchField" multiple disabled style="min-height: 80px;">
|
||||
<option value="">所有文本字段</option>
|
||||
</select>
|
||||
</div>
|
||||
<div class="control-group">
|
||||
<label for="searchValue">筛选内容:</label>
|
||||
<input type="text" id="searchValue" placeholder="输入筛选内容..." disabled>
|
||||
</div>
|
||||
<button class="btn" onclick="loadData()">刷新数据</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="data-container">
|
||||
<div class="table-info">
|
||||
<h2 id="tableName">请选择数据表</h2>
|
||||
<div class="stats">
|
||||
<span id="recordCount">记录数: 0</span>
|
||||
<span id="pageInfo">第 0 页,共 0 页</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div id="dataContainer">
|
||||
<div class="no-data">请选择数据表以查看内容</div>
|
||||
</div>
|
||||
|
||||
<div id="pagination" class="pagination" style="display: none;">
|
||||
<button class="page-btn" onclick="changePage('prev')" id="prevBtn">上一页</button>
|
||||
<span class="page-info" id="pageInfoDetail"></span>
|
||||
<button class="page-btn" onclick="changePage('next')" id="nextBtn">下一页</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div id="progressSection" class="progress-container">
|
||||
<h3>📊 分数分析进度</h3>
|
||||
<div class="progress-bar">
|
||||
<div id="progressFill" class="progress-fill" style="width: 0%;">0%</div>
|
||||
</div>
|
||||
<p id="progressText">等待分析开始...</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<script>
|
||||
let currentTable = '';
|
||||
let currentPage = 1;
|
||||
let perPage = 50;
|
||||
let totalPages = 1;
|
||||
let currentData = null;
|
||||
|
||||
// 绑定事件
|
||||
document.addEventListener('DOMContentLoaded', function() {
|
||||
loadTables();
|
||||
|
||||
// 绑定事件
|
||||
document.getElementById('tableSelect').addEventListener('change', function() {
|
||||
currentTable = this.value;
|
||||
currentPage = 1;
|
||||
if (currentTable) {
|
||||
loadTableStructure();
|
||||
loadData();
|
||||
}
|
||||
});
|
||||
|
||||
// 分析缺失分数按钮事件
|
||||
document.getElementById('analyzeScoresBtn').addEventListener('click', analyzeMissingScores);
|
||||
});
|
||||
|
||||
// 分析缺失分数
|
||||
async function analyzeMissingScores() {
|
||||
const analyzeBtn = document.getElementById('analyzeScoresBtn');
|
||||
const progressSection = document.getElementById('progressSection');
|
||||
const progressFill = document.getElementById('progressFill');
|
||||
const progressText = document.getElementById('progressText');
|
||||
|
||||
try {
|
||||
// 禁用按钮
|
||||
analyzeBtn.disabled = true;
|
||||
analyzeBtn.textContent = '分析中...';
|
||||
|
||||
// 显示进度条
|
||||
progressSection.style.display = 'block';
|
||||
progressFill.style.width = '0%';
|
||||
progressFill.textContent = '0%';
|
||||
progressText.textContent = '正在启动分析任务...';
|
||||
|
||||
// 启动分析任务
|
||||
const response = await fetch('/api/analyze_missing_scores');
|
||||
const data = await response.json();
|
||||
|
||||
if (data.task_id) {
|
||||
// 定期查询任务状态
|
||||
const interval = setInterval(async () => {
|
||||
try {
|
||||
const statusResponse = await fetch(`/api/update_task_status/${data.task_id}`);
|
||||
const statusData = await statusResponse.json();
|
||||
|
||||
// 更新进度
|
||||
progressFill.style.width = `${statusData.progress}%`;
|
||||
progressFill.textContent = `${statusData.progress}%`;
|
||||
|
||||
if (statusData.status === 'running') {
|
||||
progressText.textContent = `正在分析: ${statusData.completed}/${statusData.total} 个产品`;
|
||||
} else if (statusData.status === 'completed') {
|
||||
progressText.textContent = '🎉 所有缺失分数分析完成!';
|
||||
clearInterval(interval);
|
||||
analyzeBtn.disabled = false;
|
||||
analyzeBtn.textContent = '📊 分析缺失分数';
|
||||
|
||||
// 如果当前正在查看product_analysis表,自动刷新
|
||||
if (currentTable === 'product_analysis') {
|
||||
loadData();
|
||||
}
|
||||
} else if (statusData.status === 'failed') {
|
||||
progressText.textContent = `❌ 分析失败: ${statusData.error}`;
|
||||
clearInterval(interval);
|
||||
analyzeBtn.disabled = false;
|
||||
analyzeBtn.textContent = '📊 分析缺失分数';
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('查询任务状态失败:', error);
|
||||
progressText.textContent = '查询任务状态失败';
|
||||
}
|
||||
}, 2000);
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('启动分析任务失败:', error);
|
||||
progressText.textContent = `启动分析失败: ${error.message}`;
|
||||
analyzeBtn.disabled = false;
|
||||
analyzeBtn.textContent = '📊 分析缺失分数';
|
||||
}
|
||||
}
|
||||
|
||||
document.getElementById('searchField').addEventListener('change', loadData);
|
||||
document.getElementById('searchValue').addEventListener('input', debounce(loadData, 500));
|
||||
|
||||
// 防抖函数
|
||||
function debounce(func, wait) {
|
||||
let timeout;
|
||||
return function executedFunction(...args) {
|
||||
const later = () => {
|
||||
clearTimeout(timeout);
|
||||
func(...args);
|
||||
};
|
||||
clearTimeout(timeout);
|
||||
timeout = setTimeout(later, wait);
|
||||
};
|
||||
}
|
||||
|
||||
// 加载表列表
|
||||
async function loadTables() {
|
||||
try {
|
||||
const response = await fetch('/api/tables');
|
||||
const data = await response.json();
|
||||
const select = document.getElementById('tableSelect');
|
||||
select.innerHTML = '<option value="">选择数据表...</option>';
|
||||
|
||||
data.tables.forEach(table => {
|
||||
const option = document.createElement('option');
|
||||
option.value = table;
|
||||
option.textContent = table;
|
||||
select.appendChild(option);
|
||||
});
|
||||
} catch (error) {
|
||||
console.error('加载表列表失败:', error);
|
||||
showError('加载表列表失败: ' + error.message);
|
||||
}
|
||||
}
|
||||
|
||||
// 加载表结构
|
||||
async function loadTableStructure() {
|
||||
if (!currentTable) return;
|
||||
|
||||
try {
|
||||
const response = await fetch(`/api/table/${currentTable}/structure`);
|
||||
const data = await response.json();
|
||||
const searchField = document.getElementById('searchField');
|
||||
|
||||
searchField.innerHTML = '<option value="">所有文本字段</option>';
|
||||
data.structure.forEach(field => {
|
||||
const option = document.createElement('option');
|
||||
option.value = field.name;
|
||||
option.textContent = field.name;
|
||||
searchField.appendChild(option);
|
||||
});
|
||||
|
||||
searchField.disabled = false;
|
||||
document.getElementById('searchValue').disabled = false;
|
||||
} catch (error) {
|
||||
console.error('加载表结构失败:', error);
|
||||
}
|
||||
}
|
||||
|
||||
// 加载数据
|
||||
async function loadData() {
|
||||
if (!currentTable) return;
|
||||
|
||||
const container = document.getElementById('dataContainer');
|
||||
container.innerHTML = '<div class="loading">📊 数据加载中...</div>';
|
||||
|
||||
const searchFieldSelect = document.getElementById('searchField');
|
||||
const searchValue = document.getElementById('searchValue').value;
|
||||
|
||||
try {
|
||||
let url = `/api/table/${currentTable}/data?page=${currentPage}&per_page=${perPage}`;
|
||||
if (searchValue) {
|
||||
// 获取所有选中的字段
|
||||
const selectedFields = Array.from(searchFieldSelect.selectedOptions)
|
||||
.map(option => option.value)
|
||||
.filter(value => value !== '');
|
||||
|
||||
if (selectedFields.length > 0) {
|
||||
// 如果选择了特定字段,传递所有选中的字段
|
||||
selectedFields.forEach(field => {
|
||||
url += `&search_field=${encodeURIComponent(field)}`;
|
||||
});
|
||||
} else {
|
||||
// 否则使用"all"表示所有文本字段
|
||||
url += '&search_field=all';
|
||||
}
|
||||
url += `&search_value=${encodeURIComponent(searchValue)}`;
|
||||
}
|
||||
|
||||
const response = await fetch(url);
|
||||
currentData = await response.json();
|
||||
|
||||
displayData(currentData);
|
||||
updatePagination();
|
||||
|
||||
} catch (error) {
|
||||
console.error('加载数据失败:', error);
|
||||
showError('加载数据失败: ' + error.message);
|
||||
}
|
||||
}
|
||||
|
||||
// 显示数据
|
||||
function displayData(data) {
|
||||
const container = document.getElementById('dataContainer');
|
||||
|
||||
if (!data.rows || data.rows.length === 0) {
|
||||
container.innerHTML = '<div class="no-data">📭 没有找到数据</div>';
|
||||
return;
|
||||
}
|
||||
|
||||
let html = '<div class="table-wrapper"><table><thead><tr>';
|
||||
|
||||
// 表头
|
||||
data.columns.forEach(col => {
|
||||
html += `<th>${col}</th>`;
|
||||
});
|
||||
html += '</tr></thead><tbody>';
|
||||
|
||||
// 数据行
|
||||
data.rows.forEach(row => {
|
||||
html += '<tr>';
|
||||
row.forEach((cell, index) => {
|
||||
const colName = data.columns[index];
|
||||
if (cell.type === 'multiline') {
|
||||
html += `<td><div class="multiline-cell">${escapeHtml(cell.value)}</div></td>`;
|
||||
} else if (cell.type === 'empty') {
|
||||
html += '<td><div class="empty-cell">空</div></td>';
|
||||
} else if (colName === 'product_link' && cell.value) {
|
||||
// 渲染为链接
|
||||
html += `<td><div class="normal-cell"><a href="${escapeHtml(cell.value)}" target="_blank" rel="noopener noreferrer">${escapeHtml(cell.value)}</a></div></td>`;
|
||||
} else {
|
||||
html += `<td><div class="normal-cell">${escapeHtml(cell.value)}</div></td>`;
|
||||
}
|
||||
});
|
||||
html += '</tr>';
|
||||
});
|
||||
|
||||
html += '</tbody></table></div>';
|
||||
container.innerHTML = html;
|
||||
|
||||
// 更新统计信息
|
||||
document.getElementById('tableName').textContent = `📋 ${currentTable}`;
|
||||
document.getElementById('recordCount').textContent = `记录数: ${data.total_count}`;
|
||||
document.getElementById('pageInfo').textContent = `第 ${currentPage} 页,共 ${data.total_pages} 页`;
|
||||
}
|
||||
|
||||
// 更新分页
|
||||
function updatePagination() {
|
||||
if (!currentData) return;
|
||||
|
||||
totalPages = currentData.total_pages;
|
||||
const pagination = document.getElementById('pagination');
|
||||
const prevBtn = document.getElementById('prevBtn');
|
||||
const nextBtn = document.getElementById('nextBtn');
|
||||
const pageInfo = document.getElementById('pageInfoDetail');
|
||||
|
||||
if (totalPages <= 1) {
|
||||
pagination.style.display = 'none';
|
||||
return;
|
||||
}
|
||||
|
||||
pagination.style.display = 'flex';
|
||||
|
||||
prevBtn.disabled = currentPage <= 1;
|
||||
nextBtn.disabled = currentPage >= totalPages;
|
||||
|
||||
pageInfo.textContent = `${currentPage} / ${totalPages}`;
|
||||
}
|
||||
|
||||
// 翻页
|
||||
function changePage(direction) {
|
||||
if (direction === 'prev' && currentPage > 1) {
|
||||
currentPage--;
|
||||
loadData();
|
||||
} else if (direction === 'next' && currentPage < totalPages) {
|
||||
currentPage++;
|
||||
loadData();
|
||||
}
|
||||
}
|
||||
|
||||
// HTML转义
|
||||
function escapeHtml(text) {
|
||||
const div = document.createElement('div');
|
||||
div.textContent = text;
|
||||
return div.innerHTML;
|
||||
}
|
||||
|
||||
// 显示错误
|
||||
function showError(message) {
|
||||
const container = document.getElementById('dataContainer');
|
||||
container.innerHTML = `<div class="error">❌ ${message}</div>`;
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
1138
product/web_sqlite_viewer.py
Normal file
1138
product/web_sqlite_viewer.py
Normal file
File diff suppressed because it is too large
Load Diff
BIN
product_screenshot.png
Normal file
BIN
product_screenshot.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 717 KiB |
@@ -3,4 +3,6 @@ lxml>=4.6.3
|
||||
tqdm>=4.61.2
|
||||
loguru>=0.5.3
|
||||
zhipuai>=2.1.0
|
||||
PySide6>=6.0.0
|
||||
PySide6
|
||||
selenium>=4.15.0
|
||||
playwright>=1.40.0
|
||||
60
run_viewer.py
Normal file
60
run_viewer.py
Normal file
@@ -0,0 +1,60 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
SQLite数据库查看器启动脚本
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
from loguru import logger
|
||||
|
||||
def check_dependencies():
|
||||
"""检查依赖包"""
|
||||
missing_deps = []
|
||||
|
||||
try:
|
||||
import PySide6
|
||||
except ImportError:
|
||||
missing_deps.append("PySide6")
|
||||
|
||||
try:
|
||||
import loguru
|
||||
except ImportError:
|
||||
missing_deps.append("loguru")
|
||||
|
||||
if missing_deps:
|
||||
print("❌ 缺少以下依赖包:")
|
||||
for dep in missing_deps:
|
||||
print(f" - {dep}")
|
||||
print("\n请使用以下命令安装:")
|
||||
print("pip install -r requirements_gui.txt")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
logger.info("启动SQLite数据库查看器")
|
||||
|
||||
# 检查依赖
|
||||
if not check_dependencies():
|
||||
sys.exit(1)
|
||||
|
||||
# 导入主程序
|
||||
try:
|
||||
from sqlite_viewer import main as viewer_main
|
||||
except ImportError as e:
|
||||
logger.error(f"导入主程序失败: {e}")
|
||||
print("❌ 无法导入主程序,请检查文件是否存在")
|
||||
sys.exit(1)
|
||||
|
||||
# 运行主程序
|
||||
try:
|
||||
viewer_main()
|
||||
except Exception as e:
|
||||
logger.error(f"程序运行错误: {e}")
|
||||
print(f"❌ 程序运行错误: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
11
temp_product_info.txt
Normal file
11
temp_product_info.txt
Normal file
@@ -0,0 +1,11 @@
|
||||
=== Product Hunt 产品信息 ===
|
||||
|
||||
产品名称: Greta
|
||||
|
||||
产品简介: 未获取
|
||||
|
||||
制作人发言: This is first first proposed project. If you want to support Santiago getting his project built, here are the details.https://onemillionlines.com/proj...
|
||||
|
||||
用户数: 664 followers
|
||||
|
||||
提取时间: 2026-03-08 20:40:13
|
||||
670
templates/index.html
Normal file
670
templates/index.html
Normal file
@@ -0,0 +1,670 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="zh-CN">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>SQLite数据库查看器</title>
|
||||
<style>
|
||||
* {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
min-height: 100vh;
|
||||
color: #333;
|
||||
}
|
||||
|
||||
.container {
|
||||
width: 100%;
|
||||
margin: 0 auto;
|
||||
padding: 20px;
|
||||
}
|
||||
|
||||
.header {
|
||||
background: rgba(255, 255, 255, 0.95);
|
||||
backdrop-filter: blur(10px);
|
||||
border-radius: 15px;
|
||||
padding: 25px;
|
||||
margin-bottom: 25px;
|
||||
box-shadow: 0 8px 32px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
.header h1 {
|
||||
color: #2c3e50;
|
||||
font-size: 2.5em;
|
||||
margin-bottom: 10px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.controls {
|
||||
display: flex;
|
||||
gap: 20px;
|
||||
align-items: center;
|
||||
flex-wrap: wrap;
|
||||
margin-top: 20px;
|
||||
}
|
||||
|
||||
.control-group {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 8px;
|
||||
}
|
||||
|
||||
.control-group label {
|
||||
font-weight: 600;
|
||||
color: #34495e;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
select, input {
|
||||
padding: 12px 15px;
|
||||
border: 2px solid #e0e6ed;
|
||||
border-radius: 8px;
|
||||
font-size: 14px;
|
||||
transition: all 0.3s ease;
|
||||
background: white;
|
||||
}
|
||||
|
||||
select:focus, input:focus {
|
||||
outline: none;
|
||||
border-color: #667eea;
|
||||
box-shadow: 0 0 0 3px rgba(102, 126, 234, 0.1);
|
||||
}
|
||||
|
||||
.btn {
|
||||
padding: 12px 24px;
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
color: white;
|
||||
border: none;
|
||||
border-radius: 8px;
|
||||
cursor: pointer;
|
||||
font-weight: 600;
|
||||
transition: all 0.3s ease;
|
||||
text-decoration: none;
|
||||
display: inline-block;
|
||||
}
|
||||
|
||||
.btn:hover {
|
||||
transform: translateY(-2px);
|
||||
box-shadow: 0 5px 15px rgba(0, 0, 0, 0.2);
|
||||
}
|
||||
|
||||
.data-container {
|
||||
background: rgba(255, 255, 255, 0.95);
|
||||
backdrop-filter: blur(10px);
|
||||
border-radius: 15px;
|
||||
padding: 25px;
|
||||
box-shadow: 0 8px 32px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
.table-info {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
margin-bottom: 20px;
|
||||
padding: 15px;
|
||||
background: #f8f9fa;
|
||||
border-radius: 10px;
|
||||
}
|
||||
|
||||
.table-info h2 {
|
||||
color: #2c3e50;
|
||||
font-size: 1.5em;
|
||||
}
|
||||
|
||||
.stats {
|
||||
display: flex;
|
||||
gap: 20px;
|
||||
font-size: 0.9em;
|
||||
color: #7f8c8d;
|
||||
}
|
||||
|
||||
.table-wrapper {
|
||||
overflow-x: auto;
|
||||
border-radius: 10px;
|
||||
box-shadow: 0 4px 16px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
table {
|
||||
width: 100%;
|
||||
border-collapse: collapse;
|
||||
background: white;
|
||||
font-size: 14px;
|
||||
}
|
||||
|
||||
th {
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
color: white;
|
||||
padding: 15px 12px;
|
||||
text-align: left;
|
||||
font-weight: 600;
|
||||
position: sticky;
|
||||
top: 0;
|
||||
z-index: 10;
|
||||
}
|
||||
|
||||
td {
|
||||
padding: 12px;
|
||||
border-bottom: 1px solid #ecf0f1;
|
||||
vertical-align: top;
|
||||
}
|
||||
|
||||
tr:nth-child(even) {
|
||||
background-color: #f8f9fa;
|
||||
}
|
||||
|
||||
tr:hover {
|
||||
background-color: #e3f2fd;
|
||||
transition: background-color 0.3s ease;
|
||||
}
|
||||
|
||||
.multiline-cell {
|
||||
white-space: pre-wrap;
|
||||
line-height: 1.6;
|
||||
max-height: 200px;
|
||||
overflow-y: auto;
|
||||
padding: 8px;
|
||||
background: #fff3cd;
|
||||
border-radius: 6px;
|
||||
border-left: 4px solid #ffc107;
|
||||
}
|
||||
|
||||
.normal-cell {
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
max-width: 300px;
|
||||
}
|
||||
|
||||
.empty-cell {
|
||||
color: #95a5a6;
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
.pagination {
|
||||
display: flex;
|
||||
justify-content: center;
|
||||
align-items: center;
|
||||
gap: 10px;
|
||||
margin-top: 25px;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.page-info {
|
||||
color: #7f8c8d;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.page-btn {
|
||||
padding: 8px 12px;
|
||||
border: 2px solid #e0e6ed;
|
||||
background: white;
|
||||
border-radius: 6px;
|
||||
cursor: pointer;
|
||||
transition: all 0.3s ease;
|
||||
min-width: 40px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.page-btn:hover {
|
||||
border-color: #667eea;
|
||||
background: #667eea;
|
||||
color: white;
|
||||
}
|
||||
|
||||
.page-btn.active {
|
||||
background: #667eea;
|
||||
color: white;
|
||||
border-color: #667eea;
|
||||
}
|
||||
|
||||
.page-btn:disabled {
|
||||
opacity: 0.5;
|
||||
cursor: not-allowed;
|
||||
}
|
||||
|
||||
.loading {
|
||||
text-align: center;
|
||||
padding: 40px;
|
||||
color: #7f8c8d;
|
||||
font-size: 1.1em;
|
||||
}
|
||||
|
||||
.error {
|
||||
background: #f8d7da;
|
||||
color: #721c24;
|
||||
padding: 15px;
|
||||
border-radius: 8px;
|
||||
border: 1px solid #f5c6cb;
|
||||
margin: 20px 0;
|
||||
}
|
||||
|
||||
.no-data {
|
||||
text-align: center;
|
||||
padding: 40px;
|
||||
color: #7f8c8d;
|
||||
font-size: 1.1em;
|
||||
}
|
||||
|
||||
.analyze-btn {
|
||||
background: linear-gradient(135deg, #28a745, #20c997);
|
||||
color: white;
|
||||
border: none;
|
||||
padding: 8px 16px;
|
||||
border-radius: 6px;
|
||||
font-size: 1em;
|
||||
cursor: pointer;
|
||||
transition: all 0.3s ease;
|
||||
}
|
||||
|
||||
.analyze-btn:hover {
|
||||
transform: translateY(-1px);
|
||||
box-shadow: 0 4px 8px rgba(40, 167, 69, 0.3);
|
||||
}
|
||||
|
||||
.analyze-btn:disabled {
|
||||
background: #6c757d;
|
||||
cursor: not-allowed;
|
||||
transform: none;
|
||||
box-shadow: none;
|
||||
}
|
||||
|
||||
.progress-container {
|
||||
margin-top: 20px;
|
||||
padding: 15px;
|
||||
background: rgba(255, 255, 255, 0.95);
|
||||
backdrop-filter: blur(10px);
|
||||
border-radius: 12px;
|
||||
box-shadow: 0 4px 20px rgba(0, 0, 0, 0.1);
|
||||
display: none;
|
||||
}
|
||||
|
||||
.progress-bar {
|
||||
width: 100%;
|
||||
height: 20px;
|
||||
background: #e9ecef;
|
||||
border-radius: 10px;
|
||||
overflow: hidden;
|
||||
margin: 10px 0;
|
||||
}
|
||||
|
||||
.progress-fill {
|
||||
height: 100%;
|
||||
background: linear-gradient(90deg, #667eea, #764ba2);
|
||||
transition: width 0.3s ease;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
color: white;
|
||||
font-size: 0.8em;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
@media (max-width: 768px) {
|
||||
.controls {
|
||||
flex-direction: column;
|
||||
align-items: stretch;
|
||||
}
|
||||
|
||||
.table-info {
|
||||
flex-direction: column;
|
||||
gap: 15px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.stats {
|
||||
justify-content: center;
|
||||
}
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
<div class="header">
|
||||
<h1>🗄️ SQLite数据库查看器</h1>
|
||||
<div class="controls">
|
||||
<div class="control-group">
|
||||
<label for="tableSelect">选择数据表:</label>
|
||||
<select id="tableSelect">
|
||||
<option value="">加载中...</option>
|
||||
</select>
|
||||
</div>
|
||||
<div class="control-group">
|
||||
<label for="analyzeBtn">分析:</label>
|
||||
<button id="analyzeScoresBtn" class="analyze-btn">📊 分析缺失分数</button>
|
||||
</div>
|
||||
<div class="control-group">
|
||||
<label for="searchField">筛选字段:</label>
|
||||
<select id="searchField" multiple disabled style="min-height: 80px;">
|
||||
<option value="">所有文本字段</option>
|
||||
</select>
|
||||
</div>
|
||||
<div class="control-group">
|
||||
<label for="searchValue">筛选内容:</label>
|
||||
<input type="text" id="searchValue" placeholder="输入筛选内容..." disabled>
|
||||
</div>
|
||||
<button class="btn" onclick="loadData()">刷新数据</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="data-container">
|
||||
<div class="table-info">
|
||||
<h2 id="tableName">请选择数据表</h2>
|
||||
<div class="stats">
|
||||
<span id="recordCount">记录数: 0</span>
|
||||
<span id="pageInfo">第 0 页,共 0 页</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div id="dataContainer">
|
||||
<div class="no-data">请选择数据表以查看内容</div>
|
||||
</div>
|
||||
|
||||
<div id="pagination" class="pagination" style="display: none;">
|
||||
<button class="page-btn" onclick="changePage('prev')" id="prevBtn">上一页</button>
|
||||
<span class="page-info" id="pageInfoDetail"></span>
|
||||
<button class="page-btn" onclick="changePage('next')" id="nextBtn">下一页</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div id="progressSection" class="progress-container">
|
||||
<h3>📊 分数分析进度</h3>
|
||||
<div class="progress-bar">
|
||||
<div id="progressFill" class="progress-fill" style="width: 0%;">0%</div>
|
||||
</div>
|
||||
<p id="progressText">等待分析开始...</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<script>
|
||||
let currentTable = '';
|
||||
let currentPage = 1;
|
||||
let perPage = 50;
|
||||
let totalPages = 1;
|
||||
let currentData = null;
|
||||
|
||||
// 绑定事件
|
||||
document.addEventListener('DOMContentLoaded', function() {
|
||||
loadTables();
|
||||
|
||||
// 绑定事件
|
||||
document.getElementById('tableSelect').addEventListener('change', function() {
|
||||
currentTable = this.value;
|
||||
currentPage = 1;
|
||||
if (currentTable) {
|
||||
loadTableStructure();
|
||||
loadData();
|
||||
}
|
||||
});
|
||||
|
||||
// 分析缺失分数按钮事件
|
||||
document.getElementById('analyzeScoresBtn').addEventListener('click', analyzeMissingScores);
|
||||
});
|
||||
|
||||
// 分析缺失分数
|
||||
async function analyzeMissingScores() {
|
||||
const analyzeBtn = document.getElementById('analyzeScoresBtn');
|
||||
const progressSection = document.getElementById('progressSection');
|
||||
const progressFill = document.getElementById('progressFill');
|
||||
const progressText = document.getElementById('progressText');
|
||||
|
||||
try {
|
||||
// 禁用按钮
|
||||
analyzeBtn.disabled = true;
|
||||
analyzeBtn.textContent = '分析中...';
|
||||
|
||||
// 显示进度条
|
||||
progressSection.style.display = 'block';
|
||||
progressFill.style.width = '0%';
|
||||
progressFill.textContent = '0%';
|
||||
progressText.textContent = '正在启动分析任务...';
|
||||
|
||||
// 启动分析任务
|
||||
const response = await fetch('/api/analyze_missing_scores');
|
||||
const data = await response.json();
|
||||
|
||||
if (data.task_id) {
|
||||
// 定期查询任务状态
|
||||
const interval = setInterval(async () => {
|
||||
try {
|
||||
const statusResponse = await fetch(`/api/update_task_status/${data.task_id}`);
|
||||
const statusData = await statusResponse.json();
|
||||
|
||||
// 更新进度
|
||||
progressFill.style.width = `${statusData.progress}%`;
|
||||
progressFill.textContent = `${statusData.progress}%`;
|
||||
|
||||
if (statusData.status === 'running') {
|
||||
progressText.textContent = `正在分析: ${statusData.completed}/${statusData.total} 个产品`;
|
||||
} else if (statusData.status === 'completed') {
|
||||
progressText.textContent = '🎉 所有缺失分数分析完成!';
|
||||
clearInterval(interval);
|
||||
analyzeBtn.disabled = false;
|
||||
analyzeBtn.textContent = '📊 分析缺失分数';
|
||||
|
||||
// 如果当前正在查看product_analysis表,自动刷新
|
||||
if (currentTable === 'product_analysis') {
|
||||
loadData();
|
||||
}
|
||||
} else if (statusData.status === 'failed') {
|
||||
progressText.textContent = `❌ 分析失败: ${statusData.error}`;
|
||||
clearInterval(interval);
|
||||
analyzeBtn.disabled = false;
|
||||
analyzeBtn.textContent = '📊 分析缺失分数';
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('查询任务状态失败:', error);
|
||||
progressText.textContent = '查询任务状态失败';
|
||||
}
|
||||
}, 2000);
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('启动分析任务失败:', error);
|
||||
progressText.textContent = `启动分析失败: ${error.message}`;
|
||||
analyzeBtn.disabled = false;
|
||||
analyzeBtn.textContent = '📊 分析缺失分数';
|
||||
}
|
||||
}
|
||||
|
||||
document.getElementById('searchField').addEventListener('change', loadData);
|
||||
document.getElementById('searchValue').addEventListener('input', debounce(loadData, 500));
|
||||
|
||||
// 防抖函数
|
||||
function debounce(func, wait) {
|
||||
let timeout;
|
||||
return function executedFunction(...args) {
|
||||
const later = () => {
|
||||
clearTimeout(timeout);
|
||||
func(...args);
|
||||
};
|
||||
clearTimeout(timeout);
|
||||
timeout = setTimeout(later, wait);
|
||||
};
|
||||
}
|
||||
|
||||
// 加载表列表
|
||||
async function loadTables() {
|
||||
try {
|
||||
const response = await fetch('/api/tables');
|
||||
const data = await response.json();
|
||||
const select = document.getElementById('tableSelect');
|
||||
select.innerHTML = '<option value="">选择数据表...</option>';
|
||||
|
||||
data.tables.forEach(table => {
|
||||
const option = document.createElement('option');
|
||||
option.value = table;
|
||||
option.textContent = table;
|
||||
select.appendChild(option);
|
||||
});
|
||||
} catch (error) {
|
||||
console.error('加载表列表失败:', error);
|
||||
showError('加载表列表失败: ' + error.message);
|
||||
}
|
||||
}
|
||||
|
||||
// 加载表结构
|
||||
async function loadTableStructure() {
|
||||
if (!currentTable) return;
|
||||
|
||||
try {
|
||||
const response = await fetch(`/api/table/${currentTable}/structure`);
|
||||
const data = await response.json();
|
||||
const searchField = document.getElementById('searchField');
|
||||
|
||||
searchField.innerHTML = '<option value="">所有文本字段</option>';
|
||||
data.structure.forEach(field => {
|
||||
const option = document.createElement('option');
|
||||
option.value = field.name;
|
||||
option.textContent = field.name;
|
||||
searchField.appendChild(option);
|
||||
});
|
||||
|
||||
searchField.disabled = false;
|
||||
document.getElementById('searchValue').disabled = false;
|
||||
} catch (error) {
|
||||
console.error('加载表结构失败:', error);
|
||||
}
|
||||
}
|
||||
|
||||
// 加载数据
|
||||
async function loadData() {
|
||||
if (!currentTable) return;
|
||||
|
||||
const container = document.getElementById('dataContainer');
|
||||
container.innerHTML = '<div class="loading">📊 数据加载中...</div>';
|
||||
|
||||
const searchFieldSelect = document.getElementById('searchField');
|
||||
const searchValue = document.getElementById('searchValue').value;
|
||||
|
||||
try {
|
||||
let url = `/api/table/${currentTable}/data?page=${currentPage}&per_page=${perPage}`;
|
||||
if (searchValue) {
|
||||
// 获取所有选中的字段
|
||||
const selectedFields = Array.from(searchFieldSelect.selectedOptions)
|
||||
.map(option => option.value)
|
||||
.filter(value => value !== '');
|
||||
|
||||
if (selectedFields.length > 0) {
|
||||
// 如果选择了特定字段,传递所有选中的字段
|
||||
selectedFields.forEach(field => {
|
||||
url += `&search_field=${encodeURIComponent(field)}`;
|
||||
});
|
||||
} else {
|
||||
// 否则使用"all"表示所有文本字段
|
||||
url += '&search_field=all';
|
||||
}
|
||||
url += `&search_value=${encodeURIComponent(searchValue)}`;
|
||||
}
|
||||
|
||||
const response = await fetch(url);
|
||||
currentData = await response.json();
|
||||
|
||||
displayData(currentData);
|
||||
updatePagination();
|
||||
|
||||
} catch (error) {
|
||||
console.error('加载数据失败:', error);
|
||||
showError('加载数据失败: ' + error.message);
|
||||
}
|
||||
}
|
||||
|
||||
// 显示数据
|
||||
function displayData(data) {
|
||||
const container = document.getElementById('dataContainer');
|
||||
|
||||
if (!data.rows || data.rows.length === 0) {
|
||||
container.innerHTML = '<div class="no-data">📭 没有找到数据</div>';
|
||||
return;
|
||||
}
|
||||
|
||||
let html = '<div class="table-wrapper"><table><thead><tr>';
|
||||
|
||||
// 表头
|
||||
data.columns.forEach(col => {
|
||||
html += `<th>${col}</th>`;
|
||||
});
|
||||
html += '</tr></thead><tbody>';
|
||||
|
||||
// 数据行
|
||||
data.rows.forEach(row => {
|
||||
html += '<tr>';
|
||||
row.forEach((cell, index) => {
|
||||
const colName = data.columns[index];
|
||||
if (cell.type === 'multiline') {
|
||||
html += `<td><div class="multiline-cell">${escapeHtml(cell.value)}</div></td>`;
|
||||
} else if (cell.type === 'empty') {
|
||||
html += '<td><div class="empty-cell">空</div></td>';
|
||||
} else if (colName === 'product_link' && cell.value) {
|
||||
// 渲染为链接
|
||||
html += `<td><div class="normal-cell"><a href="${escapeHtml(cell.value)}" target="_blank" rel="noopener noreferrer">${escapeHtml(cell.value)}</a></div></td>`;
|
||||
} else {
|
||||
html += `<td><div class="normal-cell">${escapeHtml(cell.value)}</div></td>`;
|
||||
}
|
||||
});
|
||||
html += '</tr>';
|
||||
});
|
||||
|
||||
html += '</tbody></table></div>';
|
||||
container.innerHTML = html;
|
||||
|
||||
// 更新统计信息
|
||||
document.getElementById('tableName').textContent = `📋 ${currentTable}`;
|
||||
document.getElementById('recordCount').textContent = `记录数: ${data.total_count}`;
|
||||
document.getElementById('pageInfo').textContent = `第 ${currentPage} 页,共 ${data.total_pages} 页`;
|
||||
}
|
||||
|
||||
// 更新分页
|
||||
function updatePagination() {
|
||||
if (!currentData) return;
|
||||
|
||||
totalPages = currentData.total_pages;
|
||||
const pagination = document.getElementById('pagination');
|
||||
const prevBtn = document.getElementById('prevBtn');
|
||||
const nextBtn = document.getElementById('nextBtn');
|
||||
const pageInfo = document.getElementById('pageInfoDetail');
|
||||
|
||||
if (totalPages <= 1) {
|
||||
pagination.style.display = 'none';
|
||||
return;
|
||||
}
|
||||
|
||||
pagination.style.display = 'flex';
|
||||
|
||||
prevBtn.disabled = currentPage <= 1;
|
||||
nextBtn.disabled = currentPage >= totalPages;
|
||||
|
||||
pageInfo.textContent = `${currentPage} / ${totalPages}`;
|
||||
}
|
||||
|
||||
// 翻页
|
||||
function changePage(direction) {
|
||||
if (direction === 'prev' && currentPage > 1) {
|
||||
currentPage--;
|
||||
loadData();
|
||||
} else if (direction === 'next' && currentPage < totalPages) {
|
||||
currentPage++;
|
||||
loadData();
|
||||
}
|
||||
}
|
||||
|
||||
// HTML转义
|
||||
function escapeHtml(text) {
|
||||
const div = document.createElement('div');
|
||||
div.textContent = text;
|
||||
return div.innerHTML;
|
||||
}
|
||||
|
||||
// 显示错误
|
||||
function showError(message) {
|
||||
const container = document.getElementById('dataContainer');
|
||||
container.innerHTML = `<div class="error">❌ ${message}</div>`;
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
File diff suppressed because it is too large
Load Diff
@@ -84,19 +84,36 @@ def parse_file_content(file_path):
|
||||
return []
|
||||
|
||||
def check_duplicate(title, date_str):
|
||||
"""检查标题+日期是否已存在"""
|
||||
"""检查标题在最近三天(前天、昨天和今天)是否已存在"""
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
conn = sqlite3.connect('tophub_data.db')
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
SELECT COUNT(*) FROM articles
|
||||
WHERE title = ? AND source_date = ?
|
||||
''', (title, date_str))
|
||||
|
||||
count = cursor.fetchone()[0]
|
||||
conn.close()
|
||||
|
||||
return count > 0
|
||||
try:
|
||||
# 将输入日期字符串转换为datetime对象
|
||||
current_date = datetime.strptime(date_str, '%Y-%m-%d')
|
||||
|
||||
# 计算前天、昨天和今天的日期
|
||||
yesterday = current_date - timedelta(days=1)
|
||||
day_before_yesterday = current_date - timedelta(days=2)
|
||||
|
||||
# 检查这三天内是否有相同标题的文章
|
||||
cursor.execute('''
|
||||
SELECT COUNT(*) FROM articles
|
||||
WHERE title = ? AND source_date IN (?, ?, ?)
|
||||
''', (title,
|
||||
day_before_yesterday.strftime('%Y-%m-%d'),
|
||||
yesterday.strftime('%Y-%m-%d'),
|
||||
date_str))
|
||||
|
||||
count = cursor.fetchone()[0]
|
||||
logger.info(f"检查标题 '{title}' 在最近三天的重复情况: 找到 {count} 条相同记录")
|
||||
|
||||
return count > 0
|
||||
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def classify_title(title):
|
||||
"""调用API对标题进行分类"""
|
||||
@@ -188,12 +205,16 @@ def process_temp_files():
|
||||
continue
|
||||
|
||||
# 处理每篇文章
|
||||
for article in tqdm(articles, desc=f"处理 {file_path}"):
|
||||
for i, article in tqdm(enumerate(articles), desc=f"处理 {file_path}", total=len(articles)):
|
||||
total_processed += 1
|
||||
|
||||
# 每处理10篇文章记录一次进度
|
||||
if i % 10 == 0 and i > 0:
|
||||
logger.info(f"已处理 {i}/{len(articles)} 篇文章,完成 {i/len(articles)*100:.1f}%")
|
||||
|
||||
# 检查重复
|
||||
if check_duplicate(article['title'], source_date):
|
||||
logger.info(f"跳过重复文章: {article['title']}")
|
||||
logger.info(f"跳过重复文章(最近三天已存在): {article['title']}")
|
||||
continue
|
||||
|
||||
# 分类标题
|
||||
|
||||
BIN
tophub_data.db
BIN
tophub_data.db
Binary file not shown.
71362
tophub_scraper.log
71362
tophub_scraper.log
File diff suppressed because it is too large
Load Diff
@@ -11,6 +11,8 @@ import json
|
||||
import time
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from loguru import logger
|
||||
|
||||
@@ -71,6 +73,32 @@ class TopHubScraper:
|
||||
logger.error(f"获取网页内容失败: {e}")
|
||||
raise
|
||||
|
||||
def delete_date_txt_files(self):
|
||||
"""
|
||||
删除本地目录下所有以日期格式开头的txt文件
|
||||
匹配格式: YYYY年MM月DD日HHMMSS.txt
|
||||
"""
|
||||
logger.info("开始删除日期格式的txt文件")
|
||||
deleted_count = 0
|
||||
|
||||
# 定义日期格式的正则表达式模式
|
||||
date_pattern = r'^\d{4}年\d{1,2}月\d{1,2}日\d{6}\.txt$'
|
||||
|
||||
try:
|
||||
# 获取当前目录下的所有txt文件
|
||||
for filename in os.listdir('.'):
|
||||
if filename.endswith('.txt') and re.match(date_pattern, filename):
|
||||
try:
|
||||
os.remove(filename)
|
||||
logger.info(f"已删除文件: {filename}")
|
||||
deleted_count += 1
|
||||
except Exception as e:
|
||||
logger.error(f"删除文件 {filename} 失败: {e}")
|
||||
|
||||
logger.info(f"删除完成,共删除 {deleted_count} 个日期格式的txt文件")
|
||||
except Exception as e:
|
||||
logger.error(f"删除文件时出错: {e}")
|
||||
|
||||
def scrape_by_node_ids(self):
|
||||
"""
|
||||
根据节点ID范围抓取数据
|
||||
@@ -79,6 +107,9 @@ class TopHubScraper:
|
||||
list: 包含已抓取数据的列表
|
||||
"""
|
||||
try:
|
||||
# 运行逻辑前,先删除所有日期格式的txt文件
|
||||
self.delete_date_txt_files()
|
||||
|
||||
# 1. 获取网页内容
|
||||
html_content = self.fetch_webpage()
|
||||
tree = html.fromstring(html_content)
|
||||
@@ -204,6 +235,70 @@ class TopHubScraper:
|
||||
logger.error(f"保存文件失败: {e}")
|
||||
raise
|
||||
|
||||
def call_add_data_script(self):
|
||||
"""
|
||||
调用本地的tophub_add_data_to_db.py脚本
|
||||
"""
|
||||
logger.info("准备调用tophub_add_data_to_db.py脚本")
|
||||
|
||||
try:
|
||||
# 检查tophub_add_data_to_db.py是否存在
|
||||
if not os.path.exists("tophub_add_data_to_db.py"):
|
||||
logger.error("tophub_add_data_to_db.py文件不存在,无法调用")
|
||||
return
|
||||
|
||||
# 调用tophub_add_data_to_db.py脚本
|
||||
|
||||
|
||||
logger.info("正在调用tophub_add_data_to_db.py...")
|
||||
|
||||
# 使用Popen方式处理可能的编码问题
|
||||
process = subprocess.Popen([sys.executable, "tophub_add_data_to_db.py"],
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
text=True,
|
||||
encoding='utf-8',
|
||||
errors='replace') # 使用replace模式处理无法解码的字符
|
||||
|
||||
# 实时读取输出以避免编码问题
|
||||
try:
|
||||
stdout, stderr = process.communicate(timeout=3600) # 1小时超时
|
||||
except subprocess.TimeoutExpired:
|
||||
process.kill()
|
||||
logger.error("tophub_add_data_to_db.py执行超时")
|
||||
return
|
||||
|
||||
if process.returncode == 0:
|
||||
logger.info("tophub_add_data_to_db.py调用成功")
|
||||
if stdout:
|
||||
logger.info(f"脚本输出: {stdout}")
|
||||
else:
|
||||
logger.error(f"tophub_add_data_to_db.py调用失败,返回码: {process.returncode}")
|
||||
if stderr:
|
||||
logger.error(f"错误信息: {stderr}")
|
||||
if stdout:
|
||||
logger.info(f"脚本输出: {stdout}")
|
||||
|
||||
except UnicodeDecodeError as e:
|
||||
logger.error(f"编码解码错误: {e}")
|
||||
logger.info("可能是脚本输出包含非UTF-8编码字符,已尝试使用replace模式处理")
|
||||
except Exception as e:
|
||||
logger.error(f"调用tophub_add_data_to_db.py时出错: {e}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
scraper = TopHubScraper()
|
||||
scraper.scrape_by_node_ids()
|
||||
|
||||
|
||||
try:
|
||||
# 抓取数据
|
||||
scraped_data = scraper.scrape_by_node_ids()
|
||||
|
||||
# 抓取完成后调用tophub_add_data_to_db.py脚本
|
||||
if scraped_data:
|
||||
scraper.call_add_data_script()
|
||||
else:
|
||||
logger.warning("未抓取到数据,跳过调用tophub_add_data_to_db.py脚本")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"程序执行出错: {e}")
|
||||
raise
|
||||
155
右键菜单功能说明.md
155
右键菜单功能说明.md
@@ -1,155 +0,0 @@
|
||||
# 右键菜单功能说明
|
||||
|
||||
## 功能概述
|
||||
|
||||
TopHub数据查看器的右键菜单功能允许用户通过右键点击表格中的项目,快速执行常用操作,提高操作效率。
|
||||
|
||||
## 新增功能
|
||||
|
||||
### 1. 标记为感兴趣
|
||||
- **功能描述**:将选中的项目标记为感兴趣状态
|
||||
- **数据库操作**:将对应记录的`is_interested`字段设置为1
|
||||
- **界面显示**:在"感兴趣"列显示为"是",使用绿色粗体字体
|
||||
|
||||
### 2. 标记为不感兴趣
|
||||
- **功能描述**:将选中的项目标记为不感兴趣状态
|
||||
- **数据库操作**:将对应记录的`is_interested`字段设置为0
|
||||
- **界面显示**:在"感兴趣"列显示为"否",使用普通字体和颜色
|
||||
|
||||
### 3. 删除选中项
|
||||
- **功能描述**:删除选中的项目
|
||||
- **数据库操作**:从数据库中删除对应记录
|
||||
- **界面显示**:从表格中移除对应行
|
||||
|
||||
## 使用方法
|
||||
|
||||
1. 打开TopHub数据查看器
|
||||
2. 在表格中右键点击任意项目
|
||||
3. 在弹出的右键菜单中选择所需操作:
|
||||
- 点击"标记为感兴趣"将项目标记为感兴趣
|
||||
- 点击"标记为不感兴趣"将项目标记为不感兴趣
|
||||
- 点击"删除选中项"删除选中的项目
|
||||
|
||||
## 技术实现
|
||||
|
||||
### 右键菜单实现
|
||||
```python
|
||||
# 启用右键菜单
|
||||
self.table.setContextMenuPolicy(Qt.CustomContextMenu)
|
||||
self.table.customContextMenuRequested.connect(self.show_context_menu)
|
||||
|
||||
def show_context_menu(self, position):
|
||||
"""显示右键菜单"""
|
||||
# 获取点击位置的行
|
||||
row = self.table.rowAt(position.y())
|
||||
if row < 0:
|
||||
return
|
||||
|
||||
# 选中该行
|
||||
self.table.selectRow(row)
|
||||
|
||||
# 创建右键菜单
|
||||
menu = QMenu(self)
|
||||
|
||||
# 添加"标记为感兴趣"动作
|
||||
mark_action = QAction("标记为感兴趣", self)
|
||||
mark_action.triggered.connect(self.mark_as_interested)
|
||||
menu.addAction(mark_action)
|
||||
|
||||
# 添加"标记为不感兴趣"动作
|
||||
unmark_action = QAction("标记为不感兴趣", self)
|
||||
unmark_action.triggered.connect(self.mark_as_not_interested)
|
||||
menu.addAction(unmark_action)
|
||||
|
||||
# 添加分隔线
|
||||
menu.addSeparator()
|
||||
|
||||
# 添加"删除"动作
|
||||
delete_action = QAction("删除选中项", self)
|
||||
delete_action.triggered.connect(self.delete_selected_items)
|
||||
menu.addAction(delete_action)
|
||||
|
||||
# 显示菜单
|
||||
menu.exec_(self.table.mapToGlobal(position))
|
||||
```
|
||||
|
||||
### 标记为不感兴趣方法实现
|
||||
```python
|
||||
def mark_as_not_interested(self):
|
||||
"""将选中的项目标记为不感兴趣"""
|
||||
# 获取选中的行
|
||||
selected_rows = set()
|
||||
for item in self.table.selectedItems():
|
||||
selected_rows.add(item.row())
|
||||
|
||||
# 如果没有选中的行,直接返回
|
||||
if not selected_rows:
|
||||
QMessageBox.information(self, "提示", "请先选中要标记的行")
|
||||
return
|
||||
|
||||
# 弹出确认对话框
|
||||
reply = QMessageBox.question(
|
||||
self,
|
||||
"确认标记",
|
||||
f"确定要将选中的 {len(selected_rows)} 行标记为不感兴趣吗?",
|
||||
QMessageBox.Yes | QMessageBox.No,
|
||||
QMessageBox.Yes
|
||||
)
|
||||
|
||||
if reply == QMessageBox.No:
|
||||
return
|
||||
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 更新选中的行
|
||||
updated_count = 0
|
||||
for row in selected_rows:
|
||||
# 获取ID
|
||||
id_item = self.table.item(row, 0)
|
||||
if id_item:
|
||||
article_id = id_item.text()
|
||||
# 更新数据库中的is_interested字段
|
||||
cursor.execute("UPDATE articles SET is_interested = 0 WHERE id = ?", (article_id,))
|
||||
|
||||
# 更新表格中的显示
|
||||
interested_item = QTableWidgetItem("否")
|
||||
# 不感兴趣项使用普通字体和颜色
|
||||
self.table.setItem(row, 5, interested_item)
|
||||
|
||||
updated_count += 1
|
||||
|
||||
# 提交更改
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
# 更新状态栏
|
||||
self.status_bar.showMessage(f"已标记 {updated_count} 行为不感兴趣")
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"标记数据时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "数据库错误", f"标记数据时出错: {str(e)}")
|
||||
self.status_bar.showMessage("标记失败")
|
||||
except Exception as e:
|
||||
logger.error(f"标记数据时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "错误", f"标记数据时出错: {str(e)}")
|
||||
self.status_bar.showMessage("标记失败")
|
||||
```
|
||||
|
||||
## 测试
|
||||
|
||||
测试脚本`test_mark_not_interested.py`验证了"标记为不感兴趣"功能的正确性。测试结果显示功能正常工作,能够正确地将项目标记为不感兴趣,并更新数据库和界面显示。
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. 右键菜单操作前必须先选中要操作的项目
|
||||
2. 删除操作不可撤销,请谨慎使用
|
||||
3. 标记操作会直接更新数据库,确保操作前已确认选择
|
||||
4. 批量操作时,所有选中的项目都会被同时处理
|
||||
|
||||
## 更新记录
|
||||
|
||||
- 2023-11-07:添加"标记为不感兴趣"功能到右键菜单
|
||||
- 2023-11-07:完成功能测试和文档编写
|
||||
44
数据库字段添加总结.md
44
数据库字段添加总结.md
@@ -1,44 +0,0 @@
|
||||
# 数据库字段添加总结
|
||||
|
||||
## 任务概述
|
||||
为TopHub数据库查看器添加一个"感兴趣"字段,允许用户标记感兴趣的文章。
|
||||
|
||||
## 实施步骤
|
||||
|
||||
### 1. 数据库结构修改
|
||||
- 创建了`add_interested_field.py`脚本,用于向`articles`表添加`is_interested`字段
|
||||
- 字段类型:INTEGER,默认值:0
|
||||
- 脚本包含字段存在性检查、添加逻辑和验证功能
|
||||
|
||||
### 2. 数据库验证
|
||||
- 创建了`check_db_structure.py`脚本,用于检查数据库结构
|
||||
- 创建了`test_interested_field.py`脚本,用于验证字段功能
|
||||
- 创建了`show_data_with_interested.py`脚本,用于显示包含感兴趣状态的记录
|
||||
|
||||
### 3. GUI界面修改
|
||||
- 修改了`db_viewer.py`文件,添加了以下功能:
|
||||
- 在表格中添加"感兴趣"列,显示`is_interested`字段值
|
||||
- 添加"标记为感兴趣"按钮,允许用户将选中的文章标记为感兴趣
|
||||
- 更新查询语句,包含`is_interested`字段
|
||||
- 更新筛选功能,包含感兴趣列
|
||||
|
||||
## 测试结果
|
||||
- 数据库字段成功添加,默认值为0
|
||||
- 可以成功将记录标记为感兴趣(值为1)
|
||||
- GUI应用程序能够正常显示和操作感兴趣字段
|
||||
- 统计功能正常工作,可以显示感兴趣和不感兴趣的记录数量
|
||||
|
||||
## 使用方法
|
||||
1. 运行`python db_viewer.py`启动应用程序
|
||||
2. 在表格中选择一条记录
|
||||
3. 点击"标记为感兴趣"按钮将记录标记为感兴趣
|
||||
4. 可以使用筛选功能查看感兴趣的记录
|
||||
5. 统计面板会显示感兴趣和不感兴趣的记录数量
|
||||
|
||||
## 文件清单
|
||||
- `add_interested_field.py` - 添加数据库字段的脚本
|
||||
- `check_db_structure.py` - 检查数据库结构的脚本
|
||||
- `test_interested_field.py` - 测试字段功能的脚本
|
||||
- `show_data_with_interested.py` - 显示记录的命令行工具
|
||||
- `test_gui.py` - GUI测试脚本
|
||||
- `db_viewer.py` - 修改后的主应用程序
|
||||
70
评分系统使用说明.md
70
评分系统使用说明.md
@@ -1,70 +0,0 @@
|
||||
# TopHub数据查看器 - 评分系统使用说明
|
||||
|
||||
## 概述
|
||||
|
||||
TopHub数据查看器已从简单的"感兴趣/不感兴趣"标记系统升级为10分评分制度。新系统提供了更精细的内容评价能力,让您能够更准确地标记和管理抓取的内容。
|
||||
|
||||
## 评分系统说明
|
||||
|
||||
### 评分范围
|
||||
- **最低分**: 0分 (完全不感兴趣)
|
||||
- **默认分**: 5分 (中立态度)
|
||||
- **最高分**: 10分 (非常感兴趣)
|
||||
|
||||
### 颜色编码
|
||||
为了便于快速识别内容质量,系统根据分数自动显示不同颜色:
|
||||
- **绿色加粗**: 8分及以上 (高价值内容)
|
||||
- **蓝色**: 6-7分 (中等价值内容)
|
||||
- **默认颜色**: 4-5分 (一般内容)
|
||||
- **红色**: 3分及以下 (低价值内容)
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 增加评分
|
||||
1. 在表格中选择一行或多行
|
||||
2. 右键点击选中的行
|
||||
3. 从菜单中选择"增加评分(+1)"
|
||||
4. 系统会将选中项的评分增加1分,最高不超过10分
|
||||
|
||||
### 减少评分
|
||||
1. 在表格中选择一行或多行
|
||||
2. 右键点击选中的行
|
||||
3. 从菜单中选择"减少评分(-1)"
|
||||
4. 系统会将选中项的评分减少1分,最低不低于0分
|
||||
|
||||
### 批量操作
|
||||
- 可以同时选择多行进行批量评分调整
|
||||
- 使用"按关键字选中"功能可以快速选择包含特定关键词的行
|
||||
- 然后通过右键菜单进行批量评分调整
|
||||
|
||||
## 数据迁移
|
||||
|
||||
原有的"感兴趣/不感兴趣"数据已自动转换为新的评分系统:
|
||||
- 原标记为"感兴趣"的项目已转换为7分
|
||||
- 原标记为"不感兴趣"的项目已转换为5分(默认值)
|
||||
|
||||
## 技术细节
|
||||
|
||||
### 数据库结构
|
||||
- 新增了`score`字段(INTEGER类型)替代原来的`is_interested`字段
|
||||
- `score`字段默认值为5,范围限制为0-10
|
||||
|
||||
### 界面更新
|
||||
- 表格中的"感兴趣"列已更新为"评分"列,显示具体分数
|
||||
- 右键菜单已更新为"增加评分(+1)"和"减少评分(-1)"选项
|
||||
- 根据分数自动应用颜色编码,便于快速识别
|
||||
|
||||
## 常见问题
|
||||
|
||||
**Q: 为什么默认分数是5分而不是0分?**
|
||||
A: 5分代表中立态度,更符合日常评分习惯。0分通常用于表示完全不相关或质量极差的内容。
|
||||
|
||||
**Q: 如何快速找到高评分内容?**
|
||||
A: 高评分内容(8分及以上)会以绿色加粗显示,非常醒目。您也可以使用排序功能按评分列排序。
|
||||
|
||||
**Q: 可以直接设置任意分数吗?**
|
||||
A: 当前版本只支持通过+1/-1的方式调整分数,这样可以保持评分的一致性和可追溯性。
|
||||
|
||||
---
|
||||
|
||||
如有其他问题或建议,请随时反馈。
|
||||
Reference in New Issue
Block a user