第一次提交。
其中爬取是tophub_scraper.py 数据入库是 tophub_add_data_to_db.py 查看当前数据内容是 db_viewer.py
This commit is contained in:
5875
2025年11月9日131545.txt
Normal file
5875
2025年11月9日131545.txt
Normal file
File diff suppressed because it is too large
Load Diff
149
README.md
Normal file
149
README.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# TopHub数据处理系统
|
||||
|
||||
本项目用于处理TopHub网站抓取的临时文件,对数据进行分类并存储到SQLite数据库中。
|
||||
|
||||
## 功能特点
|
||||
|
||||
1. **文件解析**:读取临时文件(格式为"日期+时间.txt"),每5行作为一个数据单元
|
||||
2. **数据提取**:从每个数据单元中提取标题和链接
|
||||
3. **智能分类**:调用本地API(Ollama)对标题进行自动分类
|
||||
4. **去重处理**:检查标题+日期是否已存在于数据库中,避免重复录入
|
||||
5. **进度显示**:使用进度条显示处理进度
|
||||
6. **分类标准化**:将相似分类合并为标准分类
|
||||
|
||||
## 文件说明
|
||||
|
||||
### 核心脚本
|
||||
|
||||
1. **process_temp_files.py** - 主处理脚本
|
||||
- 解析临时文件
|
||||
- 调用API进行分类
|
||||
- 存储到数据库
|
||||
|
||||
2. **cleanup_categories.py** - 分类清理脚本
|
||||
- 清理分类中的特殊字符
|
||||
- 统一分类格式
|
||||
|
||||
3. **standardize_categories.py** - 分类标准化脚本
|
||||
- 将相似分类合并为标准分类
|
||||
- 提供分类映射规则
|
||||
|
||||
### 辅助脚本
|
||||
|
||||
1. **check_db.py** - 数据库结构检查脚本
|
||||
2. **test_api.py** - API测试脚本
|
||||
3. **view_categories.py** - 查看分类示例脚本
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 1. 处理临时文件
|
||||
|
||||
```bash
|
||||
python process_temp_files.py
|
||||
```
|
||||
|
||||
该脚本会:
|
||||
- 扫描当前目录下的所有临时文件(格式为"日期+时间.txt")
|
||||
- 解析文件内容,提取标题和链接
|
||||
- 调用本地API对标题进行分类
|
||||
- 检查并避免重复数据
|
||||
- 存储到tophub_data.db数据库
|
||||
|
||||
### 2. 清理和标准化分类
|
||||
|
||||
```bash
|
||||
# 清理分类中的特殊字符
|
||||
python cleanup_categories.py
|
||||
|
||||
# 标准化分类
|
||||
python standardize_categories.py
|
||||
```
|
||||
|
||||
### 3. 查看数据
|
||||
|
||||
```bash
|
||||
# 查看分类示例
|
||||
python view_categories.py
|
||||
|
||||
# 检查数据库结构
|
||||
python check_db.py
|
||||
```
|
||||
|
||||
## 数据库结构
|
||||
|
||||
数据库文件为`tophub_data.db`,包含以下表:
|
||||
|
||||
1. **tophub_entries** - 主数据表
|
||||
- id: 主键
|
||||
- text_content: 标题内容(非空)
|
||||
- link: 链接
|
||||
- category: 分类
|
||||
- scrape_time: 抓取时间
|
||||
|
||||
2. **classification_progress** - 分类进度表
|
||||
- id: 主键
|
||||
- total_count: 总数量
|
||||
- processed_count: 已处理数量
|
||||
- last_updated: 最后更新时间
|
||||
|
||||
## API配置
|
||||
|
||||
脚本使用本地Ollama API进行分类:
|
||||
- API地址:http://localhost:11434/api/generate
|
||||
- 模型:gemma3:4b
|
||||
- 请求格式:JSON
|
||||
|
||||
## 分类标准
|
||||
|
||||
系统支持以下标准分类:
|
||||
|
||||
1. 科技 - 新质科技、互联网等
|
||||
2. 社会 - 社会新闻、生活服务等
|
||||
3. 体育 - 体育新闻、足球等
|
||||
4. 历史 - 历史事件、历史人物等
|
||||
5. 安全 - 安全漏洞、安全科技等
|
||||
6. 军事 - 军事新闻、国防等
|
||||
7. 金融 - 金融新闻、市场分析等
|
||||
8. 购物 - 电商、购物等
|
||||
9. 游戏 - 游戏新闻等
|
||||
10. 娱乐 - 娱乐八卦、音乐等
|
||||
11. 健康 - 健康医疗、健康生活等
|
||||
12. 其他 - 其他未分类内容
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. 确保本地Ollama服务已启动并可访问
|
||||
2. 临时文件格式必须为"日期+时间.txt"
|
||||
3. 每个数据单元包含5行:节点ID、分类、标题、链接和分隔线
|
||||
4. 数据库文件会自动创建,无需手动创建
|
||||
|
||||
## 日志文件
|
||||
|
||||
系统会生成以下日志文件:
|
||||
- process_temp_files.log - 主处理日志
|
||||
- cleanup_categories.log - 分类清理日志
|
||||
- standardize_categories.log - 分类标准化日志
|
||||
|
||||
## 示例
|
||||
|
||||
### 临时文件格式示例
|
||||
|
||||
```
|
||||
节点ID: 102
|
||||
分类: 宽带山
|
||||
标题: 女机器人
|
||||
链接: http://club.kdslife.com/t_11502693.html
|
||||
--------------------------------------------------
|
||||
节点ID: 103
|
||||
分类: 宽带山
|
||||
标题: 这个应该属于底盘不行吗
|
||||
链接: http://club.kdslife.com/t_11502686.html
|
||||
--------------------------------------------------
|
||||
```
|
||||
|
||||
### 处理结果示例
|
||||
|
||||
```
|
||||
标题 '女机器人' 分类为: 科技
|
||||
标题 '这个应该属于底盘不行吗' 分类为: 其他
|
||||
```
|
||||
BIN
__pycache__/db_viewer.cpython-38.pyc
Normal file
BIN
__pycache__/db_viewer.cpython-38.pyc
Normal file
Binary file not shown.
BIN
__pycache__/tophub_add_data_to_db.cpython-38.pyc
Normal file
BIN
__pycache__/tophub_add_data_to_db.cpython-38.pyc
Normal file
Binary file not shown.
BIN
__pycache__/tophub_scraper.cpython-38.pyc
Normal file
BIN
__pycache__/tophub_scraper.cpython-38.pyc
Normal file
Binary file not shown.
72
add_interested_field.py
Normal file
72
add_interested_field.py
Normal file
@@ -0,0 +1,72 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
添加感兴趣标记字段脚本
|
||||
为articles表添加is_interested字段,默认值为0
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import os
|
||||
from loguru import logger
|
||||
|
||||
def add_interested_field():
|
||||
"""为articles表添加is_interested字段"""
|
||||
# 获取当前脚本所在目录的数据库文件路径
|
||||
script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
db_path = os.path.join(script_dir, "tophub_data.db")
|
||||
|
||||
# 检查数据库文件是否存在
|
||||
if not os.path.exists(db_path):
|
||||
logger.error(f"数据库文件不存在: {db_path}")
|
||||
return False
|
||||
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect(db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 检查is_interested字段是否已存在
|
||||
cursor.execute("PRAGMA table_info(articles)")
|
||||
columns = cursor.fetchall()
|
||||
column_names = [column[1] for column in columns]
|
||||
|
||||
if "is_interested" in column_names:
|
||||
logger.info("is_interested字段已存在,无需添加")
|
||||
conn.close()
|
||||
return True
|
||||
|
||||
# 添加is_interested字段,默认值为0
|
||||
logger.info("正在添加is_interested字段...")
|
||||
cursor.execute("ALTER TABLE articles ADD COLUMN is_interested INTEGER DEFAULT 0")
|
||||
|
||||
# 提交更改
|
||||
conn.commit()
|
||||
logger.info("成功添加is_interested字段")
|
||||
|
||||
# 验证字段是否添加成功
|
||||
cursor.execute("PRAGMA table_info(articles)")
|
||||
columns = cursor.fetchall()
|
||||
column_names = [column[1] for column in columns]
|
||||
|
||||
if "is_interested" in column_names:
|
||||
logger.info("验证成功:is_interested字段已添加到articles表")
|
||||
else:
|
||||
logger.error("验证失败:is_interested字段未成功添加")
|
||||
conn.close()
|
||||
return False
|
||||
|
||||
conn.close()
|
||||
return True
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"数据库操作出错: {str(e)}")
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.error(f"添加字段时出错: {str(e)}")
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
logger.add("db_modify.log", rotation="10 MB", level="INFO")
|
||||
if add_interested_field():
|
||||
logger.info("数据库修改完成")
|
||||
else:
|
||||
logger.error("数据库修改失败")
|
||||
23
check_db.py
Normal file
23
check_db.py
Normal file
@@ -0,0 +1,23 @@
|
||||
#!/usr/bin/env python3
|
||||
import sqlite3
|
||||
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect('tophub_data.db')
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 查看所有表
|
||||
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
|
||||
tables = cursor.fetchall()
|
||||
print('Tables:', tables)
|
||||
|
||||
# 查看表结构
|
||||
for table in tables:
|
||||
table_name = table[0]
|
||||
print(f'\nTable {table_name}:')
|
||||
cursor.execute(f'PRAGMA table_info({table_name});')
|
||||
columns = cursor.fetchall()
|
||||
for col in columns:
|
||||
print(col)
|
||||
|
||||
# 关闭连接
|
||||
conn.close()
|
||||
25
check_db_structure.py
Normal file
25
check_db_structure.py
Normal file
@@ -0,0 +1,25 @@
|
||||
#!/usr/bin/env python3
|
||||
import sqlite3
|
||||
|
||||
# 连接到数据库
|
||||
conn = sqlite3.connect('tophub_data.db')
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 获取所有表名
|
||||
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
|
||||
tables = cursor.fetchall()
|
||||
|
||||
print("数据库中的表:")
|
||||
for table in tables:
|
||||
print(f" - {table[0]}")
|
||||
|
||||
# 获取每个表的结构
|
||||
for table in tables:
|
||||
table_name = table[0]
|
||||
print(f"\n表 '{table_name}' 的结构:")
|
||||
cursor.execute(f"PRAGMA table_info({table_name});")
|
||||
columns = cursor.fetchall()
|
||||
for column in columns:
|
||||
print(f" {column[1]} ({column[2]})")
|
||||
|
||||
conn.close()
|
||||
50
check_interested_values.py
Normal file
50
check_interested_values.py
Normal file
@@ -0,0 +1,50 @@
|
||||
#!/usr/bin/env python3
|
||||
import sqlite3
|
||||
from loguru import logger
|
||||
|
||||
def check_interested_values():
|
||||
"""检查is_interested字段的值范围"""
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect('tophub_data.db')
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 查询is_interested字段的最小值、最大值和平均值
|
||||
cursor.execute("SELECT MIN(is_interested), MAX(is_interested), AVG(is_interested) FROM articles")
|
||||
result = cursor.fetchone()
|
||||
min_val, max_val, avg_val = result
|
||||
|
||||
logger.info(f"is_interested字段统计:")
|
||||
logger.info(f" 最小值: {min_val}")
|
||||
logger.info(f" 最大值: {max_val}")
|
||||
logger.info(f" 平均值: {avg_val:.2f}")
|
||||
|
||||
# 查询不同值的分布
|
||||
cursor.execute("SELECT is_interested, COUNT(*) FROM articles GROUP BY is_interested ORDER BY is_interested")
|
||||
distribution = cursor.fetchall()
|
||||
|
||||
logger.info("\nis_interested值分布:")
|
||||
for value, count in distribution:
|
||||
logger.info(f" {value}: {count} 条记录")
|
||||
|
||||
# 查询一些示例记录
|
||||
cursor.execute("SELECT id, title, is_interested FROM articles ORDER BY is_interested DESC LIMIT 5")
|
||||
examples = cursor.fetchall()
|
||||
|
||||
logger.info("\n示例记录:")
|
||||
for example in examples:
|
||||
logger.info(f" ID: {example[0]}, 标题: {example[1][:30]}..., is_interested: {example[2]}")
|
||||
|
||||
conn.close()
|
||||
return True
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"数据库操作出错: {str(e)}")
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.error(f"查询数据时出错: {str(e)}")
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
logger.add("check_interested_values.log", rotation="10 MB", level="INFO")
|
||||
check_interested_values()
|
||||
2596
db_modify.log
Normal file
2596
db_modify.log
Normal file
File diff suppressed because it is too large
Load Diff
220
db_modify.py
Normal file
220
db_modify.py
Normal file
@@ -0,0 +1,220 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
打开tophub_data.db数据库,读取表单,提取所有的类
|
||||
访问本地ollama的api,修改类的名称为2-4个字,去掉中间的空格、特殊字符等字符
|
||||
|
||||
"""
|
||||
|
||||
import requests
|
||||
import sqlite3
|
||||
import re
|
||||
import time
|
||||
from loguru import logger
|
||||
|
||||
# 配置日志
|
||||
logger.add("db_modify.log", rotation="10 MB", level="INFO")
|
||||
|
||||
class CategoryModifier:
|
||||
"""类别修改器,用于优化数据库中的类别名称"""
|
||||
|
||||
def __init__(self, db_path="tophub_data.db"):
|
||||
"""
|
||||
初始化类别修改器
|
||||
|
||||
Args:
|
||||
db_path (str): 数据库路径
|
||||
"""
|
||||
self.db_path = db_path
|
||||
self.ollama_url = "http://localhost:11434/api/generate"
|
||||
self.model = "qwen3:8b"
|
||||
|
||||
def get_all_categories(self):
|
||||
"""
|
||||
从数据库中获取所有唯一的类别
|
||||
|
||||
Returns:
|
||||
list: 包含所有唯一类别的列表
|
||||
"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("SELECT DISTINCT category FROM articles")
|
||||
categories = [row[0] for row in cursor.fetchall() if row[0]]
|
||||
|
||||
conn.close()
|
||||
logger.info(f"成功获取 {len(categories)} 个唯一类别")
|
||||
return categories
|
||||
except Exception as e:
|
||||
logger.error(f"获取类别时出错: {e}")
|
||||
return []
|
||||
|
||||
def clean_category_name(self, category):
|
||||
"""
|
||||
清理类别名称,移除特殊字符和多余空格
|
||||
|
||||
Args:
|
||||
category (str): 原始类别名称
|
||||
|
||||
Returns:
|
||||
str: 清理后的类别名称
|
||||
"""
|
||||
# 移除特殊字符,只保留中文、英文和数字
|
||||
cleaned = re.sub(r'[^\u4e00-\u9fa5a-zA-Z0-9]', '', category)
|
||||
# 移除多余的空格
|
||||
cleaned = re.sub(r'\s+', '', cleaned)
|
||||
return cleaned
|
||||
|
||||
def optimize_category_with_ollama(self, category):
|
||||
"""
|
||||
使用Ollama API优化类别名称
|
||||
|
||||
Args:
|
||||
category (str): 原始类别名称
|
||||
|
||||
Returns:
|
||||
str: 优化后的类别名称
|
||||
"""
|
||||
try:
|
||||
# 构造提示词
|
||||
prompt = f"请将以下类别名称简化为3-6个汉字,去除空格和特殊符号,更容易理解,并保持原意:'{category}'。" + \
|
||||
"例子一:'新科科技',优化为'新质生产力'。例子二:'产设',优化为'产品设计'。例子三:'史人',优化为'历史人物'。"
|
||||
|
||||
# 准备请求数据
|
||||
data = {
|
||||
"model": self.model,
|
||||
"prompt": prompt,
|
||||
"stream": False
|
||||
}
|
||||
|
||||
# 发送请求到Ollama API
|
||||
response = requests.post(self.ollama_url, json=data, timeout=30)
|
||||
response.raise_for_status()
|
||||
|
||||
# 解析响应
|
||||
result = response.json()
|
||||
optimized = result.get("response", "").strip()
|
||||
|
||||
# 清理优化后的名称
|
||||
optimized = self.clean_category_name(optimized)
|
||||
|
||||
logger.info(f"类别 '{category}' 优化为 '{optimized}'")
|
||||
return optimized
|
||||
except Exception as e:
|
||||
logger.error(f"优化类别 '{category}' 时出错: {e}")
|
||||
# 如果API调用失败,返回清理后的原始名称
|
||||
return self.clean_category_name(category)
|
||||
|
||||
def update_category_in_db(self, old_category, new_category):
|
||||
"""
|
||||
更新数据库中的类别名称
|
||||
|
||||
Args:
|
||||
old_category (str): 原始类别名称
|
||||
new_category (str): 新的类别名称
|
||||
"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute(
|
||||
"UPDATE articles SET category = ? WHERE category = ?",
|
||||
(new_category, old_category)
|
||||
)
|
||||
|
||||
count = cursor.rowcount
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
logger.info(f"成功更新类别 '{old_category}' 为 '{new_category}',影响 {count} 条记录")
|
||||
except Exception as e:
|
||||
logger.error(f"更新类别 '{old_category}' 时出错: {e}")
|
||||
|
||||
def process_all_categories(self):
|
||||
"""
|
||||
处理所有类别
|
||||
"""
|
||||
logger.info("开始处理所有类别...")
|
||||
|
||||
# 获取所有类别
|
||||
categories = self.get_all_categories()
|
||||
|
||||
if not categories:
|
||||
logger.warning("未找到任何类别")
|
||||
return
|
||||
|
||||
# 初始化进度统计
|
||||
total_categories = len(categories)
|
||||
processed_count = 0
|
||||
unchanged_count = 0
|
||||
updated_count = 0
|
||||
start_time = time.time()
|
||||
|
||||
logger.info(f"总共需要处理 {total_categories} 个类别")
|
||||
|
||||
# 处理每个类别
|
||||
for i, category in enumerate(categories, 1):
|
||||
category_start_time = time.time()
|
||||
logger.info(f"处理进度: {i}/{total_categories} ({i/total_categories*100:.1f}%) - 类别: {category}")
|
||||
|
||||
# 使用Ollama API优化类别名称
|
||||
optimized_category = self.optimize_category_with_ollama(category)
|
||||
|
||||
# 如果优化后的名称与原始名称不同,则更新数据库
|
||||
if optimized_category != category:
|
||||
self.update_category_in_db(category, optimized_category)
|
||||
updated_count += 1
|
||||
logger.info(f"类别 '{category}' 已更新为 '{optimized_category}'")
|
||||
else:
|
||||
unchanged_count += 1
|
||||
logger.info(f"类别 '{category}' 无需更改")
|
||||
|
||||
processed_count += 1
|
||||
category_end_time = time.time()
|
||||
category_duration = category_end_time - category_start_time
|
||||
|
||||
# 显示当前类别处理时间和平均处理时间
|
||||
elapsed_time = time.time() - start_time
|
||||
avg_time_per_category = elapsed_time / processed_count
|
||||
estimated_remaining = avg_time_per_category * (total_categories - processed_count)
|
||||
|
||||
logger.info(f"类别 '{category}' 处理完成,耗时: {category_duration:.2f}秒")
|
||||
logger.info(f"累计处理: {processed_count}/{total_categories} | "
|
||||
f"已更新: {updated_count} | 未更改: {unchanged_count} | "
|
||||
f"平均耗时: {avg_time_per_category:.2f}秒/类别 | "
|
||||
f"预计剩余时间: {estimated_remaining:.2f}秒")
|
||||
|
||||
# 显示总体统计信息
|
||||
total_duration = time.time() - start_time
|
||||
logger.info("="*60)
|
||||
logger.info("所有类别处理完成!")
|
||||
logger.info(f"总计处理类别数: {total_categories}")
|
||||
logger.info(f"更新类别数: {updated_count}")
|
||||
logger.info(f"未更改类别数: {unchanged_count}")
|
||||
logger.info(f"总耗时: {total_duration:.2f}秒")
|
||||
logger.info(f"平均每类别处理时间: {total_duration/total_categories:.2f}秒")
|
||||
logger.info("="*60)
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
modifier = CategoryModifier()
|
||||
|
||||
# 检查Ollama服务是否可用
|
||||
try:
|
||||
response = requests.get("http://localhost:11434/api/tags", timeout=5)
|
||||
if response.status_code == 200:
|
||||
logger.info("Ollama服务可用")
|
||||
else:
|
||||
logger.warning("Ollama服务不可用,请确保服务已启动")
|
||||
return
|
||||
except Exception as e:
|
||||
logger.warning(f"无法连接到Ollama服务: {e}")
|
||||
logger.info("请确保Ollama服务已在本地运行")
|
||||
return
|
||||
|
||||
# 处理所有类别
|
||||
modifier.process_all_categories()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
6
db_modify_score.log
Normal file
6
db_modify_score.log
Normal file
@@ -0,0 +1,6 @@
|
||||
2025-11-07 23:49:35.277 | INFO | __main__:modify_database_structure:44 - 正在添加score字段...
|
||||
2025-11-07 23:49:35.281 | INFO | __main__:modify_database_structure:48 - 正在转换is_interested数据到score字段...
|
||||
2025-11-07 23:49:35.288 | INFO | __main__:modify_database_structure:63 - 成功添加score字段并转换数据
|
||||
2025-11-07 23:49:35.289 | INFO | __main__:modify_database_structure:71 - 验证成功:score字段已添加到articles表
|
||||
2025-11-07 23:49:35.289 | INFO | __main__:modify_database_structure:84 - 数据转换结果: score=7的记录数: 1, score=5的记录数: 1196
|
||||
2025-11-07 23:49:35.290 | INFO | __main__:<module>:99 - 数据库结构修改完成
|
||||
1669
db_modify_zhipu.log
Normal file
1669
db_modify_zhipu.log
Normal file
File diff suppressed because it is too large
Load Diff
101
db_modify_zhipu.py
Normal file
101
db_modify_zhipu.py
Normal file
@@ -0,0 +1,101 @@
|
||||
# 调用智谱的api,修改每一个项目的分类
|
||||
# 从db文件读取表,读取第二个项,标题,根据标题,提交到api,获取回复,返回,并更新到db文件
|
||||
|
||||
import sqlite3
|
||||
import time
|
||||
from loguru import logger
|
||||
from zhipuai import ZhipuAI
|
||||
|
||||
# 配置日志
|
||||
logger.add("db_modify_zhipu.log", rotation="10 MB", level="INFO")
|
||||
|
||||
# 初始化客户端
|
||||
client = ZhipuAI(api_key="fad3d9f9a45f4d939f0e7a7133fa07bf.X4bOO053GAIPKLE5")
|
||||
|
||||
def get_simplified_category(title):
|
||||
"""
|
||||
调用智谱API获取简化的分类名称
|
||||
"""
|
||||
try:
|
||||
# 创建聊天完成请求
|
||||
response = client.chat.completions.create(
|
||||
model="glm-4-flash",
|
||||
messages=[
|
||||
{
|
||||
"role": "system",
|
||||
"content": "你是一个专业的分类助手。请根据文章标题,提供一个3-6个汉字的简化分类名称,去除空格和特殊符号,更容易理解,并保持原意。"
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": f"对以下文字内容进行分类,返回结果为类别,如\"社会新闻\",\"机器人\",\"金融\",\"历史\",\"购物\",\"新质生产力\"等等。目的:只返回2-6个汉字,不返回其它内容。内容:'{title}'"
|
||||
}
|
||||
],
|
||||
temperature=0.7
|
||||
)
|
||||
|
||||
# 提取回复内容
|
||||
category = response.choices[0].message.content.strip()
|
||||
logger.info(f"标题: {title[:30]}... -> 分类: {category}")
|
||||
return category
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"获取分类失败: {str(e)}")
|
||||
return None
|
||||
|
||||
def update_database_categories():
|
||||
"""
|
||||
更新数据库中的分类信息
|
||||
"""
|
||||
# 连接到数据库
|
||||
conn = sqlite3.connect('tophub_data.db')
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
# 获取所有记录
|
||||
cursor.execute("SELECT id, title, category FROM articles")
|
||||
records = cursor.fetchall()
|
||||
|
||||
logger.info(f"共找到 {len(records)} 条记录需要处理")
|
||||
|
||||
updated_count = 0
|
||||
failed_count = 0
|
||||
|
||||
# 处理每条记录
|
||||
for record in records:
|
||||
record_id, title, current_category = record
|
||||
|
||||
# 跳过已经简化的分类(长度<=6且不包含特殊字符)
|
||||
if current_category and len(current_category) <= 6 and not any(c in current_category for c in " ,.!?;:,。!?;:"):
|
||||
logger.info(f"跳过记录 {record_id},分类已简化: {current_category}")
|
||||
continue
|
||||
|
||||
logger.info(f"处理记录 {record_id}: {title[:30]}...")
|
||||
|
||||
# 获取新的分类
|
||||
new_category = get_simplified_category(title)
|
||||
|
||||
if new_category:
|
||||
# 更新数据库
|
||||
cursor.execute("UPDATE articles SET category = ? WHERE id = ?", (new_category, record_id))
|
||||
conn.commit()
|
||||
updated_count += 1
|
||||
logger.info(f"已更新记录 {record_id} 的分类为: {new_category}")
|
||||
else:
|
||||
failed_count += 1
|
||||
logger.error(f"无法获取记录 {record_id} 的新分类")
|
||||
|
||||
# 添加延迟,避免API调用过于频繁
|
||||
time.sleep(1)
|
||||
|
||||
logger.info(f"处理完成! 成功更新 {updated_count} 条记录,失败 {failed_count} 条记录")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"更新数据库时出错: {str(e)}")
|
||||
conn.rollback()
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
if __name__ == "__main__":
|
||||
logger.info("开始更新数据库分类...")
|
||||
update_database_categories()
|
||||
logger.info("程序执行完成")
|
||||
835
db_viewer.py
Normal file
835
db_viewer.py
Normal file
@@ -0,0 +1,835 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
TopHub数据查看器 - PySide5界面应用程序
|
||||
用于显示SQLite数据库中的TopHub抓取数据
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import sqlite3
|
||||
import webbrowser
|
||||
from datetime import datetime
|
||||
from loguru import logger
|
||||
from PySide6.QtWidgets import (
|
||||
QApplication, QMainWindow, QTableWidget, QTableWidgetItem, QVBoxLayout,
|
||||
QHBoxLayout, QWidget, QLabel, QLineEdit, QPushButton, QComboBox,
|
||||
QGroupBox, QStatusBar, QMenuBar, QMenu, QMessageBox, QHeaderView,
|
||||
QAbstractItemView, QDialog, QFormLayout, QTextEdit, QInputDialog
|
||||
)
|
||||
from PySide6.QtCore import Qt, QUrl, QTimer, QEvent
|
||||
from PySide6.QtGui import QAction, QFont, QIcon, QDesktopServices, QClipboard
|
||||
|
||||
|
||||
class DatabaseViewer(QMainWindow):
|
||||
"""主窗口类,用于显示数据库内容"""
|
||||
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
# 获取当前脚本所在目录的数据库文件路径
|
||||
script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
self.db_path = os.path.join(script_dir, "tophub_data.db")
|
||||
|
||||
# 检查数据库文件是否存在
|
||||
if not os.path.exists(self.db_path):
|
||||
QMessageBox.critical(self, "错误", f"数据库文件不存在: {self.db_path}")
|
||||
sys.exit(1)
|
||||
|
||||
self.init_ui()
|
||||
self.load_data()
|
||||
|
||||
def init_ui(self):
|
||||
"""初始化用户界面"""
|
||||
# 设置窗口属性
|
||||
self.setWindowTitle("TopHub数据查看器")
|
||||
self.setGeometry(100, 100, 1200, 800)
|
||||
|
||||
# 创建中央部件
|
||||
central_widget = QWidget()
|
||||
self.setCentralWidget(central_widget)
|
||||
|
||||
# 创建主布局
|
||||
main_layout = QVBoxLayout(central_widget)
|
||||
|
||||
# 创建搜索和筛选区域
|
||||
filter_group = QGroupBox("搜索和筛选")
|
||||
filter_layout = QHBoxLayout(filter_group)
|
||||
|
||||
# 搜索框
|
||||
self.search_edit = QLineEdit()
|
||||
self.search_edit.setPlaceholderText("输入搜索关键词...")
|
||||
self.search_edit.textChanged.connect(self.filter_data)
|
||||
filter_layout.addWidget(QLabel("搜索:"))
|
||||
filter_layout.addWidget(self.search_edit)
|
||||
|
||||
# 分类筛选
|
||||
self.category_combo = QComboBox()
|
||||
self.category_combo.addItem("全部分类")
|
||||
self.category_combo.currentTextChanged.connect(self.filter_data)
|
||||
filter_layout.addWidget(QLabel("分类:"))
|
||||
filter_layout.addWidget(self.category_combo)
|
||||
|
||||
# 刷新按钮
|
||||
self.refresh_button = QPushButton("刷新数据")
|
||||
self.refresh_button.clicked.connect(self.load_data)
|
||||
filter_layout.addWidget(self.refresh_button)
|
||||
|
||||
# 批量删除相关控件
|
||||
self.select_by_keyword_button = QPushButton("按关键字选中")
|
||||
self.select_by_keyword_button.clicked.connect(self.select_by_keyword)
|
||||
filter_layout.addWidget(self.select_by_keyword_button)
|
||||
|
||||
self.delete_selected_button = QPushButton("删除选中项")
|
||||
self.delete_selected_button.clicked.connect(self.delete_selected_items)
|
||||
filter_layout.addWidget(self.delete_selected_button)
|
||||
|
||||
# 标记感兴趣按钮
|
||||
self.mark_interested_button = QPushButton("标记为感兴趣")
|
||||
self.mark_interested_button.clicked.connect(self.mark_as_interested)
|
||||
filter_layout.addWidget(self.mark_interested_button)
|
||||
|
||||
# 添加筛选区域到主布局
|
||||
main_layout.addWidget(filter_group)
|
||||
|
||||
# 创建分类统计显示区域
|
||||
self.category_stats_group = QGroupBox("分类统计")
|
||||
self.category_stats_layout = QHBoxLayout(self.category_stats_group)
|
||||
self.category_stats_label = QLabel("暂无数据")
|
||||
self.category_stats_layout.addWidget(self.category_stats_label)
|
||||
main_layout.addWidget(self.category_stats_group)
|
||||
|
||||
# 创建表格
|
||||
self.table = QTableWidget()
|
||||
self.table.setColumnCount(6) # 保留6列,最后一列显示评分
|
||||
self.table.setHorizontalHeaderLabels(["ID", "标题", "链接", "分类", "来源日期", "评分"])
|
||||
|
||||
# 设置表格属性
|
||||
self.table.setAlternatingRowColors(True)
|
||||
self.table.setSelectionBehavior(QAbstractItemView.SelectRows)
|
||||
self.table.setEditTriggers(QAbstractItemView.NoEditTriggers)
|
||||
self.table.setSortingEnabled(True)
|
||||
|
||||
# 设置表格选择模式
|
||||
self.table.setSelectionMode(QAbstractItemView.SingleSelection)
|
||||
|
||||
# 设置列宽
|
||||
header = self.table.horizontalHeader()
|
||||
header.setSectionResizeMode(0, QHeaderView.ResizeToContents) # ID列
|
||||
header.setSectionResizeMode(1, QHeaderView.Stretch) # 文本内容列
|
||||
header.setSectionResizeMode(2, QHeaderView.ResizeToContents) # 链接列
|
||||
header.setSectionResizeMode(3, QHeaderView.ResizeToContents) # 分类列
|
||||
header.setSectionResizeMode(4, QHeaderView.ResizeToContents) # 时间列
|
||||
header.setSectionResizeMode(5, QHeaderView.ResizeToContents) # 评分列
|
||||
|
||||
# 启用链接点击
|
||||
self.table.cellClicked.connect(self.on_cell_clicked)
|
||||
|
||||
# 安装事件过滤器以处理链接点击
|
||||
self.table.viewport().installEventFilter(self)
|
||||
|
||||
# 启用右键菜单
|
||||
self.table.setContextMenuPolicy(Qt.CustomContextMenu)
|
||||
self.table.customContextMenuRequested.connect(self.show_context_menu)
|
||||
|
||||
# 添加表格到主布局
|
||||
main_layout.addWidget(self.table)
|
||||
|
||||
# 创建状态栏
|
||||
self.status_bar = QStatusBar()
|
||||
self.setStatusBar(self.status_bar)
|
||||
|
||||
# 创建菜单栏
|
||||
self.create_menu_bar()
|
||||
|
||||
def create_menu_bar(self):
|
||||
"""创建菜单栏"""
|
||||
menubar = self.menuBar()
|
||||
|
||||
# 文件菜单
|
||||
file_menu = menubar.addMenu("文件")
|
||||
|
||||
# 刷新动作
|
||||
refresh_action = QAction("刷新数据", self)
|
||||
refresh_action.setShortcut("F5")
|
||||
refresh_action.triggered.connect(self.load_data)
|
||||
file_menu.addAction(refresh_action)
|
||||
|
||||
# 退出动作
|
||||
exit_action = QAction("退出", self)
|
||||
exit_action.setShortcut("Ctrl+Q")
|
||||
exit_action.triggered.connect(self.close)
|
||||
file_menu.addAction(exit_action)
|
||||
|
||||
# 帮助菜单
|
||||
help_menu = menubar.addMenu("帮助")
|
||||
|
||||
# 关于动作
|
||||
about_action = QAction("关于", self)
|
||||
about_action.triggered.connect(self.show_about)
|
||||
help_menu.addAction(about_action)
|
||||
|
||||
def load_data(self):
|
||||
"""从数据库加载数据"""
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 检查表是否存在
|
||||
cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND name='articles'")
|
||||
if not cursor.fetchone():
|
||||
QMessageBox.critical(self, "错误", "数据库中不存在articles表")
|
||||
conn.close()
|
||||
return
|
||||
|
||||
# 查询数据 - 修改为查询score字段而不是is_interested
|
||||
cursor.execute('''
|
||||
SELECT id, title, url, category, source_date, score
|
||||
FROM articles
|
||||
ORDER BY id DESC
|
||||
''')
|
||||
|
||||
rows = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
# 更新表格
|
||||
self.table.setRowCount(len(rows))
|
||||
|
||||
# 获取所有分类和统计信息
|
||||
categories = set()
|
||||
category_counts = {} # 用于存储每个分类的数量
|
||||
|
||||
for row_idx, row in enumerate(rows):
|
||||
id_val, title, url, category, source_date, score = row
|
||||
|
||||
# 添加到分类集合和统计字典
|
||||
if category:
|
||||
categories.add(category)
|
||||
category_counts[category] = category_counts.get(category, 0) + 1
|
||||
else:
|
||||
# 处理空分类的情况
|
||||
category_counts["未分类"] = category_counts.get("未分类", 0) + 1
|
||||
|
||||
# 设置表格项
|
||||
self.table.setItem(row_idx, 0, QTableWidgetItem(str(id_val)))
|
||||
self.table.setItem(row_idx, 1, QTableWidgetItem(title))
|
||||
|
||||
# 链接项 - 设置为蓝色并加下划线
|
||||
link_item = QTableWidgetItem(url if url else "")
|
||||
if url:
|
||||
link_item.setForeground(Qt.blue)
|
||||
link_item.setFont(QFont("", -1, QFont.Bold))
|
||||
self.table.setItem(row_idx, 2, link_item)
|
||||
|
||||
self.table.setItem(row_idx, 3, QTableWidgetItem(category if category else "未分类"))
|
||||
self.table.setItem(row_idx, 4, QTableWidgetItem(source_date))
|
||||
|
||||
# 感兴趣状态项
|
||||
score_item = QTableWidgetItem(str(score))
|
||||
# 根据分数设置颜色
|
||||
if score >= 8:
|
||||
score_item.setForeground(Qt.green)
|
||||
score_item.setFont(QFont("", -1, QFont.Bold))
|
||||
elif score >= 6:
|
||||
score_item.setForeground(Qt.blue)
|
||||
elif score <= 3:
|
||||
score_item.setForeground(Qt.red)
|
||||
self.table.setItem(row_idx, 5, score_item)
|
||||
|
||||
# 更新分类下拉框
|
||||
current_category = self.category_combo.currentText()
|
||||
self.category_combo.clear()
|
||||
self.category_combo.addItem("全部分类")
|
||||
for cat in sorted(categories):
|
||||
self.category_combo.addItem(cat)
|
||||
|
||||
# 恢复之前选择的分类
|
||||
index = self.category_combo.findText(current_category)
|
||||
if index >= 0:
|
||||
self.category_combo.setCurrentIndex(index)
|
||||
|
||||
# 更新分类统计显示
|
||||
self.update_category_stats(category_counts)
|
||||
|
||||
# 更新状态栏
|
||||
self.status_bar.showMessage(f"已加载 {len(rows)} 条记录")
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"数据库操作出错: {str(e)}")
|
||||
QMessageBox.critical(self, "数据库错误", f"数据库操作出错: {str(e)}")
|
||||
self.status_bar.showMessage("加载数据失败")
|
||||
except Exception as e:
|
||||
logger.error(f"加载数据时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "错误", f"加载数据时出错: {str(e)}")
|
||||
self.status_bar.showMessage("加载数据失败")
|
||||
|
||||
def update_category_stats(self, category_counts):
|
||||
"""更新分类统计显示"""
|
||||
if not category_counts:
|
||||
self.category_stats_label.setText("暂无数据")
|
||||
return
|
||||
|
||||
# 按数量降序排列分类
|
||||
sorted_categories = sorted(category_counts.items(), key=lambda x: x[1], reverse=True)
|
||||
|
||||
# 构建统计信息文本
|
||||
stats_text = " | ".join([f"{category}: {count}" for category, count in sorted_categories])
|
||||
|
||||
# 如果文本过长,进行截断并添加提示
|
||||
if len(stats_text) > 200:
|
||||
stats_text = stats_text[:200] + "... (更多分类请查看完整数据)"
|
||||
|
||||
self.category_stats_label.setText(stats_text)
|
||||
self.category_stats_label.setToolTip(" | ".join([f"{category}: {count}" for category, count in sorted_categories]))
|
||||
|
||||
def update_category_stats_after_filter(self):
|
||||
"""在筛选后更新分类统计显示"""
|
||||
# 统计可见行的分类
|
||||
category_counts = {}
|
||||
|
||||
for row in range(self.table.rowCount()):
|
||||
# 跳过隐藏的行
|
||||
if self.table.isRowHidden(row):
|
||||
continue
|
||||
|
||||
# 获取分类项
|
||||
category_item = self.table.item(row, 3)
|
||||
if category_item:
|
||||
category = category_item.text()
|
||||
category_counts[category] = category_counts.get(category, 0) + 1
|
||||
else:
|
||||
category_counts["未分类"] = category_counts.get("未分类", 0) + 1
|
||||
|
||||
# 更新分类统计显示
|
||||
self.update_category_stats(category_counts)
|
||||
|
||||
def filter_data(self):
|
||||
"""根据搜索条件和分类筛选数据"""
|
||||
search_text = self.search_edit.text().lower()
|
||||
selected_category = self.category_combo.currentText()
|
||||
|
||||
# 遍历所有行
|
||||
for row in range(self.table.rowCount()):
|
||||
show_row = True
|
||||
|
||||
# 检查搜索条件
|
||||
if search_text:
|
||||
text_match = False
|
||||
for col in range(1, 6): # 检查标题、链接、分类、日期、感兴趣列
|
||||
item = self.table.item(row, col)
|
||||
if item and search_text in item.text().lower():
|
||||
text_match = True
|
||||
break
|
||||
show_row = show_row and text_match
|
||||
|
||||
# 检查分类条件
|
||||
if selected_category != "全部分类":
|
||||
category_item = self.table.item(row, 3)
|
||||
category_match = category_item and category_item.text() == selected_category
|
||||
show_row = show_row and category_match
|
||||
|
||||
# 显示或隐藏行
|
||||
self.table.setRowHidden(row, not show_row)
|
||||
|
||||
# 计算可见行数
|
||||
visible_count = sum(1 for row in range(self.table.rowCount())
|
||||
if not self.table.isRowHidden(row))
|
||||
self.status_bar.showMessage(f"显示 {visible_count}/{self.table.rowCount()} 条记录")
|
||||
|
||||
# 重新计算并显示分类统计
|
||||
self.update_category_stats_after_filter()
|
||||
|
||||
def eventFilter(self, obj, event):
|
||||
"""事件过滤器,用于处理链接点击而不触发行选择"""
|
||||
if obj == self.table.viewport() and event.type() == QEvent.MouseButtonPress:
|
||||
# 获取点击位置
|
||||
pos = event.position()
|
||||
# 获取点击位置的行和列
|
||||
row = self.table.rowAt(int(pos.y()))
|
||||
column = self.table.columnAt(int(pos.x()))
|
||||
|
||||
# 如果点击的是链接列(第2列,索引为2)
|
||||
if column == 2 and row >= 0:
|
||||
item = self.table.item(row, column)
|
||||
if item and item.text() and item.text().startswith("http"):
|
||||
# 直接打开链接
|
||||
webbrowser.open(item.text())
|
||||
# 返回True表示事件已处理,不再传递给原始处理器
|
||||
# 这样就不会触发行选择,避免鼠标跳动
|
||||
return True
|
||||
|
||||
# 其他事件交给原始处理器处理
|
||||
return super().eventFilter(obj, event)
|
||||
|
||||
def on_cell_clicked(self, row, column):
|
||||
"""处理单元格点击事件"""
|
||||
# 链接列的点击已经由eventFilter处理,这里不再处理
|
||||
# 只处理非链接列的点击,保持原有选择行为
|
||||
if column != 2:
|
||||
# 可以在这里添加其他列的点击处理逻辑
|
||||
pass
|
||||
|
||||
def show_context_menu(self, position):
|
||||
"""显示右键菜单"""
|
||||
# 获取点击位置的行
|
||||
row = self.table.rowAt(position.y())
|
||||
if row < 0:
|
||||
return
|
||||
|
||||
# 选中该行
|
||||
self.table.selectRow(row)
|
||||
|
||||
# 创建右键菜单
|
||||
menu = QMenu(self)
|
||||
|
||||
# 添加"增加评分(+1)"动作
|
||||
increase_score_action = QAction("增加评分(+1)", self)
|
||||
increase_score_action.triggered.connect(self.increase_score)
|
||||
menu.addAction(increase_score_action)
|
||||
|
||||
# 添加"减少评分(-1)"动作
|
||||
decrease_score_action = QAction("减少评分(-1)", self)
|
||||
decrease_score_action.triggered.connect(self.decrease_score)
|
||||
menu.addAction(decrease_score_action)
|
||||
|
||||
# 添加分隔线
|
||||
menu.addSeparator()
|
||||
|
||||
# 添加"复制信息"动作
|
||||
copy_info_action = QAction("复制信息", self)
|
||||
copy_info_action.triggered.connect(self.copy_info)
|
||||
menu.addAction(copy_info_action)
|
||||
|
||||
# 添加分隔线
|
||||
menu.addSeparator()
|
||||
|
||||
# 添加"删除"动作
|
||||
delete_action = QAction("删除选中项", self)
|
||||
delete_action.triggered.connect(self.delete_selected_items)
|
||||
menu.addAction(delete_action)
|
||||
|
||||
# 显示菜单
|
||||
menu.exec_(self.table.mapToGlobal(position))
|
||||
|
||||
def copy_info(self):
|
||||
"""复制选中行的标题、链接、日期等信息"""
|
||||
# 获取选中的行
|
||||
selected_rows = set()
|
||||
for item in self.table.selectedItems():
|
||||
selected_rows.add(item.row())
|
||||
|
||||
# 如果没有选中的行,直接返回
|
||||
if not selected_rows:
|
||||
QMessageBox.information(self, "提示", "请先选中要复制信息的行")
|
||||
return
|
||||
|
||||
# 收集所有选中行的信息
|
||||
all_info = []
|
||||
for row in sorted(selected_rows):
|
||||
# 获取标题、链接、日期
|
||||
title_item = self.table.item(row, 1)
|
||||
url_item = self.table.item(row, 2)
|
||||
date_item = self.table.item(row, 4)
|
||||
|
||||
title = title_item.text() if title_item else ""
|
||||
url = url_item.text() if url_item else ""
|
||||
date = date_item.text() if date_item else ""
|
||||
|
||||
# 用空格组合信息
|
||||
info = f"{title} {url} {date}".strip()
|
||||
all_info.append(info)
|
||||
|
||||
# 将所有信息用换行符连接
|
||||
clipboard_text = "\n".join(all_info)
|
||||
|
||||
# 复制到剪贴板
|
||||
clipboard = QApplication.clipboard()
|
||||
clipboard.setText(clipboard_text)
|
||||
|
||||
# 更新状态栏
|
||||
self.status_bar.showMessage(f"已复制 {len(selected_rows)} 行信息到剪贴板")
|
||||
|
||||
def show_about(self):
|
||||
"""显示关于对话框"""
|
||||
about_text = """
|
||||
<h3>TopHub数据查看器</h3>
|
||||
<p>版本: 1.0</p>
|
||||
<p>用于查看TopHub网站抓取数据的PySide5应用程序</p>
|
||||
<p>功能特性:</p>
|
||||
<ul>
|
||||
<li>显示SQLite数据库中的抓取数据</li>
|
||||
<li>支持点击链接在浏览器中打开</li>
|
||||
<li>支持搜索和分类筛选</li>
|
||||
<li>支持排序功能</li>
|
||||
<li>支持标记感兴趣的项目</li>
|
||||
</ul>
|
||||
"""
|
||||
QMessageBox.about(self, "关于", about_text)
|
||||
|
||||
def select_by_keyword(self):
|
||||
"""按关键字选中行"""
|
||||
# 弹出输入对话框获取关键字
|
||||
keyword, ok = QInputDialog.getText(self, "按关键字选中", "请输入关键字:")
|
||||
|
||||
if not ok or not keyword:
|
||||
return
|
||||
|
||||
keyword = keyword.lower()
|
||||
selected_count = 0
|
||||
|
||||
# 遍历所有可见行
|
||||
for row in range(self.table.rowCount()):
|
||||
# 跳过隐藏的行
|
||||
if self.table.isRowHidden(row):
|
||||
continue
|
||||
|
||||
# 检查该行是否包含关键字
|
||||
match = False
|
||||
for col in range(self.table.columnCount()):
|
||||
item = self.table.item(row, col)
|
||||
if item and keyword in item.text().lower():
|
||||
match = True
|
||||
break
|
||||
|
||||
# 如果匹配,则选中该行
|
||||
if match:
|
||||
self.table.selectRow(row)
|
||||
selected_count += 1
|
||||
|
||||
# 更新状态栏
|
||||
self.status_bar.showMessage(f"已选中 {selected_count} 行")
|
||||
|
||||
def delete_selected_items(self):
|
||||
"""删除选中的项目"""
|
||||
# 获取选中的行
|
||||
selected_rows = set()
|
||||
for item in self.table.selectedItems():
|
||||
selected_rows.add(item.row())
|
||||
|
||||
# 如果没有选中的行,直接返回
|
||||
if not selected_rows:
|
||||
QMessageBox.information(self, "提示", "请先选中要删除的行")
|
||||
return
|
||||
|
||||
# 弹出确认对话框
|
||||
reply = QMessageBox.question(
|
||||
self,
|
||||
"确认删除",
|
||||
f"确定要删除选中的 {len(selected_rows)} 行数据吗?此操作不可撤销!",
|
||||
QMessageBox.Yes | QMessageBox.No,
|
||||
QMessageBox.No
|
||||
)
|
||||
|
||||
if reply == QMessageBox.No:
|
||||
return
|
||||
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 删除选中的行
|
||||
deleted_count = 0
|
||||
for row in sorted(selected_rows, reverse=True): # 从后往前删除,避免索引变化
|
||||
# 获取ID
|
||||
id_item = self.table.item(row, 0)
|
||||
if id_item:
|
||||
article_id = id_item.text()
|
||||
# 从数据库中删除
|
||||
cursor.execute("DELETE FROM articles WHERE id = ?", (article_id,))
|
||||
# 从表格中移除行
|
||||
self.table.removeRow(row)
|
||||
deleted_count += 1
|
||||
|
||||
# 提交更改
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
# 更新状态栏
|
||||
self.status_bar.showMessage(f"已删除 {deleted_count} 行数据")
|
||||
|
||||
# 重新加载数据以更新分类统计
|
||||
self.load_data()
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"删除数据时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "数据库错误", f"删除数据时出错: {str(e)}")
|
||||
self.status_bar.showMessage("删除失败")
|
||||
except Exception as e:
|
||||
logger.error(f"删除数据时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "错误", f"删除数据时出错: {str(e)}")
|
||||
self.status_bar.showMessage("删除失败")
|
||||
|
||||
def mark_as_interested(self):
|
||||
"""将选中的项目标记为感兴趣"""
|
||||
# 获取选中的行
|
||||
selected_rows = set()
|
||||
for item in self.table.selectedItems():
|
||||
selected_rows.add(item.row())
|
||||
|
||||
# 如果没有选中的行,直接返回
|
||||
if not selected_rows:
|
||||
QMessageBox.information(self, "提示", "请先选中要标记的行")
|
||||
return
|
||||
|
||||
# 弹出确认对话框
|
||||
reply = QMessageBox.question(
|
||||
self,
|
||||
"确认标记",
|
||||
f"确定要将选中的 {len(selected_rows)} 行标记为感兴趣吗?",
|
||||
QMessageBox.Yes | QMessageBox.No,
|
||||
QMessageBox.Yes
|
||||
)
|
||||
|
||||
if reply == QMessageBox.No:
|
||||
return
|
||||
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 更新选中的行
|
||||
updated_count = 0
|
||||
for row in selected_rows:
|
||||
# 获取ID
|
||||
id_item = self.table.item(row, 0)
|
||||
if id_item:
|
||||
article_id = id_item.text()
|
||||
# 更新数据库中的is_interested字段
|
||||
cursor.execute("UPDATE articles SET is_interested = 1 WHERE id = ?", (article_id,))
|
||||
|
||||
# 更新表格中的显示
|
||||
interested_item = QTableWidgetItem("是")
|
||||
interested_item.setForeground(Qt.green)
|
||||
interested_item.setFont(QFont("", -1, QFont.Bold))
|
||||
self.table.setItem(row, 5, interested_item)
|
||||
|
||||
updated_count += 1
|
||||
|
||||
# 提交更改
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
# 更新状态栏
|
||||
self.status_bar.showMessage(f"已标记 {updated_count} 行为感兴趣")
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"标记数据时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "数据库错误", f"标记数据时出错: {str(e)}")
|
||||
self.status_bar.showMessage("标记失败")
|
||||
except Exception as e:
|
||||
logger.error(f"标记数据时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "错误", f"标记数据时出错: {str(e)}")
|
||||
self.status_bar.showMessage("标记失败")
|
||||
|
||||
|
||||
def mark_as_not_interested(self):
|
||||
"""将选中的项目标记为不感兴趣"""
|
||||
# 获取选中的行
|
||||
selected_rows = set()
|
||||
for item in self.table.selectedItems():
|
||||
selected_rows.add(item.row())
|
||||
|
||||
# 如果没有选中的行,直接返回
|
||||
if not selected_rows:
|
||||
QMessageBox.information(self, "提示", "请先选中要标记的行")
|
||||
return
|
||||
|
||||
# 弹出确认对话框
|
||||
reply = QMessageBox.question(
|
||||
self,
|
||||
"确认标记",
|
||||
f"确定要将选中的 {len(selected_rows)} 行标记为不感兴趣吗?",
|
||||
QMessageBox.Yes | QMessageBox.No,
|
||||
QMessageBox.Yes
|
||||
)
|
||||
|
||||
if reply == QMessageBox.No:
|
||||
return
|
||||
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 更新选中的行
|
||||
updated_count = 0
|
||||
for row in selected_rows:
|
||||
# 获取ID
|
||||
id_item = self.table.item(row, 0)
|
||||
if id_item:
|
||||
article_id = id_item.text()
|
||||
# 更新数据库中的is_interested字段
|
||||
cursor.execute("UPDATE articles SET is_interested = 0 WHERE id = ?", (article_id,))
|
||||
|
||||
# 更新表格中的显示
|
||||
interested_item = QTableWidgetItem("否")
|
||||
# 不感兴趣项使用普通字体和颜色
|
||||
self.table.setItem(row, 5, interested_item)
|
||||
|
||||
updated_count += 1
|
||||
|
||||
# 提交更改
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
# 更新状态栏
|
||||
self.status_bar.showMessage(f"已标记 {updated_count} 行为不感兴趣")
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"标记数据时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "数据库错误", f"标记数据时出错: {str(e)}")
|
||||
self.status_bar.showMessage("标记失败")
|
||||
except Exception as e:
|
||||
logger.error(f"标记数据时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "错误", f"标记数据时出错: {str(e)}")
|
||||
self.status_bar.showMessage("标记失败")
|
||||
|
||||
|
||||
def increase_score(self):
|
||||
"""增加选中项目的评分(+1)"""
|
||||
# 获取选中的行
|
||||
selected_rows = set()
|
||||
for item in self.table.selectedItems():
|
||||
selected_rows.add(item.row())
|
||||
|
||||
# 如果没有选中的行,直接返回
|
||||
if not selected_rows:
|
||||
QMessageBox.information(self, "提示", "请先选中要增加评分的行")
|
||||
return
|
||||
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 更新选中的行
|
||||
updated_count = 0
|
||||
for row in selected_rows:
|
||||
# 获取ID
|
||||
id_item = self.table.item(row, 0)
|
||||
if id_item:
|
||||
article_id = id_item.text()
|
||||
# 获取当前分数
|
||||
cursor.execute("SELECT score FROM articles WHERE id = ?", (article_id,))
|
||||
result = cursor.fetchone()
|
||||
if result:
|
||||
current_score = result[0]
|
||||
# 增加分数,但不超过10
|
||||
new_score = min(current_score + 1, 10)
|
||||
# 更新数据库中的score字段
|
||||
cursor.execute("UPDATE articles SET score = ? WHERE id = ?", (new_score, article_id))
|
||||
|
||||
# 更新表格中的显示
|
||||
score_item = QTableWidgetItem(str(new_score))
|
||||
# 根据分数设置颜色
|
||||
if new_score >= 8:
|
||||
score_item.setForeground(Qt.green)
|
||||
score_item.setFont(QFont("", -1, QFont.Bold))
|
||||
elif new_score >= 6:
|
||||
score_item.setForeground(Qt.blue)
|
||||
elif new_score <= 3:
|
||||
score_item.setForeground(Qt.red)
|
||||
self.table.setItem(row, 5, score_item)
|
||||
|
||||
updated_count += 1
|
||||
|
||||
# 提交更改
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
# 更新状态栏
|
||||
self.status_bar.showMessage(f"已增加 {updated_count} 行的评分")
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"增加评分时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "数据库错误", f"增加评分时出错: {str(e)}")
|
||||
self.status_bar.showMessage("增加评分失败")
|
||||
except Exception as e:
|
||||
logger.error(f"增加评分时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "错误", f"增加评分时出错: {str(e)}")
|
||||
self.status_bar.showMessage("增加评分失败")
|
||||
|
||||
def decrease_score(self):
|
||||
"""减少选中项目的评分(-1)"""
|
||||
# 获取选中的行
|
||||
selected_rows = set()
|
||||
for item in self.table.selectedItems():
|
||||
selected_rows.add(item.row())
|
||||
|
||||
# 如果没有选中的行,直接返回
|
||||
if not selected_rows:
|
||||
QMessageBox.information(self, "提示", "请先选中要减少评分的行")
|
||||
return
|
||||
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 更新选中的行
|
||||
updated_count = 0
|
||||
for row in selected_rows:
|
||||
# 获取ID
|
||||
id_item = self.table.item(row, 0)
|
||||
if id_item:
|
||||
article_id = id_item.text()
|
||||
# 获取当前分数
|
||||
cursor.execute("SELECT score FROM articles WHERE id = ?", (article_id,))
|
||||
result = cursor.fetchone()
|
||||
if result:
|
||||
current_score = result[0]
|
||||
# 减少分数,但不低于0
|
||||
new_score = max(current_score - 1, 0)
|
||||
# 更新数据库中的score字段
|
||||
cursor.execute("UPDATE articles SET score = ? WHERE id = ?", (new_score, article_id))
|
||||
|
||||
# 更新表格中的显示
|
||||
score_item = QTableWidgetItem(str(new_score))
|
||||
# 根据分数设置颜色
|
||||
if new_score >= 8:
|
||||
score_item.setForeground(Qt.green)
|
||||
score_item.setFont(QFont("", -1, QFont.Bold))
|
||||
elif new_score >= 6:
|
||||
score_item.setForeground(Qt.blue)
|
||||
elif new_score <= 3:
|
||||
score_item.setForeground(Qt.red)
|
||||
self.table.setItem(row, 5, score_item)
|
||||
|
||||
updated_count += 1
|
||||
|
||||
# 提交更改
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
# 更新状态栏
|
||||
self.status_bar.showMessage(f"已减少 {updated_count} 行的评分")
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"减少评分时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "数据库错误", f"减少评分时出错: {str(e)}")
|
||||
self.status_bar.showMessage("减少评分失败")
|
||||
except Exception as e:
|
||||
logger.error(f"减少评分时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "错误", f"减少评分时出错: {str(e)}")
|
||||
self.status_bar.showMessage("减少评分失败")
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
app = QApplication(sys.argv)
|
||||
|
||||
# 设置应用程序属性
|
||||
app.setApplicationName("TopHub数据查看器")
|
||||
app.setOrganizationName("TopHub")
|
||||
|
||||
# 创建并显示主窗口
|
||||
viewer = DatabaseViewer()
|
||||
viewer.show()
|
||||
|
||||
# 运行应用程序
|
||||
sys.exit(app.exec())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
55
fix_db_viewer.py
Normal file
55
fix_db_viewer.py
Normal file
@@ -0,0 +1,55 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
修复db_viewer.py文件中的方法位置问题
|
||||
将increase_score和decrease_score方法从文件末尾移动到DatabaseViewer类内部
|
||||
"""
|
||||
|
||||
import re
|
||||
|
||||
def fix_db_viewer():
|
||||
"""修复db_viewer.py文件"""
|
||||
try:
|
||||
# 读取原始文件
|
||||
with open('db_viewer.py', 'r', encoding='utf-8') as f:
|
||||
content = f.read()
|
||||
|
||||
# 找到increase_score和decrease_score方法
|
||||
increase_score_match = re.search(r'\n\s*def increase_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', content, re.DOTALL)
|
||||
decrease_score_match = re.search(r'\n\s*def decrease_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', content, re.DOTALL)
|
||||
|
||||
if not increase_score_match or not decrease_score_match:
|
||||
print("未找到increase_score或decrease_score方法")
|
||||
return False
|
||||
|
||||
# 提取方法内容
|
||||
increase_score_method = increase_score_match.group(0)
|
||||
decrease_score_method = decrease_score_match.group(0)
|
||||
|
||||
# 从文件末尾移除这两个方法
|
||||
content = re.sub(r'\n\s*def increase_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', '', content, flags=re.DOTALL)
|
||||
content = re.sub(r'\n\s*def decrease_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', '', content, flags=re.DOTALL)
|
||||
|
||||
# 找到mark_as_not_interested方法的结束位置,在其后插入新方法
|
||||
mark_as_not_interested_match = re.search(r'(\n\s*def mark_as_not_interested\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z))', content, re.DOTALL)
|
||||
|
||||
if not mark_as_not_interested_match:
|
||||
print("未找到mark_as_not_interested方法")
|
||||
return False
|
||||
|
||||
# 在mark_as_not_interested方法后插入新方法
|
||||
insertion_point = mark_as_not_interested_match.end(1)
|
||||
new_content = content[:insertion_point] + increase_score_method + decrease_score_method + content[insertion_point:]
|
||||
|
||||
# 写入修复后的文件
|
||||
with open('db_viewer.py', 'w', encoding='utf-8') as f:
|
||||
f.write(new_content)
|
||||
|
||||
print("成功修复db_viewer.py文件")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"修复文件时出错: {str(e)}")
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
fix_db_viewer()
|
||||
2
gui_test.log
Normal file
2
gui_test.log
Normal file
@@ -0,0 +1,2 @@
|
||||
2025-11-07 23:39:42.157 | INFO | __main__:<module>:42 - 开始GUI测试
|
||||
2025-11-07 23:39:47.875 | INFO | __main__:close_app:30 - 测试完成,关闭应用程序
|
||||
101
modify_db_to_score.py
Normal file
101
modify_db_to_score.py
Normal file
@@ -0,0 +1,101 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
修改数据库结构脚本
|
||||
将is_interested字段改为score字段,实现10分评分制度
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import os
|
||||
from loguru import logger
|
||||
|
||||
def modify_database_structure():
|
||||
"""修改数据库结构,将is_interested字段改为score字段"""
|
||||
# 获取当前脚本所在目录的数据库文件路径
|
||||
script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
db_path = os.path.join(script_dir, "tophub_data.db")
|
||||
|
||||
# 检查数据库文件是否存在
|
||||
if not os.path.exists(db_path):
|
||||
logger.error(f"数据库文件不存在: {db_path}")
|
||||
return False
|
||||
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect(db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 检查is_interested字段是否存在
|
||||
cursor.execute("PRAGMA table_info(articles)")
|
||||
columns = cursor.fetchall()
|
||||
column_names = [column[1] for column in columns]
|
||||
|
||||
if "is_interested" not in column_names:
|
||||
logger.info("is_interested字段不存在,无需修改")
|
||||
conn.close()
|
||||
return True
|
||||
|
||||
# 检查score字段是否已存在
|
||||
if "score" in column_names:
|
||||
logger.info("score字段已存在,无需添加")
|
||||
conn.close()
|
||||
return True
|
||||
|
||||
# 添加score字段,默认值为5
|
||||
logger.info("正在添加score字段...")
|
||||
cursor.execute("ALTER TABLE articles ADD COLUMN score INTEGER DEFAULT 5")
|
||||
|
||||
# 将is_interested的值转换为score
|
||||
logger.info("正在转换is_interested数据到score字段...")
|
||||
|
||||
# 获取所有记录
|
||||
cursor.execute("SELECT id, is_interested FROM articles")
|
||||
records = cursor.fetchall()
|
||||
|
||||
# 转换数据
|
||||
for record in records:
|
||||
article_id, is_interested = record
|
||||
# 转换逻辑:is_interested=1转为score=7,is_interested=0转为score=5
|
||||
score = 7 if is_interested == 1 else 5
|
||||
cursor.execute("UPDATE articles SET score = ? WHERE id = ?", (score, article_id))
|
||||
|
||||
# 提交更改
|
||||
conn.commit()
|
||||
logger.info("成功添加score字段并转换数据")
|
||||
|
||||
# 验证字段是否添加成功
|
||||
cursor.execute("PRAGMA table_info(articles)")
|
||||
columns = cursor.fetchall()
|
||||
column_names = [column[1] for column in columns]
|
||||
|
||||
if "score" in column_names:
|
||||
logger.info("验证成功:score字段已添加到articles表")
|
||||
else:
|
||||
logger.error("验证失败:score字段未成功添加")
|
||||
conn.close()
|
||||
return False
|
||||
|
||||
# 检查数据转换结果
|
||||
cursor.execute("SELECT COUNT(*) FROM articles WHERE score = 7")
|
||||
count_7 = cursor.fetchone()[0]
|
||||
|
||||
cursor.execute("SELECT COUNT(*) FROM articles WHERE score = 5")
|
||||
count_5 = cursor.fetchone()[0]
|
||||
|
||||
logger.info(f"数据转换结果: score=7的记录数: {count_7}, score=5的记录数: {count_5}")
|
||||
|
||||
conn.close()
|
||||
return True
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"数据库操作出错: {str(e)}")
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.error(f"修改数据库结构时出错: {str(e)}")
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
logger.add("db_modify_score.log", rotation="10 MB", level="INFO")
|
||||
if modify_database_structure():
|
||||
logger.info("数据库结构修改完成")
|
||||
else:
|
||||
logger.error("数据库结构修改失败")
|
||||
79
ollama_model_viewer.py
Normal file
79
ollama_model_viewer.py
Normal file
@@ -0,0 +1,79 @@
|
||||
import sys
|
||||
import requests
|
||||
import json
|
||||
from PySide6.QtWidgets import QApplication, QMainWindow, QListWidget, QVBoxLayout, QWidget, QLabel, QPushButton
|
||||
from PySide6.QtCore import Qt
|
||||
from loguru import logger
|
||||
|
||||
class OllamaModelViewer(QMainWindow):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.setWindowTitle("Ollama 模型查看器")
|
||||
self.setGeometry(100, 100, 600, 400)
|
||||
|
||||
# 创建主窗口部件
|
||||
self.central_widget = QWidget()
|
||||
self.setCentralWidget(self.central_widget)
|
||||
|
||||
# 创建布局
|
||||
self.layout = QVBoxLayout()
|
||||
self.central_widget.setLayout(self.layout)
|
||||
|
||||
# 创建标题标签
|
||||
self.title_label = QLabel("当前安装的Ollama模型:")
|
||||
self.title_label.setStyleSheet("font-weight: bold; font-size: 14px;")
|
||||
self.layout.addWidget(self.title_label)
|
||||
|
||||
# 创建列表部件
|
||||
self.model_list = QListWidget()
|
||||
self.model_list.setStyleSheet("font-family: monospace;")
|
||||
self.layout.addWidget(self.model_list)
|
||||
|
||||
# 创建刷新按钮
|
||||
self.refresh_button = QPushButton("刷新模型列表")
|
||||
self.refresh_button.clicked.connect(self.fetch_models)
|
||||
self.layout.addWidget(self.refresh_button)
|
||||
|
||||
# 初始加载模型
|
||||
self.fetch_models()
|
||||
|
||||
def fetch_models(self):
|
||||
"""从Ollama API获取模型列表"""
|
||||
self.model_list.clear()
|
||||
|
||||
try:
|
||||
logger.info("正在获取Ollama模型列表...")
|
||||
response = requests.get("http://localhost:11434/api/tags", timeout=5)
|
||||
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
models = data.get("models", [])
|
||||
|
||||
if models:
|
||||
for model in models:
|
||||
model_name = model.get("model", "")
|
||||
if model_name:
|
||||
self.model_list.addItem(model_name)
|
||||
logger.info(f"找到模型: {model_name}")
|
||||
else:
|
||||
self.model_list.addItem("未找到任何模型")
|
||||
logger.info("未找到任何模型")
|
||||
else:
|
||||
self.model_list.addItem(f"API请求失败,状态码: {response.status_code}")
|
||||
logger.error(f"API请求失败,状态码: {response.status_code}")
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
self.model_list.addItem("无法连接到Ollama API")
|
||||
logger.error(f"无法连接到Ollama API: {str(e)}")
|
||||
except json.JSONDecodeError as e:
|
||||
self.model_list.addItem("API响应格式错误")
|
||||
logger.error(f"API响应格式错误: {str(e)}")
|
||||
except Exception as e:
|
||||
self.model_list.addItem(f"发生错误: {str(e)}")
|
||||
logger.error(f"发生未知错误: {str(e)}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
app = QApplication(sys.argv)
|
||||
window = OllamaModelViewer()
|
||||
window.show()
|
||||
sys.exit(app.exec())
|
||||
6
requirements.txt
Normal file
6
requirements.txt
Normal file
@@ -0,0 +1,6 @@
|
||||
requests>=2.25.1
|
||||
lxml>=4.6.3
|
||||
tqdm>=4.61.2
|
||||
loguru>=0.5.3
|
||||
zhipuai>=2.1.0
|
||||
PySide6>=6.0.0
|
||||
8380
tophub_add_data_to_db.log
Normal file
8380
tophub_add_data_to_db.log
Normal file
File diff suppressed because it is too large
Load Diff
213
tophub_add_data_to_db.py
Normal file
213
tophub_add_data_to_db.py
Normal file
@@ -0,0 +1,213 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
处理临时文件并写入数据库的脚本
|
||||
读取指定格式的临时文件,提取标题和链接,调用API进行分类,然后写入SQLite数据库
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import requests
|
||||
import os
|
||||
import re
|
||||
from datetime import datetime
|
||||
from tqdm import tqdm
|
||||
from loguru import logger
|
||||
import glob
|
||||
|
||||
# 配置日志
|
||||
logger.add("tophub_add_data_to_db.log", rotation="10 MB", level="INFO")
|
||||
|
||||
# API配置
|
||||
API_URL = "http://localhost:11434/api/generate"
|
||||
API_MODEL = "gemma3:4b"
|
||||
|
||||
def init_database():
|
||||
"""初始化数据库,创建表结构"""
|
||||
conn = sqlite3.connect('tophub_data.db')
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS articles (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
title TEXT NOT NULL,
|
||||
url TEXT NOT NULL,
|
||||
category TEXT,
|
||||
source_date TEXT NOT NULL,
|
||||
created_at TEXT NOT NULL,
|
||||
UNIQUE(title, source_date)
|
||||
)
|
||||
''')
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
logger.info("数据库初始化完成")
|
||||
|
||||
def find_temp_files():
|
||||
"""查找符合格式的临时文件"""
|
||||
pattern = "*年*月*日*.txt"
|
||||
files = glob.glob(pattern)
|
||||
logger.info(f"找到 {len(files)} 个临时文件: {files}")
|
||||
return files
|
||||
|
||||
def parse_file_content(file_path):
|
||||
"""解析文件内容,按5行一个循环提取数据"""
|
||||
articles = []
|
||||
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
lines = f.readlines()
|
||||
|
||||
# 按5行一组进行解析
|
||||
for i in range(0, len(lines), 5):
|
||||
if i + 4 < len(lines):
|
||||
node_id = lines[i].strip()
|
||||
category = lines[i+1].strip()
|
||||
title = lines[i+2].strip()
|
||||
url = lines[i+3].strip()
|
||||
separator = lines[i+4].strip() if i+4 < len(lines) else ""
|
||||
|
||||
# 提取关键信息
|
||||
title_match = re.search(r'标题: (.+)', title)
|
||||
url_match = re.search(r'链接: (.+)', url)
|
||||
|
||||
if title_match and url_match:
|
||||
articles.append({
|
||||
'title': title_match.group(1),
|
||||
'url': url_match.group(1),
|
||||
'category': category.split(': ')[1] if ': ' in category else '未知'
|
||||
})
|
||||
|
||||
logger.info(f"从文件 {file_path} 解析出 {len(articles)} 条数据")
|
||||
return articles
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"解析文件 {file_path} 失败: {e}")
|
||||
return []
|
||||
|
||||
def check_duplicate(title, date_str):
|
||||
"""检查标题+日期是否已存在"""
|
||||
conn = sqlite3.connect('tophub_data.db')
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
SELECT COUNT(*) FROM articles
|
||||
WHERE title = ? AND source_date = ?
|
||||
''', (title, date_str))
|
||||
|
||||
count = cursor.fetchone()[0]
|
||||
conn.close()
|
||||
|
||||
return count > 0
|
||||
|
||||
def classify_title(title):
|
||||
"""调用API对标题进行分类"""
|
||||
try:
|
||||
prompt = f"目标:对以下文字内容进行分类,返回结果为类别,如\"社会新闻\",\"金融\",\"历史\",\"购物\",\"新质科技\"等等。目的:只返回2-4个字,不返回其它内容。内容:{title}"
|
||||
|
||||
data = {
|
||||
"model": API_MODEL,
|
||||
"prompt": prompt,
|
||||
"stream": False
|
||||
}
|
||||
|
||||
response = requests.post(API_URL, json=data, timeout=30)
|
||||
response.raise_for_status()
|
||||
|
||||
result = response.json()
|
||||
category = result.get('response', '').strip()
|
||||
|
||||
# 验证分类结果长度
|
||||
if len(category) < 2 or len(category) > 8:
|
||||
category = '其他'
|
||||
|
||||
logger.info(f"标题 '{title}' 分类为: {category}")
|
||||
return category
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"API调用失败,标题 '{title}': {e}")
|
||||
return '其他'
|
||||
|
||||
def insert_article(title, url, category, source_date):
|
||||
"""插入文章到数据库"""
|
||||
conn = sqlite3.connect('tophub_data.db')
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
created_at = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
cursor.execute('''
|
||||
INSERT INTO articles (title, url, category, source_date, created_at)
|
||||
VALUES (?, ?, ?, ?, ?)
|
||||
''', (title, url, category, source_date, created_at))
|
||||
|
||||
conn.commit()
|
||||
logger.info(f"成功插入文章: {title}")
|
||||
return True
|
||||
|
||||
except sqlite3.IntegrityError:
|
||||
logger.warning(f"文章已存在,跳过: {title}")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"插入文章失败: {e}")
|
||||
return False
|
||||
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
def process_temp_files():
|
||||
"""主处理函数"""
|
||||
logger.info("开始处理临时文件...")
|
||||
|
||||
# 初始化数据库
|
||||
init_database()
|
||||
|
||||
# 查找临时文件
|
||||
temp_files = find_temp_files()
|
||||
|
||||
if not temp_files:
|
||||
logger.warning("未找到临时文件")
|
||||
return
|
||||
|
||||
total_processed = 0
|
||||
total_inserted = 0
|
||||
|
||||
# 处理每个文件
|
||||
for file_path in temp_files:
|
||||
logger.info(f"处理文件: {file_path}")
|
||||
|
||||
# 从文件名提取日期
|
||||
date_match = re.search(r'(\d{4})年(\d{1,2})月(\d{1,2})日', file_path)
|
||||
if date_match:
|
||||
source_date = f"{date_match.group(1)}-{int(date_match.group(2)):02d}-{int(date_match.group(3)):02d}"
|
||||
else:
|
||||
source_date = datetime.now().strftime('%Y-%m-%d')
|
||||
|
||||
# 解析文件内容
|
||||
articles = parse_file_content(file_path)
|
||||
|
||||
if not articles:
|
||||
continue
|
||||
|
||||
# 处理每篇文章
|
||||
for article in tqdm(articles, desc=f"处理 {file_path}"):
|
||||
total_processed += 1
|
||||
|
||||
# 检查重复
|
||||
if check_duplicate(article['title'], source_date):
|
||||
logger.info(f"跳过重复文章: {article['title']}")
|
||||
continue
|
||||
|
||||
# 分类标题
|
||||
category = classify_title(article['title'])
|
||||
|
||||
# 插入数据库
|
||||
if insert_article(article['title'], article['url'], category, source_date):
|
||||
total_inserted += 1
|
||||
|
||||
logger.info(f"处理完成! 总计处理: {total_processed}, 成功插入: {total_inserted}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
process_temp_files()
|
||||
except Exception as e:
|
||||
logger.error(f"程序执行失败: {e}")
|
||||
raise
|
||||
14
tophub_ban_column.txt
Normal file
14
tophub_ban_column.txt
Normal file
@@ -0,0 +1,14 @@
|
||||
淘宝
|
||||
音乐
|
||||
电影
|
||||
猫眼
|
||||
IMDB
|
||||
视频
|
||||
七猫
|
||||
读书
|
||||
TapTap
|
||||
Music
|
||||
即刻
|
||||
站酷
|
||||
App
|
||||
彩票
|
||||
BIN
tophub_data.db
Normal file
BIN
tophub_data.db
Normal file
Binary file not shown.
7559
tophub_scraper.log
Normal file
7559
tophub_scraper.log
Normal file
File diff suppressed because it is too large
Load Diff
209
tophub_scraper.py
Normal file
209
tophub_scraper.py
Normal file
@@ -0,0 +1,209 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
TopHub网站数据抓取脚本
|
||||
负责从tophub.today网站抓取数据,根据指定规则过滤并保存
|
||||
"""
|
||||
|
||||
import requests
|
||||
from lxml import html
|
||||
import json
|
||||
import time
|
||||
import os
|
||||
import re
|
||||
from datetime import datetime
|
||||
from loguru import logger
|
||||
|
||||
# 配置日志
|
||||
logger.add("tophub_scraper.log", rotation="10 MB", level="INFO")
|
||||
|
||||
class TopHubScraper:
|
||||
"""TopHub网站数据抓取器"""
|
||||
|
||||
def __init__(self):
|
||||
"""
|
||||
初始化抓取器
|
||||
"""
|
||||
self.base_url = "https://tophub.today/"
|
||||
self.ban_list_file = "tophub_ban_column.txt"
|
||||
self.session = requests.Session()
|
||||
self.session.headers.update({
|
||||
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
|
||||
})
|
||||
self.ban_list = self.load_ban_list()
|
||||
|
||||
def load_ban_list(self):
|
||||
"""
|
||||
加载需要过滤的栏目列表
|
||||
|
||||
Returns:
|
||||
set: 需要过滤的栏目集合
|
||||
"""
|
||||
ban_list = set()
|
||||
try:
|
||||
if os.path.exists(self.ban_list_file):
|
||||
with open(self.ban_list_file, 'r', encoding='utf-8') as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line:
|
||||
ban_list.add(line)
|
||||
logger.info(f"已加载 {len(ban_list)} 个需要过滤的栏目")
|
||||
else:
|
||||
logger.warning(f"过滤文件 {self.ban_list_file} 不存在,将不过滤任何栏目")
|
||||
except Exception as e:
|
||||
logger.error(f"加载过滤文件失败: {e}")
|
||||
return ban_list
|
||||
|
||||
def fetch_webpage(self):
|
||||
"""
|
||||
获取网页内容
|
||||
|
||||
Returns:
|
||||
str: 网页HTML内容
|
||||
"""
|
||||
logger.info(f"正在获取网页内容: {self.base_url}")
|
||||
try:
|
||||
response = self.session.get(self.base_url, timeout=10)
|
||||
response.raise_for_status()
|
||||
logger.info("网页内容获取成功")
|
||||
return response.text
|
||||
except requests.RequestException as e:
|
||||
logger.error(f"获取网页内容失败: {e}")
|
||||
raise
|
||||
|
||||
def scrape_by_node_ids(self):
|
||||
"""
|
||||
根据节点ID范围抓取数据
|
||||
|
||||
Returns:
|
||||
list: 包含已抓取数据的列表
|
||||
"""
|
||||
try:
|
||||
# 1. 获取网页内容
|
||||
html_content = self.fetch_webpage()
|
||||
tree = html.fromstring(html_content)
|
||||
|
||||
# 2. 创建输出文件名(基于当前日期时间)
|
||||
now = datetime.now()
|
||||
output_file = f"{now.year}年{now.month}月{now.day}日{now.hour}{now.minute}{now.second}.txt"
|
||||
|
||||
scraped_data = []
|
||||
|
||||
# 3. 遍历节点ID范围
|
||||
for node_id in range(1, 1000): # 从1到999
|
||||
xpath = f'//*[@id="node-{node_id}"]'
|
||||
logger.info(f"正在查找节点: {xpath}")
|
||||
|
||||
# 查找节点
|
||||
nodes = tree.xpath(xpath)
|
||||
if not nodes:
|
||||
continue # 没有找到节点,跳过下一个数字
|
||||
|
||||
node = nodes[0]
|
||||
|
||||
# 查找span标签
|
||||
spans = node.xpath('.//span')
|
||||
if not spans:
|
||||
logger.info(f"节点 {node_id} 中未找到span标签,跳过")
|
||||
continue
|
||||
|
||||
# 获取第一个span的文本内容
|
||||
span_text = spans[0].text_content().strip()
|
||||
if not span_text:
|
||||
logger.info(f"节点 {node_id} 的span标签为空,跳过")
|
||||
continue
|
||||
|
||||
# 检查是否在过滤列表中(部分匹配)
|
||||
should_skip = False
|
||||
for ban_word in self.ban_list:
|
||||
if ban_word in span_text:
|
||||
logger.info(f"节点 {node_id} 的内容 '{span_text}' 包含过滤词 '{ban_word}',跳过")
|
||||
should_skip = True
|
||||
break
|
||||
|
||||
if should_skip:
|
||||
continue
|
||||
|
||||
logger.info(f"节点 {node_id} 的内容 '{span_text}' 通过过滤,继续处理")
|
||||
|
||||
# 查找a元素
|
||||
links = node.xpath('.//a')
|
||||
if not links:
|
||||
logger.info(f"节点 {node_id} 中未找到a元素,跳过")
|
||||
continue
|
||||
|
||||
# 提取所有链接和文本
|
||||
for link in links:
|
||||
link_text = link.text_content().strip()
|
||||
href = link.get('href', '')
|
||||
|
||||
if link_text and href:
|
||||
# 补全相对链接
|
||||
if not href.startswith('http'):
|
||||
href = f"https://tophub.today{href}"
|
||||
|
||||
# 当category和text的值相同时,跳过当前循环
|
||||
if span_text == link_text:
|
||||
logger.info(f"节点 {node_id} 的分类和标题相同 ({span_text}),跳过")
|
||||
continue
|
||||
|
||||
scraped_data.append({
|
||||
'node_id': node_id,
|
||||
'category': span_text,
|
||||
'text': link_text,
|
||||
'link': href
|
||||
})
|
||||
|
||||
# 4. 保存数据到文件
|
||||
if scraped_data:
|
||||
self.save_to_file(scraped_data, output_file)
|
||||
logger.info(f"成功抓取 {len(scraped_data)} 条数据,保存到 {output_file}")
|
||||
else:
|
||||
logger.warning("未抓取到任何数据")
|
||||
|
||||
return scraped_data
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"抓取数据时出错: {e}")
|
||||
raise
|
||||
|
||||
def save_to_file(self, data, filename):
|
||||
"""
|
||||
将数据保存到文件
|
||||
|
||||
Args:
|
||||
data (list): 要保存的数据
|
||||
filename (str): 文件名
|
||||
"""
|
||||
try:
|
||||
with open(filename, 'w', encoding='utf-8') as f:
|
||||
for item in data:
|
||||
f.write(f"节点ID: {item['node_id']}\n")
|
||||
f.write(f"分类: {item['category']}\n")
|
||||
|
||||
# 使用正则表达式清洗标题,去除数字序号和多余空白
|
||||
title_text = item['text']
|
||||
# 处理多行标题,提取实际内容
|
||||
lines = title_text.strip().split('\n')
|
||||
if len(lines) >= 2:
|
||||
# 第二行通常是实际标题内容
|
||||
cleaned_title = lines[1].strip()
|
||||
else:
|
||||
# 如果只有一行,尝试使用正则表达式
|
||||
match = re.match(r'^\d+\s+(.+)$', title_text.strip(), re.DOTALL)
|
||||
if match:
|
||||
cleaned_title = match.group(1).strip()
|
||||
else:
|
||||
cleaned_title = title_text.strip()
|
||||
|
||||
f.write(f"标题: {cleaned_title}\n")
|
||||
f.write(f"链接: {item['link']}\n")
|
||||
f.write("-" * 50 + "\n")
|
||||
logger.info(f"数据已保存到 {filename}")
|
||||
except Exception as e:
|
||||
logger.error(f"保存文件失败: {e}")
|
||||
raise
|
||||
|
||||
if __name__ == "__main__":
|
||||
scraper = TopHubScraper()
|
||||
scraper.scrape_by_node_ids()
|
||||
155
右键菜单功能说明.md
Normal file
155
右键菜单功能说明.md
Normal file
@@ -0,0 +1,155 @@
|
||||
# 右键菜单功能说明
|
||||
|
||||
## 功能概述
|
||||
|
||||
TopHub数据查看器的右键菜单功能允许用户通过右键点击表格中的项目,快速执行常用操作,提高操作效率。
|
||||
|
||||
## 新增功能
|
||||
|
||||
### 1. 标记为感兴趣
|
||||
- **功能描述**:将选中的项目标记为感兴趣状态
|
||||
- **数据库操作**:将对应记录的`is_interested`字段设置为1
|
||||
- **界面显示**:在"感兴趣"列显示为"是",使用绿色粗体字体
|
||||
|
||||
### 2. 标记为不感兴趣
|
||||
- **功能描述**:将选中的项目标记为不感兴趣状态
|
||||
- **数据库操作**:将对应记录的`is_interested`字段设置为0
|
||||
- **界面显示**:在"感兴趣"列显示为"否",使用普通字体和颜色
|
||||
|
||||
### 3. 删除选中项
|
||||
- **功能描述**:删除选中的项目
|
||||
- **数据库操作**:从数据库中删除对应记录
|
||||
- **界面显示**:从表格中移除对应行
|
||||
|
||||
## 使用方法
|
||||
|
||||
1. 打开TopHub数据查看器
|
||||
2. 在表格中右键点击任意项目
|
||||
3. 在弹出的右键菜单中选择所需操作:
|
||||
- 点击"标记为感兴趣"将项目标记为感兴趣
|
||||
- 点击"标记为不感兴趣"将项目标记为不感兴趣
|
||||
- 点击"删除选中项"删除选中的项目
|
||||
|
||||
## 技术实现
|
||||
|
||||
### 右键菜单实现
|
||||
```python
|
||||
# 启用右键菜单
|
||||
self.table.setContextMenuPolicy(Qt.CustomContextMenu)
|
||||
self.table.customContextMenuRequested.connect(self.show_context_menu)
|
||||
|
||||
def show_context_menu(self, position):
|
||||
"""显示右键菜单"""
|
||||
# 获取点击位置的行
|
||||
row = self.table.rowAt(position.y())
|
||||
if row < 0:
|
||||
return
|
||||
|
||||
# 选中该行
|
||||
self.table.selectRow(row)
|
||||
|
||||
# 创建右键菜单
|
||||
menu = QMenu(self)
|
||||
|
||||
# 添加"标记为感兴趣"动作
|
||||
mark_action = QAction("标记为感兴趣", self)
|
||||
mark_action.triggered.connect(self.mark_as_interested)
|
||||
menu.addAction(mark_action)
|
||||
|
||||
# 添加"标记为不感兴趣"动作
|
||||
unmark_action = QAction("标记为不感兴趣", self)
|
||||
unmark_action.triggered.connect(self.mark_as_not_interested)
|
||||
menu.addAction(unmark_action)
|
||||
|
||||
# 添加分隔线
|
||||
menu.addSeparator()
|
||||
|
||||
# 添加"删除"动作
|
||||
delete_action = QAction("删除选中项", self)
|
||||
delete_action.triggered.connect(self.delete_selected_items)
|
||||
menu.addAction(delete_action)
|
||||
|
||||
# 显示菜单
|
||||
menu.exec_(self.table.mapToGlobal(position))
|
||||
```
|
||||
|
||||
### 标记为不感兴趣方法实现
|
||||
```python
|
||||
def mark_as_not_interested(self):
|
||||
"""将选中的项目标记为不感兴趣"""
|
||||
# 获取选中的行
|
||||
selected_rows = set()
|
||||
for item in self.table.selectedItems():
|
||||
selected_rows.add(item.row())
|
||||
|
||||
# 如果没有选中的行,直接返回
|
||||
if not selected_rows:
|
||||
QMessageBox.information(self, "提示", "请先选中要标记的行")
|
||||
return
|
||||
|
||||
# 弹出确认对话框
|
||||
reply = QMessageBox.question(
|
||||
self,
|
||||
"确认标记",
|
||||
f"确定要将选中的 {len(selected_rows)} 行标记为不感兴趣吗?",
|
||||
QMessageBox.Yes | QMessageBox.No,
|
||||
QMessageBox.Yes
|
||||
)
|
||||
|
||||
if reply == QMessageBox.No:
|
||||
return
|
||||
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 更新选中的行
|
||||
updated_count = 0
|
||||
for row in selected_rows:
|
||||
# 获取ID
|
||||
id_item = self.table.item(row, 0)
|
||||
if id_item:
|
||||
article_id = id_item.text()
|
||||
# 更新数据库中的is_interested字段
|
||||
cursor.execute("UPDATE articles SET is_interested = 0 WHERE id = ?", (article_id,))
|
||||
|
||||
# 更新表格中的显示
|
||||
interested_item = QTableWidgetItem("否")
|
||||
# 不感兴趣项使用普通字体和颜色
|
||||
self.table.setItem(row, 5, interested_item)
|
||||
|
||||
updated_count += 1
|
||||
|
||||
# 提交更改
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
# 更新状态栏
|
||||
self.status_bar.showMessage(f"已标记 {updated_count} 行为不感兴趣")
|
||||
|
||||
except sqlite3.Error as e:
|
||||
logger.error(f"标记数据时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "数据库错误", f"标记数据时出错: {str(e)}")
|
||||
self.status_bar.showMessage("标记失败")
|
||||
except Exception as e:
|
||||
logger.error(f"标记数据时出错: {str(e)}")
|
||||
QMessageBox.critical(self, "错误", f"标记数据时出错: {str(e)}")
|
||||
self.status_bar.showMessage("标记失败")
|
||||
```
|
||||
|
||||
## 测试
|
||||
|
||||
测试脚本`test_mark_not_interested.py`验证了"标记为不感兴趣"功能的正确性。测试结果显示功能正常工作,能够正确地将项目标记为不感兴趣,并更新数据库和界面显示。
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. 右键菜单操作前必须先选中要操作的项目
|
||||
2. 删除操作不可撤销,请谨慎使用
|
||||
3. 标记操作会直接更新数据库,确保操作前已确认选择
|
||||
4. 批量操作时,所有选中的项目都会被同时处理
|
||||
|
||||
## 更新记录
|
||||
|
||||
- 2023-11-07:添加"标记为不感兴趣"功能到右键菜单
|
||||
- 2023-11-07:完成功能测试和文档编写
|
||||
44
数据库字段添加总结.md
Normal file
44
数据库字段添加总结.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# 数据库字段添加总结
|
||||
|
||||
## 任务概述
|
||||
为TopHub数据库查看器添加一个"感兴趣"字段,允许用户标记感兴趣的文章。
|
||||
|
||||
## 实施步骤
|
||||
|
||||
### 1. 数据库结构修改
|
||||
- 创建了`add_interested_field.py`脚本,用于向`articles`表添加`is_interested`字段
|
||||
- 字段类型:INTEGER,默认值:0
|
||||
- 脚本包含字段存在性检查、添加逻辑和验证功能
|
||||
|
||||
### 2. 数据库验证
|
||||
- 创建了`check_db_structure.py`脚本,用于检查数据库结构
|
||||
- 创建了`test_interested_field.py`脚本,用于验证字段功能
|
||||
- 创建了`show_data_with_interested.py`脚本,用于显示包含感兴趣状态的记录
|
||||
|
||||
### 3. GUI界面修改
|
||||
- 修改了`db_viewer.py`文件,添加了以下功能:
|
||||
- 在表格中添加"感兴趣"列,显示`is_interested`字段值
|
||||
- 添加"标记为感兴趣"按钮,允许用户将选中的文章标记为感兴趣
|
||||
- 更新查询语句,包含`is_interested`字段
|
||||
- 更新筛选功能,包含感兴趣列
|
||||
|
||||
## 测试结果
|
||||
- 数据库字段成功添加,默认值为0
|
||||
- 可以成功将记录标记为感兴趣(值为1)
|
||||
- GUI应用程序能够正常显示和操作感兴趣字段
|
||||
- 统计功能正常工作,可以显示感兴趣和不感兴趣的记录数量
|
||||
|
||||
## 使用方法
|
||||
1. 运行`python db_viewer.py`启动应用程序
|
||||
2. 在表格中选择一条记录
|
||||
3. 点击"标记为感兴趣"按钮将记录标记为感兴趣
|
||||
4. 可以使用筛选功能查看感兴趣的记录
|
||||
5. 统计面板会显示感兴趣和不感兴趣的记录数量
|
||||
|
||||
## 文件清单
|
||||
- `add_interested_field.py` - 添加数据库字段的脚本
|
||||
- `check_db_structure.py` - 检查数据库结构的脚本
|
||||
- `test_interested_field.py` - 测试字段功能的脚本
|
||||
- `show_data_with_interested.py` - 显示记录的命令行工具
|
||||
- `test_gui.py` - GUI测试脚本
|
||||
- `db_viewer.py` - 修改后的主应用程序
|
||||
70
评分系统使用说明.md
Normal file
70
评分系统使用说明.md
Normal file
@@ -0,0 +1,70 @@
|
||||
# TopHub数据查看器 - 评分系统使用说明
|
||||
|
||||
## 概述
|
||||
|
||||
TopHub数据查看器已从简单的"感兴趣/不感兴趣"标记系统升级为10分评分制度。新系统提供了更精细的内容评价能力,让您能够更准确地标记和管理抓取的内容。
|
||||
|
||||
## 评分系统说明
|
||||
|
||||
### 评分范围
|
||||
- **最低分**: 0分 (完全不感兴趣)
|
||||
- **默认分**: 5分 (中立态度)
|
||||
- **最高分**: 10分 (非常感兴趣)
|
||||
|
||||
### 颜色编码
|
||||
为了便于快速识别内容质量,系统根据分数自动显示不同颜色:
|
||||
- **绿色加粗**: 8分及以上 (高价值内容)
|
||||
- **蓝色**: 6-7分 (中等价值内容)
|
||||
- **默认颜色**: 4-5分 (一般内容)
|
||||
- **红色**: 3分及以下 (低价值内容)
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 增加评分
|
||||
1. 在表格中选择一行或多行
|
||||
2. 右键点击选中的行
|
||||
3. 从菜单中选择"增加评分(+1)"
|
||||
4. 系统会将选中项的评分增加1分,最高不超过10分
|
||||
|
||||
### 减少评分
|
||||
1. 在表格中选择一行或多行
|
||||
2. 右键点击选中的行
|
||||
3. 从菜单中选择"减少评分(-1)"
|
||||
4. 系统会将选中项的评分减少1分,最低不低于0分
|
||||
|
||||
### 批量操作
|
||||
- 可以同时选择多行进行批量评分调整
|
||||
- 使用"按关键字选中"功能可以快速选择包含特定关键词的行
|
||||
- 然后通过右键菜单进行批量评分调整
|
||||
|
||||
## 数据迁移
|
||||
|
||||
原有的"感兴趣/不感兴趣"数据已自动转换为新的评分系统:
|
||||
- 原标记为"感兴趣"的项目已转换为7分
|
||||
- 原标记为"不感兴趣"的项目已转换为5分(默认值)
|
||||
|
||||
## 技术细节
|
||||
|
||||
### 数据库结构
|
||||
- 新增了`score`字段(INTEGER类型)替代原来的`is_interested`字段
|
||||
- `score`字段默认值为5,范围限制为0-10
|
||||
|
||||
### 界面更新
|
||||
- 表格中的"感兴趣"列已更新为"评分"列,显示具体分数
|
||||
- 右键菜单已更新为"增加评分(+1)"和"减少评分(-1)"选项
|
||||
- 根据分数自动应用颜色编码,便于快速识别
|
||||
|
||||
## 常见问题
|
||||
|
||||
**Q: 为什么默认分数是5分而不是0分?**
|
||||
A: 5分代表中立态度,更符合日常评分习惯。0分通常用于表示完全不相关或质量极差的内容。
|
||||
|
||||
**Q: 如何快速找到高评分内容?**
|
||||
A: 高评分内容(8分及以上)会以绿色加粗显示,非常醒目。您也可以使用排序功能按评分列排序。
|
||||
|
||||
**Q: 可以直接设置任意分数吗?**
|
||||
A: 当前版本只支持通过+1/-1的方式调整分数,这样可以保持评分的一致性和可追溯性。
|
||||
|
||||
---
|
||||
|
||||
如有其他问题或建议,请随时反馈。
|
||||
Reference in New Issue
Block a user