integrated_scraper.py 和 product_ai_analysis.py 两个文件合并
This commit is contained in:
5900
2025年11月26日19179.txt
5900
2025年11月26日19179.txt
File diff suppressed because it is too large
Load Diff
5795
2025年11月27日19551.txt
5795
2025年11月27日19551.txt
File diff suppressed because it is too large
Load Diff
5
integrated_product_system.log
Normal file
5
integrated_product_system.log
Normal file
@@ -0,0 +1,5 @@
|
||||
2025-11-28 21:56:24.885 | INFO | __main__:__init__:69 - 初始化全功能产品系统,数据库: c:\Users\xiaji\Documents\个人文件夹\夏骥\hothub的抓取\product\products.db
|
||||
2025-11-28 21:56:24.885 | INFO | __main__:run_full_workflow:555 - === 开始全功能产品系统工作流程 ===
|
||||
2025-11-28 21:56:24.886 | INFO | __main__:init_database:83 - 正在初始化产品数据库...
|
||||
2025-11-28 21:56:24.886 | SUCCESS | __main__:init_database:122 - 产品数据库初始化完成
|
||||
2025-11-28 21:56:24.887 | INFO | __main__:run_full_workflow:562 - 步骤1: 开始抓取ProductHunt数据...
|
||||
199
product/README.md
Normal file
199
product/README.md
Normal file
@@ -0,0 +1,199 @@
|
||||
# 全功能产品抓取与分析系统
|
||||
|
||||
这是一个整合了产品抓取和AI分析功能的完整系统,将原来的 `integrated_scraper.py` 和 `product_ai_analysis.py` 合并为一个统一的系统。
|
||||
|
||||
## 功能特性
|
||||
|
||||
### 数据抓取功能
|
||||
- 从tophub_data.db数据库中查询ProductHunt链接
|
||||
- 使用playwright连接Chrome浏览器抓取产品信息
|
||||
- 自动去重,避免重复抓取
|
||||
- 支持批量抓取和进度显示
|
||||
- 保存产品信息到products表
|
||||
|
||||
### AI分析功能
|
||||
- 调用Ollama AI API(qwen3:8b模型)分析产品开发难度
|
||||
- 自动解析AI响应,提取产品名称、简介和开发难度
|
||||
- 保存分析结果到product_analysis表
|
||||
- 支持断点续分析,避免重复分析
|
||||
- 自动延时保护,避免API过载
|
||||
|
||||
### 系统特性
|
||||
- 统一的配置管理(config.py)
|
||||
- 完整的日志记录(loguru)
|
||||
- 进度条显示(tqdm)
|
||||
- 错误处理和重试机制
|
||||
- 模块化设计,易于扩展
|
||||
|
||||
## 文件结构
|
||||
|
||||
```
|
||||
product/
|
||||
├── integrated_product_system.py # 主系统文件(核心功能)
|
||||
├── run_system.py # 简化命令行界面
|
||||
├── config.py # 配置文件
|
||||
├── README.md # 使用说明
|
||||
└── playwright-get-data.py # playwright抓取模块(依赖文件)
|
||||
```
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 1. 基本使用(完整模式)
|
||||
```bash
|
||||
# 运行完整工作流程(抓取+分析)
|
||||
python run_system.py --mode full
|
||||
|
||||
# 或者使用主系统文件
|
||||
python integrated_product_system.py
|
||||
```
|
||||
|
||||
### 2. 仅抓取模式
|
||||
```bash
|
||||
# 仅运行抓取功能
|
||||
python run_system.py --mode scraping
|
||||
|
||||
# 指定抓取数量限制
|
||||
python run_system.py --mode scraping --limit 50
|
||||
|
||||
# 不跳过重复URL
|
||||
python run_system.py --mode scraping --no-skip-duplicates
|
||||
```
|
||||
|
||||
### 3. 仅分析模式
|
||||
```bash
|
||||
# 仅运行AI分析功能
|
||||
python run_system.py --mode analysis
|
||||
|
||||
# 限制分析数量
|
||||
python run_system.py --mode analysis --max-products 100
|
||||
```
|
||||
|
||||
### 4. 高级选项
|
||||
```bash
|
||||
# 指定数据库路径
|
||||
python run_system.py --tophub-db /path/to/tophub_data.db --product-db /path/to/products.db
|
||||
|
||||
# 指定Chrome调试端口
|
||||
python run_system.py --debug-port 9222
|
||||
|
||||
# 指定日志文件和级别
|
||||
python run_system.py --log-file my_log.log --log-level DEBUG
|
||||
|
||||
# 指定特定URL进行抓取
|
||||
python run_system.py --mode scraping --urls https://www.producthunt.com/posts/example-product
|
||||
```
|
||||
|
||||
## 数据库结构
|
||||
|
||||
### products表(产品信息)
|
||||
```sql
|
||||
CREATE TABLE products (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
url TEXT NOT NULL UNIQUE,
|
||||
name TEXT,
|
||||
introduction TEXT,
|
||||
user_count TEXT,
|
||||
maker_link TEXT,
|
||||
maker_statement TEXT,
|
||||
created_at TEXT NOT NULL,
|
||||
updated_at TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### product_analysis表(AI分析结果)
|
||||
```sql
|
||||
CREATE TABLE product_analysis (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
original_id INTEGER,
|
||||
original_name TEXT,
|
||||
product_name TEXT,
|
||||
product_intro TEXT,
|
||||
development_difficulty TEXT,
|
||||
difficulty_score INTEGER,
|
||||
ai_response TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY (original_id) REFERENCES products (id)
|
||||
);
|
||||
```
|
||||
|
||||
## 配置说明
|
||||
|
||||
编辑 `config.py` 文件可以修改系统配置:
|
||||
|
||||
- **DATABASE_CONFIG**: 数据库路径配置
|
||||
- **CHROME_CONFIG**: Chrome浏览器配置
|
||||
- **AI_CONFIG**: AI API配置(Ollama)
|
||||
- **SCRAPING_CONFIG**: 抓取配置
|
||||
- **LOGGING_CONFIG**: 日志配置
|
||||
- **ANALYSIS_CONFIG**: 分析配置
|
||||
|
||||
## 系统要求
|
||||
|
||||
- Python 3.7+
|
||||
- Chrome浏览器(已运行,调试端口开启)
|
||||
- Ollama服务(已运行,qwen3:8b模型已安装)
|
||||
- SQLite数据库
|
||||
|
||||
## 依赖库
|
||||
|
||||
```bash
|
||||
pip install loguru tqdm requests playwright
|
||||
```
|
||||
|
||||
## 运行步骤
|
||||
|
||||
1. **确保Chrome浏览器已运行并开启调试端口**
|
||||
```bash
|
||||
# Windows
|
||||
chrome.exe --remote-debugging-port=9222
|
||||
|
||||
# macOS
|
||||
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
|
||||
```
|
||||
|
||||
2. **确保Ollama服务已运行**
|
||||
```bash
|
||||
# 启动Ollama服务
|
||||
ollama serve
|
||||
|
||||
# 安装qwen3:8b模型(如果未安装)
|
||||
ollama pull qwen3:8b
|
||||
```
|
||||
|
||||
3. **确保tophub_data.db数据库存在**
|
||||
- 数据库应包含articles表,且有url字段
|
||||
|
||||
4. **运行系统**
|
||||
```bash
|
||||
python run_system.py
|
||||
```
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q: 系统运行时提示Chrome连接失败?
|
||||
A: 确保Chrome浏览器已运行并开启了调试端口(默认9222)。
|
||||
|
||||
### Q: AI分析时提示API调用失败?
|
||||
A: 确保Ollama服务已运行,且qwen3:8b模型已安装。
|
||||
|
||||
### Q: 如何查看抓取和分析的进度?
|
||||
A: 系统会自动显示进度条,同时也会在日志文件中记录详细信息。
|
||||
|
||||
### Q: 如何只分析特定数量的产品?
|
||||
A: 使用 `--max-products` 参数,例如:`python run_system.py --max-products 50`
|
||||
|
||||
### Q: 如何重新分析已分析过的产品?
|
||||
A: 系统默认会跳过已分析的产品,如需重新分析,请删除product_analysis表中对应记录。
|
||||
|
||||
## 更新日志
|
||||
|
||||
### v1.0.0 (当前版本)
|
||||
- ✨ 合并integrated_scraper.py和product_ai_analysis.py功能
|
||||
- ✨ 添加统一的配置管理
|
||||
- ✨ 提供简化的命令行界面
|
||||
- ✨ 增强错误处理和日志记录
|
||||
- ✨ 支持多种运行模式
|
||||
|
||||
## 联系支持
|
||||
|
||||
如有问题,请查看日志文件获取详细信息,或检查系统配置是否正确。
|
||||
BIN
product/__pycache__/config.cpython-313.pyc
Normal file
BIN
product/__pycache__/config.cpython-313.pyc
Normal file
Binary file not shown.
BIN
product/__pycache__/integrated_product_system.cpython-313.pyc
Normal file
BIN
product/__pycache__/integrated_product_system.cpython-313.pyc
Normal file
Binary file not shown.
@@ -1,312 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
高级ProductHunt抓取器 - 处理Cloudflare Turnstile挑战
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import sqlite3
|
||||
from loguru import logger
|
||||
import os
|
||||
from urllib.parse import urlparse
|
||||
|
||||
class AdvancedProductHuntScraper:
|
||||
def __init__(self, db_path="test_product.db"):
|
||||
self.db_path = db_path
|
||||
self.init_database()
|
||||
|
||||
def init_database(self):
|
||||
"""初始化数据库"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 创建products表
|
||||
cursor.execute("""
|
||||
CREATE TABLE IF NOT EXISTS products (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
name TEXT,
|
||||
url TEXT UNIQUE,
|
||||
introduction TEXT,
|
||||
user_count INTEGER,
|
||||
maker_link TEXT,
|
||||
maker_statement TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
""")
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
logger.info(f"数据库已初始化: {self.db_path}")
|
||||
|
||||
def check_duplicate(self, url):
|
||||
"""检查URL是否已存在"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("SELECT id FROM products WHERE url = ?", (url,))
|
||||
result = cursor.fetchone()
|
||||
conn.close()
|
||||
return result is not None
|
||||
|
||||
def save_product_info(self, product_info):
|
||||
"""保存产品信息到数据库"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 检查是否已存在
|
||||
cursor.execute("SELECT id FROM products WHERE url = ?", (product_info['url'],))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
# 更新现有记录
|
||||
cursor.execute("""
|
||||
UPDATE products SET
|
||||
name = ?, introduction = ?, user_count = ?,
|
||||
maker_link = ?, maker_statement = ?, updated_at = CURRENT_TIMESTAMP
|
||||
WHERE url = ?
|
||||
""", (
|
||||
product_info['name'], product_info['introduction'],
|
||||
product_info['user_count'], product_info['maker_link'],
|
||||
product_info['maker_statement'], product_info['url']
|
||||
))
|
||||
logger.info(f"更新产品信息: {product_info['name']}")
|
||||
else:
|
||||
# 插入新记录
|
||||
cursor.execute("""
|
||||
INSERT INTO products (name, url, introduction, user_count, maker_link, maker_statement)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (
|
||||
product_info['name'], product_info['url'], product_info['introduction'],
|
||||
product_info['user_count'], product_info['maker_link'], product_info['maker_statement']
|
||||
))
|
||||
logger.info(f"保存产品信息: {product_info['name']}")
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
async def scrape_with_stealth(self, url):
|
||||
"""使用隐身模式抓取产品信息"""
|
||||
try:
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
logger.info(f"开始高级抓取: {url}")
|
||||
|
||||
# 创建Playwright实例
|
||||
playwright = await async_playwright().start()
|
||||
|
||||
# 使用更隐蔽的浏览器配置
|
||||
browser = await playwright.chromium.launch(
|
||||
headless=False, # 非无头模式以便观察
|
||||
args=[
|
||||
'--disable-blink-features=AutomationControlled',
|
||||
'--disable-features=VizDisplayCompositor',
|
||||
'--disable-background-timer-throttling',
|
||||
'--disable-backgrounding-occluded-windows',
|
||||
'--disable-renderer-backgrounding',
|
||||
'--disable-web-security',
|
||||
'--disable-features=TranslateUI',
|
||||
'--disable-ipc-flooding-protection',
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox'
|
||||
]
|
||||
)
|
||||
|
||||
# 创建上下文和页面
|
||||
context = await browser.new_context(
|
||||
viewport={'width': 1920, 'height': 1080},
|
||||
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
extra_http_headers={
|
||||
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
|
||||
'Accept-Language': 'en-US,en;q=0.9',
|
||||
'Accept-Encoding': 'gzip, deflate, br',
|
||||
'DNT': '1',
|
||||
'Connection': 'keep-alive',
|
||||
'Upgrade-Insecure-Requests': '1',
|
||||
}
|
||||
)
|
||||
|
||||
page = await context.new_page()
|
||||
|
||||
# 隐藏自动化特征
|
||||
await page.add_init_script("""
|
||||
Object.defineProperty(navigator, 'webdriver', {
|
||||
get: () => undefined,
|
||||
});
|
||||
Object.defineProperty(navigator, 'plugins', {
|
||||
get: () => [1, 2, 3, 4, 5],
|
||||
});
|
||||
Object.defineProperty(navigator, 'languages', {
|
||||
get: () => ['en-US', 'en'],
|
||||
});
|
||||
""")
|
||||
|
||||
# 设置超时时间
|
||||
page.set_default_timeout(300000) # 5分钟
|
||||
|
||||
# 导航到页面
|
||||
await page.goto(url, wait_until="domcontentloaded")
|
||||
|
||||
# 检查页面状态
|
||||
page_title = await page.title()
|
||||
logger.info(f"页面标题: {page_title}")
|
||||
|
||||
# 检查是否是Cloudflare挑战页面
|
||||
if "请稍候" in page_title or "Checking" in page_title or "Verifying" in page_title:
|
||||
logger.info("检测到Cloudflare挑战页面,等待用户手动验证...")
|
||||
|
||||
# 等待用户手动完成验证
|
||||
try:
|
||||
# 等待页面标题变化或特定元素出现
|
||||
await page.wait_for_function(
|
||||
"""() => {
|
||||
const title = document.title;
|
||||
return !title.includes('请稍候') &&
|
||||
!title.includes('Checking') &&
|
||||
!title.includes('Verifying') &&
|
||||
title !== '请稍候…';
|
||||
}""",
|
||||
timeout=300000 # 5分钟
|
||||
)
|
||||
logger.info("Cloudflare挑战已完成")
|
||||
except Exception as e:
|
||||
logger.warning(f"等待Cloudflare挑战超时: {e}")
|
||||
|
||||
# 如果超时,尝试刷新页面
|
||||
await page.reload(wait_until="domcontentloaded")
|
||||
logger.info("已刷新页面")
|
||||
|
||||
# 等待页面加载
|
||||
await page.wait_for_timeout(5000)
|
||||
|
||||
# 获取当前页面URL
|
||||
current_url = page.url
|
||||
logger.info(f"当前页面URL: {current_url}")
|
||||
|
||||
# 检查是否重定向到其他页面
|
||||
if current_url != url:
|
||||
logger.warning(f"页面已重定向: {url} -> {current_url}")
|
||||
|
||||
# 尝试提取产品信息
|
||||
product_info = {'url': url}
|
||||
|
||||
# 提取产品名称
|
||||
name_selectors = [
|
||||
"h1",
|
||||
"[data-test='product-name']",
|
||||
".product-name",
|
||||
"title"
|
||||
]
|
||||
|
||||
for selector in name_selectors:
|
||||
try:
|
||||
element = await page.query_selector(selector)
|
||||
if element:
|
||||
name = await element.text_content()
|
||||
if name and name.strip() and name.strip() != "www.producthunt.com":
|
||||
product_info['name'] = name.strip()
|
||||
logger.info(f"提取到产品名称: {product_info['name']}")
|
||||
break
|
||||
except Exception as e:
|
||||
logger.debug(f"选择器 {selector} 失败: {e}")
|
||||
|
||||
if 'name' not in product_info:
|
||||
# 从URL中提取产品名称
|
||||
parsed_url = urlparse(url)
|
||||
path_parts = parsed_url.path.split('/')
|
||||
if len(path_parts) >= 3 and path_parts[-2] == 'products':
|
||||
product_info['name'] = path_parts[-1].replace('-', ' ').title()
|
||||
logger.info(f"从URL提取产品名称: {product_info['name']}")
|
||||
else:
|
||||
product_info['name'] = "Unknown Product"
|
||||
logger.warning("无法提取产品名称")
|
||||
|
||||
# 提取其他信息(简化版本)
|
||||
product_info['introduction'] = None
|
||||
product_info['user_count'] = None
|
||||
product_info['maker_link'] = None
|
||||
product_info['maker_statement'] = None
|
||||
|
||||
# 关闭浏览器
|
||||
await browser.close()
|
||||
await playwright.stop()
|
||||
|
||||
logger.success(f"抓取完成: {product_info['name']}")
|
||||
return product_info
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"抓取失败: {e}")
|
||||
return {'url': url, 'name': 'Error', 'introduction': None, 'user_count': None, 'maker_link': None, 'maker_statement': None}
|
||||
|
||||
async def run_test(self):
|
||||
"""运行测试"""
|
||||
# 从tophub_data.db获取ProductHunt链接
|
||||
tophub_db_path = os.path.join(os.path.dirname(self.db_path), "..", "tophub_data.db")
|
||||
|
||||
conn = sqlite3.connect(tophub_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 查询包含producthunt.com的链接
|
||||
cursor.execute("""
|
||||
SELECT url FROM articles
|
||||
WHERE url LIKE '%producthunt.com%'
|
||||
LIMIT 3
|
||||
""")
|
||||
|
||||
urls = [row[0] for row in cursor.fetchall()]
|
||||
conn.close()
|
||||
|
||||
logger.info(f"找到 {len(urls)} 个ProductHunt链接")
|
||||
|
||||
# 处理每个URL
|
||||
for url in urls:
|
||||
logger.info(f"处理URL: {url}")
|
||||
|
||||
# 检查是否重复(注释掉跳过逻辑以强制重新抓取)
|
||||
# if self.check_duplicate(url):
|
||||
# logger.info(f"链接已存在,跳过: {url}")
|
||||
# continue
|
||||
|
||||
# 抓取产品信息
|
||||
product_info = await self.scrape_with_stealth(url)
|
||||
|
||||
# 保存到数据库
|
||||
self.save_product_info(product_info)
|
||||
|
||||
# 统计结果
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("SELECT COUNT(*) FROM products")
|
||||
count = cursor.fetchone()[0]
|
||||
|
||||
cursor.execute("SELECT name, url FROM products")
|
||||
products = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
logger.success("测试任务完成")
|
||||
|
||||
print("\n=== 测试结果统计 ===")
|
||||
print(f"数据库中的产品数量: {count}")
|
||||
print("已抓取的产品:")
|
||||
for name, url in products:
|
||||
print(f" - {name}: {url}")
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
# 配置日志
|
||||
logger.remove()
|
||||
logger.add(
|
||||
"advanced_scraper.log",
|
||||
level="DEBUG",
|
||||
format="{time:YYYY-MM-DD HH:mm:ss} | {level:<8} | {name}:{function}:{line} - {message}",
|
||||
rotation="10 MB",
|
||||
retention="7 days"
|
||||
)
|
||||
|
||||
# 创建抓取器实例
|
||||
scraper = AdvancedProductHuntScraper()
|
||||
|
||||
# 运行测试
|
||||
await scraper.run_test()
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -1,245 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
ProductHunt API抓取器 - 通过API获取产品信息
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import sqlite3
|
||||
import requests
|
||||
from loguru import logger
|
||||
import os
|
||||
import json
|
||||
from urllib.parse import urlparse
|
||||
|
||||
class ProductHuntAPIScraper:
|
||||
def __init__(self, db_path="test_product.db"):
|
||||
self.db_path = db_path
|
||||
self.init_database()
|
||||
|
||||
def init_database(self):
|
||||
"""初始化数据库"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 创建products表
|
||||
cursor.execute("""
|
||||
CREATE TABLE IF NOT EXISTS products (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
name TEXT,
|
||||
url TEXT UNIQUE,
|
||||
introduction TEXT,
|
||||
user_count INTEGER,
|
||||
maker_link TEXT,
|
||||
maker_statement TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
""")
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
logger.info(f"数据库已初始化: {self.db_path}")
|
||||
|
||||
def save_product_info(self, product_info):
|
||||
"""保存产品信息到数据库"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 检查是否已存在
|
||||
cursor.execute("SELECT id FROM products WHERE url = ?", (product_info['url'],))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
# 更新现有记录
|
||||
cursor.execute("""
|
||||
UPDATE products SET
|
||||
name = ?, introduction = ?, user_count = ?,
|
||||
maker_link = ?, maker_statement = ?, updated_at = CURRENT_TIMESTAMP
|
||||
WHERE url = ?
|
||||
""", (
|
||||
product_info['name'], product_info['introduction'],
|
||||
product_info['user_count'], product_info['maker_link'],
|
||||
product_info['maker_statement'], product_info['url']
|
||||
))
|
||||
logger.info(f"更新产品信息: {product_info['name']}")
|
||||
else:
|
||||
# 插入新记录
|
||||
cursor.execute("""
|
||||
INSERT INTO products (name, url, introduction, user_count, maker_link, maker_statement)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (
|
||||
product_info['name'], product_info['url'], product_info['introduction'],
|
||||
product_info['user_count'], product_info['maker_link'], product_info['maker_statement']
|
||||
))
|
||||
logger.info(f"保存产品信息: {product_info['name']}")
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
def extract_product_name_from_url(self, url):
|
||||
"""从URL中提取产品名称"""
|
||||
try:
|
||||
parsed_url = urlparse(url)
|
||||
path_parts = parsed_url.path.split('/')
|
||||
|
||||
# 查找products路径段
|
||||
for i, part in enumerate(path_parts):
|
||||
if part == 'products' and i + 1 < len(path_parts):
|
||||
product_slug = path_parts[i + 1]
|
||||
# 将slug转换为可读的名称
|
||||
name = product_slug.replace('-', ' ').title()
|
||||
return name
|
||||
|
||||
# 如果找不到products路径段,使用最后一个路径段
|
||||
if path_parts:
|
||||
last_part = path_parts[-1]
|
||||
if last_part:
|
||||
name = last_part.replace('-', ' ').title()
|
||||
return name
|
||||
|
||||
return "Unknown Product"
|
||||
except Exception as e:
|
||||
logger.error(f"从URL提取产品名称失败: {e}")
|
||||
return "Unknown Product"
|
||||
|
||||
def get_product_info_from_api(self, url):
|
||||
"""尝试通过API获取产品信息"""
|
||||
try:
|
||||
# 从URL中提取产品slug
|
||||
parsed_url = urlparse(url)
|
||||
path_parts = parsed_url.path.split('/')
|
||||
|
||||
product_slug = None
|
||||
for i, part in enumerate(path_parts):
|
||||
if part == 'products' and i + 1 < len(path_parts):
|
||||
product_slug = path_parts[i + 1]
|
||||
break
|
||||
|
||||
if not product_slug:
|
||||
logger.warning(f"无法从URL中提取产品slug: {url}")
|
||||
return None
|
||||
|
||||
# 尝试使用ProductHunt的GraphQL API(需要API密钥)
|
||||
# 这里我们使用一个简化的方法,只提取基本信息
|
||||
|
||||
product_info = {
|
||||
'url': url,
|
||||
'name': self.extract_product_name_from_url(url),
|
||||
'introduction': f"Product from ProductHunt: {product_slug}",
|
||||
'user_count': None, # 需要API访问
|
||||
'maker_link': None, # 需要API访问
|
||||
'maker_statement': None # 需要API访问
|
||||
}
|
||||
|
||||
logger.info(f"通过API获取产品信息: {product_info['name']}")
|
||||
return product_info
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"API获取产品信息失败: {e}")
|
||||
return None
|
||||
|
||||
def get_product_info_fallback(self, url):
|
||||
"""备用方法:从URL中提取基本信息"""
|
||||
try:
|
||||
product_name = self.extract_product_name_from_url(url)
|
||||
|
||||
product_info = {
|
||||
'url': url,
|
||||
'name': product_name,
|
||||
'introduction': f"Product from ProductHunt: {product_name}",
|
||||
'user_count': None,
|
||||
'maker_link': None,
|
||||
'maker_statement': None
|
||||
}
|
||||
|
||||
logger.info(f"使用备用方法获取产品信息: {product_info['name']}")
|
||||
return product_info
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"备用方法获取产品信息失败: {e}")
|
||||
return None
|
||||
|
||||
def run_test(self):
|
||||
"""运行测试"""
|
||||
# 从tophub_data.db获取ProductHunt链接
|
||||
tophub_db_path = os.path.join(os.path.dirname(self.db_path), "..", "tophub_data.db")
|
||||
|
||||
conn = sqlite3.connect(tophub_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 查询包含producthunt.com的链接
|
||||
cursor.execute("""
|
||||
SELECT url FROM articles
|
||||
WHERE url LIKE '%producthunt.com%'
|
||||
LIMIT 3
|
||||
""")
|
||||
|
||||
urls = [row[0] for row in cursor.fetchall()]
|
||||
conn.close()
|
||||
|
||||
logger.info(f"找到 {len(urls)} 个ProductHunt链接")
|
||||
|
||||
# 处理每个URL
|
||||
for url in urls:
|
||||
logger.info(f"处理URL: {url}")
|
||||
|
||||
# 尝试通过API获取产品信息
|
||||
product_info = self.get_product_info_from_api(url)
|
||||
|
||||
# 如果API失败,使用备用方法
|
||||
if not product_info:
|
||||
product_info = self.get_product_info_fallback(url)
|
||||
|
||||
# 如果两种方法都失败,创建基本产品信息
|
||||
if not product_info:
|
||||
product_info = {
|
||||
'url': url,
|
||||
'name': 'Unknown Product',
|
||||
'introduction': 'Unable to fetch product information',
|
||||
'user_count': None,
|
||||
'maker_link': None,
|
||||
'maker_statement': None
|
||||
}
|
||||
|
||||
# 保存到数据库
|
||||
self.save_product_info(product_info)
|
||||
|
||||
# 统计结果
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("SELECT COUNT(*) FROM products")
|
||||
count = cursor.fetchone()[0]
|
||||
|
||||
cursor.execute("SELECT name, url FROM products")
|
||||
products = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
logger.success("测试任务完成")
|
||||
|
||||
print("\n=== 测试结果统计 ===")
|
||||
print(f"数据库中的产品数量: {count}")
|
||||
print("已抓取的产品:")
|
||||
for name, url in products:
|
||||
print(f" - {name}: {url}")
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
# 配置日志
|
||||
logger.remove()
|
||||
logger.add(
|
||||
"api_scraper.log",
|
||||
level="DEBUG",
|
||||
format="{time:YYYY-MM-DD HH:mm:ss} | {level:<8} | {name}:{function}:{line} - {message}",
|
||||
rotation="10 MB",
|
||||
retention="7 days"
|
||||
)
|
||||
|
||||
# 创建抓取器实例
|
||||
scraper = ProductHuntAPIScraper()
|
||||
|
||||
# 运行测试
|
||||
scraper.run_test()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
52
product/config.py
Normal file
52
product/config.py
Normal file
@@ -0,0 +1,52 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
全功能产品系统配置文件
|
||||
"""
|
||||
|
||||
import os
|
||||
|
||||
# 数据库配置
|
||||
DATABASE_CONFIG = {
|
||||
'tophub_db_path': os.path.join(os.path.dirname(os.path.dirname(__file__)), "tophub_data.db"),
|
||||
'product_db_path': os.path.join(os.path.dirname(__file__), "products.db"),
|
||||
}
|
||||
|
||||
# Chrome调试配置
|
||||
CHROME_CONFIG = {
|
||||
'debug_port': 9222,
|
||||
'headless': False,
|
||||
'timeout': 30,
|
||||
}
|
||||
|
||||
# AI分析配置
|
||||
AI_CONFIG = {
|
||||
'api_url': "http://localhost:11434/api/generate",
|
||||
'model': "qwen3:8b",
|
||||
'timeout': 60,
|
||||
'retry_count': 3,
|
||||
'retry_delay': 5,
|
||||
}
|
||||
|
||||
# 抓取配置
|
||||
SCRAPING_CONFIG = {
|
||||
'default_limit': 0, # 0表示不限制
|
||||
'skip_duplicates': True,
|
||||
'batch_size': 10,
|
||||
'delay_between_requests': 2,
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
LOGGING_CONFIG = {
|
||||
'log_file': "integrated_product_system.log",
|
||||
'log_level': "INFO",
|
||||
'log_rotation': "10 MB",
|
||||
'log_format': "<green>{time:YYYY-MM-DD HH:mm:ss}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - <level>{message}</level>",
|
||||
}
|
||||
|
||||
# 分析配置
|
||||
ANALYSIS_CONFIG = {
|
||||
'max_products': None, # None表示分析所有产品
|
||||
'batch_size': 1, # 每次分析的产品数量
|
||||
'delay_between_analyses': 2, # 分析间隔(秒)
|
||||
}
|
||||
@@ -1,818 +0,0 @@
|
||||
2025-11-27 22:15:02.065 | INFO | __main__:__init__:38 - 初始化产品难度评分器,数据库: products.db
|
||||
2025-11-27 22:15:02.066 | INFO | __main__:score_products:190 - 开始产品难度评分
|
||||
2025-11-27 22:15:02.066 | SUCCESS | __main__:connect_to_database:44 - 成功连接到数据库: products.db
|
||||
2025-11-27 22:15:02.071 | SUCCESS | __main__:add_difficulty_score_column:62 - 成功添加difficulty_score字段
|
||||
2025-11-27 22:15:02.074 | INFO | __main__:get_unscored_products:93 - 找到 251 个未评分的产品
|
||||
2025-11-27 22:15:02.074 | INFO | __main__:score_products:207 - 准备评分 251 个产品
|
||||
2025-11-27 22:15:02.074 | INFO | __main__:score_products:212 -
|
||||
评分进度: 1/251 - 产品ID: 1
|
||||
2025-11-27 22:15:02.075 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:15:22.897 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 90
|
||||
2025-11-27 22:15:22.900 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 1 的难度评分为: 90
|
||||
2025-11-27 22:15:22.900 | SUCCESS | __main__:score_products:221 - 评分完成: 90分
|
||||
2025-11-27 22:15:22.900 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:15:24.901 | INFO | __main__:score_products:212 -
|
||||
评分进度: 2/251 - 产品ID: 2
|
||||
2025-11-27 22:15:24.901 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:15:42.061 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:15:42.066 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 2 的难度评分为: 85
|
||||
2025-11-27 22:15:42.066 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:15:42.066 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:15:44.067 | INFO | __main__:score_products:212 -
|
||||
评分进度: 3/251 - 产品ID: 3
|
||||
2025-11-27 22:15:44.068 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:15:59.877 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:15:59.882 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 3 的难度评分为: 75
|
||||
2025-11-27 22:15:59.882 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:15:59.882 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:16:01.883 | INFO | __main__:score_products:212 -
|
||||
评分进度: 4/251 - 产品ID: 4
|
||||
2025-11-27 22:16:01.884 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:16:12.907 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:16:12.912 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 4 的难度评分为: 95
|
||||
2025-11-27 22:16:12.912 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:16:12.912 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:16:14.913 | INFO | __main__:score_products:212 -
|
||||
评分进度: 5/251 - 产品ID: 5
|
||||
2025-11-27 22:16:14.914 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:16:30.206 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:16:30.211 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 5 的难度评分为: 75
|
||||
2025-11-27 22:16:30.211 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:16:30.211 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:16:32.212 | INFO | __main__:score_products:212 -
|
||||
评分进度: 6/251 - 产品ID: 6
|
||||
2025-11-27 22:16:32.213 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:16:37.802 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 60
|
||||
2025-11-27 22:16:37.806 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 6 的难度评分为: 60
|
||||
2025-11-27 22:16:37.806 | SUCCESS | __main__:score_products:221 - 评分完成: 60分
|
||||
2025-11-27 22:16:37.806 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:16:39.807 | INFO | __main__:score_products:212 -
|
||||
评分进度: 7/251 - 产品ID: 7
|
||||
2025-11-27 22:16:39.807 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:16:52.409 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:16:52.414 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 7 的难度评分为: 85
|
||||
2025-11-27 22:16:52.414 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:16:52.414 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:16:54.414 | INFO | __main__:score_products:212 -
|
||||
评分进度: 8/251 - 产品ID: 8
|
||||
2025-11-27 22:16:54.416 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:17:04.041 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:17:04.045 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 8 的难度评分为: 95
|
||||
2025-11-27 22:17:04.045 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:17:04.045 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:17:06.045 | INFO | __main__:score_products:212 -
|
||||
评分进度: 9/251 - 产品ID: 9
|
||||
2025-11-27 22:17:06.046 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:17:24.896 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 60
|
||||
2025-11-27 22:17:24.900 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 9 的难度评分为: 60
|
||||
2025-11-27 22:17:24.900 | SUCCESS | __main__:score_products:221 - 评分完成: 60分
|
||||
2025-11-27 22:17:24.900 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:17:26.901 | INFO | __main__:score_products:212 -
|
||||
评分进度: 10/251 - 产品ID: 10
|
||||
2025-11-27 22:17:26.901 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:17:42.131 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:17:42.135 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 10 的难度评分为: 85
|
||||
2025-11-27 22:17:42.135 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:17:42.136 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:17:44.136 | INFO | __main__:score_products:212 -
|
||||
评分进度: 11/251 - 产品ID: 11
|
||||
2025-11-27 22:17:44.137 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:17:58.158 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:17:58.162 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 11 的难度评分为: 95
|
||||
2025-11-27 22:17:58.162 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:17:58.162 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:18:00.163 | INFO | __main__:score_products:212 -
|
||||
评分进度: 12/251 - 产品ID: 12
|
||||
2025-11-27 22:18:00.164 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:18:08.974 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 100
|
||||
2025-11-27 22:18:08.977 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 12 的难度评分为: 100
|
||||
2025-11-27 22:18:08.977 | SUCCESS | __main__:score_products:221 - 评分完成: 100分
|
||||
2025-11-27 22:18:08.977 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:18:10.978 | INFO | __main__:score_products:212 -
|
||||
评分进度: 13/251 - 产品ID: 13
|
||||
2025-11-27 22:18:10.979 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:18:21.194 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 90
|
||||
2025-11-27 22:18:21.198 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 13 的难度评分为: 90
|
||||
2025-11-27 22:18:21.198 | SUCCESS | __main__:score_products:221 - 评分完成: 90分
|
||||
2025-11-27 22:18:21.198 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:18:23.200 | INFO | __main__:score_products:212 -
|
||||
评分进度: 14/251 - 产品ID: 14
|
||||
2025-11-27 22:18:23.201 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:18:29.891 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:18:29.895 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 14 的难度评分为: 95
|
||||
2025-11-27 22:18:29.895 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:18:29.895 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:18:31.895 | INFO | __main__:score_products:212 -
|
||||
评分进度: 15/251 - 产品ID: 15
|
||||
2025-11-27 22:18:31.896 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:18:45.906 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:18:45.910 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 15 的难度评分为: 75
|
||||
2025-11-27 22:18:45.910 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:18:45.910 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:18:47.911 | INFO | __main__:score_products:212 -
|
||||
评分进度: 16/251 - 产品ID: 16
|
||||
2025-11-27 22:18:47.912 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:18:59.078 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:18:59.082 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 16 的难度评分为: 75
|
||||
2025-11-27 22:18:59.082 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:18:59.082 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:19:01.083 | INFO | __main__:score_products:212 -
|
||||
评分进度: 17/251 - 产品ID: 17
|
||||
2025-11-27 22:19:01.083 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:19:11.227 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 60
|
||||
2025-11-27 22:19:11.231 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 17 的难度评分为: 60
|
||||
2025-11-27 22:19:11.231 | SUCCESS | __main__:score_products:221 - 评分完成: 60分
|
||||
2025-11-27 22:19:11.231 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:19:13.232 | INFO | __main__:score_products:212 -
|
||||
评分进度: 18/251 - 产品ID: 18
|
||||
2025-11-27 22:19:13.232 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:19:27.810 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:19:27.813 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 18 的难度评分为: 75
|
||||
2025-11-27 22:19:27.813 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:19:27.813 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:19:29.814 | INFO | __main__:score_products:212 -
|
||||
评分进度: 19/251 - 产品ID: 19
|
||||
2025-11-27 22:19:29.814 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:19:38.474 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:19:38.478 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 19 的难度评分为: 85
|
||||
2025-11-27 22:19:38.478 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:19:38.478 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:19:40.478 | INFO | __main__:score_products:212 -
|
||||
评分进度: 20/251 - 产品ID: 20
|
||||
2025-11-27 22:19:40.479 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:19:56.459 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:19:56.463 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 20 的难度评分为: 75
|
||||
2025-11-27 22:19:56.463 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:19:56.463 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:19:58.464 | INFO | __main__:score_products:212 -
|
||||
评分进度: 21/251 - 产品ID: 21
|
||||
2025-11-27 22:19:58.464 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:20:08.851 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:20:08.855 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 21 的难度评分为: 85
|
||||
2025-11-27 22:20:08.855 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:20:08.856 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:20:10.857 | INFO | __main__:score_products:212 -
|
||||
评分进度: 22/251 - 产品ID: 22
|
||||
2025-11-27 22:20:10.858 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:20:28.350 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:20:28.355 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 22 的难度评分为: 95
|
||||
2025-11-27 22:20:28.355 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:20:28.355 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:20:30.356 | INFO | __main__:score_products:212 -
|
||||
评分进度: 23/251 - 产品ID: 23
|
||||
2025-11-27 22:20:30.356 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:20:46.974 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:20:46.979 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 23 的难度评分为: 95
|
||||
2025-11-27 22:20:46.979 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:20:46.979 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:20:48.979 | INFO | __main__:score_products:212 -
|
||||
评分进度: 24/251 - 产品ID: 24
|
||||
2025-11-27 22:20:48.979 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:21:02.432 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 65
|
||||
2025-11-27 22:21:02.437 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 24 的难度评分为: 65
|
||||
2025-11-27 22:21:02.437 | SUCCESS | __main__:score_products:221 - 评分完成: 65分
|
||||
2025-11-27 22:21:02.437 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:21:04.438 | INFO | __main__:score_products:212 -
|
||||
评分进度: 25/251 - 产品ID: 25
|
||||
2025-11-27 22:21:04.438 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:21:10.182 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:21:10.187 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 25 的难度评分为: 85
|
||||
2025-11-27 22:21:10.187 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:21:10.187 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:21:12.188 | INFO | __main__:score_products:212 -
|
||||
评分进度: 26/251 - 产品ID: 26
|
||||
2025-11-27 22:21:12.189 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:21:25.692 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:21:25.696 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 26 的难度评分为: 85
|
||||
2025-11-27 22:21:25.696 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:21:25.697 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:21:27.697 | INFO | __main__:score_products:212 -
|
||||
评分进度: 27/251 - 产品ID: 27
|
||||
2025-11-27 22:21:27.698 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:21:42.789 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:21:42.793 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 27 的难度评分为: 95
|
||||
2025-11-27 22:21:42.793 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:21:42.794 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:21:44.794 | INFO | __main__:score_products:212 -
|
||||
评分进度: 28/251 - 产品ID: 28
|
||||
2025-11-27 22:21:44.795 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:21:58.897 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:21:58.902 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 28 的难度评分为: 95
|
||||
2025-11-27 22:21:58.902 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:21:58.902 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:22:00.903 | INFO | __main__:score_products:212 -
|
||||
评分进度: 29/251 - 产品ID: 29
|
||||
2025-11-27 22:22:00.903 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:22:10.583 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:22:10.587 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 29 的难度评分为: 85
|
||||
2025-11-27 22:22:10.587 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:22:10.587 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:22:12.588 | INFO | __main__:score_products:212 -
|
||||
评分进度: 30/251 - 产品ID: 30
|
||||
2025-11-27 22:22:12.589 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:22:30.462 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:22:30.467 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 30 的难度评分为: 75
|
||||
2025-11-27 22:22:30.467 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:22:30.467 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:22:32.467 | INFO | __main__:score_products:212 -
|
||||
评分进度: 31/251 - 产品ID: 31
|
||||
2025-11-27 22:22:32.468 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:22:41.026 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:22:41.032 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 31 的难度评分为: 75
|
||||
2025-11-27 22:22:41.032 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:22:41.032 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:22:43.033 | INFO | __main__:score_products:212 -
|
||||
评分进度: 32/251 - 产品ID: 32
|
||||
2025-11-27 22:22:43.034 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:22:51.204 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:22:51.208 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 32 的难度评分为: 85
|
||||
2025-11-27 22:22:51.208 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:22:51.208 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:22:53.209 | INFO | __main__:score_products:212 -
|
||||
评分进度: 33/251 - 产品ID: 33
|
||||
2025-11-27 22:22:53.209 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:23:07.564 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 90
|
||||
2025-11-27 22:23:07.568 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 33 的难度评分为: 90
|
||||
2025-11-27 22:23:07.568 | SUCCESS | __main__:score_products:221 - 评分完成: 90分
|
||||
2025-11-27 22:23:07.568 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:23:09.569 | INFO | __main__:score_products:212 -
|
||||
评分进度: 34/251 - 产品ID: 34
|
||||
2025-11-27 22:23:09.570 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:23:21.371 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:23:21.375 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 34 的难度评分为: 75
|
||||
2025-11-27 22:23:21.375 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:23:21.375 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:23:23.375 | INFO | __main__:score_products:212 -
|
||||
评分进度: 35/251 - 产品ID: 35
|
||||
2025-11-27 22:23:23.376 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:23:38.365 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:23:38.368 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 35 的难度评分为: 75
|
||||
2025-11-27 22:23:38.369 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:23:38.369 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:23:40.369 | INFO | __main__:score_products:212 -
|
||||
评分进度: 36/251 - 产品ID: 36
|
||||
2025-11-27 22:23:40.369 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:23:50.821 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:23:50.826 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 36 的难度评分为: 85
|
||||
2025-11-27 22:23:50.826 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:23:50.826 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:23:52.827 | INFO | __main__:score_products:212 -
|
||||
评分进度: 37/251 - 产品ID: 37
|
||||
2025-11-27 22:23:52.827 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:24:07.978 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:24:07.983 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 37 的难度评分为: 95
|
||||
2025-11-27 22:24:07.983 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:24:07.983 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:24:09.983 | INFO | __main__:score_products:212 -
|
||||
评分进度: 38/251 - 产品ID: 38
|
||||
2025-11-27 22:24:09.984 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:24:31.439 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:24:31.443 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 38 的难度评分为: 85
|
||||
2025-11-27 22:24:31.443 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:24:31.443 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:24:33.444 | INFO | __main__:score_products:212 -
|
||||
评分进度: 39/251 - 产品ID: 39
|
||||
2025-11-27 22:24:33.445 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:25:04.537 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:25:04.541 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 39 的难度评分为: 85
|
||||
2025-11-27 22:25:04.541 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:25:04.541 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:25:06.541 | INFO | __main__:score_products:212 -
|
||||
评分进度: 40/251 - 产品ID: 40
|
||||
2025-11-27 22:25:06.542 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:25:18.764 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:25:18.767 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 40 的难度评分为: 85
|
||||
2025-11-27 22:25:18.767 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:25:18.767 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:25:20.768 | INFO | __main__:score_products:212 -
|
||||
评分进度: 41/251 - 产品ID: 41
|
||||
2025-11-27 22:25:20.769 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:25:36.627 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:25:36.632 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 41 的难度评分为: 75
|
||||
2025-11-27 22:25:36.632 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:25:36.632 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:25:38.632 | INFO | __main__:score_products:212 -
|
||||
评分进度: 42/251 - 产品ID: 42
|
||||
2025-11-27 22:25:38.633 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:26:02.058 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:26:02.063 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 42 的难度评分为: 85
|
||||
2025-11-27 22:26:02.063 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:26:02.063 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:26:04.064 | INFO | __main__:score_products:212 -
|
||||
评分进度: 43/251 - 产品ID: 43
|
||||
2025-11-27 22:26:04.064 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:26:15.507 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:26:15.511 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 43 的难度评分为: 95
|
||||
2025-11-27 22:26:15.511 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:26:15.511 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:26:17.512 | INFO | __main__:score_products:212 -
|
||||
评分进度: 44/251 - 产品ID: 44
|
||||
2025-11-27 22:26:17.512 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:26:31.613 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:26:31.617 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 44 的难度评分为: 85
|
||||
2025-11-27 22:26:31.617 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:26:31.617 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:26:33.618 | INFO | __main__:score_products:212 -
|
||||
评分进度: 45/251 - 产品ID: 45
|
||||
2025-11-27 22:26:33.619 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:26:54.906 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:26:54.910 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 45 的难度评分为: 85
|
||||
2025-11-27 22:26:54.910 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:26:54.910 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:26:56.911 | INFO | __main__:score_products:212 -
|
||||
评分进度: 46/251 - 产品ID: 46
|
||||
2025-11-27 22:26:56.911 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:27:09.484 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:27:09.489 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 46 的难度评分为: 85
|
||||
2025-11-27 22:27:09.489 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:27:09.489 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:27:11.490 | INFO | __main__:score_products:212 -
|
||||
评分进度: 47/251 - 产品ID: 47
|
||||
2025-11-27 22:27:11.491 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:27:25.136 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 90
|
||||
2025-11-27 22:27:25.140 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 47 的难度评分为: 90
|
||||
2025-11-27 22:27:25.141 | SUCCESS | __main__:score_products:221 - 评分完成: 90分
|
||||
2025-11-27 22:27:25.141 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:27:27.141 | INFO | __main__:score_products:212 -
|
||||
评分进度: 48/251 - 产品ID: 48
|
||||
2025-11-27 22:27:27.142 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:27:52.128 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 90
|
||||
2025-11-27 22:27:52.131 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 48 的难度评分为: 90
|
||||
2025-11-27 22:27:52.131 | SUCCESS | __main__:score_products:221 - 评分完成: 90分
|
||||
2025-11-27 22:27:52.131 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:27:54.132 | INFO | __main__:score_products:212 -
|
||||
评分进度: 49/251 - 产品ID: 49
|
||||
2025-11-27 22:27:54.133 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:28:10.443 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:28:10.447 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 49 的难度评分为: 95
|
||||
2025-11-27 22:28:10.447 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:28:10.448 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:28:12.448 | INFO | __main__:score_products:212 -
|
||||
评分进度: 50/251 - 产品ID: 50
|
||||
2025-11-27 22:28:12.448 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:28:24.343 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:28:24.348 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 50 的难度评分为: 95
|
||||
2025-11-27 22:28:24.348 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:28:24.348 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:28:26.349 | INFO | __main__:score_products:212 -
|
||||
评分进度: 51/251 - 产品ID: 51
|
||||
2025-11-27 22:28:26.350 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:28:41.099 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:28:41.104 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 51 的难度评分为: 85
|
||||
2025-11-27 22:28:41.104 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:28:41.104 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:28:43.105 | INFO | __main__:score_products:212 -
|
||||
评分进度: 52/251 - 产品ID: 52
|
||||
2025-11-27 22:28:43.106 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:28:55.393 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:28:55.397 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 52 的难度评分为: 75
|
||||
2025-11-27 22:28:55.397 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:28:55.397 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:28:57.398 | INFO | __main__:score_products:212 -
|
||||
评分进度: 53/251 - 产品ID: 53
|
||||
2025-11-27 22:28:57.398 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:29:10.087 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:29:10.091 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 53 的难度评分为: 75
|
||||
2025-11-27 22:29:10.091 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:29:10.091 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:29:12.092 | INFO | __main__:score_products:212 -
|
||||
评分进度: 54/251 - 产品ID: 54
|
||||
2025-11-27 22:29:12.092 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:29:23.753 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:29:23.755 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 54 的难度评分为: 85
|
||||
2025-11-27 22:29:23.756 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:29:23.756 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:29:25.756 | INFO | __main__:score_products:212 -
|
||||
评分进度: 55/251 - 产品ID: 55
|
||||
2025-11-27 22:29:25.756 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:29:37.465 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:29:37.469 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 55 的难度评分为: 75
|
||||
2025-11-27 22:29:37.469 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:29:37.469 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:29:39.469 | INFO | __main__:score_products:212 -
|
||||
评分进度: 56/251 - 产品ID: 56
|
||||
2025-11-27 22:29:39.470 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:29:53.805 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 70
|
||||
2025-11-27 22:29:53.810 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 56 的难度评分为: 70
|
||||
2025-11-27 22:29:53.810 | SUCCESS | __main__:score_products:221 - 评分完成: 70分
|
||||
2025-11-27 22:29:53.811 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:29:55.812 | INFO | __main__:score_products:212 -
|
||||
评分进度: 57/251 - 产品ID: 57
|
||||
2025-11-27 22:29:55.812 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:30:11.152 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:30:11.156 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 57 的难度评分为: 85
|
||||
2025-11-27 22:30:11.156 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:30:11.156 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:30:13.157 | INFO | __main__:score_products:212 -
|
||||
评分进度: 58/251 - 产品ID: 58
|
||||
2025-11-27 22:30:13.157 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:30:21.557 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:30:21.561 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 58 的难度评分为: 95
|
||||
2025-11-27 22:30:21.561 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:30:21.561 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:30:23.562 | INFO | __main__:score_products:212 -
|
||||
评分进度: 59/251 - 产品ID: 59
|
||||
2025-11-27 22:30:23.562 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:30:34.610 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 90
|
||||
2025-11-27 22:30:34.613 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 59 的难度评分为: 90
|
||||
2025-11-27 22:30:34.613 | SUCCESS | __main__:score_products:221 - 评分完成: 90分
|
||||
2025-11-27 22:30:34.613 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:30:36.613 | INFO | __main__:score_products:212 -
|
||||
评分进度: 60/251 - 产品ID: 60
|
||||
2025-11-27 22:30:36.614 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:30:53.797 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 70
|
||||
2025-11-27 22:30:53.801 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 60 的难度评分为: 70
|
||||
2025-11-27 22:30:53.801 | SUCCESS | __main__:score_products:221 - 评分完成: 70分
|
||||
2025-11-27 22:30:53.801 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:30:55.802 | INFO | __main__:score_products:212 -
|
||||
评分进度: 61/251 - 产品ID: 61
|
||||
2025-11-27 22:30:55.802 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:31:07.842 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:31:07.846 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 61 的难度评分为: 75
|
||||
2025-11-27 22:31:07.846 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:31:07.847 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:31:09.847 | INFO | __main__:score_products:212 -
|
||||
评分进度: 62/251 - 产品ID: 62
|
||||
2025-11-27 22:31:09.847 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:31:17.957 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:31:17.961 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 62 的难度评分为: 85
|
||||
2025-11-27 22:31:17.961 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:31:17.961 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:31:19.962 | INFO | __main__:score_products:212 -
|
||||
评分进度: 63/251 - 产品ID: 63
|
||||
2025-11-27 22:31:19.963 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:31:35.601 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:31:35.606 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 63 的难度评分为: 75
|
||||
2025-11-27 22:31:35.606 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:31:35.606 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:31:37.606 | INFO | __main__:score_products:212 -
|
||||
评分进度: 64/251 - 产品ID: 64
|
||||
2025-11-27 22:31:37.607 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:31:54.718 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:31:54.722 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 64 的难度评分为: 85
|
||||
2025-11-27 22:31:54.722 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:31:54.723 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:31:56.723 | INFO | __main__:score_products:212 -
|
||||
评分进度: 65/251 - 产品ID: 65
|
||||
2025-11-27 22:31:56.724 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:32:06.981 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 65
|
||||
2025-11-27 22:32:06.987 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 65 的难度评分为: 65
|
||||
2025-11-27 22:32:06.987 | SUCCESS | __main__:score_products:221 - 评分完成: 65分
|
||||
2025-11-27 22:32:06.987 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:32:08.987 | INFO | __main__:score_products:212 -
|
||||
评分进度: 66/251 - 产品ID: 66
|
||||
2025-11-27 22:32:08.988 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:32:22.253 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:32:22.257 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 66 的难度评分为: 75
|
||||
2025-11-27 22:32:22.257 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:32:22.257 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:32:24.258 | INFO | __main__:score_products:212 -
|
||||
评分进度: 67/251 - 产品ID: 67
|
||||
2025-11-27 22:32:24.258 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:32:42.900 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:32:42.906 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 67 的难度评分为: 85
|
||||
2025-11-27 22:32:42.906 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:32:42.906 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:32:44.906 | INFO | __main__:score_products:212 -
|
||||
评分进度: 68/251 - 产品ID: 68
|
||||
2025-11-27 22:32:44.907 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:32:58.072 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 60
|
||||
2025-11-27 22:32:58.078 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 68 的难度评分为: 60
|
||||
2025-11-27 22:32:58.078 | SUCCESS | __main__:score_products:221 - 评分完成: 60分
|
||||
2025-11-27 22:32:58.078 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:33:00.078 | INFO | __main__:score_products:212 -
|
||||
评分进度: 69/251 - 产品ID: 69
|
||||
2025-11-27 22:33:00.079 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:33:17.223 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 100
|
||||
2025-11-27 22:33:17.228 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 69 的难度评分为: 100
|
||||
2025-11-27 22:33:17.228 | SUCCESS | __main__:score_products:221 - 评分完成: 100分
|
||||
2025-11-27 22:33:17.228 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:33:19.229 | INFO | __main__:score_products:212 -
|
||||
评分进度: 70/251 - 产品ID: 70
|
||||
2025-11-27 22:33:19.230 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:33:35.768 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:33:35.773 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 70 的难度评分为: 85
|
||||
2025-11-27 22:33:35.773 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:33:35.773 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:33:37.774 | INFO | __main__:score_products:212 -
|
||||
评分进度: 71/251 - 产品ID: 71
|
||||
2025-11-27 22:33:37.774 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:33:50.953 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:33:50.957 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 71 的难度评分为: 75
|
||||
2025-11-27 22:33:50.957 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:33:50.957 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:33:52.958 | INFO | __main__:score_products:212 -
|
||||
评分进度: 72/251 - 产品ID: 72
|
||||
2025-11-27 22:33:52.959 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:34:06.272 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:34:06.278 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 72 的难度评分为: 75
|
||||
2025-11-27 22:34:06.278 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:34:06.278 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:34:08.278 | INFO | __main__:score_products:212 -
|
||||
评分进度: 73/251 - 产品ID: 73
|
||||
2025-11-27 22:34:08.279 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:34:27.380 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 90
|
||||
2025-11-27 22:34:27.387 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 73 的难度评分为: 90
|
||||
2025-11-27 22:34:27.387 | SUCCESS | __main__:score_products:221 - 评分完成: 90分
|
||||
2025-11-27 22:34:27.387 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:34:29.387 | INFO | __main__:score_products:212 -
|
||||
评分进度: 74/251 - 产品ID: 74
|
||||
2025-11-27 22:34:29.388 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:34:41.841 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:34:41.844 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 74 的难度评分为: 85
|
||||
2025-11-27 22:34:41.844 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:34:41.844 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:34:43.845 | INFO | __main__:score_products:212 -
|
||||
评分进度: 75/251 - 产品ID: 75
|
||||
2025-11-27 22:34:43.845 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:34:54.980 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:34:54.984 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 75 的难度评分为: 75
|
||||
2025-11-27 22:34:54.984 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:34:54.984 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:34:56.984 | INFO | __main__:score_products:212 -
|
||||
评分进度: 76/251 - 产品ID: 76
|
||||
2025-11-27 22:34:56.985 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:35:08.186 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:35:08.191 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 76 的难度评分为: 75
|
||||
2025-11-27 22:35:08.191 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:35:08.191 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:35:10.192 | INFO | __main__:score_products:212 -
|
||||
评分进度: 77/251 - 产品ID: 77
|
||||
2025-11-27 22:35:10.193 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:35:15.593 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:35:15.597 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 77 的难度评分为: 85
|
||||
2025-11-27 22:35:15.597 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:35:15.597 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:35:17.597 | INFO | __main__:score_products:212 -
|
||||
评分进度: 78/251 - 产品ID: 78
|
||||
2025-11-27 22:35:17.598 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:35:30.231 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:35:30.235 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 78 的难度评分为: 75
|
||||
2025-11-27 22:35:30.235 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:35:30.235 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:35:32.235 | INFO | __main__:score_products:212 -
|
||||
评分进度: 79/251 - 产品ID: 79
|
||||
2025-11-27 22:35:32.236 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:35:45.524 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:35:45.528 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 79 的难度评分为: 75
|
||||
2025-11-27 22:35:45.528 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:35:45.528 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:35:47.528 | INFO | __main__:score_products:212 -
|
||||
评分进度: 80/251 - 产品ID: 80
|
||||
2025-11-27 22:35:47.529 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:36:01.332 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 65
|
||||
2025-11-27 22:36:01.335 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 80 的难度评分为: 65
|
||||
2025-11-27 22:36:01.335 | SUCCESS | __main__:score_products:221 - 评分完成: 65分
|
||||
2025-11-27 22:36:01.335 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:36:03.336 | INFO | __main__:score_products:212 -
|
||||
评分进度: 81/251 - 产品ID: 81
|
||||
2025-11-27 22:36:03.337 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:36:15.964 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:36:15.967 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 81 的难度评分为: 85
|
||||
2025-11-27 22:36:15.967 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:36:15.967 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:36:17.967 | INFO | __main__:score_products:212 -
|
||||
评分进度: 82/251 - 产品ID: 82
|
||||
2025-11-27 22:36:17.968 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:36:33.251 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:36:33.255 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 82 的难度评分为: 95
|
||||
2025-11-27 22:36:33.256 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:36:33.256 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:36:35.256 | INFO | __main__:score_products:212 -
|
||||
评分进度: 83/251 - 产品ID: 83
|
||||
2025-11-27 22:36:35.256 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:36:49.059 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 90
|
||||
2025-11-27 22:36:49.063 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 83 的难度评分为: 90
|
||||
2025-11-27 22:36:49.063 | SUCCESS | __main__:score_products:221 - 评分完成: 90分
|
||||
2025-11-27 22:36:49.063 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:36:51.064 | INFO | __main__:score_products:212 -
|
||||
评分进度: 84/251 - 产品ID: 84
|
||||
2025-11-27 22:36:51.064 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:37:05.285 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:37:05.288 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 84 的难度评分为: 85
|
||||
2025-11-27 22:37:05.289 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:37:05.289 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:37:07.289 | INFO | __main__:score_products:212 -
|
||||
评分进度: 85/251 - 产品ID: 85
|
||||
2025-11-27 22:37:07.290 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:37:19.469 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 90
|
||||
2025-11-27 22:37:19.473 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 85 的难度评分为: 90
|
||||
2025-11-27 22:37:19.473 | SUCCESS | __main__:score_products:221 - 评分完成: 90分
|
||||
2025-11-27 22:37:19.473 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:37:21.473 | INFO | __main__:score_products:212 -
|
||||
评分进度: 86/251 - 产品ID: 86
|
||||
2025-11-27 22:37:21.474 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:37:34.519 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:37:34.522 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 86 的难度评分为: 85
|
||||
2025-11-27 22:37:34.523 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:37:34.523 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:37:36.523 | INFO | __main__:score_products:212 -
|
||||
评分进度: 87/251 - 产品ID: 87
|
||||
2025-11-27 22:37:36.524 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:37:50.313 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:37:50.317 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 87 的难度评分为: 85
|
||||
2025-11-27 22:37:50.317 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:37:50.317 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:37:52.317 | INFO | __main__:score_products:212 -
|
||||
评分进度: 88/251 - 产品ID: 88
|
||||
2025-11-27 22:37:52.318 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:37:59.835 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:37:59.839 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 88 的难度评分为: 75
|
||||
2025-11-27 22:37:59.839 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:37:59.839 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:38:01.839 | INFO | __main__:score_products:212 -
|
||||
评分进度: 89/251 - 产品ID: 89
|
||||
2025-11-27 22:38:01.840 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:38:17.211 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 65
|
||||
2025-11-27 22:38:17.215 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 89 的难度评分为: 65
|
||||
2025-11-27 22:38:17.215 | SUCCESS | __main__:score_products:221 - 评分完成: 65分
|
||||
2025-11-27 22:38:17.215 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:38:19.216 | INFO | __main__:score_products:212 -
|
||||
评分进度: 90/251 - 产品ID: 90
|
||||
2025-11-27 22:38:19.216 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:38:41.217 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:38:41.221 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 90 的难度评分为: 75
|
||||
2025-11-27 22:38:41.221 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:38:41.221 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:38:43.222 | INFO | __main__:score_products:212 -
|
||||
评分进度: 91/251 - 产品ID: 91
|
||||
2025-11-27 22:38:43.223 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:38:56.247 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 70
|
||||
2025-11-27 22:38:56.252 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 91 的难度评分为: 70
|
||||
2025-11-27 22:38:56.252 | SUCCESS | __main__:score_products:221 - 评分完成: 70分
|
||||
2025-11-27 22:38:56.252 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:38:58.252 | INFO | __main__:score_products:212 -
|
||||
评分进度: 92/251 - 产品ID: 92
|
||||
2025-11-27 22:38:58.253 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:39:05.522 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 100
|
||||
2025-11-27 22:39:05.527 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 92 的难度评分为: 100
|
||||
2025-11-27 22:39:05.527 | SUCCESS | __main__:score_products:221 - 评分完成: 100分
|
||||
2025-11-27 22:39:05.527 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:39:07.527 | INFO | __main__:score_products:212 -
|
||||
评分进度: 93/251 - 产品ID: 93
|
||||
2025-11-27 22:39:07.528 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:39:22.890 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 60
|
||||
2025-11-27 22:39:22.894 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 93 的难度评分为: 60
|
||||
2025-11-27 22:39:22.895 | SUCCESS | __main__:score_products:221 - 评分完成: 60分
|
||||
2025-11-27 22:39:22.895 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:39:24.895 | INFO | __main__:score_products:212 -
|
||||
评分进度: 94/251 - 产品ID: 94
|
||||
2025-11-27 22:39:24.895 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:39:42.951 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:39:42.956 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 94 的难度评分为: 75
|
||||
2025-11-27 22:39:42.956 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:39:42.956 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:39:44.957 | INFO | __main__:score_products:212 -
|
||||
评分进度: 95/251 - 产品ID: 95
|
||||
2025-11-27 22:39:44.958 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:39:58.088 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:39:58.093 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 95 的难度评分为: 85
|
||||
2025-11-27 22:39:58.093 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:39:58.094 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:40:00.094 | INFO | __main__:score_products:212 -
|
||||
评分进度: 96/251 - 产品ID: 96
|
||||
2025-11-27 22:40:00.095 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:40:09.793 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:40:09.797 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 96 的难度评分为: 75
|
||||
2025-11-27 22:40:09.797 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:40:09.797 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:40:11.797 | INFO | __main__:score_products:212 -
|
||||
评分进度: 97/251 - 产品ID: 97
|
||||
2025-11-27 22:40:11.798 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:40:27.589 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:40:27.593 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 97 的难度评分为: 75
|
||||
2025-11-27 22:40:27.594 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:40:27.594 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:40:29.594 | INFO | __main__:score_products:212 -
|
||||
评分进度: 98/251 - 产品ID: 98
|
||||
2025-11-27 22:40:29.595 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:40:42.639 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:40:42.645 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 98 的难度评分为: 95
|
||||
2025-11-27 22:40:42.645 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:40:42.645 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:40:44.646 | INFO | __main__:score_products:212 -
|
||||
评分进度: 99/251 - 产品ID: 99
|
||||
2025-11-27 22:40:44.646 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:40:54.784 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:40:54.788 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 99 的难度评分为: 85
|
||||
2025-11-27 22:40:54.788 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:40:54.788 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:40:56.788 | INFO | __main__:score_products:212 -
|
||||
评分进度: 100/251 - 产品ID: 100
|
||||
2025-11-27 22:40:56.789 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:41:12.314 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:41:12.318 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 100 的难度评分为: 85
|
||||
2025-11-27 22:41:12.318 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:41:12.318 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:41:14.319 | INFO | __main__:score_products:212 -
|
||||
评分进度: 101/251 - 产品ID: 101
|
||||
2025-11-27 22:41:14.320 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:41:21.103 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 60
|
||||
2025-11-27 22:41:21.107 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 101 的难度评分为: 60
|
||||
2025-11-27 22:41:21.107 | SUCCESS | __main__:score_products:221 - 评分完成: 60分
|
||||
2025-11-27 22:41:21.107 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:41:23.108 | INFO | __main__:score_products:212 -
|
||||
评分进度: 102/251 - 产品ID: 102
|
||||
2025-11-27 22:41:23.109 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:41:33.685 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:41:33.689 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 102 的难度评分为: 95
|
||||
2025-11-27 22:41:33.689 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:41:33.689 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:41:35.690 | INFO | __main__:score_products:212 -
|
||||
评分进度: 103/251 - 产品ID: 103
|
||||
2025-11-27 22:41:35.690 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:41:46.143 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:41:46.147 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 103 的难度评分为: 85
|
||||
2025-11-27 22:41:46.147 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:41:46.147 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:41:48.148 | INFO | __main__:score_products:212 -
|
||||
评分进度: 104/251 - 产品ID: 104
|
||||
2025-11-27 22:41:48.148 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:41:59.316 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:41:59.321 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 104 的难度评分为: 85
|
||||
2025-11-27 22:41:59.321 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:41:59.321 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:42:01.321 | INFO | __main__:score_products:212 -
|
||||
评分进度: 105/251 - 产品ID: 105
|
||||
2025-11-27 22:42:01.322 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:42:15.088 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:42:15.093 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 105 的难度评分为: 75
|
||||
2025-11-27 22:42:15.093 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:42:15.093 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:42:17.094 | INFO | __main__:score_products:212 -
|
||||
评分进度: 106/251 - 产品ID: 106
|
||||
2025-11-27 22:42:17.094 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:42:30.720 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 65
|
||||
2025-11-27 22:42:30.724 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 106 的难度评分为: 65
|
||||
2025-11-27 22:42:30.724 | SUCCESS | __main__:score_products:221 - 评分完成: 65分
|
||||
2025-11-27 22:42:30.724 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:42:32.725 | INFO | __main__:score_products:212 -
|
||||
评分进度: 107/251 - 产品ID: 107
|
||||
2025-11-27 22:42:32.726 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:42:42.705 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:42:42.710 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 107 的难度评分为: 85
|
||||
2025-11-27 22:42:42.710 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:42:42.710 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:42:44.711 | INFO | __main__:score_products:212 -
|
||||
评分进度: 108/251 - 产品ID: 108
|
||||
2025-11-27 22:42:44.712 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:42:57.337 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:42:57.341 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 108 的难度评分为: 75
|
||||
2025-11-27 22:42:57.341 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:42:57.341 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:42:59.342 | INFO | __main__:score_products:212 -
|
||||
评分进度: 109/251 - 产品ID: 109
|
||||
2025-11-27 22:42:59.342 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:43:10.384 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:43:10.388 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 109 的难度评分为: 85
|
||||
2025-11-27 22:43:10.388 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:43:10.388 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:43:12.389 | INFO | __main__:score_products:212 -
|
||||
评分进度: 110/251 - 产品ID: 110
|
||||
2025-11-27 22:43:12.389 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:43:24.284 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:43:24.287 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 110 的难度评分为: 75
|
||||
2025-11-27 22:43:24.287 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:43:24.287 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:43:26.288 | INFO | __main__:score_products:212 -
|
||||
评分进度: 111/251 - 产品ID: 111
|
||||
2025-11-27 22:43:26.289 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:43:36.921 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:43:36.925 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 111 的难度评分为: 85
|
||||
2025-11-27 22:43:36.925 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:43:36.925 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:43:38.926 | INFO | __main__:score_products:212 -
|
||||
评分进度: 112/251 - 产品ID: 112
|
||||
2025-11-27 22:43:38.926 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:43:46.973 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:43:46.978 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 112 的难度评分为: 85
|
||||
2025-11-27 22:43:46.978 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:43:46.978 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:43:48.979 | INFO | __main__:score_products:212 -
|
||||
评分进度: 113/251 - 产品ID: 113
|
||||
2025-11-27 22:43:48.979 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:44:06.897 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 95
|
||||
2025-11-27 22:44:06.901 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 113 的难度评分为: 95
|
||||
2025-11-27 22:44:06.901 | SUCCESS | __main__:score_products:221 - 评分完成: 95分
|
||||
2025-11-27 22:44:06.901 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:44:08.902 | INFO | __main__:score_products:212 -
|
||||
评分进度: 114/251 - 产品ID: 114
|
||||
2025-11-27 22:44:08.902 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:44:31.885 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 85
|
||||
2025-11-27 22:44:31.890 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 114 的难度评分为: 85
|
||||
2025-11-27 22:44:31.890 | SUCCESS | __main__:score_products:221 - 评分完成: 85分
|
||||
2025-11-27 22:44:31.890 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:44:33.891 | INFO | __main__:score_products:212 -
|
||||
评分进度: 115/251 - 产品ID: 115
|
||||
2025-11-27 22:44:33.891 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:45:10.222 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 90
|
||||
2025-11-27 22:45:10.226 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 115 的难度评分为: 90
|
||||
2025-11-27 22:45:10.226 | SUCCESS | __main__:score_products:221 - 评分完成: 90分
|
||||
2025-11-27 22:45:10.227 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
2025-11-27 22:45:12.227 | INFO | __main__:score_products:212 -
|
||||
评分进度: 116/251 - 产品ID: 116
|
||||
2025-11-27 22:45:12.228 | INFO | __main__:call_ollama_for_scoring:139 - 调用Ollama API进行难度评分
|
||||
2025-11-27 22:45:44.910 | SUCCESS | __main__:call_ollama_for_scoring:157 - 获得评分: 75
|
||||
2025-11-27 22:45:44.914 | SUCCESS | __main__:update_difficulty_score:182 - 更新产品ID 116 的难度评分为: 75
|
||||
2025-11-27 22:45:44.914 | SUCCESS | __main__:score_products:221 - 评分完成: 75分
|
||||
2025-11-27 22:45:44.914 | INFO | __main__:score_products:226 - 等待2秒后继续...
|
||||
@@ -1,250 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
产品难度评分脚本
|
||||
读取product_analysis表,增加难度评分字段,使用Ollama API进行智能评分
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import os
|
||||
import time
|
||||
from typing import List, Tuple, Optional
|
||||
from loguru import logger
|
||||
import requests
|
||||
import json
|
||||
|
||||
class DifficultyScorer:
|
||||
"""产品难度评分器"""
|
||||
|
||||
def __init__(self, db_path: str = "products.db"):
|
||||
"""
|
||||
初始化评分器
|
||||
|
||||
Args:
|
||||
db_path: 数据库文件路径
|
||||
"""
|
||||
self.db_path = db_path
|
||||
self.api_url = "http://localhost:11434/api/generate"
|
||||
|
||||
# 检查数据库文件是否存在
|
||||
if not os.path.exists(db_path):
|
||||
current_dir_db = os.path.join(os.path.dirname(__file__), db_path)
|
||||
if os.path.exists(current_dir_db):
|
||||
self.db_path = current_dir_db
|
||||
logger.info(f"使用当前目录下的数据库文件: {current_dir_db}")
|
||||
else:
|
||||
raise FileNotFoundError(f"数据库文件不存在: {db_path} 和 {current_dir_db}")
|
||||
|
||||
logger.info(f"初始化产品难度评分器,数据库: {self.db_path}")
|
||||
|
||||
def connect_to_database(self) -> sqlite3.Connection:
|
||||
"""连接到SQLite数据库"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
logger.success(f"成功连接到数据库: {self.db_path}")
|
||||
return conn
|
||||
except Exception as e:
|
||||
logger.error(f"连接数据库失败: {e}")
|
||||
raise
|
||||
|
||||
def add_difficulty_score_column(self, conn: sqlite3.Connection):
|
||||
"""添加难度评分字段"""
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 检查字段是否已存在
|
||||
cursor.execute("PRAGMA table_info(product_analysis)")
|
||||
columns = [row[1] for row in cursor.fetchall()]
|
||||
|
||||
if 'difficulty_score' not in columns:
|
||||
cursor.execute("ALTER TABLE product_analysis ADD COLUMN difficulty_score INTEGER")
|
||||
conn.commit()
|
||||
logger.success("成功添加difficulty_score字段")
|
||||
else:
|
||||
logger.info("difficulty_score字段已存在")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"添加难度评分字段失败: {e}")
|
||||
raise
|
||||
|
||||
def get_unscored_products(self, conn: sqlite3.Connection) -> List[Tuple]:
|
||||
"""
|
||||
获取未评分的产品数据
|
||||
|
||||
Args:
|
||||
conn: 数据库连接
|
||||
|
||||
Returns:
|
||||
产品数据列表,每个元素为(id, ai_response)
|
||||
"""
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 查询未评分的产品
|
||||
cursor.execute("""
|
||||
SELECT id, ai_response
|
||||
FROM product_analysis
|
||||
WHERE difficulty_score IS NULL
|
||||
AND ai_response IS NOT NULL
|
||||
AND ai_response != ''
|
||||
""")
|
||||
|
||||
products = cursor.fetchall()
|
||||
logger.info(f"找到 {len(products)} 个未评分的产品")
|
||||
|
||||
return products
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"获取未评分产品数据失败: {e}")
|
||||
raise
|
||||
|
||||
def call_ollama_for_scoring(self, ai_response: str) -> Optional[int]:
|
||||
"""
|
||||
调用Ollama API进行难度评分
|
||||
|
||||
Args:
|
||||
ai_response: AI响应内容
|
||||
|
||||
Returns:
|
||||
评分(0-100),失败时返回None
|
||||
"""
|
||||
try:
|
||||
# 构建评分提示
|
||||
prompt = f"""
|
||||
请根据以下产品开发难度描述,给出一个0-100分的难度评分:
|
||||
|
||||
难度描述:{ai_response}
|
||||
|
||||
评分标准:
|
||||
- 90-100分:个人开发极其困难,需要大量专业知识和团队协作
|
||||
- 70-89分:相对困难,需要较强的技术能力和较多时间
|
||||
- 50-69分:中等难度,需要一定的技术基础
|
||||
- 30-49分:相对简单,有基础即可开发
|
||||
- 10-29分:非常简单,入门级别
|
||||
- 0-9分:极其简单,几乎无难度
|
||||
|
||||
请只返回一个数字,不要有任何其他文字。
|
||||
"""
|
||||
|
||||
data = {
|
||||
"model": "qwen3:8b",
|
||||
"prompt": prompt.strip(),
|
||||
"stream": False
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
logger.info(f"调用Ollama API进行难度评分")
|
||||
|
||||
response = requests.post(
|
||||
self.api_url,
|
||||
headers=headers,
|
||||
data=json.dumps(data, ensure_ascii=False),
|
||||
timeout=60
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
score_text = result.get("response", "").strip()
|
||||
|
||||
# 尝试解析评分
|
||||
try:
|
||||
score = int(score_text)
|
||||
# 确保评分在有效范围内
|
||||
score = max(0, min(100, score))
|
||||
logger.success(f"获得评分: {score}")
|
||||
return score
|
||||
except ValueError:
|
||||
logger.error(f"无法解析评分: {score_text}")
|
||||
return None
|
||||
else:
|
||||
logger.error(f"API调用失败: {response.status_code}, {response.text}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"调用Ollama API时出错: {e}")
|
||||
return None
|
||||
|
||||
def update_difficulty_score(self, conn: sqlite3.Connection, product_id: int, score: int):
|
||||
"""更新产品难度评分"""
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
UPDATE product_analysis
|
||||
SET difficulty_score = ?
|
||||
WHERE id = ?
|
||||
""", (score, product_id))
|
||||
|
||||
conn.commit()
|
||||
logger.success(f"更新产品ID {product_id} 的难度评分为: {score}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"更新难度评分失败: {e}")
|
||||
raise
|
||||
|
||||
def score_products(self):
|
||||
"""评分所有未评分的产品"""
|
||||
logger.info("开始产品难度评分")
|
||||
|
||||
conn = None
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = self.connect_to_database()
|
||||
|
||||
# 添加难度评分字段
|
||||
self.add_difficulty_score_column(conn)
|
||||
|
||||
# 获取未评分的产品
|
||||
products = self.get_unscored_products(conn)
|
||||
|
||||
if not products:
|
||||
logger.info("没有需要评分的产品")
|
||||
return
|
||||
|
||||
logger.info(f"准备评分 {len(products)} 个产品")
|
||||
|
||||
# 逐个评分
|
||||
success_count = 0
|
||||
for i, (product_id, ai_response) in enumerate(products, 1):
|
||||
logger.info(f"\n评分进度: {i}/{len(products)} - 产品ID: {product_id}")
|
||||
|
||||
# 调用AI进行评分
|
||||
score = self.call_ollama_for_scoring(ai_response)
|
||||
|
||||
if score is not None:
|
||||
# 更新数据库
|
||||
self.update_difficulty_score(conn, product_id, score)
|
||||
success_count += 1
|
||||
logger.success(f"评分完成: {score}分")
|
||||
else:
|
||||
logger.error(f"评分失败: 产品ID {product_id}")
|
||||
|
||||
# 延时避免API过载
|
||||
logger.info("等待2秒后继续...")
|
||||
time.sleep(2)
|
||||
|
||||
logger.success(f"评分完成! 成功评分 {success_count} 个产品")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"评分过程中出错: {e}")
|
||||
finally:
|
||||
if conn:
|
||||
conn.close()
|
||||
logger.info("数据库连接已关闭")
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
# 配置日志
|
||||
logger.add("difficulty_scorer.log", rotation="10 MB", level="INFO")
|
||||
|
||||
# 创建评分器
|
||||
scorer = DifficultyScorer()
|
||||
|
||||
# 开始评分
|
||||
scorer.score_products()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1021
product/integrated_product_system.log
Normal file
1021
product/integrated_product_system.log
Normal file
File diff suppressed because it is too large
Load Diff
630
product/integrated_product_system.py
Normal file
630
product/integrated_product_system.py
Normal file
@@ -0,0 +1,630 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
全功能产品抓取与分析系统
|
||||
整合integrated_scraper.py和product_ai_analysis.py的功能
|
||||
支持从ProductHunt抓取数据并进行AI分析
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import asyncio
|
||||
import os
|
||||
import argparse
|
||||
from datetime import datetime
|
||||
from loguru import logger
|
||||
from tqdm import tqdm
|
||||
import sys
|
||||
import requests
|
||||
import json
|
||||
import time
|
||||
from typing import List, Tuple, Optional
|
||||
|
||||
# 配置日志
|
||||
logger.remove()
|
||||
logger.add(sys.stderr, level="INFO", format="<green>{time:YYYY-MM-DD HH:mm:ss}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - <level>{message}</level>")
|
||||
|
||||
# 动态导入playwright-get-data.py
|
||||
import importlib.util
|
||||
playwright_data_path = os.path.join(os.path.dirname(__file__), "playwright-get-data.py")
|
||||
spec = importlib.util.spec_from_file_location("playwright_get_data", playwright_data_path)
|
||||
playwright_get_data = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(playwright_get_data)
|
||||
ProductHuntScraper = playwright_get_data.ProductHuntScraper
|
||||
|
||||
|
||||
class IntegratedProductSystem:
|
||||
"""全功能产品抓取与分析系统"""
|
||||
|
||||
def __init__(self, tophub_db_path=None, product_db_path=None, debug_port=9222,
|
||||
limit=0, skip_duplicates=True, api_key=""):
|
||||
"""
|
||||
初始化系统
|
||||
|
||||
Args:
|
||||
tophub_db_path: tophub数据库路径
|
||||
product_db_path: 产品数据库路径
|
||||
debug_port: Chrome调试端口
|
||||
limit: 抓取链接数量限制
|
||||
skip_duplicates: 是否跳过已存在的URL
|
||||
api_key: API密钥(Ollama不需要)
|
||||
"""
|
||||
# 设置数据库路径
|
||||
if tophub_db_path:
|
||||
self.tophub_db_path = tophub_db_path
|
||||
else:
|
||||
self.tophub_db_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), "tophub_data.db")
|
||||
|
||||
if product_db_path:
|
||||
self.product_db_path = product_db_path
|
||||
else:
|
||||
self.product_db_path = os.path.join(os.path.dirname(__file__), "products.db")
|
||||
|
||||
self.debug_port = debug_port
|
||||
self.limit = limit
|
||||
self.skip_duplicates = skip_duplicates
|
||||
self.api_key = api_key
|
||||
self.api_url = "http://localhost:11434/api/generate"
|
||||
self.product_urls = []
|
||||
|
||||
logger.info(f"初始化全功能产品系统,数据库: {self.product_db_path}")
|
||||
|
||||
def connect_to_database(self) -> sqlite3.Connection:
|
||||
"""连接到SQLite数据库"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
logger.success(f"成功连接到数据库: {self.product_db_path}")
|
||||
return conn
|
||||
except Exception as e:
|
||||
logger.error(f"连接数据库失败: {e}")
|
||||
raise
|
||||
|
||||
def init_database(self):
|
||||
"""初始化数据库,创建所有需要的表"""
|
||||
logger.info("正在初始化产品数据库...")
|
||||
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 创建产品信息表(来自integrated_scraper.py)
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS products (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
url TEXT NOT NULL UNIQUE,
|
||||
name TEXT,
|
||||
introduction TEXT,
|
||||
user_count TEXT,
|
||||
maker_link TEXT,
|
||||
maker_statement TEXT,
|
||||
created_at TEXT NOT NULL,
|
||||
updated_at TEXT NOT NULL
|
||||
)
|
||||
''')
|
||||
|
||||
# 创建分析结果表(来自product_ai_analysis.py)
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS product_analysis (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
original_id INTEGER,
|
||||
original_name TEXT,
|
||||
product_name TEXT,
|
||||
product_intro TEXT,
|
||||
development_difficulty TEXT,
|
||||
difficulty_score INTEGER,
|
||||
ai_response TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY (original_id) REFERENCES products (id)
|
||||
)
|
||||
''')
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
logger.success("产品数据库初始化完成")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"初始化数据库失败: {e}")
|
||||
|
||||
def query_producthunt_urls(self, limit=None):
|
||||
"""查询包含producthunt.com的链接"""
|
||||
if limit is None:
|
||||
limit = self.limit
|
||||
|
||||
logger.info(f"正在查询tophub_data.db数据库,限制: {limit}条")
|
||||
|
||||
try:
|
||||
conn = sqlite3.connect(self.tophub_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 查询包含producthunt.com的链接
|
||||
cursor.execute("SELECT url FROM articles WHERE url LIKE '%producthunt.com%'")
|
||||
|
||||
urls = [row[0] for row in cursor.fetchall()]
|
||||
|
||||
conn.close()
|
||||
|
||||
logger.success(f"找到 {len(urls)} 个包含producthunt.com的链接")
|
||||
return urls
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"查询数据库失败: {e}")
|
||||
return []
|
||||
|
||||
def check_duplicate(self, url):
|
||||
"""检查URL是否已存在"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("SELECT COUNT(*) FROM products WHERE url = ?", (url,))
|
||||
count = cursor.fetchone()[0]
|
||||
|
||||
conn.close()
|
||||
return count > 0
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"检查重复失败: {e}")
|
||||
return False
|
||||
|
||||
def save_product_info(self, product_info):
|
||||
"""保存产品信息到数据库"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
|
||||
# 检查是否已存在
|
||||
cursor.execute("SELECT id FROM products WHERE url = ?", (product_info['url'],))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
# 更新现有记录
|
||||
cursor.execute('''
|
||||
UPDATE products SET
|
||||
name = ?, introduction = ?, user_count = ?,
|
||||
maker_link = ?, maker_statement = ?, updated_at = ?
|
||||
WHERE url = ?
|
||||
''', (
|
||||
product_info.get('name'),
|
||||
product_info.get('introduction'),
|
||||
product_info.get('user_count'),
|
||||
product_info.get('maker_link'),
|
||||
product_info.get('maker_statement'),
|
||||
current_time,
|
||||
product_info['url']
|
||||
))
|
||||
logger.info(f"更新产品信息: {product_info.get('name', '未知')}")
|
||||
else:
|
||||
# 插入新记录
|
||||
cursor.execute('''
|
||||
INSERT INTO products
|
||||
(url, name, introduction, user_count, maker_link, maker_statement, created_at, updated_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
||||
''', (
|
||||
product_info['url'],
|
||||
product_info.get('name'),
|
||||
product_info.get('introduction'),
|
||||
product_info.get('user_count'),
|
||||
product_info.get('maker_link'),
|
||||
product_info.get('maker_statement'),
|
||||
current_time,
|
||||
current_time
|
||||
))
|
||||
logger.info(f"新增产品信息: {product_info.get('name', '未知')}")
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"保存产品信息失败: {e}")
|
||||
return False
|
||||
|
||||
async def scrape_product_info(self, url):
|
||||
"""使用playwright-get-data.py中的专业功能抓取产品信息"""
|
||||
try:
|
||||
logger.info(f"开始抓取: {url}")
|
||||
|
||||
# 创建ProductHuntScraper实例
|
||||
scraper = ProductHuntScraper(debug_port=self.debug_port)
|
||||
|
||||
# 连接到已运行的Chrome实例
|
||||
connected = await scraper.connect_to_existing_chrome()
|
||||
if not connected:
|
||||
logger.error("连接Chrome失败,跳过此URL")
|
||||
return None
|
||||
|
||||
# 导航到ProductHunt页面
|
||||
navigated = await scraper.navigate_to_producthunt(url)
|
||||
if not navigated:
|
||||
logger.error("导航到页面失败,跳过此URL")
|
||||
await scraper.close()
|
||||
return None
|
||||
|
||||
# 提取产品信息
|
||||
product_info = await scraper.extract_product_info()
|
||||
if product_info:
|
||||
product_info['url'] = url
|
||||
logger.success(f"成功提取产品信息: {product_info.get('name', '未知')}")
|
||||
else:
|
||||
logger.error("提取产品信息失败")
|
||||
|
||||
# 关闭连接
|
||||
await scraper.close()
|
||||
|
||||
return product_info
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"抓取产品信息失败: {e}")
|
||||
return None
|
||||
|
||||
def get_product_data(self, conn: sqlite3.Connection) -> List[Tuple]:
|
||||
"""从数据库获取产品数据"""
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 查询products表中的name和introduction字段
|
||||
cursor.execute("""
|
||||
SELECT id, name, introduction
|
||||
FROM products
|
||||
WHERE name IS NOT NULL AND introduction IS NOT NULL
|
||||
AND name != '' AND introduction != ''
|
||||
""")
|
||||
|
||||
products = cursor.fetchall()
|
||||
logger.info(f"从数据库获取到 {len(products)} 个产品")
|
||||
|
||||
# 显示前几个产品作为示例
|
||||
for i, (id, name, intro) in enumerate(products[:3], 1):
|
||||
logger.info(f"示例产品{i}: ID={id}, 名称='{name}', 简介='{intro[:50]}...'")
|
||||
|
||||
return products
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"获取产品数据失败: {e}")
|
||||
raise
|
||||
|
||||
def call_ollama_ai_api(self, name: str, introduction: str) -> Optional[str]:
|
||||
"""调用Ollama AI API进行分析"""
|
||||
try:
|
||||
# 构建请求数据 - 使用Ollama API格式
|
||||
prompt = f"这个是【{name}】,简介内容是【{introduction}】。请把产品的简介翻译成中文,并返回假设一个人加上AI辅助能否开发这个产品,请详细回答。返回的内容是产品名称/产品简介/开发难度。返回的例子一:notion/这个是笔记产品等等/一个人开发难度较高"
|
||||
|
||||
data = {
|
||||
"model": "qwen3:8b",
|
||||
"prompt": prompt,
|
||||
"stream": False
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
logger.info(f"调用Ollama AI API分析产品: {name}")
|
||||
|
||||
response = requests.post(
|
||||
self.api_url,
|
||||
headers=headers,
|
||||
data=json.dumps(data, ensure_ascii=False),
|
||||
timeout=60
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
content = result.get("response", "")
|
||||
logger.success(f"API调用成功: {name}")
|
||||
return content
|
||||
else:
|
||||
logger.error(f"API调用失败: {response.status_code}, {response.text}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"调用Ollama AI API时出错: {e}")
|
||||
return None
|
||||
|
||||
def parse_ai_response(self, response: str) -> Tuple[str, str, str]:
|
||||
"""解析AI响应内容"""
|
||||
try:
|
||||
# 使用/分割响应内容
|
||||
parts = response.split('/')
|
||||
|
||||
if len(parts) >= 3:
|
||||
product_name = parts[0].strip()
|
||||
product_intro = parts[1].strip()
|
||||
difficulty = parts[2].strip()
|
||||
|
||||
logger.info(f"解析结果: 名称='{product_name}', 简介='{product_intro[:30]}...', 难度='{difficulty}'")
|
||||
return product_name, product_intro, difficulty
|
||||
else:
|
||||
logger.warning(f"响应格式不符合预期: {response}")
|
||||
# 如果格式不符合,返回原始内容
|
||||
return "", response, ""
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"解析AI响应失败: {e}")
|
||||
return "", response, ""
|
||||
|
||||
def check_product_exists_in_analysis(self, conn: sqlite3.Connection, original_name: str) -> bool:
|
||||
"""检查产品是否已存在于分析结果表中"""
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("""
|
||||
SELECT COUNT(*) FROM product_analysis
|
||||
WHERE original_name = ?
|
||||
""", (original_name,))
|
||||
|
||||
count = cursor.fetchone()[0]
|
||||
exists = count > 0
|
||||
|
||||
if exists:
|
||||
logger.info(f"产品 '{original_name}' 已存在,跳过分析")
|
||||
|
||||
return exists
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"检查产品存在性失败: {e}")
|
||||
return False
|
||||
|
||||
def save_analysis_result(self, conn: sqlite3.Connection,
|
||||
original_id: int, original_name: str,
|
||||
product_name: str, product_intro: str,
|
||||
difficulty: str, ai_response: str):
|
||||
"""保存分析结果到数据库"""
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
INSERT INTO product_analysis
|
||||
(original_id, original_name, product_name, product_intro, development_difficulty, ai_response)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (original_id, original_name, product_name, product_intro, difficulty, ai_response))
|
||||
|
||||
conn.commit()
|
||||
logger.success(f"保存分析结果成功: {product_name}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"保存分析结果失败: {e}")
|
||||
raise
|
||||
|
||||
def analyze_products(self, max_products: int = None):
|
||||
"""分析产品数据"""
|
||||
if max_products is None:
|
||||
logger.info("开始分析所有产品数据")
|
||||
else:
|
||||
logger.info(f"开始分析产品数据,最大数量: {max_products}")
|
||||
|
||||
conn = None
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = self.connect_to_database()
|
||||
|
||||
# 获取产品数据
|
||||
products = self.get_product_data(conn)
|
||||
|
||||
if not products:
|
||||
logger.warning("没有找到可分析的产品数据")
|
||||
return
|
||||
|
||||
# 限制分析数量
|
||||
if max_products is not None:
|
||||
products_to_analyze = products[:max_products]
|
||||
else:
|
||||
products_to_analyze = products
|
||||
|
||||
logger.info(f"准备分析 {len(products_to_analyze)} 个产品")
|
||||
|
||||
# 逐个分析产品
|
||||
success_count = 0
|
||||
skip_count = 0
|
||||
for i, (original_id, name, introduction) in enumerate(products_to_analyze, 1):
|
||||
logger.info(f"\n分析进度: {i}/{len(products_to_analyze)} - {name}")
|
||||
|
||||
# 检查产品是否已存在
|
||||
if self.check_product_exists_in_analysis(conn, name):
|
||||
skip_count += 1
|
||||
logger.info(f"跳过已存在产品,当前进度: {i}/{len(products_to_analyze)}")
|
||||
continue
|
||||
|
||||
# 显示API调用状态
|
||||
logger.info(f"正在提交API请求... 进度: {i}/{len(products_to_analyze)}")
|
||||
|
||||
# 调用AI API
|
||||
ai_response = self.call_ollama_ai_api(name, introduction)
|
||||
|
||||
if ai_response:
|
||||
# 显示数据处理状态
|
||||
logger.info(f"API调用成功,正在处理数据...")
|
||||
|
||||
# 解析响应
|
||||
product_name, product_intro, difficulty = self.parse_ai_response(ai_response)
|
||||
|
||||
# 保存结果
|
||||
self.save_analysis_result(conn, original_id, name,
|
||||
product_name, product_intro, difficulty, ai_response)
|
||||
success_count += 1
|
||||
|
||||
# 显示完成状态
|
||||
logger.success(f"产品 '{name}' 分析完成,进度: {i}/{len(products_to_analyze)}")
|
||||
else:
|
||||
logger.error(f"分析失败: {name}")
|
||||
|
||||
# 处理完数据后延时2秒
|
||||
logger.info("数据处理完成,等待2秒后继续...")
|
||||
time.sleep(2)
|
||||
|
||||
logger.success(f"分析完成! 成功分析 {success_count} 个产品,跳过 {skip_count} 个已存在产品")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"分析过程中出错: {e}")
|
||||
finally:
|
||||
if conn:
|
||||
conn.close()
|
||||
logger.info("数据库连接已关闭")
|
||||
|
||||
async def run_scraping(self, urls=None):
|
||||
"""运行抓取任务"""
|
||||
logger.info("=== 开始ProductHunt数据抓取 ===")
|
||||
|
||||
# 获取要抓取的URL列表
|
||||
if urls is None:
|
||||
self.product_urls = self.query_producthunt_urls()
|
||||
else:
|
||||
self.product_urls = urls
|
||||
|
||||
if not self.product_urls:
|
||||
logger.error("未找到要抓取的ProductHunt链接")
|
||||
return False
|
||||
|
||||
logger.info(f"找到 {len(self.product_urls)} 个ProductHunt链接")
|
||||
|
||||
# 统计抓取结果
|
||||
success_count = 0
|
||||
skip_count = 0
|
||||
error_count = 0
|
||||
|
||||
# 使用进度条显示处理进度
|
||||
with tqdm(total=len(self.product_urls), desc="抓取ProductHunt链接") as pbar:
|
||||
for url in self.product_urls:
|
||||
logger.info(f"处理URL: {url}")
|
||||
|
||||
# 检查是否已存在
|
||||
if self.skip_duplicates and self.check_duplicate(url):
|
||||
logger.info(f"URL已存在,跳过: {url}")
|
||||
skip_count += 1
|
||||
pbar.update(1)
|
||||
continue
|
||||
|
||||
# 抓取产品信息
|
||||
product_info = await self.scrape_product_info(url)
|
||||
|
||||
if product_info:
|
||||
# 保存到数据库
|
||||
success = self.save_product_info(product_info)
|
||||
if success:
|
||||
logger.success(f"成功保存产品信息: {product_info.get('name', '未知')}")
|
||||
success_count += 1
|
||||
else:
|
||||
logger.error(f"保存产品信息失败: {url}")
|
||||
error_count += 1
|
||||
else:
|
||||
logger.error(f"抓取产品信息失败: {url}")
|
||||
error_count += 1
|
||||
|
||||
pbar.update(1)
|
||||
|
||||
# 显示抓取结果统计
|
||||
self.show_scraping_results(success_count, skip_count, error_count)
|
||||
|
||||
logger.success("=== ProductHunt数据抓取完成 ===")
|
||||
return True
|
||||
|
||||
def show_scraping_results(self, success_count, skip_count, error_count):
|
||||
"""显示抓取结果统计"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 统计数据库中的产品数量
|
||||
cursor.execute("SELECT COUNT(*) FROM products")
|
||||
total_count = cursor.fetchone()[0]
|
||||
|
||||
# 获取最新抓取的产品信息
|
||||
cursor.execute("SELECT name, url FROM products ORDER BY updated_at DESC LIMIT 10")
|
||||
recent_products = cursor.fetchall()
|
||||
|
||||
conn.close()
|
||||
|
||||
logger.info("=== 抓取结果统计 ===")
|
||||
logger.info(f"成功抓取: {success_count} 个产品")
|
||||
logger.info(f"跳过重复: {skip_count} 个链接")
|
||||
logger.info(f"抓取失败: {error_count} 个链接")
|
||||
logger.info(f"数据库中的产品总数: {total_count}")
|
||||
|
||||
if recent_products:
|
||||
logger.info("最新抓取的产品:")
|
||||
for name, url in recent_products:
|
||||
logger.info(f" - {name}: {url}")
|
||||
else:
|
||||
logger.info("数据库中暂无产品记录")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"显示抓取结果失败: {e}")
|
||||
|
||||
async def run_full_workflow_async(self, max_products=None, analyze_only=False):
|
||||
"""异步运行完整工作流程:抓取+分析"""
|
||||
logger.info("=== 开始全功能产品系统工作流程 ===")
|
||||
|
||||
# 初始化数据库
|
||||
self.init_database()
|
||||
|
||||
if not analyze_only:
|
||||
# 步骤1: 抓取数据
|
||||
logger.info("步骤1: 开始抓取ProductHunt数据...")
|
||||
await self.run_scraping()
|
||||
else:
|
||||
logger.info("跳过抓取步骤,直接进行分析")
|
||||
|
||||
# 步骤2: AI分析
|
||||
logger.info("步骤2: 开始AI分析产品数据...")
|
||||
self.analyze_products(max_products)
|
||||
|
||||
logger.success("=== 全功能产品系统工作流程完成 ===")
|
||||
|
||||
def run_full_workflow(self, max_products=None, analyze_only=False):
|
||||
"""运行完整工作流程:抓取+分析(同步入口)"""
|
||||
# 创建新的事件循环来运行异步函数
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
try:
|
||||
loop.run_until_complete(self.run_full_workflow_async(max_products, analyze_only))
|
||||
finally:
|
||||
loop.close()
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
"""解析命令行参数"""
|
||||
parser = argparse.ArgumentParser(description="全功能产品抓取与分析系统")
|
||||
parser.add_argument("--tophub-db", help="tophub数据库路径", default=None)
|
||||
parser.add_argument("--product-db", help="产品数据库路径", default=None)
|
||||
parser.add_argument("--debug-port", type=int, help="Chrome调试端口", default=9222)
|
||||
parser.add_argument("--limit", type=int, help="抓取链接数量限制", default=0)
|
||||
parser.add_argument("--no-skip-duplicates", action="store_true", help="不跳过重复URL")
|
||||
parser.add_argument("--urls", nargs="+", help="指定要抓取的URL列表")
|
||||
parser.add_argument("--log-file", help="日志文件路径", default="integrated_product_system.log")
|
||||
parser.add_argument("--max-products", type=int, help="最大分析产品数量", default=None)
|
||||
parser.add_argument("--analyze-only", action="store_true", help="仅进行分析,不抓取数据")
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
args = parse_arguments()
|
||||
|
||||
# 配置日志文件输出
|
||||
logger.add(args.log_file, level="INFO", rotation="10 MB")
|
||||
|
||||
# 创建系统实例
|
||||
system = IntegratedProductSystem(
|
||||
tophub_db_path=args.tophub_db,
|
||||
product_db_path=args.product_db,
|
||||
debug_port=args.debug_port,
|
||||
limit=args.limit,
|
||||
skip_duplicates=not args.no_skip_duplicates,
|
||||
api_key="" # Ollama不需要API密钥
|
||||
)
|
||||
|
||||
# 运行完整工作流程
|
||||
if args.urls:
|
||||
# 如果指定了URL,先抓取这些URL
|
||||
await system.run_scraping(urls=args.urls)
|
||||
# 然后进行分析
|
||||
system.analyze_products(max_products=args.max_products)
|
||||
else:
|
||||
# 运行完整工作流程
|
||||
await system.run_full_workflow_async(max_products=args.max_products, analyze_only=args.analyze_only)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# 运行异步主函数
|
||||
asyncio.run(main())
|
||||
@@ -1,353 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
全功能ProductHunt数据抓取器
|
||||
使用playwright-get-data.py中的专业功能绕过Cloudflare挑战
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import asyncio
|
||||
import os
|
||||
import argparse
|
||||
from datetime import datetime
|
||||
from loguru import logger
|
||||
from tqdm import tqdm
|
||||
import sys
|
||||
|
||||
# 导入playwright-get-data.py中的功能
|
||||
import importlib.util
|
||||
|
||||
# 动态导入playwright-get-data.py
|
||||
playwright_data_path = os.path.join(os.path.dirname(__file__), "playwright-get-data.py")
|
||||
spec = importlib.util.spec_from_file_location("playwright_get_data", playwright_data_path)
|
||||
playwright_get_data = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(playwright_get_data)
|
||||
ProductHuntScraper = playwright_get_data.ProductHuntScraper
|
||||
|
||||
# 配置日志
|
||||
logger.remove()
|
||||
logger.add(sys.stderr, level="INFO", format="<green>{time:YYYY-MM-DD HH:mm:ss}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - <level>{message}</level>")
|
||||
|
||||
class ProductHuntScraperFull:
|
||||
"""全功能ProductHunt数据抓取器"""
|
||||
|
||||
def __init__(self, tophub_db_path=None, product_db_path=None, debug_port=9222, limit=0, skip_duplicates=True):
|
||||
"""
|
||||
初始化抓取器
|
||||
|
||||
Args:
|
||||
tophub_db_path: tophub数据库路径
|
||||
product_db_path: 产品数据库路径
|
||||
debug_port: Chrome调试端口
|
||||
limit: 抓取链接数量限制
|
||||
skip_duplicates: 是否跳过已存在的URL
|
||||
"""
|
||||
if tophub_db_path:
|
||||
self.tophub_db_path = tophub_db_path
|
||||
else:
|
||||
self.tophub_db_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), "tophub_data.db")
|
||||
|
||||
if product_db_path:
|
||||
self.product_db_path = product_db_path
|
||||
else:
|
||||
self.product_db_path = os.path.join(os.path.dirname(__file__), "products.db")
|
||||
|
||||
self.debug_port = debug_port
|
||||
self.limit = limit
|
||||
self.skip_duplicates = skip_duplicates
|
||||
self.product_urls = []
|
||||
|
||||
def query_producthunt_urls(self, limit=None):
|
||||
"""查询包含producthunt.com的链接"""
|
||||
if limit is None:
|
||||
limit = self.limit
|
||||
|
||||
logger.info(f"正在查询tophub_data.db数据库,限制: {limit}条")
|
||||
|
||||
try:
|
||||
conn = sqlite3.connect(self.tophub_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 查询包含producthunt.com的链接(去掉LIMIT限制)
|
||||
cursor.execute("SELECT url FROM articles WHERE url LIKE '%producthunt.com%'")
|
||||
|
||||
urls = [row[0] for row in cursor.fetchall()]
|
||||
|
||||
conn.close()
|
||||
|
||||
logger.success(f"找到 {len(urls)} 个包含producthunt.com的链接")
|
||||
return urls
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"查询数据库失败: {e}")
|
||||
return []
|
||||
|
||||
def init_product_database(self):
|
||||
"""初始化产品数据库"""
|
||||
logger.info("正在初始化产品数据库...")
|
||||
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 创建产品信息表
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS products (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
url TEXT NOT NULL UNIQUE,
|
||||
name TEXT,
|
||||
introduction TEXT,
|
||||
user_count TEXT,
|
||||
maker_link TEXT,
|
||||
maker_statement TEXT,
|
||||
created_at TEXT NOT NULL,
|
||||
updated_at TEXT NOT NULL
|
||||
)
|
||||
''')
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
logger.success("产品数据库初始化完成")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"初始化数据库失败: {e}")
|
||||
|
||||
def check_duplicate(self, url):
|
||||
"""检查URL是否已存在"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("SELECT COUNT(*) FROM products WHERE url = ?", (url,))
|
||||
count = cursor.fetchone()[0]
|
||||
|
||||
conn.close()
|
||||
return count > 0
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"检查重复失败: {e}")
|
||||
return False
|
||||
|
||||
def save_product_info(self, product_info):
|
||||
"""保存产品信息到数据库"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
|
||||
# 检查是否已存在
|
||||
cursor.execute("SELECT id FROM products WHERE url = ?", (product_info['url'],))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
# 更新现有记录
|
||||
cursor.execute('''
|
||||
UPDATE products SET
|
||||
name = ?, introduction = ?, user_count = ?,
|
||||
maker_link = ?, maker_statement = ?, updated_at = ?
|
||||
WHERE url = ?
|
||||
''', (
|
||||
product_info.get('name'),
|
||||
product_info.get('introduction'),
|
||||
product_info.get('user_count'),
|
||||
product_info.get('maker_link'),
|
||||
product_info.get('maker_statement'),
|
||||
current_time,
|
||||
product_info['url']
|
||||
))
|
||||
logger.info(f"更新产品信息: {product_info.get('name', '未知')}")
|
||||
else:
|
||||
# 插入新记录
|
||||
cursor.execute('''
|
||||
INSERT INTO products
|
||||
(url, name, introduction, user_count, maker_link, maker_statement, created_at, updated_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
||||
''', (
|
||||
product_info['url'],
|
||||
product_info.get('name'),
|
||||
product_info.get('introduction'),
|
||||
product_info.get('user_count'),
|
||||
product_info.get('maker_link'),
|
||||
product_info.get('maker_statement'),
|
||||
current_time,
|
||||
current_time
|
||||
))
|
||||
logger.info(f"新增产品信息: {product_info.get('name', '未知')}")
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"保存产品信息失败: {e}")
|
||||
return False
|
||||
|
||||
async def scrape_product_info(self, url):
|
||||
"""使用playwright-get-data.py中的专业功能抓取产品信息"""
|
||||
try:
|
||||
logger.info(f"开始抓取: {url}")
|
||||
|
||||
# 创建ProductHuntScraper实例
|
||||
scraper = ProductHuntScraper(debug_port=self.debug_port)
|
||||
|
||||
# 连接到已运行的Chrome实例
|
||||
connected = await scraper.connect_to_existing_chrome()
|
||||
if not connected:
|
||||
logger.error("连接Chrome失败,跳过此URL")
|
||||
return None
|
||||
|
||||
# 导航到ProductHunt页面
|
||||
navigated = await scraper.navigate_to_producthunt(url)
|
||||
if not navigated:
|
||||
logger.error("导航到页面失败,跳过此URL")
|
||||
await scraper.close()
|
||||
return None
|
||||
|
||||
# 提取产品信息
|
||||
product_info = await scraper.extract_product_info()
|
||||
if product_info:
|
||||
product_info['url'] = url
|
||||
logger.success(f"成功提取产品信息: {product_info.get('name', '未知')}")
|
||||
else:
|
||||
logger.error("提取产品信息失败")
|
||||
|
||||
# 关闭连接
|
||||
await scraper.close()
|
||||
|
||||
return product_info
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"抓取产品信息失败: {e}")
|
||||
return None
|
||||
|
||||
async def run_scraping(self, urls=None):
|
||||
"""运行抓取任务"""
|
||||
logger.info("=== 开始ProductHunt数据抓取 ===")
|
||||
|
||||
# 初始化数据库
|
||||
self.init_product_database()
|
||||
|
||||
# 获取要抓取的URL列表
|
||||
if urls is None:
|
||||
self.product_urls = self.query_producthunt_urls()
|
||||
else:
|
||||
self.product_urls = urls
|
||||
|
||||
if not self.product_urls:
|
||||
logger.error("未找到要抓取的ProductHunt链接")
|
||||
return False
|
||||
|
||||
logger.info(f"找到 {len(self.product_urls)} 个ProductHunt链接")
|
||||
|
||||
# 统计抓取结果
|
||||
success_count = 0
|
||||
skip_count = 0
|
||||
error_count = 0
|
||||
|
||||
# 使用进度条显示处理进度
|
||||
with tqdm(total=len(self.product_urls), desc="抓取ProductHunt链接") as pbar:
|
||||
for url in self.product_urls:
|
||||
logger.info(f"处理URL: {url}")
|
||||
|
||||
# 检查是否已存在
|
||||
if self.skip_duplicates and self.check_duplicate(url):
|
||||
logger.info(f"URL已存在,跳过: {url}")
|
||||
skip_count += 1
|
||||
pbar.update(1)
|
||||
continue
|
||||
|
||||
# 抓取产品信息
|
||||
product_info = await self.scrape_product_info(url)
|
||||
|
||||
if product_info:
|
||||
# 保存到数据库
|
||||
success = self.save_product_info(product_info)
|
||||
if success:
|
||||
logger.success(f"成功保存产品信息: {product_info.get('name', '未知')}")
|
||||
success_count += 1
|
||||
else:
|
||||
logger.error(f"保存产品信息失败: {url}")
|
||||
error_count += 1
|
||||
else:
|
||||
logger.error(f"抓取产品信息失败: {url}")
|
||||
error_count += 1
|
||||
|
||||
pbar.update(1)
|
||||
|
||||
# 显示抓取结果统计
|
||||
self.show_scraping_results(success_count, skip_count, error_count)
|
||||
|
||||
logger.success("=== ProductHunt数据抓取完成 ===")
|
||||
return True
|
||||
|
||||
def show_scraping_results(self, success_count, skip_count, error_count):
|
||||
"""显示抓取结果统计"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 统计数据库中的产品数量
|
||||
cursor.execute("SELECT COUNT(*) FROM products")
|
||||
total_count = cursor.fetchone()[0]
|
||||
|
||||
# 获取最新抓取的产品信息
|
||||
cursor.execute("SELECT name, url FROM products ORDER BY updated_at DESC LIMIT 10")
|
||||
recent_products = cursor.fetchall()
|
||||
|
||||
conn.close()
|
||||
|
||||
logger.info("=== 抓取结果统计 ===")
|
||||
logger.info(f"成功抓取: {success_count} 个产品")
|
||||
logger.info(f"跳过重复: {skip_count} 个链接")
|
||||
logger.info(f"抓取失败: {error_count} 个链接")
|
||||
logger.info(f"数据库中的产品总数: {total_count}")
|
||||
|
||||
if recent_products:
|
||||
logger.info("最新抓取的产品:")
|
||||
for name, url in recent_products:
|
||||
logger.info(f" - {name}: {url}")
|
||||
else:
|
||||
logger.info("数据库中暂无产品记录")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"显示抓取结果失败: {e}")
|
||||
|
||||
def parse_arguments():
|
||||
"""解析命令行参数"""
|
||||
parser = argparse.ArgumentParser(description="全功能ProductHunt数据抓取器")
|
||||
parser.add_argument("--tophub-db", help="tophub数据库路径", default=None)
|
||||
parser.add_argument("--product-db", help="产品数据库路径", default=None)
|
||||
parser.add_argument("--debug-port", type=int, help="Chrome调试端口", default=9222)
|
||||
parser.add_argument("--limit", type=int, help="抓取链接数量限制", default=0)
|
||||
parser.add_argument("--no-skip-duplicates", action="store_true", help="不跳过重复URL")
|
||||
parser.add_argument("--urls", nargs="+", help="指定要抓取的URL列表")
|
||||
parser.add_argument("--log-file", help="日志文件路径", default="producthunt_scraper.log")
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
args = parse_arguments()
|
||||
|
||||
# 配置日志文件输出
|
||||
logger.add(args.log_file, level="INFO", rotation="10 MB")
|
||||
|
||||
# 创建抓取器实例
|
||||
scraper = ProductHuntScraperFull(
|
||||
tophub_db_path=args.tophub_db,
|
||||
product_db_path=args.product_db,
|
||||
debug_port=args.debug_port,
|
||||
limit=args.limit,
|
||||
skip_duplicates=not args.no_skip_duplicates
|
||||
)
|
||||
|
||||
# 运行抓取任务
|
||||
if args.urls:
|
||||
await scraper.run_scraping(urls=args.urls)
|
||||
else:
|
||||
await scraper.run_scraping()
|
||||
|
||||
if __name__ == "__main__":
|
||||
# 运行异步主函数
|
||||
asyncio.run(main())
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,342 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
产品AI分析脚本
|
||||
读取SQLite数据库中的产品信息,调用Ollama AI API进行分析,并将结果存储到新表中
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import os
|
||||
import time
|
||||
from typing import List, Tuple, Optional
|
||||
from loguru import logger
|
||||
|
||||
# 智谱AI API相关
|
||||
import requests
|
||||
import json
|
||||
|
||||
class ProductAIAnalyzer:
|
||||
"""产品AI分析器"""
|
||||
|
||||
def __init__(self, api_key: str = "", db_path: str = "products.db"):
|
||||
"""
|
||||
初始化分析器
|
||||
|
||||
Args:
|
||||
api_key: API密钥(Ollama不需要,保留参数以保持兼容性)
|
||||
db_path: 数据库文件路径
|
||||
"""
|
||||
self.api_key = api_key
|
||||
self.db_path = db_path
|
||||
self.api_url = "http://localhost:11434/api/generate"
|
||||
|
||||
# 检查数据库文件是否存在,支持相对路径和绝对路径
|
||||
if not os.path.exists(db_path):
|
||||
# 尝试在当前目录下查找
|
||||
current_dir_db = os.path.join(os.path.dirname(__file__), db_path)
|
||||
if os.path.exists(current_dir_db):
|
||||
self.db_path = current_dir_db
|
||||
logger.info(f"使用当前目录下的数据库文件: {current_dir_db}")
|
||||
else:
|
||||
raise FileNotFoundError(f"数据库文件不存在: {db_path} 和 {current_dir_db}")
|
||||
|
||||
logger.info(f"初始化产品AI分析器,数据库: {self.db_path}")
|
||||
|
||||
def connect_to_database(self) -> sqlite3.Connection:
|
||||
"""连接到SQLite数据库"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
logger.success(f"成功连接到数据库: {self.db_path}")
|
||||
return conn
|
||||
except Exception as e:
|
||||
logger.error(f"连接数据库失败: {e}")
|
||||
raise
|
||||
|
||||
def get_product_data(self, conn: sqlite3.Connection) -> List[Tuple]:
|
||||
"""
|
||||
从数据库获取产品数据
|
||||
|
||||
Args:
|
||||
conn: 数据库连接
|
||||
|
||||
Returns:
|
||||
产品数据列表,每个元素为(id, name, introduction)
|
||||
"""
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 查询products表中的name和introduction字段
|
||||
cursor.execute("""
|
||||
SELECT id, name, introduction
|
||||
FROM products
|
||||
WHERE name IS NOT NULL AND introduction IS NOT NULL
|
||||
AND name != '' AND introduction != ''
|
||||
""")
|
||||
|
||||
products = cursor.fetchall()
|
||||
logger.info(f"从数据库获取到 {len(products)} 个产品")
|
||||
|
||||
# 显示前几个产品作为示例
|
||||
for i, (id, name, intro) in enumerate(products[:3], 1):
|
||||
logger.info(f"示例产品{i}: ID={id}, 名称='{name}', 简介='{intro[:50]}...'")
|
||||
|
||||
return products
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"获取产品数据失败: {e}")
|
||||
raise
|
||||
|
||||
def call_ollama_ai_api(self, name: str, introduction: str) -> Optional[str]:
|
||||
"""
|
||||
调用Ollama AI API进行分析
|
||||
|
||||
Args:
|
||||
name: 产品名称
|
||||
introduction: 产品简介
|
||||
|
||||
Returns:
|
||||
API响应内容,失败时返回None
|
||||
"""
|
||||
try:
|
||||
# 构建请求数据 - 使用Ollama API格式
|
||||
prompt = f"这个是【{name}】,简介内容是【{introduction}】。请把产品的简介翻译成中文,并返回假设一个人加上AI辅助能否开发这个产品,请详细回答。返回的内容是产品名称/产品简介/开发难度。返回的例子一:notion/这个是笔记产品等等/一个人开发难度较高"
|
||||
|
||||
data = {
|
||||
"model": "qwen3:8b",
|
||||
"prompt": prompt,
|
||||
"stream": False
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
logger.info(f"调用Ollama AI API分析产品: {name}")
|
||||
|
||||
response = requests.post(
|
||||
self.api_url,
|
||||
headers=headers,
|
||||
data=json.dumps(data, ensure_ascii=False),
|
||||
timeout=60
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
content = result.get("response", "")
|
||||
logger.success(f"API调用成功: {name}")
|
||||
return content
|
||||
else:
|
||||
logger.error(f"API调用失败: {response.status_code}, {response.text}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"调用Ollama AI API时出错: {e}")
|
||||
return None
|
||||
|
||||
def parse_ai_response(self, response: str) -> Tuple[str, str, str]:
|
||||
"""
|
||||
解析AI响应内容
|
||||
|
||||
Args:
|
||||
response: AI响应内容
|
||||
|
||||
Returns:
|
||||
(产品名称, 产品简介, 开发难度)
|
||||
"""
|
||||
try:
|
||||
# 使用/分割响应内容
|
||||
parts = response.split('/')
|
||||
|
||||
if len(parts) >= 3:
|
||||
product_name = parts[0].strip()
|
||||
product_intro = parts[1].strip()
|
||||
difficulty = parts[2].strip()
|
||||
|
||||
logger.info(f"解析结果: 名称='{product_name}', 简介='{product_intro[:30]}...', 难度='{difficulty}'")
|
||||
return product_name, product_intro, difficulty
|
||||
else:
|
||||
logger.warning(f"响应格式不符合预期: {response}")
|
||||
# 如果格式不符合,返回原始内容
|
||||
return "", response, ""
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"解析AI响应失败: {e}")
|
||||
return "", response, ""
|
||||
|
||||
def create_analysis_table(self, conn: sqlite3.Connection):
|
||||
"""创建分析结果表"""
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 创建分析结果表
|
||||
cursor.execute("""
|
||||
CREATE TABLE IF NOT EXISTS product_analysis (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
original_id INTEGER,
|
||||
original_name TEXT,
|
||||
product_name TEXT,
|
||||
product_intro TEXT,
|
||||
development_difficulty TEXT,
|
||||
ai_response TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY (original_id) REFERENCES products (id)
|
||||
)
|
||||
""")
|
||||
|
||||
conn.commit()
|
||||
logger.success("创建分析结果表成功")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"创建分析结果表失败: {e}")
|
||||
raise
|
||||
|
||||
def save_analysis_result(self, conn: sqlite3.Connection,
|
||||
original_id: int, original_name: str,
|
||||
product_name: str, product_intro: str,
|
||||
difficulty: str, ai_response: str):
|
||||
"""保存分析结果到数据库"""
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
INSERT INTO product_analysis
|
||||
(original_id, original_name, product_name, product_intro, development_difficulty, ai_response)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (original_id, original_name, product_name, product_intro, difficulty, ai_response))
|
||||
|
||||
conn.commit()
|
||||
logger.success(f"保存分析结果成功: {product_name}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"保存分析结果失败: {e}")
|
||||
raise
|
||||
|
||||
def check_product_exists(self, conn: sqlite3.Connection, original_name: str) -> bool:
|
||||
"""
|
||||
检查产品是否已存在于分析结果表中
|
||||
|
||||
Args:
|
||||
conn: 数据库连接
|
||||
original_name: 原始产品名称
|
||||
|
||||
Returns:
|
||||
如果产品已存在返回True,否则返回False
|
||||
"""
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("""
|
||||
SELECT COUNT(*) FROM product_analysis
|
||||
WHERE original_name = ?
|
||||
""", (original_name,))
|
||||
|
||||
count = cursor.fetchone()[0]
|
||||
exists = count > 0
|
||||
|
||||
if exists:
|
||||
logger.info(f"产品 '{original_name}' 已存在,跳过分析")
|
||||
|
||||
return exists
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"检查产品存在性失败: {e}")
|
||||
return False
|
||||
|
||||
def analyze_products(self, max_products: int = None):
|
||||
"""
|
||||
分析产品数据
|
||||
|
||||
Args:
|
||||
max_products: 最大分析产品数量,None表示分析所有产品
|
||||
"""
|
||||
if max_products is None:
|
||||
logger.info("开始分析所有产品数据")
|
||||
else:
|
||||
logger.info(f"开始分析产品数据,最大数量: {max_products}")
|
||||
|
||||
conn = None
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = self.connect_to_database()
|
||||
|
||||
# 创建分析结果表
|
||||
self.create_analysis_table(conn)
|
||||
|
||||
# 获取产品数据
|
||||
products = self.get_product_data(conn)
|
||||
|
||||
if not products:
|
||||
logger.warning("没有找到可分析的产品数据")
|
||||
return
|
||||
|
||||
# 限制分析数量
|
||||
if max_products is not None:
|
||||
products_to_analyze = products[:max_products]
|
||||
else:
|
||||
products_to_analyze = products
|
||||
|
||||
logger.info(f"准备分析 {len(products_to_analyze)} 个产品")
|
||||
|
||||
# 逐个分析产品
|
||||
success_count = 0
|
||||
skip_count = 0
|
||||
for i, (original_id, name, introduction) in enumerate(products_to_analyze, 1):
|
||||
logger.info(f"\n分析进度: {i}/{len(products_to_analyze)} - {name}")
|
||||
|
||||
# 检查产品是否已存在
|
||||
if self.check_product_exists(conn, name):
|
||||
skip_count += 1
|
||||
logger.info(f"跳过已存在产品,当前进度: {i}/{len(products_to_analyze)}")
|
||||
continue
|
||||
|
||||
# 显示API调用状态
|
||||
logger.info(f"正在提交API请求... 进度: {i}/{len(products_to_analyze)}")
|
||||
|
||||
# 调用AI API
|
||||
ai_response = self.call_ollama_ai_api(name, introduction)
|
||||
|
||||
if ai_response:
|
||||
# 显示数据处理状态
|
||||
logger.info(f"API调用成功,正在处理数据...")
|
||||
|
||||
# 解析响应
|
||||
product_name, product_intro, difficulty = self.parse_ai_response(ai_response)
|
||||
|
||||
# 保存结果
|
||||
self.save_analysis_result(conn, original_id, name,
|
||||
product_name, product_intro, difficulty, ai_response)
|
||||
success_count += 1
|
||||
|
||||
# 显示完成状态
|
||||
logger.success(f"产品 '{name}' 分析完成,进度: {i}/{len(products_to_analyze)}")
|
||||
else:
|
||||
logger.error(f"分析失败: {name}")
|
||||
|
||||
# 处理完数据后延时2秒
|
||||
logger.info("数据处理完成,等待2秒后继续...")
|
||||
time.sleep(2)
|
||||
|
||||
logger.success(f"分析完成! 成功分析 {success_count} 个产品,跳过 {skip_count} 个已存在产品")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"分析过程中出错: {e}")
|
||||
finally:
|
||||
if conn:
|
||||
conn.close()
|
||||
logger.info("数据库连接已关闭")
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
# 配置日志
|
||||
logger.add("product_ai_analysis.log", rotation="10 MB", level="INFO")
|
||||
|
||||
# Ollama不需要API密钥
|
||||
api_key = ""
|
||||
|
||||
# 创建分析器
|
||||
analyzer = ProductAIAnalyzer(api_key)
|
||||
|
||||
# 开始分析(默认分析所有产品)
|
||||
analyzer.analyze_products(max_products=None)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,7 +0,0 @@
|
||||
{
|
||||
"name": "Raycast",
|
||||
"introduction": "A collection of powerful productivity tools all within an extendable launcher. Fast, ergonomic and reliable.",
|
||||
"user_count": "17K followers",
|
||||
"maker_link": "https://www.producthunt.com/products/raycast/launches/product-hunt-for-raycast",
|
||||
"maker_statement": "Raycast for Windows"
|
||||
}
|
||||
@@ -1,407 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
ProductHunt数据抓取器
|
||||
从tophub_data.db查询包含producthunt.com的链接,然后使用Playwright抓取产品信息并保存到product.db
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import asyncio
|
||||
import os
|
||||
from datetime import datetime
|
||||
from loguru import logger
|
||||
from tqdm import tqdm
|
||||
import sys
|
||||
|
||||
# 配置日志
|
||||
logger.remove()
|
||||
logger.add(sys.stderr, level="INFO", format="<green>{time:YYYY-MM-DD HH:mm:ss}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - <level>{message}</level>")
|
||||
|
||||
class ProductHuntScraper:
|
||||
"""ProductHunt数据抓取器"""
|
||||
|
||||
def __init__(self):
|
||||
self.tophub_db_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), "tophub_data.db")
|
||||
self.product_db_path = os.path.join(os.path.dirname(__file__), "product.db")
|
||||
self.product_urls = []
|
||||
|
||||
def query_producthunt_urls(self):
|
||||
"""查询包含producthunt.com的链接"""
|
||||
logger.info("正在查询tophub_data.db数据库...")
|
||||
|
||||
try:
|
||||
conn = sqlite3.connect(self.tophub_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 查询包含producthunt.com的链接
|
||||
cursor.execute("SELECT url FROM articles WHERE url LIKE '%producthunt.com%'")
|
||||
urls = [row[0] for row in cursor.fetchall()]
|
||||
|
||||
conn.close()
|
||||
|
||||
logger.success(f"找到 {len(urls)} 个包含producthunt.com的链接")
|
||||
return urls
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"查询数据库失败: {e}")
|
||||
return []
|
||||
|
||||
def init_product_database(self):
|
||||
"""初始化product.db数据库"""
|
||||
logger.info("正在初始化product.db数据库...")
|
||||
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 创建产品信息表
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS products (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
url TEXT NOT NULL UNIQUE,
|
||||
name TEXT,
|
||||
introduction TEXT,
|
||||
user_count TEXT,
|
||||
maker_link TEXT,
|
||||
maker_statement TEXT,
|
||||
created_at TEXT NOT NULL,
|
||||
updated_at TEXT NOT NULL
|
||||
)
|
||||
''')
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
logger.success("product.db数据库初始化完成")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"初始化数据库失败: {e}")
|
||||
|
||||
def check_duplicate(self, url):
|
||||
"""检查URL是否已存在"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("SELECT COUNT(*) FROM products WHERE url = ?", (url,))
|
||||
count = cursor.fetchone()[0]
|
||||
|
||||
conn.close()
|
||||
return count > 0
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"检查重复失败: {e}")
|
||||
return False
|
||||
|
||||
def save_product_info(self, product_info):
|
||||
"""保存产品信息到数据库"""
|
||||
try:
|
||||
conn = sqlite3.connect(self.product_db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
|
||||
|
||||
# 检查是否已存在
|
||||
cursor.execute("SELECT id FROM products WHERE url = ?", (product_info['url'],))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
# 更新现有记录
|
||||
cursor.execute('''
|
||||
UPDATE products SET
|
||||
name = ?, introduction = ?, user_count = ?,
|
||||
maker_link = ?, maker_statement = ?, updated_at = ?
|
||||
WHERE url = ?
|
||||
''', (
|
||||
product_info.get('name'),
|
||||
product_info.get('introduction'),
|
||||
product_info.get('user_count'),
|
||||
product_info.get('maker_link'),
|
||||
product_info.get('maker_statement'),
|
||||
current_time,
|
||||
product_info['url']
|
||||
))
|
||||
logger.info(f"更新产品信息: {product_info.get('name', '未知')}")
|
||||
else:
|
||||
# 插入新记录
|
||||
cursor.execute('''
|
||||
INSERT INTO products
|
||||
(url, name, introduction, user_count, maker_link, maker_statement, created_at, updated_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
||||
''', (
|
||||
product_info['url'],
|
||||
product_info.get('name'),
|
||||
product_info.get('introduction'),
|
||||
product_info.get('user_count'),
|
||||
product_info.get('maker_link'),
|
||||
product_info.get('maker_statement'),
|
||||
current_time,
|
||||
current_time
|
||||
))
|
||||
logger.info(f"新增产品信息: {product_info.get('name', '未知')}")
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"保存产品信息失败: {e}")
|
||||
return False
|
||||
|
||||
async def scrape_product_info(self, url):
|
||||
"""使用Playwright抓取产品信息"""
|
||||
try:
|
||||
# 导入Playwright相关模块
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
logger.info(f"开始抓取: {url}")
|
||||
|
||||
# 创建Playwright实例
|
||||
playwright = await async_playwright().start()
|
||||
browser = await playwright.chromium.launch(headless=True)
|
||||
page = await browser.new_page()
|
||||
|
||||
# 设置超时时间
|
||||
page.set_default_timeout(120000) # 增加超时时间以处理Cloudflare挑战
|
||||
|
||||
# 导航到页面
|
||||
await page.goto(url, wait_until="domcontentloaded")
|
||||
|
||||
# 检查是否是Cloudflare挑战页面
|
||||
page_title = await page.title()
|
||||
if "请稍候" in page_title or "Checking" in page_title or "Verifying" in page_title:
|
||||
logger.info("检测到Cloudflare挑战页面,等待验证完成...")
|
||||
|
||||
# 等待Cloudflare挑战完成
|
||||
try:
|
||||
# 等待页面标题变化或特定元素出现
|
||||
await page.wait_for_function(
|
||||
"""() => {
|
||||
const title = document.title;
|
||||
return !title.includes('请稍候') &&
|
||||
!title.includes('Checking') &&
|
||||
!title.includes('Verifying') &&
|
||||
title !== '请稍候…';
|
||||
}""",
|
||||
timeout=300000 # 5分钟
|
||||
)
|
||||
logger.info("Cloudflare挑战已完成")
|
||||
except Exception as e:
|
||||
logger.warning(f"等待Cloudflare挑战超时: {e}")
|
||||
|
||||
# 等待页面加载
|
||||
await page.wait_for_timeout(3000)
|
||||
|
||||
product_info = {'url': url}
|
||||
|
||||
# 提取产品名称 - 改进的XPath选择器
|
||||
try:
|
||||
# 尝试多种选择器
|
||||
name_selectors = [
|
||||
"xpath=//h1",
|
||||
"xpath=//h1[@data-test='product-name']",
|
||||
"xpath=//h1[contains(@class, 'text')]",
|
||||
"xpath=//title"
|
||||
]
|
||||
|
||||
for selector in name_selectors:
|
||||
name_element = await page.query_selector(selector)
|
||||
if name_element:
|
||||
name_text = (await name_element.text_content()).strip()
|
||||
# 过滤掉页面标题中的无关内容
|
||||
if name_text and 'Product Hunt' not in name_text and len(name_text) > 5:
|
||||
product_info['name'] = name_text
|
||||
logger.info(f"提取到产品名称: {product_info['name']}")
|
||||
break
|
||||
|
||||
if 'name' not in product_info:
|
||||
logger.warning("未找到有效的产品名称元素")
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"提取产品名称失败: {e}")
|
||||
|
||||
# 提取产品简介 - 改进的XPath选择器
|
||||
try:
|
||||
intro_selectors = [
|
||||
"xpath=//*[@class='relative text-16 font-normal text-gray-700']//div",
|
||||
"xpath=//p[contains(@class, 'description')]",
|
||||
"xpath=//div[contains(@class, 'description')]",
|
||||
"xpath=//meta[@name='description']"
|
||||
]
|
||||
|
||||
for selector in intro_selectors:
|
||||
intro_element = await page.query_selector(selector)
|
||||
if intro_element:
|
||||
intro_text = (await intro_element.text_content()).strip()
|
||||
if intro_text:
|
||||
product_info['introduction'] = intro_text
|
||||
logger.info(f"提取到产品简介: {product_info['introduction'][:100]}...")
|
||||
break
|
||||
|
||||
if 'introduction' not in product_info:
|
||||
logger.warning("未找到产品简介元素")
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"提取产品简介失败: {e}")
|
||||
|
||||
# 提取用户数 - 改进的XPath选择器
|
||||
try:
|
||||
user_count_selectors = [
|
||||
"xpath=//*[@class='flex flex-row gap-2']//div/div[2]/span/p",
|
||||
"xpath=//span[contains(text(), 'users')]",
|
||||
"xpath=//span[contains(text(), 'upvotes')]",
|
||||
"xpath=//div[contains(@class, 'stats')]"
|
||||
]
|
||||
|
||||
for selector in user_count_selectors:
|
||||
user_count_element = await page.query_selector(selector)
|
||||
if user_count_element:
|
||||
user_count_text = (await user_count_element.text_content()).strip()
|
||||
if user_count_text:
|
||||
product_info['user_count'] = user_count_text
|
||||
logger.info(f"提取到用户数: {product_info['user_count']}")
|
||||
break
|
||||
|
||||
if 'user_count' not in product_info:
|
||||
logger.warning("未找到用户数元素")
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"提取用户数失败: {e}")
|
||||
|
||||
# 提取制作人链接 - 改进的XPath选择器
|
||||
try:
|
||||
maker_link_selectors = [
|
||||
"xpath=//span[contains(@class, 'absolute')]",
|
||||
"xpath=//a[contains(@href, 'hunter')]",
|
||||
"xpath=//a[contains(text(), 'hunter')]",
|
||||
"xpath=//a[contains(@class, 'maker')]"
|
||||
]
|
||||
|
||||
for selector in maker_link_selectors:
|
||||
maker_element = await page.query_selector(selector)
|
||||
if maker_element:
|
||||
# 如果是span,找父级a标签
|
||||
if 'span' in selector:
|
||||
a_element = await maker_element.evaluate_handle('(element) => element.closest("a")')
|
||||
if a_element:
|
||||
maker_link = await a_element.get_attribute('href')
|
||||
else:
|
||||
maker_link = await maker_element.get_attribute('href')
|
||||
|
||||
if maker_link and not maker_link.startswith('http'):
|
||||
base_url = "https://www.producthunt.com"
|
||||
if maker_link.startswith('/'):
|
||||
maker_link = base_url + maker_link
|
||||
else:
|
||||
maker_link = base_url + '/' + maker_link
|
||||
|
||||
if maker_link:
|
||||
product_info['maker_link'] = maker_link
|
||||
logger.info(f"提取到制作人链接: {maker_link}")
|
||||
break
|
||||
|
||||
if 'maker_link' not in product_info:
|
||||
logger.warning("未找到制作人链接元素")
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"提取制作人链接失败: {e}")
|
||||
|
||||
# 提取制作人发言(简化版本)
|
||||
try:
|
||||
if product_info.get('maker_link'):
|
||||
# 在新页面中打开制作人链接
|
||||
new_page = await browser.new_page()
|
||||
await new_page.goto(product_info['maker_link'], wait_until="domcontentloaded")
|
||||
await new_page.wait_for_timeout(5000)
|
||||
|
||||
# 尝试多种选择器提取发言内容
|
||||
statement_selectors = [
|
||||
"xpath=//*[@id='comment-4597755']/div/div[2]/div/div/div",
|
||||
"xpath=//div[contains(@class, 'comment')]",
|
||||
"xpath=//p[contains(@class, 'comment')]",
|
||||
"xpath=//article"
|
||||
]
|
||||
|
||||
for selector in statement_selectors:
|
||||
comment_element = await new_page.query_selector(selector)
|
||||
if comment_element:
|
||||
statement_text = (await comment_element.text_content()).strip()
|
||||
if statement_text and len(statement_text) > 10:
|
||||
product_info['maker_statement'] = statement_text
|
||||
logger.info(f"提取到制作人发言: {product_info['maker_statement'][:100]}...")
|
||||
break
|
||||
|
||||
await new_page.close()
|
||||
else:
|
||||
logger.warning("没有制作人链接,跳过提取制作人发言")
|
||||
except Exception as e:
|
||||
logger.warning(f"提取制作人发言失败: {e}")
|
||||
|
||||
# 关闭浏览器
|
||||
await browser.close()
|
||||
await playwright.stop()
|
||||
|
||||
logger.success(f"抓取完成: {product_info.get('name', '未知')}")
|
||||
return product_info
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"抓取产品信息失败: {e}")
|
||||
return {'url': url}
|
||||
|
||||
async def process_urls(self):
|
||||
"""处理所有URL"""
|
||||
# 查询URL
|
||||
self.product_urls = self.query_producthunt_urls()
|
||||
|
||||
if not self.product_urls:
|
||||
logger.warning("未找到包含producthunt.com的链接")
|
||||
return
|
||||
|
||||
# 初始化数据库
|
||||
self.init_product_database()
|
||||
|
||||
logger.info(f"开始处理 {len(self.product_urls)} 个产品链接")
|
||||
|
||||
# 创建进度条
|
||||
with tqdm(total=len(self.product_urls), desc="处理进度") as pbar:
|
||||
for url in self.product_urls:
|
||||
try:
|
||||
# 检查是否已存在
|
||||
if self.check_duplicate(url):
|
||||
logger.info(f"跳过已存在的链接: {url}")
|
||||
pbar.update(1)
|
||||
continue
|
||||
|
||||
# 抓取产品信息
|
||||
product_info = await self.scrape_product_info(url)
|
||||
|
||||
# 保存到数据库
|
||||
if product_info:
|
||||
self.save_product_info(product_info)
|
||||
|
||||
pbar.update(1)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"处理链接失败 {url}: {e}")
|
||||
pbar.update(1)
|
||||
|
||||
def run(self):
|
||||
"""运行主程序"""
|
||||
logger.info("开始ProductHunt数据抓取任务")
|
||||
|
||||
try:
|
||||
# 运行异步任务
|
||||
asyncio.run(self.process_urls())
|
||||
logger.success("任务完成")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"程序执行失败: {e}")
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
scraper = ProductHuntScraper()
|
||||
scraper.run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Binary file not shown.
127
product/run_system.py
Normal file
127
product/run_system.py
Normal file
@@ -0,0 +1,127 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
全功能产品系统运行脚本
|
||||
提供简化的命令行界面
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import os
|
||||
import sys
|
||||
from loguru import logger
|
||||
|
||||
# 导入主系统
|
||||
from integrated_product_system import IntegratedProductSystem
|
||||
from config import DATABASE_CONFIG, CHROME_CONFIG, AI_CONFIG, SCRAPING_CONFIG, LOGGING_CONFIG, ANALYSIS_CONFIG
|
||||
|
||||
|
||||
def setup_logging(log_file=None, log_level="INFO"):
|
||||
"""设置日志配置"""
|
||||
if log_file is None:
|
||||
log_file = LOGGING_CONFIG['log_file']
|
||||
|
||||
logger.remove()
|
||||
logger.add(sys.stderr, level=log_level, format=LOGGING_CONFIG['log_format'])
|
||||
logger.add(log_file, level=log_level, rotation=LOGGING_CONFIG['log_rotation'])
|
||||
|
||||
logger.info("日志系统初始化完成")
|
||||
|
||||
|
||||
def print_system_info():
|
||||
"""打印系统信息"""
|
||||
logger.info("=== 全功能产品抓取与分析系统 ===")
|
||||
logger.info(f"数据库路径: {DATABASE_CONFIG['product_db_path']}")
|
||||
logger.info(f"Chrome调试端口: {CHROME_CONFIG['debug_port']}")
|
||||
logger.info(f"AI模型: {AI_CONFIG['model']}")
|
||||
logger.info(f"API地址: {AI_CONFIG['api_url']}")
|
||||
logger.info("=" * 40)
|
||||
|
||||
|
||||
async def run_scraping_mode(args):
|
||||
"""运行抓取模式"""
|
||||
logger.info("运行抓取模式...")
|
||||
|
||||
system = IntegratedProductSystem(
|
||||
tophub_db_path=args.tophub_db or DATABASE_CONFIG['tophub_db_path'],
|
||||
product_db_path=args.product_db or DATABASE_CONFIG['product_db_path'],
|
||||
debug_port=args.debug_port or CHROME_CONFIG['debug_port'],
|
||||
limit=args.limit or SCRAPING_CONFIG['default_limit'],
|
||||
skip_duplicates=args.skip_duplicates if hasattr(args, 'skip_duplicates') else SCRAPING_CONFIG['skip_duplicates']
|
||||
)
|
||||
|
||||
# 初始化数据库
|
||||
system.init_database()
|
||||
|
||||
# 运行抓取
|
||||
await system.run_scraping(urls=args.urls)
|
||||
|
||||
|
||||
async def run_analysis_mode(args):
|
||||
"""运行分析模式"""
|
||||
logger.info("运行分析模式...")
|
||||
|
||||
system = IntegratedProductSystem(
|
||||
product_db_path=args.product_db or DATABASE_CONFIG['product_db_path']
|
||||
)
|
||||
|
||||
# 初始化数据库
|
||||
system.init_database()
|
||||
|
||||
# 运行分析
|
||||
system.analyze_products(max_products=args.max_products)
|
||||
|
||||
|
||||
async def run_full_mode(args):
|
||||
"""运行完整模式(抓取+分析)"""
|
||||
logger.info("运行完整模式(抓取+分析)...")
|
||||
|
||||
system = IntegratedProductSystem(
|
||||
tophub_db_path=args.tophub_db or DATABASE_CONFIG['tophub_db_path'],
|
||||
product_db_path=args.product_db or DATABASE_CONFIG['product_db_path'],
|
||||
debug_port=args.debug_port or CHROME_CONFIG['debug_port'],
|
||||
limit=args.limit or SCRAPING_CONFIG['default_limit'],
|
||||
skip_duplicates=args.skip_duplicates if hasattr(args, 'skip_duplicates') else SCRAPING_CONFIG['skip_duplicates']
|
||||
)
|
||||
|
||||
# 运行完整工作流程
|
||||
system.run_full_workflow(max_products=args.max_products)
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
parser = argparse.ArgumentParser(description="全功能产品抓取与分析系统")
|
||||
|
||||
# 通用参数
|
||||
parser.add_argument("--mode", choices=["scraping", "analysis", "full"], default="full",
|
||||
help="运行模式: scraping(仅抓取), analysis(仅分析), full(抓取+分析)")
|
||||
parser.add_argument("--tophub-db", help="tophub数据库路径")
|
||||
parser.add_argument("--product-db", help="产品数据库路径")
|
||||
parser.add_argument("--debug-port", type=int, help="Chrome调试端口")
|
||||
parser.add_argument("--limit", type=int, help="抓取链接数量限制")
|
||||
parser.add_argument("--max-products", type=int, help="最大分析产品数量")
|
||||
parser.add_argument("--log-file", help="日志文件路径")
|
||||
parser.add_argument("--log-level", choices=["DEBUG", "INFO", "WARNING", "ERROR"],
|
||||
default="INFO", help="日志级别")
|
||||
parser.add_argument("--no-skip-duplicates", action="store_true", help="不跳过重复URL")
|
||||
parser.add_argument("--urls", nargs="+", help="指定要抓取的URL列表")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 设置日志
|
||||
setup_logging(args.log_file, args.log_level)
|
||||
|
||||
# 打印系统信息
|
||||
print_system_info()
|
||||
|
||||
# 根据模式运行
|
||||
if args.mode == "scraping":
|
||||
asyncio.run(run_scraping_mode(args))
|
||||
elif args.mode == "analysis":
|
||||
asyncio.run(run_analysis_mode(args))
|
||||
else: # full mode
|
||||
asyncio.run(run_full_mode(args))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
File diff suppressed because it is too large
Load Diff
BIN
tophub_data.db
BIN
tophub_data.db
Binary file not shown.
1099
tophub_scraper.log
1099
tophub_scraper.log
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user