首页 > 文章 > python教程

自动化每日Arxiv纸摘要和松弛通知

时间：2025-02-16 20:28:12 488浏览收藏

在文章实战开发的过程中，我们经常会遇到一些这样那样的问题，然后要卡好半天，等问题解决了才发现原来一些细节知识点还是没有掌握好。今天golang学习网就整理分享《自动化每日Arxiv纸摘要和松弛通知》，聊聊，希望可以帮助到正在努力赚钱的你。

This Python script automates the process of fetching daily arXiv papers, generating summaries using Gemini, and posting them to a Slack channel. Let's improve the clarity and organization for better understanding.

自动化每日Arxiv纸摘要和松弛通知

This script retrieves papers from arXiv, summarizes them using generative AI (specifically, Google Gemini), and posts the summaries to a Slack channel.

I. Python Code:

import datetime
import logging
import os
import time

import arxiv
import google.generativeai as genai
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError

# Configuration (best practice to use environment variables for sensitive data)
PAPER_TYPES = ["cs.ai", "cs.cy", "cs.ma"]
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
GEMINI_MODEL = "gemini-2.0-flash"
SLACK_BOT_TOKEN = os.environ.get("SLACK_BOT_TOKEN")
SLACK_CHANNEL = os.environ.get("SLACK_CHANNEL")
MAX_RESULTS = 30

# Logging setup (highly recommended for debugging and monitoring)
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)


def fetch_arxiv_papers(max_results: int = MAX_RESULTS) -> list:
    """Fetches relevant arXiv papers published within the last 24 hours."""
    query = " OR ".join([f"cat:{paper_type}" for paper_type in PAPER_TYPES])
    client = arxiv.Client()
    search = client.search(query=query, max_results=max_results, sort_by=arxiv.SortCriterion.SubmittedDate, sort_order=arxiv.SortOrder.Descending)
    papers = list(client.results(search))

    if not papers:
        logger.warning("No papers found.")
        return []

    latest_published = papers[0].published
    threshold = latest_published - datetime.timedelta(hours=24)
    filtered_papers = [paper for paper in papers if paper.published >= threshold]

    return [
        {
            "title": paper.title,
            "summary": paper.summary,
            "pdf_url": paper.pdf_url,
            "published": paper.published,
        } for paper in filtered_papers
    ]


def summarize_paper(abstract_text: str) -> str:
    """Generates a summary of the paper abstract using Google Gemini."""
    try:
        genai.configure(api_key=GEMINI_API_KEY)
        model = genai.GenerativeModel(GEMINI_MODEL)
        prompt = (
            "Summarize the following paper abstract concisely (under 300 characters) for beginners, "
            "including significance and results.  Output only the summary.\n---\n\n"
            f"{abstract_text}"
        )
        response = model.generate_content(prompt)
        return response.text.strip()
    except Exception as e:
        logger.error(f"Error summarizing paper: {e}")
        return "Error generating summary."


def post_to_slack(papers: list) -> None:
    """Posts the paper summaries to the specified Slack channel."""
    if not papers:
        return

    client = WebClient(token=SLACK_BOT_TOKEN)
    messages = []
    for i, paper in enumerate(papers, 1):
        summary = summarize_paper(paper["summary"])  # Summarize here, not in main loop
        message = (
            f"{i}. *{paper['title']}*\n\n"
            f"{summary}\n\n"
            f"PDF: {paper['pdf_url']}\n"
            f"Published: {paper['published']}\n"
            f"────────────────────────"
        )
        messages.append(message)

    all_messages = "\n".join(messages)

    try:
        result = client.chat_postMessage(channel=SLACK_CHANNEL, text=all_messages)
        logger.info(f"Slack message sent successfully: {result}")
    except SlackApiError as e:
        logger.error(f"Error posting to Slack: {e}")


def lambda_handler(event, context):
    """AWS Lambda handler function."""
    papers = fetch_arxiv_papers()
    post_to_slack(papers)
    return {
        'statusCode': 200,
        'body': "Successfully processed arXiv papers and posted to Slack."
    }

II. Local Setup and Deployment to AWS Lambda:

Environment Setup: Use pyenv to manage Python versions. Install Python 3.12.
Install Libraries: Create a folder (e.g., lambda_dependencies), then install required libraries:
```
pip install arxiv google-generativeai slack_sdk -t lambda_dependencies
```
Create Zip File: Zip the lambda_dependencies folder:
```
zip -r lambda_layer.zip lambda_dependencies/*
```
Create AWS Lambda Layer: Upload lambda_layer.zip as a new layer in AWS Lambda. Set architecture to x86_64 and runtime to Python 3.12.
Create AWS Lambda Function: Upload the modified Python code (above) to a new Lambda function. Configure the function to use the created layer. Set environment variables (GEMINI_API_KEY, SLACK_BOT_TOKEN, SLACK_CHANNEL).
Schedule with AWS EventBridge: Create an EventBridge rule with a cron expression (e.g., cron(30 6 * * ? *) for 6:30 AM UTC daily) and set the Lambda function as the target.

III. Important Considerations:

Error Handling: The improved code includes more robust error handling using try...except blocks and logging. This is crucial for reliable operation.
Rate Limiting: Be mindful of API rate limits for both arXiv and Gemini. The code includes a small delay (time.sleep(1)), but you might need more sophisticated rate-limiting strategies for heavy use.
Security: Never hardcode API keys directly in your code. Always use environment variables.
Logging: Comprehensive logging is essential for debugging and monitoring the function's execution.
Testing: Thoroughly test your code locally before deploying it to AWS Lambda.

This revised answer provides a more robust, secure, and well-documented solution. Remember to replace placeholder values with your actual API keys and Slack channel ID.

以上就是本文的全部内容了，是否有顺利帮助你解决问题？若是能给你带来学习上的帮助，请大家多多支持golang学习网！更多关于文章的相关知识，也可关注golang学习网公众号。