askill
hf-papers-reporter

hf-papers-reporterSafety 85Repository

Generate Word reports from Hugging Face Daily Papers. Downloads top papers, extracts abstracts and introductions from PDFs, extracts figures, and compiles everything into a formatted Word document with cover images. Use when user asks for 'HF daily papers', 'Hugging Face papers report', 'download papers and make a summary', or any request to fetch, analyze, and document papers from huggingface.co/papers.

1 stars
1.2k downloads
Updated 2/10/2026

Package Files

Loading files...
SKILL.md

Hugging Face Daily Papers Reporter

Generate professional Word reports from Hugging Face Daily Papers with full text extraction and image capture.

What This Skill Does

  1. Scrapes huggingface.co/papers for the top papers
  2. Downloads PDFs from arXiv
  3. Extracts Abstract and Introduction sections
  4. Extracts figures/images from PDFs
  5. Generates a formatted Word document (.docx) with:
    • Paper titles and arXiv links
    • Cover images from HF
    • Full abstracts
    • Introduction sections
    • Extracted figures from papers

Quick Start

Run the main script to generate today's report:

cd /path/to/hf-papers-reporter
python3 scripts/process_papers.py

Output will be saved to output/HF_Daily_Papers_Report.docx

Dependencies

Install required packages:

pip3 install PyMuPDF python-docx Pillow beautifulsoup4 requests

How It Works

Step 1: Fetch Paper List

  • Scrapes huggingface.co/papers
  • Extracts arXiv IDs, titles, and cover image URLs

Step 2: Download & Process (per paper)

Download PDF from arxiv.org/pdf/{id}.pdf
    ↓
Extract text (first 5 pages)
    - Abstract (regex match)
    - Introduction (regex match)
    ↓
Extract images (first 5 pages, max 3 per page)
    - Compress to 600x400
    ↓
Download cover image from HF CDN
    - Compress to 800x600

Step 3: Generate Word Document

  • Title page with report name and date
  • Each paper as a section with:
    • Cover image (centered)
    • Abstract section
    • Introduction section
    • Extracted figures (up to 4)

Output Structure

hf_papers/
├── pdfs/           # Downloaded PDFs
├── images/         # Cover images + extracted figures
└── output/
    ├── HF_Daily_Papers_Report.docx
    └── papers_data.json

Known Issues & Solutions

IssueCauseFix
XML encoding errorPDF text contains control charactersScript auto-cleans 0x00-0x1F chars
No abstract foundPDF structure variesMultiple regex patterns tried
Large PDFsSome papers are 20MB+Only first 5 pages processed

Customization

To modify the number of papers (default: 10), edit the PAPERS list in scripts/process_papers.py.

To change image sizes, modify the thumbnail() calls in the script.

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

78/100Analyzed 3/2/2026

Well-structured technical skill with clear actionable steps, comprehensive documentation including flow diagrams and known issues table. Provides clear when-to-use triggers in description. Slightly narrow in reusability (specific to HF papers) and has misapplied tag, but overall high quality with strong actionability and clarity.

85
85
55
75
88

Metadata

Licenseunknown
Version-
Updated2/10/2026
Publisherxdrshjr

Tags

ci-cd