Hacker News

Subscribe to Hacker News feed
Hacker News RSS
Updated: 26 min 29 sec ago

Seeking Advice on Improving OCR for Watermarked PDFs in My RAG Pipeline

Sat, 02/28/2026 - 9:28am

I’ve been developing a small RAG pipeline and ran into a specific technical issue involving OCR. I’m using PyMuPDF for extraction, and whenever a PDF contains a centered watermark on each page, the OCR becomes noisy—text breaks, artifacts show up, and the output degrades enough that it affects chunking and retrieval accuracy downstream.

The document is otherwise clean, so I’m trying to understand whether this is a known limitation of PyMuPDF or if there are better approaches for handling watermarked PDFs before OCR. I’m working with an RTX 4000 (8GB VRAM), so I’m also trying to stay within reasonable GPU constraints.

I’d really appreciate any ideas on:

more robust OCR libraries or models that handle watermarks well

preprocessing strategies to suppress watermark text

better extraction pipelines for RAG use cases

or any general advice on improving this part of the system

The project is open-source, and if anyone is interested in digging deeper, finding issues, or contributing improvements, here’s the repository:

GitHub: https://github.com/Hundred-Trillion/L88-Full

If you find it useful, starring the repo helps increase visibility so more people with domain expertise might notice it.

Thanks in advance for any insights.

Comments URL: https://news.ycombinator.com/item?id=47195785

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Code-snippet flashcards for 600 programming cheat sheets

Sat, 02/28/2026 - 9:28am

Today, I'm launching Flashcards across all 600+ topics on CheatSheet++ (cheatsheet-plus-plus.com).

Active recall is crucial for really learning and remembering concepts (especially for interview prep).

To make these actually useful for developers, I focused on a few key features:

Code Snippets Included: Standard flashcards are often too text-heavy. These flashcards feature syntax-highlighted code examples on the back alongside the conceptual explanations. Progressive Difficulties: The decks scale from Beginner to Advanced, adjusting the depth of the questions and the complexity of the concepts accordingly.

This feature is designed to work alongside our existing Interview Q&A section for comprehensive prep.

Search the topic you are currently learning and try it: https://cheatsheet-plus-plus.com

Feedback and critique are incredibly welcome!

Comments URL: https://news.ycombinator.com/item?id=47195784

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: OpenPencil - Open-source vector design tool controlled by AI Agents

Sat, 02/28/2026 - 9:21am

Hey! I'm the creator of OpenPencil, and I'm super excited to share it with you today. We are entering the era of AI Agents, but our design tools are still stuck in the GUI era. We are constantly downloading "final_v9.fig" and manually clicking to tweak UI elements. I wanted to change that.

OpenPencil isn't just another design tool with a magic AI button. It is structurally built for AI.

Here is why it's different:

Agentic Design (MCP Server): You can connect Claude, Cursor, or any MCP-compatible agent directly to your design. Tell your AI IDE to "update the login screen to match the new dark mode theme," and it modifies the design file without you ever touching a mouse.

Design-as-Code: The .op format is pure JSON. Finally, you can Git commit, diff, and PR your design files just like your codebase.

100% Open Source (MIT): No subscriptions, no vendor lock-in. Build on top of it, fork it, make it yours.

I built this because I believe the future of design is Human creativity + Agent execution.

I'd love your feedback! Drop your questions below, and let me know what features you want to see next!

Comments URL: https://news.ycombinator.com/item?id=47195713

Points: 1

# Comments: 1

Categories: Hacker News

Show HN: I built a dashboard to track AI's impact on jobs

Sat, 02/28/2026 - 9:19am

Article URL: https://www.clocktick.ai/

Comments URL: https://news.ycombinator.com/item?id=47195695

Points: 2

# Comments: 0

Categories: Hacker News

Caret: Tab to Complete on Mac

Sat, 02/28/2026 - 9:18am

Article URL: https://www.trycaret.com/

Comments URL: https://news.ycombinator.com/item?id=47195674

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: LazyGravity – I control my local AI coding setup from Discord via CDP

Sat, 02/28/2026 - 9:18am

I built LazyGravity to solve a personal itch: wanting to trigger real coding tasks when I am away from my desk, using an interface I already have (Discord on my phone).

It's a local-first bridge that forwards messages to my desktop AI coding setup, then reports results back. Instead of exposing my dev machine through public ports or relying on cloud relays, it uses Chrome DevTools Protocol (WebSocket) to drive the editor UI locally and securely.

I posted a short video demonstrating how it handles UI generation and hot-reloading here: https://x.com/m_web3/status/2027743280923086968

Repo: https://github.com/tokyoweb3/LazyGravity

I would love to hear your thoughts on this architecture, especially regarding local CDP security boundaries.

Comments URL: https://news.ycombinator.com/item?id=47195667

Points: 1

# Comments: 0

Categories: Hacker News

Pages