Hacker News

Cispe Code of Conduct

Hacker News - Mon, 03/24/2025 - 6:26am

Article URL: https://www.codeofconduct.cloud/

Comments URL: https://news.ycombinator.com/item?id=43459273

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Kreuzberg v3.0 – Modern Python Document Extraction

Hacker News - Mon, 03/24/2025 - 6:24am

I'm excited to announce Kreuzberg v3.0, which was released yesterday.

Kreuzberg is an MIT licensed Python library that extracts text from a wide range of documents (PDFs, images, office files etc.) without depending on external APIs dependencies.

Its different from other libraries and commercial offerings in this space by being designed to be (1) lightweight, (2) CPU orientated, (3) simple to user and (4) have async support as a first class citizen.

The v3.0 release completely reworks the architecture for extensibility. Kreuzberg now now supports:

- Multiple OCR backends (Tesseract, PaddleOCR, EasyOCR), with OCR itself being completely optional. - Support custom extractors and overriding of builtin extractors. - Post-processing and validation hooks. - Extensive PDF metadata extraction. - Optional support for semantic chunking.

There is also a brand new documentation site at https://goldziher.github.io/kreuzberg.

I also published a roadmap for the project, which you can see here: https://github.com/Goldziher/kreuzberg/discussions/24

You can see the repo at https://github.com/Goldziher/kreuzberg - please star it if you find it valuable, since this motivates me!

Comments URL: https://news.ycombinator.com/item?id=43459261

Points: 2

# Comments: 0

Categories: Hacker News

Betavoltaic Device

Hacker News - Mon, 03/24/2025 - 6:10am
Categories: Hacker News

Pages