Hacker News

Show HN: AIBenchy – Independent AI Leaderboard

Hacker News - Tue, 02/17/2026 - 9:31pm

Hey HN, Like many of you, I'm tired of public AI leaderboards that mostly recycle the same saturated/overfitted benchmarks (MMLU, HumanEval, etc.) and often miss fast/cheap variants or real daily pain points.

A couple days ago I launched AIBenchy — a small, opinionated leaderboard running my own custom tests focused on end-user/dev scenarios that actually trip up models today.

Current tests cover categories like:

- Anti-AI Tricks (classic gotchas like "count the Rs in strawberry", logic traps) - Instruction following & consistency - Data parsing/extraction - Domain-specific tasks - Puzzle solving / edge-case reasoning

Recent additions (just pushed today):

- Reasoning score (new!): A separate judge LLM evaluates the chain-of-thought for efficiency — does it repeat itself, loop, think forever, brute-force enumerate every possibility (looking at you, some Qwen-3.5 runs), or get to the point cleanly? This penalizes "cheaty" high-token reasoning even if the final answer is correct. Goal: reward smart, concise thinking over exhaustive trial-and-error. - Stability metric: Measures consistency across runs (some models flake on the same prompt). Right now the leaderboard has ~20 models (Qwen3.5 Plus currently topping it, followed by GLM 5, various GPT/Claude variants, etc.), but it's super early/WIP:

- Manual runs + small test set - No public submission of tests yet (open to ideas!) - Focused on transparency & practical usefulness over massive scale

I'd love feedback from HN:

- What custom tests / gotchas / use-cases should I add next? - Thoughts on the reasoning score — fair way to judge efficiency, or too subjective? - Models/variants I'm missing (especially fast/cheap ones ignored elsewhere)? - Should I let people submit their own prompts/tests eventually? Thanks for checking it out: https://aibenchy.com

Appreciate any roast/ideas — building this to scratch my own itch.

Comments URL: https://news.ycombinator.com/item?id=47056436

Points: 1

# Comments: 1

Categories: Hacker News

Taste for Makers

Hacker News - Tue, 02/17/2026 - 9:30pm

Article URL: https://paulgraham.com/taste.html

Comments URL: https://news.ycombinator.com/item?id=47056427

Points: 2

# Comments: 1

Categories: Hacker News

Thin Is In

Hacker News - Tue, 02/17/2026 - 9:29pm
Categories: Hacker News

Other money making uses for the DGX Spark?

Hacker News - Tue, 02/17/2026 - 9:27pm

I just got this today and have it setup with vsCode. I was thinking it would be nice to make use of this to generate some income when I'm not using it for dev. The first thought was crypto but I have been out of that business for a few years. Searches on the subject were no help.

I'm looking for ideas. Thanks! jj

Comments URL: https://news.ycombinator.com/item?id=47056401

Points: 1

# Comments: 0

Categories: Hacker News

Tidal Heating of Io

Hacker News - Tue, 02/17/2026 - 9:23pm
Categories: Hacker News

Show HN: Conduit: One Swift interface for every AI provider, on-device and cloud

Hacker News - Tue, 02/17/2026 - 9:21pm

I built Conduit because I was tired of writing the same streaming boilerplate five times for five different AI providers, then rewriting it every time a new one became interesting. So I stopped. The core idea: one protocol hierarchy, every provider. Switch from Claude to a local Llama model running on Apple Silicon with a one-line change. No vendor lock-in at the call site.

The interesting decision was going actor-first from day one. Every provider is a Swift actor. You get data-race freedom enforced at compile time, not by convention. Swift 6.2's strict concurrency makes this a hard guarantee, not a README promise. LangChain can't say that.

The part I'm most proud of — @Generable

@Generable struct FlightSearch { @Guide(description: "Origin airport code") let origin: String

@Guide(description: "Departure date", .format(.date)) let date: Date @Guide(.range(1...9)) let passengers: Int }

let result = try await provider.generate( "Book me a flight to Tokyo next Friday", model: .claude3_5Sonnet, returning: FlightSearch.self )

The macro expands at compile time (via swift-syntax) to generate JSON Schema, streaming partial types, and all conversion boilerplate. The API is deliberately aligned with Apple's new Foundation Models framework — so the same struct works against on-device Apple models on iOS 26 and against Claude or GPT-4 with zero changes.

On-device is a first-class citizen, not an afterthought Most Swift AI SDKs treat cloud as the primary path and shim local models in awkwardly. Conduit treats MLX, llama.cpp, Core ML, and Apple's Foundation Models as fully equal providers. A ChatSession configured with an MLX Llama model and one configured with GPT-4o are indistinguishable at the call site.

Trait-based compilation keeps binary size sane

AsyncThrowingStream all the way down. Cancellation works via standard Swift task cancellation — no special teardown protocol. Back-pressure is handled naturally by the async iterator.

12 providers, one interface Anthropic, OpenAI, Azure OpenAI, Ollama, OpenRouter, Kimi, MiniMax, HuggingFace Hub, MLX, llama.cpp, Core ML, Foundation Models. The OpenAI-compatible ones share a single OpenAIProvider actor — the named variants are thin configuration wrappers, not code forks.

https://github.com/christopherkarani/Conduit Happy to dig into the actor model approach, the macro expansion strategy, or why wrapping LangChain was never an option.

Comments URL: https://news.ycombinator.com/item?id=47056343

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: How I built Timeframe, our family e-paper dashboard

Hacker News - Tue, 02/17/2026 - 9:20pm

I'm proud to share the e-paper family dashboard I've been building over the past decade. I think you might find it interesting. It's open source: https://github.com/joelhawksley/timeframe.

Comments URL: https://news.ycombinator.com/item?id=47056330

Points: 2

# Comments: 0

Categories: Hacker News

Show HN: Fullbleed – Rust HTML/CSS-to-PDF with Deterministic Output+Python CLI

Hacker News - Tue, 02/17/2026 - 8:35pm

Hi HN,

I've been building fullbleed for a while and just shipped v0.2.5. It's a PDF generation engine written in Rust, distributed as a Python wheel.

The short version: HTML/CSS in, PDF out. No headless browser. No cloud. No Chromium. Works fully offline. *Why fullbleed:

Full Bleed is a term that means printed on the edge, or end to end of a page. Thats what I wanted, a full end to end solution that didn't require sys dependencies and unlike browsers, I could ACTUALLY do a print layout full bleed. Some other reasons: -Deterministic, fixed point as the base measurement (.000035mm~) so that things are where I want them to be -Composition and authoring, so I can put templates and the variable data in the same place without 200 lines of glue code -vendored assets- Tired of handling system fonts etc -HTML/CSS layout engine so that I could take advantage of the most commonly used document description convention, but still abstract away from it. -Agent-first design. I was honestly annoyed with how bad agents were at composing PDFS, and wanted an agent-friendly loop. *What makes it different from WeasyPrint / wkhtmltopdf / Prince:*

- *Deterministic output*: SHA-256 hashing on every render. `--repro-record` / `--repro-check` for CI pipelines. Same inputs always produce the exact same PDF bytes. I don't know of any other open-source PDF engine that does this. - *Structured page data*: The engine returns structured JSON alongside PDF bytes — running totals, per-page sums, grand totals. Useful for financial statements where you want to reconcile programmatically before the PDF even lands. - *Rayon-backed parallel batch*: `render_pdf_batch_parallel()` with Python bindings that release the GIL. You can generate 10,000 statements while your Python process does other work. - *VDP / transactional compose*: Overlay variable data onto source PDF templates with feature-driven page binding. Built-in, not bolted on. - *Agent-safe JSON CLI*: Every command emits a versioned schema. `--json-only` mode for CI and LLM agent workflows. `--schema` for introspection.

*The pricing angle:* Prince costs $3,800+/year per server. DocRaptor starts at $15/month but quickly hits $600/month for real volume — and it's cloud-only. fullbleed is AGPL-3.0 free for OSS, and commercial licenses start at $20/month per org with no usage caps.

*Quick start:* ``` pip install fullbleed fullbleed init . python report.py ```

That scaffolds a full project with Bootstrap 5, Inter font, and component-first Python helpers — all vendored and hash-pinned offline.

GitHub: https://github.com/fullbleed-engine/fullbleed-official

Happy to answer questions about the rendering pipeline, the determinism model, or the Python/Rust binding design.

Comments URL: https://news.ycombinator.com/item?id=47055927

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Scanward – Free domain security scanner (SSL, DNS, headers, email auth)

Hacker News - Tue, 02/17/2026 - 8:31pm

I built Scanward to solve a problem I kept running into as a DevOps engineer: checking whether a domain has its security basics covered (SSL config, DNS hygiene, HTTP security headers, SPF/DKIM/DMARC) meant juggling 4-5 different tools.

Scanward runs all 5 checks in one scan and gives you an A-F grade with specific findings. The free public scanner requires no signup.

If you create an account (free tier: 1 domain), it does continuous monitoring and emails you when something changes — cert expiring, grade drops, missing headers, etc.

Tech stack: FastAPI + Celery + PostgreSQL + Redis on Railway. Next.js dashboard on Cloudflare Pages. All scanning uses public data (DNS queries, HTTP headers, SSL handshakes) — no agents to install.

Would love feedback on the scoring methodology and what checks you'd want to see added.

Comments URL: https://news.ycombinator.com/item?id=47055885

Points: 1

# Comments: 0

Categories: Hacker News

Diyclaw.dev

Hacker News - Tue, 02/17/2026 - 8:28pm

Article URL: https://diyclaw.dev

Comments URL: https://news.ycombinator.com/item?id=47055864

Points: 2

# Comments: 0

Categories: Hacker News

Heroku Seems to Be Down

Hacker News - Tue, 02/17/2026 - 8:23pm

Article URL: https://x.com/search

Comments URL: https://news.ycombinator.com/item?id=47055825

Points: 2

# Comments: 4

Categories: Hacker News

Pages