Feed aggregator

Show HN: Lucid – Catch hallucinations in AI-generated code before they ship

Hacker News - Fri, 02/13/2026 - 11:55pm

Hi HN, I'm Ty. I built LUCID because I kept shipping bugs that my AI coding assistant hallucinated into existence.

Three independent papers have proven that LLM hallucination is mathematically inevitable (Xu et al. 2024, Banerjee et al. 2024, Karpowicz 2025). You can't train it away. You can't prompt it away. So I built a verification layer instead.

How it works: LUCID extracts implicit claims from AI-generated code (e.g., "this function handles null input," "this query is injection-safe," "this handles concurrent access"), then uses a second, adversarial AI pass to verify each claim against the actual implementation. You get a report showing exactly what would have shipped to production without verification.

"But can't the verifier hallucinate too?" Yes -- and that's the right question. The benchmarks below were validated by running real test suites, not by trusting LUCID's judgment. The value is that structured claim extraction + adversarial verification catches bugs that a single generation pass misses. The architecture also supports swapping LLM verification for formal methods (SMT solvers, property-based testing) per claim type as those integrations mature.

Benchmarks:

- HumanEval: 86.6% baseline -> 100% pass@5 with LUCID (164/164 problems) - SWE-bench: 18.3% baseline -> 30.3% with LUCID (+65.5%) - Both benchmarks were validated by running actual test suites, not by LLM judgment - LLM-as-judge actually performs worse at higher k values -- it hallucinates false positives

Three ways to use it:

1. MCP Server (Claude Code, Cursor, Windsurf) -- one config line, verification as a native tool 2. GitHub Action -- automated verification on every PR with inline comments 3. CLI -- npx lucid verify --repo /path/to/code

Free tier: 100 verifications/month. Get a key at https://trylucid.dev

Code: https://github.com/gtsbahamas/hallucination-reversing-system Paper: https://doi.org/10.5281/zenodo.18522644 Dashboard: https://trylucid.dev

Comments URL: https://news.ycombinator.com/item?id=47011695

Points: 2

# Comments: 0

Categories: Hacker News

The Challenger Map

Hacker News - Fri, 02/13/2026 - 11:46pm

Article URL: https://challengermap.ca/

Comments URL: https://news.ycombinator.com/item?id=47011651

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Why Playwright-CLI Beats MCP for AI‑Driven Browser Automation

Hacker News - Fri, 02/13/2026 - 11:45pm

Most “AI + browser” setups still bolt MCP tools onto Playwright and hope for the best, so every click dumps full DOMs, accessibility trees, and logs into the model.

That burns tokens, collapses context, and makes long sessions unreliable.

Meanwhile, default Playwright reports start to struggle once you have more than a few dozen e2e tests, so teams drown in HTML reports and flaky failures instead of clear patterns.

The insights at https://testdino.com/blog/playwright-cli/ explores how Microsoft’s playwright-cli keeps browser state external, returns only compact element references and YAML flows, and works with normal npx playwright test plus smarter reporting, so both agents and humans stay fast, cost aware, and predictable.

Comments URL: https://news.ycombinator.com/item?id=47011649

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: ReviewStack – API that aggregates reviews from YouTube and Reddit

Hacker News - Fri, 02/13/2026 - 11:43pm

I built an API that takes a product name, scrapes reviews from YouTube and Reddit, and returns structured sentiment analysis in a single JSON response. Live demo (no signup): https://reviewstack.vercel.app/demo

The response includes a normalized score (1-10), a plain-text summary, pros/cons lists, recurring themes with sentiment, and source attribution linking back to the original content.

The AI layer uses Claude by Anthropic. It reads the collected reviews and extracts structured data. The value is in not having to maintain scraping infrastructure, handle rate limits across platforms, or write your own extraction prompts.

Stack: Next.js API routes, Vercel for hosting, Stripe for billing, YouTube Data API + Reddit JSON endpoints for sourcing, Claude for analysis.

Pricing: free tier at 50 lookups/month, paid plans at $29/mo (500 lookups) and $79/mo (2,000 lookups). Solo/bootstrapped project.

Happy to answer questions about the scraping approach, accuracy, or anything else.

Comments URL: https://news.ycombinator.com/item?id=47011640

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: How to get rid of vagina dependency in 7 days

Hacker News - Fri, 02/13/2026 - 11:42pm

Listen for it for 7 days and you will be completely FREE

Comments URL: https://news.ycombinator.com/item?id=47011634

Points: 2

# Comments: 0

Categories: Hacker News

Op.gg but for Chess

Hacker News - Fri, 02/13/2026 - 11:42pm

Article URL: https://chess-pulse-neon.vercel.app/

Comments URL: https://news.ycombinator.com/item?id=47011630

Points: 1

# Comments: 1

Categories: Hacker News

Interop 2026

Hacker News - Fri, 02/13/2026 - 11:31pm
Categories: Hacker News

Show HN: SQL-tap – Real-time SQL traffic viewer for PostgreSQL and MySQL

Hacker News - Fri, 02/13/2026 - 11:27pm

sql-tap is a transparent proxy that captures SQL queries by parsing the PostgreSQL/MySQL wire protocol and displays them in a terminal UI. You can run EXPLAIN on any captured query. No application code changes needed — just change the port.

Comments URL: https://news.ycombinator.com/item?id=47011567

Points: 2

# Comments: 0

Categories: Hacker News

Ghidra by NSA

Hacker News - Fri, 02/13/2026 - 11:24pm
Categories: Hacker News

Show HN: MicroVibe – minimal JSX web starter

Hacker News - Fri, 02/13/2026 - 11:23pm

Hi HN, I built MicroVibe, a small web starter for people who want JSX + file-based routing + API routes, without pulling in a heavy framework.

What it does today:

- File-based routing (including dynamic and catch-all segments)

- API routes with consistent JSON error shape

- SSR by default, and `mode = "client"` per route when interactivity is needed

- Runtime module cache with file-change invalidation in local dev

Project goal: keep the runtime small and understandable so teams can iterate quickly and still reason about behavior.

I would really value feedback on:

1. Routing/API ergonomics

2. What was confusing or slow in your first 30 minutes using MicroVibe?

3. Where this should clearly differ from Next/Astro/Vite workflows

Comments URL: https://news.ycombinator.com/item?id=47011545

Points: 3

# Comments: 1

Categories: Hacker News

Gauntlet AI (Legit or Scam?)

Hacker News - Fri, 02/13/2026 - 11:22pm

Article URL: https://qualify.gauntletai.com/

Comments URL: https://news.ycombinator.com/item?id=47011541

Points: 1

# Comments: 0

Categories: Hacker News

Pages