Hacker News

Subscribe to Hacker News feed
Hacker News RSS
Updated: 4 min 7 sec ago

Convert URLs and Files to Markdown

Sat, 02/14/2026 - 12:16am

Article URL: https://markdown.new

Comments URL: https://news.ycombinator.com/item?id=47011815

Points: 2

# Comments: 0

Categories: Hacker News

AI Fails at 96% of Jobs (New Study)

Sat, 02/14/2026 - 12:00am
Categories: Hacker News

Show HN: Lucid – Catch hallucinations in AI-generated code before they ship

Fri, 02/13/2026 - 11:55pm

Hi HN, I'm Ty. I built LUCID because I kept shipping bugs that my AI coding assistant hallucinated into existence.

Three independent papers have proven that LLM hallucination is mathematically inevitable (Xu et al. 2024, Banerjee et al. 2024, Karpowicz 2025). You can't train it away. You can't prompt it away. So I built a verification layer instead.

How it works: LUCID extracts implicit claims from AI-generated code (e.g., "this function handles null input," "this query is injection-safe," "this handles concurrent access"), then uses a second, adversarial AI pass to verify each claim against the actual implementation. You get a report showing exactly what would have shipped to production without verification.

"But can't the verifier hallucinate too?" Yes -- and that's the right question. The benchmarks below were validated by running real test suites, not by trusting LUCID's judgment. The value is that structured claim extraction + adversarial verification catches bugs that a single generation pass misses. The architecture also supports swapping LLM verification for formal methods (SMT solvers, property-based testing) per claim type as those integrations mature.

Benchmarks:

- HumanEval: 86.6% baseline -> 100% pass@5 with LUCID (164/164 problems) - SWE-bench: 18.3% baseline -> 30.3% with LUCID (+65.5%) - Both benchmarks were validated by running actual test suites, not by LLM judgment - LLM-as-judge actually performs worse at higher k values -- it hallucinates false positives

Three ways to use it:

1. MCP Server (Claude Code, Cursor, Windsurf) -- one config line, verification as a native tool 2. GitHub Action -- automated verification on every PR with inline comments 3. CLI -- npx lucid verify --repo /path/to/code

Free tier: 100 verifications/month. Get a key at https://trylucid.dev

Code: https://github.com/gtsbahamas/hallucination-reversing-system Paper: https://doi.org/10.5281/zenodo.18522644 Dashboard: https://trylucid.dev

Comments URL: https://news.ycombinator.com/item?id=47011695

Points: 2

# Comments: 0

Categories: Hacker News

The Challenger Map

Fri, 02/13/2026 - 11:46pm

Article URL: https://challengermap.ca/

Comments URL: https://news.ycombinator.com/item?id=47011651

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Why Playwright-CLI Beats MCP for AI‑Driven Browser Automation

Fri, 02/13/2026 - 11:45pm

Most “AI + browser” setups still bolt MCP tools onto Playwright and hope for the best, so every click dumps full DOMs, accessibility trees, and logs into the model.

That burns tokens, collapses context, and makes long sessions unreliable.

Meanwhile, default Playwright reports start to struggle once you have more than a few dozen e2e tests, so teams drown in HTML reports and flaky failures instead of clear patterns.

The insights at https://testdino.com/blog/playwright-cli/ explores how Microsoft’s playwright-cli keeps browser state external, returns only compact element references and YAML flows, and works with normal npx playwright test plus smarter reporting, so both agents and humans stay fast, cost aware, and predictable.

Comments URL: https://news.ycombinator.com/item?id=47011649

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: ReviewStack – API that aggregates reviews from YouTube and Reddit

Fri, 02/13/2026 - 11:43pm

I built an API that takes a product name, scrapes reviews from YouTube and Reddit, and returns structured sentiment analysis in a single JSON response. Live demo (no signup): https://reviewstack.vercel.app/demo

The response includes a normalized score (1-10), a plain-text summary, pros/cons lists, recurring themes with sentiment, and source attribution linking back to the original content.

The AI layer uses Claude by Anthropic. It reads the collected reviews and extracts structured data. The value is in not having to maintain scraping infrastructure, handle rate limits across platforms, or write your own extraction prompts.

Stack: Next.js API routes, Vercel for hosting, Stripe for billing, YouTube Data API + Reddit JSON endpoints for sourcing, Claude for analysis.

Pricing: free tier at 50 lookups/month, paid plans at $29/mo (500 lookups) and $79/mo (2,000 lookups). Solo/bootstrapped project.

Happy to answer questions about the scraping approach, accuracy, or anything else.

Comments URL: https://news.ycombinator.com/item?id=47011640

Points: 1

# Comments: 0

Categories: Hacker News

Pages