Hacker News

Show HN: I challenged an LLM to find a hidden problem in my telemetry data [video]

Hacker News - Wed, 02/25/2026 - 5:56pm

I hid a performance bug in a Rails cart’s telemetry (no errors, global latency looked fine) and challenged an LLM to find it just by querying the data. It did, then built a dashboard + alert through an MCP server we built at Honeybadger.

I instrumented a tiny Rails shopping cart (5 lines). Insights auto-captures request/controller + ActiveRecord events, and I added two bits of business context: a session_id injected into every event so checkout activity correlates end-to-end, plus a single intent event that records region, cart_total, payment_gateway, and card_type.

The hidden issue: checkout is slow only for Braintree when card_type=MX and region=EU. No errors, uptime green, overall latency looks fine.

In the 6-min video I give the model a vague prompt (“EU customers report slow checkout”). It segments until it finds the outlier, infers abandonment as GET without POST per session_id, estimates impact (~$69 in the current slice, ~$1.2k over a week, rough), then creates a dashboard + alert via the MCP server.

Happy to discuss the MCP architecture, the query language, and what surprised me vs. fell flat.

Comments URL: https://news.ycombinator.com/item?id=47159240

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Dropbox for Your Agents' Memories

Hacker News - Wed, 02/25/2026 - 5:52pm

Article URL: https://memory.store/start

Comments URL: https://news.ycombinator.com/item?id=47159199

Points: 1

# Comments: 0

Categories: Hacker News

Ask HN: Is meaningful privacy possible with hosted AI models?

Hacker News - Wed, 02/25/2026 - 5:49pm

I've been thinking about the privacy tradeoffs when using frontier AI models like Claude or chatgpt.

Even with a VPN, providers still see prompts, and usage is tied to accounts and payment methods linked to identity in some way.

I've been trying to find a way to access these models without creating provider-specific accounts tied to my identity. Ideally, through some kind of intermediary that abstracts identity and doesn't retain prompts.

From a technical and economic perspective, would that kind of setup meaningfully improve privacy, or does it just shift trust? Is meaningful privacy with hosted AI fundamentally unrealistic regardless of architecture?

Comments URL: https://news.ycombinator.com/item?id=47159175

Points: 1

# Comments: 1

Categories: Hacker News

Show HN: AI-runtime-guard – Policy enforcement layer for MCP AI agents

Hacker News - Wed, 02/25/2026 - 5:47pm

I built this after realizing that AI agents with filesystem and shell access can delete files, leak credentials, or execute destructive commands — and there's no enforcement layer stopping them at the execution level.

ai-runtime-guard is an MCP server that sits between your AI agent and your system. It enforces a policy layer before any file or shell action takes effect. No retraining, no prompt engineering, no changes to your agent or workflow.

Your agent can say anything. It can only do what policy allows.

What it does: - Blocks dangerous commands (rm -rf, dd, shutdown, privilege escalation) before execution - Gates risky commands behind human approval via a web GUI - Simulates blast radius for wildcard operations before they run - Creates automatic backups before destructive actions - Full audit trail of everything the agent does

Works with Claude Desktop, Cursor, Codex, and any stdio MCP-compatible client. Default profile is basic protection out of the box — advanced tiers are opt-in.

Validated on macOS Apple Silicon. Linux expected to work, formal validation coming in v1.1.

Would love feedback from anyone running AI agents with filesystem access.

Comments URL: https://news.ycombinator.com/item?id=47159151

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: LazyViewer – TUI code viewer with Git diff previews

Hacker News - Wed, 02/25/2026 - 5:42pm

Hi HN,

While I was coding with Codex I missed a tool that - like Cursor - shows diffs inline while it is able to do both tree navigation easily with keyboard and navigate the source code without changing mode (like in vim).

My favourite tool in TUI was lazygit, but I wanted a way to see the diffs inside the whole source code.

As this looked like a nice little project to vibe code in a few days, I did it and tried to perfect the integration of the few features that I wanted together: - Full easy keyboard + mouse navigation - Multiple tree support (unlike VS code / Cursor) - Inline diff - Ripgrep fast search - Sticky function/class headers

I tried tools like ranger and other navigators, and vim NeoTree preview, but none of these tools were really good for my use case.

Now what I got is an ugly code base in a super functional tool

Comments URL: https://news.ycombinator.com/item?id=47159111

Points: 2

# Comments: 0

Categories: Hacker News

Show HN: My-data.download – Guides to export your data

Hacker News - Wed, 02/25/2026 - 5:40pm

I built an open-source directory of personal data export instructions. Pick a service (Spotify, GitHub, Netflix, etc.), get the exact steps, what data you'll get, and what you can do with it.

It's a static site backed by a single JSON file. Adding a source is just a PR. Contributions are welcome and wanted.

https://my-data.download

https://github.com/janschill/my-data.download

Comments URL: https://news.ycombinator.com/item?id=47159090

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Running hallucination detection on a $200 GPU (RTX 3050, 4GB)

Hacker News - Wed, 02/25/2026 - 5:36pm

I built SIB-ENGINE, a real-time hallucination detection system that monitors LLM internal structure rather than output content.

KEY RESULTS (Gemma-2B, N=1000):

• 54% hallucination detection with 7% false positive rate

• <1% computational overhead (runs on RTX 3050 with 4GB VRAM)

• ROC-AUC: 0.8995

WHY IT'S DIFFERENT:

Traditional methods analyze the output text semantically.

SIB-ENGINE monitors "geometric drift" in hidden states during generation - identifying the structural collapse of the latent space before the first incorrect token is sampled.

This approach offers unique advantages:

• Real-time intervention: Stop generation mid-stream

• Language-agnostic: No semantic analysis needed

• Privacy-preserving: Never reads the actual content

• Extremely lightweight: Works on consumer hardware

HOW IT WORKS: SIB-ENGINE monitors the internal stability of the model's computation. While the system utilizes multiple structural signals to detect instability, two primary indicators include:

Representation Stability: Tracking how the initial intent is preserved or distorted as it moves through the model's transformation space.

Cross-Layer Alignment: Monitoring the consensus of information processing across different neural depths to identify early-stage divergence.

When these (and other proprietary structural signals) deviate from the expected stable manifold, the system flags a potential hallucination before it manifests in the output.

DEMO & CODE:

• Demo video: https://www.youtube.com/watch?v=H1_zDC0SXQ8

• GitHub: https://github.com/yubainu/sibainu-engine

• Raw data: raw_logs.csv (full transparency)

LIMITATIONS:

• Tested on Gemma-2B only (2.5B parameters)

• Designed to scale, but needs validation on larger models

• Catches "structurally unstable" hallucinations (about half)

• Best used as first-line defense in ensemble systems

TECHNICAL NOTES:

• No external models needed (unlike self-consistency methods)

• No knowledge bases required (unlike RAG approaches)

• Adds ~1% inference time vs. 300-500% for semantic methods

• Works by monitoring the process not the product

I'd love feedback on:

• Validation on larger models (Seeking strategic partnerships and compute resources for large-scale validation.)

• Integration patterns for production systems

• Comparison with other structural approaches

• Edge cases where geometric signals fail

This represents a fundamentally different paradigm: instead of asking "is this text correct?", we ask "was the generation process unstable?" The answer is surprisingly informative.

Happy to discuss technical details in the comments!

Comments URL: https://news.ycombinator.com/item?id=47159047

Points: 1

# Comments: 1

Categories: Hacker News

Pages