Hacker News

Show HN: PaperPod – Fast, no-setup sandboxes for AI agents

Hacker News - Mon, 02/09/2026 - 3:22pm

Hi HN, we built PaperPod: agent-native sandboxes for code execution, live URLs, browser automation, LLM inference, persistent memory, with 50+ built-in tools (ffmpeg, sqlite, etc.) - all via our CLI tool or HTTP endpoints.

The problem: Existing infra tools require SDK setup, API keys, or clicking through UIs, things agents can't do well.

What we built: On-demand sandboxes accessible via our CLI or HTTP endpoints. Try it right now with any coding agent:

npm install -g @paperpod/cli ppod login your@email.com ppod help

What's included: - Code execution (Python, JS, shell) - Live preview URLs - Persistent memory (10MB, survives restarts) - LLM Inference - Browser automation (screenshots, PDFs, scraping via Playwright) - 50+ tools (ffmpeg, sqlite, git, pandoc, imagemagick)

$5 free credits on signup (~14 hours), no credit card required. Built on Cloudflare Containers.

Quick Demo with OpenClaw: https://youtube.com/shorts/gTbyz26mPxk?feature=share

What would make this more useful for your agent workflows?

Comments URL: https://news.ycombinator.com/item?id=46950654

Points: 1

# Comments: 1

Categories: Hacker News

Show HN: Pluto – open-source Experiment Tracker for Neptune users

Hacker News - Mon, 02/09/2026 - 2:41pm

Hey HN! We're Roanak and Andrew from Trainy (YC S23). We build GPU infrastructure for ML teams (scheduling multi-node training jobs on Kubernetes). When Neptune announced they're shutting down, our customers didn't have a clear path forward. The alternatives exist but none of them matched the UI experience Neptune had, especially at scale. So we decided to build Pluto.

Pluto is an open-source experiment tracker based on our fork of MLOp. The main idea is that you can add one import alongside your existing Neptune code and it logs to both platforms simultaneously. You validate that everything matches on real training runs, then when you're ready, set an env var and all Neptune API calls redirect to Pluto. We also built a Neptune exporter for historical runs.

We're focusing heavily on having a UI that stays responsive at scale, since ML teams can have thousands of runs per project and the tracker is open all day. If you find anything slow or buggy in the playground, we'd love to hear about it.

Beyond the compatibility layer, we're working on: tensor logging with on-the-fly visualization (log raw tensors instead of pre-rendered plots), code diffing between runs, and Linear/Jira integration. We just shipped Pluto MCP in alpha, which lets you query experiment data with an LLM.

Live playground (no signup): https://demo.pluto.trainy.ai/o/dev-org/projects/my-ml-projec...

Quickstart: https://docs.trainy.ai/pluto/quickstart

Listed on Neptune's official transition hub: https://docs.neptune.ai/transition_hub/migration/to_pluto

Let us know what you think! We'd especially love feedback from anyone managing experiment tracking across a team.

Comments URL: https://news.ycombinator.com/item?id=46950009

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: LLM-use – orchestrate LLMs for AI agents like OpenClaw, cut costs

Hacker News - Mon, 02/09/2026 - 2:34pm

I built llm-use, an open-source tool to run AI agent workflows across multiple LLMs with routing and cost optimization.

Repo: https://github.com/llm-use/llm-use

OpenClaw-style agents are powerful but get expensive if every step runs on a single high-end model. llm-use helps by: • using a strong model only for planning and final synthesis • running most steps on cheaper or local models • mixing local and cloud models in the same workflow

Example:

python3 cli.py exec \ --orchestrator anthropic:claude-4-5-sonnet \ --worker ollama:llama3.1:8b \ --task "Monitor sources and produce a daily summary"

This setup keeps long-running agents predictable in cost while preserving quality where it matters.

Feedback welcome.

Comments URL: https://news.ycombinator.com/item?id=46949890

Points: 1

# Comments: 0

Categories: Hacker News

Ask HN: How do you test payment webhook edge cases?

Hacker News - Mon, 02/09/2026 - 2:30pm

I work at a PSP. We recently had a production bug where a payment succeeded but our system didn't update — the webhook was delayed 45 seconds. Stripe's test mode doesn't simulate delays, timeouts, or webhook failures. We ended up writing internal mocks, but it's tedious. Considering building a simple tool: configurable delays, webhook failure simulation, request timeline. Curious how others handle this. Do you just write custom mocks? Accept the risk? Use something I don't know about?

Comments URL: https://news.ycombinator.com/item?id=46949828

Points: 1

# Comments: 3

Categories: Hacker News

Pages