Feed aggregator
Apple Launches MacBook Pros with New M5 Pro, M5 Max Chips
Hacked Tehran Traffic Cameras Fed Israeli Intelligence Before Strike On Khamenei
False positives in cybersecurity detection tools drain resources and distract from real threats. Once CISOs understand the root causes of false positives, they can implement strategies to reduce them.
We're about to turn night into day. Is that a good idea?
Article URL: https://www.washingtonpost.com/climate-environment/2026/02/27/satellites-light-pollution-spacex/
Comments URL: https://news.ycombinator.com/item?id=47232936
Points: 1
# Comments: 0
LinkedIn Ragebait
Article URL: https://balanarayan.com/2026/03/03/linkedin-ragebait/
Comments URL: https://news.ycombinator.com/item?id=47232908
Points: 1
# Comments: 0
Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents
Hey HN - we're Tarush, Sidhant, and Shashij from Cekura (https://www.cekura.ai). We've been running voice agent simulation for 1.5 years, and recently extended the same infrastructure to chat. Teams use Cekura to simulate real user conversations, stress-test prompts and LLM behavior, and catch regressions before they hit production.
The core problem: you can't manually QA an AI agent. When you ship a new prompt, swap a model, or add a tool, how do you know the agent still behaves correctly across the thousands of ways users might interact with it? Most teams resort to manual spot-checking (doesn't scale), waiting for users to complain (too late), or brittle scripted tests.
Our answer is simulation: synthetic users interact with your agent the way real users do, and LLM-based judges evaluate whether it responded correctly - across the full conversational arc, not just single turns. Three things make this actually work: Scenario generation + real conversation import - Our scenario generation agent bootstraps your test suite from a description of your agent. But real users find paths no generator anticipates, so we also ingest your production conversations and automatically extract test cases from them. Your coverage evolves as your users do.
Mock tool platform - Agents call tools. Running simulations against real APIs is slow and flaky. Our mock tool platform lets you define tool schemas, behavior, and return values so simulations exercise tool selection and decision-making without touching production systems.
Deterministic, structured test cases - LLMs are stochastic. A CI test that passes "most of the time" is useless. Rather than free-form prompts, our evaluators are defined as structured conditional action trees: explicit conditions that trigger specific responses, with support for fixed messages when word-for-word precision matters. This means the synthetic user behaves consistently across runs - same branching logic, same inputs - so a failure is a real regression, not noise.
Cekura also monitors your live agent traffic. The obvious alternative here is a tracing platform like Langfuse or LangSmith - and they're great tools for debugging individual LLM calls. But conversational agents have a different failure mode: the bug isn't in any single turn, it's in how turns relate to each other. Take a verification flow that requires name, date of birth, and phone number before proceeding - if the agent skips asking for DOB and moves on anyway, every individual turn looks fine in isolation. The failure only becomes visible when you evaluate the full session as a unit. Cekura is built around this from the ground up. Where tracing platforms evaluate turn by turn, Cekura evaluates the full session. Imagine a banking agent where the user fails verification in step 1, but the agent hallucinates and proceeds anyway. A turn-based evaluator sees step 3 (address confirmation) and marks it green - the right question was asked. Cekura's judge sees the full transcript and flags the session as failed because verification never succeeded.
Try us out at https://www.cekura.ai - 7-day free trial, no credit card required. Paid plans from $30/month.
We also put together a product video if you'd like to see it in action: https://www.youtube.com/watch?v=n8FFKv1-nMw. The first minute dives into quick onboarding - and if you want to jump straight to the results, skip to 8:40.
Curious what the HN community is doing - how are you testing behavioral regressions in your agents? What failure modes have hurt you most? Happy to dig in below!
Comments URL: https://news.ycombinator.com/item?id=47232903
Points: 2
# Comments: 0
Vaultara – daily news intelligence reports
Article URL: https://vaultara.co/
Comments URL: https://news.ycombinator.com/item?id=47232895
Points: 1
# Comments: 0
Show HN: Nbdantic a Humble Pydantic for Notebooks
Article URL: https://github.com/ivanbelenky/nbdantic
Comments URL: https://news.ycombinator.com/item?id=47232885
Points: 1
# Comments: 0
VoooAI can now generate a complete set of comics with just one sentence
Article URL: https://voooai.com/
Comments URL: https://news.ycombinator.com/item?id=47232881
Points: 1
# Comments: 0
The View from RSS
Article URL: https://www.carolinecrampton.com/the-view-from-rss/
Comments URL: https://news.ycombinator.com/item?id=47232849
Points: 1
# Comments: 0
Helsing's AI-Powered HX-2 drones hunting targets deep behind the frontline
Show HN: Auth intermediary that cannot authorize. By design
What if your auth system had no admin override?
I built an authorization intermediary that cannot authorize.
How it works: • Hashes requests (opaque, can't see payload) • Forwards to provider (stateless, can't make decisions) • Provider validates everything (exclusive authority)
The intermediary holds no signing keys. The provider holds the only pen. Compromise the intermediary = you get nowhere.
200 lines. Zro dependencies. Apache 2.0.
Core- and basic extension demos Replit.com/@sbw70
Bundled demos Replit.com/@holiwood4420
Repo https://github.com/sbw70/verification-constraints/
Run it. Fork it. Fuck it up. If you can get it to authorize without provider consent, I'd love to see how.
Comments URL: https://news.ycombinator.com/item?id=47232843
Points: 1
# Comments: 0
Draft Barron Trump
Article URL: https://www.draftbarrontrump.com
Comments URL: https://news.ycombinator.com/item?id=47232832
Points: 1
# Comments: 0
I used an AI tool on App Store Connect. Apple terminated my account for fraud
Article URL: https://www.reazy.pro/blog/apple-terminated-developer-account-no-explanation
Comments URL: https://news.ycombinator.com/item?id=47232830
Points: 1
# Comments: 1
These Awesome Concept Gadgets Make MWC an Exciting Place to Be
Show HN: Free tool that scores your interview answers on 4 dimensions
Article URL: https://prepto.tech/score
Comments URL: https://news.ycombinator.com/item?id=47232800
Points: 1
# Comments: 1
Claude Code /voice is not the 'real' thing its just 'transcription'
Article URL: https://github.com/virtengine/bosun/releases/tag/0.37.0
Comments URL: https://news.ycombinator.com/item?id=47232794
Points: 2
# Comments: 1
Codeown – A dedicated place to log your daily coding progress
Article URL: https://codeown.space
Comments URL: https://news.ycombinator.com/item?id=47232787
Points: 1
# Comments: 1
Integral: A Federated, Post-Monetary, Cybernetic Cooperative Economic System [pdf]
Article URL: https://integralcollective.io/wp-content/uploads/2026/01/INTEGRAL-Paper-V0.1-Hi-Res.pdf
Comments URL: https://news.ycombinator.com/item?id=47232782
Points: 1
# Comments: 0
Show HN: Blindfold – PII protection for LLM apps (local regex and cloud NLP)
Article URL: https://blindfold.dev
Comments URL: https://news.ycombinator.com/item?id=47232777
Points: 1
# Comments: 1
