Feed aggregator

Show HN: Crustdata (YC F24) – Web Search API for Token-Efficient AI Agents

Hacker News - Tue, 02/24/2026 - 10:11pm

Hi HN! We’re Abhilash Chowdhary, Chris Pisarski and Manmohit Grewal. We built Crustdata (YC F24). Today we’re launching our web search API for AI agents, which not only returns the most relevant documents from the web but also maps them to the correct entity (person, company or event). Demo video here https://youtu.be/IouWW97hBN8

If you run agents at scale, tokens become a line item. The web data is the worst input: long pages, repeated content, mixed entities, stale claims. The usual web search -> scrape -> summarize + structure forces the agent to spend tokens doing janitorial work before it can take action.

We’re trying to move that work upstream. We keep a canonical graph (ontology) of people and companies: stable internal IDs, aliases, and relationships. Then we continuously index the web and attach each document to the right entity ID. Example: raw web search for "Stripe pricing changes 2026" returns ~10 results across ~4,000 tokens, mostly redundant. We return 6 deduplicated results in ~1,200 tokens.

This is not just about saving tokens. It also matters because the common failure isn’t “search missed something.” It’s “search found something about the wrong entity.” Names collide. Companies rebrand. Domains move. Press releases get syndicated and look like independent sources. If you treat strings as IDs, you eventually attach evidence to the wrong person/company and the agent takes a confident action based on that mistake.

Under the hood, we run a continuous pipeline that updates the entity-linked index: discover -> fetch -> extract -> dedupe -> entity resolution -> attach -> index . And we serve you this index via our search API.

We didn’t start with web search. We spent ~2 years building verified people + company data from higher-trust sources. That forced us to build identity as a system, not a string. When we tried to bolt on web search and started building our integrated index of documents + people + companies, we ended up with a pile of local fixes: parser tweaks, domain rules, prompt hacks. Each fix helped one case and broke another because identity isn’t local. That’s when we committed to an entity-first index: pay the entity resolution cost once, then reuse it everywhere.

If you’re building AI agents for sales, recruiting, or investing that do a lot of web searches for people and companies, we’d love for you to try our web search APIs. https://crustdata.com/demo

Comments URL: https://news.ycombinator.com/item?id=47146819

Points: 4

# Comments: 0

Categories: Hacker News

Tests Are the New Moat

Hacker News - Tue, 02/24/2026 - 9:41pm
Categories: Hacker News

Show HN: Measuring brand share in AI answers – a Y Combinator case study

Hacker News - Tue, 02/24/2026 - 9:36pm

After working in data science at Google, I built GeoVector to systematically measure how brands appear in AI-generated answers. Our approach is research-based, using position-adjusted scoring grounded in published GEO literature. The Y Combinator report is one example of the analysis we run.

We ran 150 prompts across ChatGPT and Gemini, tracking 21 brands. Three things that surprised us:

1. Techstars outranks YC on ChatGPT despite YC's far stronger Google presence 2. YC's own site accounts for just 8 of 940 AI source references 3. The single most-cited source driving competitor visibility is a blog post on pitchwise.se — not any accelerator's own website

Full report at the link, no signup. GeoVector runs this analysis for any brand or vertical.

Comments URL: https://news.ycombinator.com/item?id=47146588

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Courtyard – Open-source macOS app for local MLX fine-tuning Text

Hacker News - Tue, 02/24/2026 - 9:34pm

I've been building Courtyard, a macOS desktop app designed to make local LLM workflows on Apple Silicon less tedious.

The motivation: I was tired of juggling multiple Python CLI scripts, JSONL formatting, and environment issues just to run a simple LoRA fine-tune on my Mac.

Courtyard is essentially a UI wrapper around mlx-lm combined with data preparation tools. It handles:

Dataset formatting and cleaning (privacy filtering, deduplication). Local LoRA fine-tuning via MLX on Apple Silicon. An integrated chat UI for A/B testing the base model vs. the fine-tuned adapter. Exporting to GGUF or directly to an Ollama runtime. The stack is Tauri 2.x + React + Rust + Python (mlx-lm). It's fully open-source (AGPL).

Repo: https://github.com/Mcourtyard/m-courtyard

I'd love to hear your thoughts on the architecture, MLX implementation, or any edge cases you run into. Happy to answer technical questions.

Comments URL: https://news.ycombinator.com/item?id=47146570

Points: 1

# Comments: 0

Categories: Hacker News

RFC Explorer – Explore over 9000 RFCs

Hacker News - Tue, 02/24/2026 - 9:11pm

Article URL: https://rfcexplorer.net/

Comments URL: https://news.ycombinator.com/item?id=47146426

Points: 1

# Comments: 0

Categories: Hacker News

Pages