Feed aggregator

Amazon's Echo Pop Just Returned to Its All-Time Lowest Price Before Black Friday

CNET Feed - Mon, 11/04/2024 - 2:43pm
This petite smart speaker is currently 55% off the regular price at both Best Buy and Amazon.
Categories: CNET

For Windows 10 Holdouts, One More Year of Support Will Cost $30

CNET Feed - Mon, 11/04/2024 - 2:30pm
Tech support won't be offered for Windows 10, but security updates will roll out past 2025 with the Extended Security Updates option.
Categories: CNET

Down in the Mantle

Hacker News - Mon, 11/04/2024 - 2:16pm
Categories: Hacker News

Black Friday Gaming Deal: Backbone's Nifty Cloud Gaming Controllers Are 40% Off

CNET Feed - Mon, 11/04/2024 - 2:14pm
You can snag the PlayStation Edition of this collapsible mobile controller for just $60 right now and take your games on the go.
Categories: CNET

Show HN: Fuzzy deduplicate any CSV using vector embeddings

Hacker News - Mon, 11/04/2024 - 2:07pm

I made an app to fuzzy-deduplicate my Google Sheets and CRM records

- No manual configuration required

- Works out-of-the-box on most data types (ex. people, companies, product catalog)

Implementation details:

- Embeds records using an E5-family model

- Performs similarity search using DuckDB w/ vector similarity extension

- Does last-mile comparison and merges duplicates using Claude

Demo video: https://youtu.be/7mZ0kdwXBwM

Github repo (Apache 2.0 licensed): https://github.com/SnowPilotOrg/dedupe_it

Background story: My company has a table for tracking leads, which includes website visitors, demo form submissions, app signups, and manual entries. It’s full of duplicates. And writing formulas to merge those dupes has been a massive PITA.

I figured that an LLM could handle any data shape and give me a way to deal with tricky custom rules like “treat international subsidiaries as distinct from their parent company”.

The challenging thing was avoiding an NxN comparison matrix. The solution I came up with was first narrowing down our search space using vector embeddings + semantic similarity search, and then using a generative LLM only to compare a few nearest neighbors and merge.

Some cool attributes of this approach:

- Can work incrementally (no reprocessing the entire dataset)

- Allows processing all records in parallel

- Composes with deterministic dedupe rules

Lmk any feedback on how to make this better!

Comments URL: https://news.ycombinator.com/item?id=42044962

Points: 2

# Comments: 0

Categories: Hacker News

Walmart’s $15 Roku Smart Bulb Deal Will Light Up the Room, Not Your Wallet

CNET Feed - Mon, 11/04/2024 - 2:07pm
Add more ambience to your home without spending big bucks with these discounted Roku smart bulbs.
Categories: CNET

My Favorite Bluetooth Speaker Is $30 Off and Makes for a Great Gift

CNET Feed - Mon, 11/04/2024 - 2:05pm
I awarded the Anker Soundcore Boom 2 a CNET Editors' Choice in 2024. You can get it for $100, or $30 off its list price of $130.
Categories: CNET

Pages