Estafette
Compose Login
You are browsing eu.zone1 in read-only mode. Log in to participate.
rss-bridge 2024-11-04T16:00:00+00:00

177: Vector Databases

Intro topic:  Buying a CarNews/Links:Cognitive Load is what Mattershttps://github.com/zakirullin/cognitive-loadDiffusion models are Real-Time Game Engineshttps://gamengen.github.io/Your Company Needs Junior Devshttps://softwaredoug.com/blog/2024/09/07/your-team-needs-juniorsSeamless Streaming / Fish Speech / LLaMA OmniSeamless: https://huggingface.co/facebook/seamless-streamingFish: https://github.com/fishaudio/fish-speech LLaMA Omni: https://github.com/ictnlp/LLaMA-Omni Book of the ShowPatrick: Thought Emporium Youtubehttps://youtu.be/8X1_HEJk2Hw?si=T8EaHul-QMahyUvQJason: Novel Mindshttps://www.novelminds.ai/Patreon Plug https://www.patreon.com/programmingthrowdown?ty=hTool of the ShowPatrick: Escape Simulatorhttps://pinestudio.com/games/escape-simulator/Jason: Cursor IDEhttps://www.cursor.com/Topic: Vector Databases (~54 min)How computers represent data traditionallyASCII valuesRGB valuesHow traditional compression worksHuffman encoding (tree structure)Lossy example: Fourier Transform & store coefficientsHow embeddings are computedPairwise (contrastive) methodsForward models (self-supervised)Similarity metricsApproximate Nearest Neighbors (ANN)Sub-Linear ANNClusteringSpace Partitioning (e.g. K-D Trees)What a vector database doesPerform nearest-neighbors with many different similarity metricsStore the vectors and the data structures to support sub-linear ANNHandle updates, deletes, rebalancing/reclustering, backups/restoresExamplespgvector: a vector-database plugin for postgresWeaviate, Pinecone Milvus

★ Support this podcast on Patreon ★
]]

---

Programming Throwdown

Patrick Wheeler and Jason Gauci

176: MLOps at SwampUp

178: Working from Home

Download Audio File

Intro topic:  Buying a Car

News/Links:

- Cognitive Load is what Matters

- https://github.com/zakirullin/cognitive-load

- Diffusion models are Real-Time Game Engines

- https://gamengen.github.io/

- Your Company Needs Junior Devs

- https://softwaredoug.com/blog/2024/09/07/your-team-needs-juniors

- Seamless Streaming / Fish Speech / LLaMA Omni

- Seamless: https://huggingface.co/facebook/seamless-streaming
- Fish: https://github.com/fishaudio/fish-speech
- LLaMA Omni: https://github.com/ictnlp/LLaMA-Omni

Book of the Show

- Patrick:

- Thought Emporium Youtube

- https://youtu.be/8X1_HEJk2Hw?si=T8EaHul-QMahyUvQ

- Jason:

- Novel Minds

- https://www.novelminds.ai/

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h

Tool of the Show

- Patrick:

- Escape Simulator

- https://pinestudio.com/games/escape-simulator/

- Jason:

- Cursor IDE

- https://www.cursor.com/

Topic: Vector Databases (~54 min)

- How computers represent data traditionally

- ASCII values
- RGB values

- How traditional compression works

- Huffman encoding (tree structure)
- Lossy example: Fourier Transform & store coefficients

- How embeddings are computed

- Pairwise (contrastive) methods
- Forward models (self-supervised)

- Similarity metrics
- Approximate Nearest Neighbors (ANN)
- Sub-Linear ANN

- Clustering
- Space Partitioning (e.g. K-D Trees)

- What a vector database does

- Perform nearest-neighbors with many different similarity metrics
- Store the vectors and the data structures to support sub-linear ANN
- Handle updates, deletes, rebalancing/reclustering, backups/restores

- Examples

- pgvector: a vector-database plugin for postgres
- Weaviate, Pinecone
- Milvus

---

[Original source](https://www.programmingthrowdown.com/episodes/177-vector-databases/)

Reply