Alexander S.
Posts
The Huberman Lab Podcast AI: Custom Knowledge Chatbots

The Huberman Lab Podcast AI: Custom Knowledge Chatbots

Aleksander Szczepaniak
April 06, 2024

The Huberman Lab podcast is easily my favorite. If you haven’t heard of it yet, it’s a podcast where Dr. Andrew Huberman - a Stanford professor - speaks about science-based tools for everyday life. It’s a goldmine of knowledge for improving your health and life quality.

But with over 150 episodes, it's a challenge to digest and recall all the insights. I found myself wishing for a quick-reference database of the podcast's wisdom, accessible anytime.

With the current state of AI, it's entirely possible. I’ve built a custom chatbot that has all the knowledge from the podcast.

How does it work?

Transcripts from all Dr. Huberman’s podcasts have been split into chunks and saved into a JSON array.

 [
   {
      "chunk_id":2391,
      "video_id":"6ZrlsVx85ek",
      "title":"The Science of Healthy Hair, Hair Loss and How to Regrow Hair | Huberman Lab Podcast",
      "content":"of you, and that's caffeine. We all think of caffeine as a stimulant that we drink. [...] Keep in mind that topical caffeine ointments shouldn't necessarily be applied every single day, so this is the sort of",
      "embedding":[
         -0.016596613451838493,
         -0.012290825136005878,
         0.004662381950765848,
         -0.02624797634780407,
         -0.039831873029470444,
         -0.026354622095823288,
         0.011730939149856567
      ]
   },
  ...
]

Embeddings

Embeddings are numerical representations of words, phrases, or sentences that capture their meaning and context. They are used to represent text in a way that can be mathematically processed.

We can upload the embeddings to a vector database, from where they can be queried on demand.

For example, if we wanted to query chunks that are related to question:

“What impact does caffeine have on hair loss?”

Here's what we do:

First we embed the question into a vector.
We query for vectors that have the highest cosine similarity to the embedded question
We receive a list of n chunks that share similar semantic context with the question

Cosine similarity

Cosine similarity measures the cosine of the angle between two vectors to determine how similar their directions are.

In case of LLM embeddings, “orange” and “apple” will be close to each other in the embedding space because they are both fruits and share many related contexts in language, resulting in a high cosine similarity value. On the other hand, "orange" and "bicycle" would likely have a lower cosine similarity since they come from different contexts and are not closely related in meaning.

Source: https://aitechtrend.com/how-cosine-similarity-can-improve-your-machine-learning-models/

GPT Prompt

Now that we have a list of chunks which should be related to the question, we generate a prompt and send it to OpenAI chat completion API.

You will receive fragments from a scientific podcast. Your task it to summarize them in no more than 6 senctences without twisting any facts. If question can't
be answered based on the fragments, acknowledge it and refuse to answer:

Question: "What impact caffeine has on hair loss?"
Fragments:
```
One of the things that caffeine does is it is a fairly potent PDE inhibitor.
By being a potent PDE inhibitor, it indirectly stimulates IGF-1. Why?
...
head to head, topical caffeine application can be as effective as
Minoxidil application without actually lowering things like blood pressure
```
Your answer:

Useful tools

LangChain

LangChain is a framework designed to simplify the creation of applications using large language models. It provides a set of tools that made it possible to:

Load YouTube transcripts directly into a Python array
Split transcripts into chunks
Embed and store chunks in a vector database

LangChain makes it super-easy to load data from different sources

PineCone

Pinecone's vector database is fully-managed, developer-friendly, and easily scalable.

OpenAI

OpenAI exposes many APIs that are crucial in custom chatbot development:

Embedding API - allows to embed sentences into a vector, that can later be stored in a vector database
Chat Completion API - currently based on gpt-3.5, chat completion allows us to process the chunks into a user-friendly answer

Business Application

The applications of custom knowledge chatbots are nearly endless. From customer support to corporate onboading and cold outreach, AI is replacing the mundane tasks allowing us to focus on more creative endeavors.

How will you leverage AI in your business?