Tate Pods/Interviews ~ AI

by @vdutts7

About

Semantic search over any of interviews from Tate (since release). Transcripts may not be perfect (blame YouTube API's stringent ban on non-OAuth caption access lol). This project uses basic Python scripting, a vector database and semantic knn-search.

Videos are transcribed, combined with associated metadata, and pre-processed. The transcipts are chunked and vectorized into a database by tokens and converted to text embeddings with ~ 16k dimensions. There are limitations; for those who care more about this topic, read the Milvus documentation.

Next Steps & Feedback

Some of my plans to improve this project:

  • Moving away from YouTube V3 API towards a faster transcribing solution. Whisper is good but expensive and pytube and other Python packages are probably going to be used once the amoutn of video content exceeds a certain storage capacity.
  • Adding visual elements to search experience (i.e. thumnbail generation specific to the exact timestamp) using Puppeteer or some other solution.

Feel free to send me feedback on Twitter.

Notice & License

  • Follow me on Twitter @vdutts7 for more content like this.

  • Support my open source work by sponsoring me before my API costs explode.

  • Independently created. Not affiliated with Andrew Tate. Not affiliated with YouTube nor any of the companies mentioned above.