I’ve been playing with Gemini Pro 1.5 for a few days, and I think the most exciting feature isn’t so much the token count... it’s the ability to use video as an input.

The ability to extract structured content from text is already one of the most exciting use-cases for LLMs. GPT-4 Video and LLaVA expanded that to images. And now Gemini Pro 1.5 expands that to video.

The ability to analyze video like this feels SO powerful. Being able to take a 20 second video of a bookshelf and get back a JSON array of those books is just the first thing I thought to try.

...as always with modern AI, there are still plenty of challenges to overcome...But this really does feel like another one of those glimpses of a future that’s suddenly far closer then I expected it to be.

