GPT-4 Vision API | Web Scraping with GPT-4 Vision API and Puppeteer

GPT-4 Vision API is a multimodal AI model that combines natural language processing (NLP) with image understanding capabilities. It's designed to analyze images and provide textual responses to questions about them, bridging the gap between visual and linguistic information.

In today's video I do some experimentation with the new GPT-4 Vision API and try to scrape information from web pages using it.

00:00 Intro
01:04 Basic usage of GPT-4 Vision API
05:50 Test GPT-4 Vision with image from Unsplash
07:23 Taking a screenshot with Puppeteer
12:35 Test GPT-4 Vision with Wikipedia screenshot
18:14 Test GPT-4 Vision with Google weather info
19:29 Automating URL generation + screenshot taking
33:24 Handling timeouts and retries and making it conversational
44:30 Summarizing BBC news
45:33 Fixing slow loading pages
49:18 Asking for weather information
50:24 Tweaking system message
54:03 Asking for Tesla stock price
56:00 Outro

GitHub: https://github.com/unconv/gpt4v-browsing

Subscribe: https://www.youtube.com/@unconv/featured

#gpt #python #puppeteer