About us
About This Project
Hi, I am Andrea. I am a software engineer from Italy with a background in Web Information and Data Engineering. I have always been passionate about data, the web, and building small projects that turn curiosity into something tangible.
Every Monday, I enjoy watching HumanSafari and his adventures around the world. If you have seen his videos, you know he has a habit of exploring local supermarkets and, at some point, he started casually mentioning Nutella prices in different countries.
Why not build a simple data pipeline to analyze all his videos and extract every Nutella price he has ever mentioned?
For the curious and the geeks
Here is a more technical breakdown of how this dataset came to life:
- It all started with scraping transcripts from Nicolò’s videos.
- I used a simple regex search for "Nutella" to identify relevant mentions.
- For each match, I extracted a context window and checked for nearby references to prices and weights.
- I normalized the data using regex patterns to extract weight, local price, and currency.
- When possible, I inferred the country from the video title.
- Then I used AI to label each entry into ready, missing data, ambiguous, or false positive.
- After that, I manually reviewed everything, often going back to the exact moment in the video referenced by the transcript.
The result
I may have missed a few entries here and there, but I managed to build a clean and usable dataset without watching thousands of hours of footage.
More importantly, I turned a fun idea into something you can explore, compare, and maybe even contribute to.
If you enjoy this project even half as much as I enjoyed building it, that is already a win 🙂
If you want to explore the project from another angle, open the guide, read the data information or submit a new observation.