Long in the making, I just put out a blog post on downloading social media data from platforms like X, Facebook, YouTube, and TikTok, at scale!
Check it out here: https://lnkd.in/gnhsKDAy
The context: social media companies don't make it easy to study them. Even the research APIs they flaunt often have major limitations, which obstructs desperately-needed research & journalism.
So, I wrote up my notes on how I personally did social media data collection over the course of the last 2 years, gathering 100s of millions of tweets, posts, and videos. The approach I describe is super generalizable.
I hope it's useful to students and other independent researchers embarking on this work themselves.
A few points I cover:
➡ Breaking down these big data projects into small pieces can be trickier than you think! When you're dealing with finicky APIs and weird scraping tools, this can take a *lot* of trial-and-error.
➡ Anticipating, identifying, and responding appropriately to errors is the crux of how you design everything else. The name of the game is designing subtasks that can succeed or fail quickly and clearly!
➡ Some cloud compute tools can make this work easier, but aren't necessary. I talk through situations in which I've used tools with all the bells and whistles, and situations where I've done stuff pretty much by hand.
There's tons more, with code snippets and fun little doodles 😁
Shoutout to the Coalition for Independent Tech Research for sustaining incredible community and support for people doing this kind of work!
Researcher
1wLooking forward to the event Dr. Milagros Miceli, ya quiero ver los resultados en sus distintos formatos también!! abrazoo!