Dr. Milagros Miceli’s Post

View profile for Dr. Milagros Miceli, graphic

Research Lead at Weizenbaum-Institut. Researcher at DAIR.

View profile for Dylan Baker, graphic

Lead Research Engineer, DAIR Institute

Long in the making, I just put out a blog post on downloading social media data from platforms like X, Facebook, YouTube, and TikTok, at scale! Check it out here: https://lnkd.in/gnhsKDAy The context: social media companies don't make it easy to study them. Even the research APIs they flaunt often have major limitations, which obstructs desperately-needed research & journalism. So, I wrote up my notes on how I personally did social media data collection over the course of the last 2 years, gathering 100s of millions of tweets, posts, and videos. The approach I describe is super generalizable. I hope it's useful to students and other independent researchers embarking on this work themselves. A few points I cover: ➡ Breaking down these big data projects into small pieces can be trickier than you think! When you're dealing with finicky APIs and weird scraping tools, this can take a *lot* of trial-and-error. ➡ Anticipating, identifying, and responding appropriately to errors is the crux of how you design everything else. The name of the game is designing subtasks that can succeed or fail quickly and clearly! ➡ Some cloud compute tools can make this work easier, but aren't necessary. I talk through situations in which I've used tools with all the bells and whistles, and situations where I've done stuff pretty much by hand. There's tons more, with code snippets and fun little doodles 😁 Shoutout to the Coalition for Independent Tech Research for sustaining incredible community and support for people doing this kind of work!

Notes on Scaling Social Media Data Collection

Notes on Scaling Social Media Data Collection

dair-institute.org

To view or add a comment, sign in

Explore topics