2

I've used rcurl a fair bit for simple text retrieval and simple scraping, but I'm stumped with google trends. Let's use obama & romney as an example. If you append "&export=1", google trends returns a page displaying the data underlying the graph.

http://www.google.com/trends/explore?q=obama%2C+romney#q=obama%2C%20romney&export=1

On that page, the data lives in the reportContent div, which you can examine by inspecting the element for:

<div id="reportContent" class="report-content"> </div>

More specifically, it is tucked away in the innerHTML and the innertext properties associated with that div. I've never seen this before & am wondering how to access that data with rcurl. I'm also curious, if anyone happens to know, why google does not just present the data in simple html. I'll admit I'm not very knowledgable; I'm reading as much as I can about it, but what I have found out about the innertext property (not much) is not particularly illuminating or helpful in modifying my rcurl script.

6
  • 1
    Why don't you download that in CSV format, e.g. (for the above URL): google.com/trends/…
    – daroczig
    Commented Jun 12, 2013 at 23:10
  • Didn't know how -- thanks! You've answered my question.
    – Don
    Commented Jun 13, 2013 at 0:19
  • Just recently a package named GTrends was published which is based on the RCurl library and is supposed to do what you are trying to accomplish. Have a look at Just Another R Blog
    – hvollmeier
    Commented Jun 13, 2013 at 7:14
  • daroczig: The problem is that if you try to use this programmatically, you get an error for not signing in to google. And when you do sign in via rcurl, it still throws an error having to do with the login.
    – Don
    Commented Jun 14, 2013 at 1:23
  • hvollmeier: this package solves my problem -- thanks!
    – Don
    Commented Jun 14, 2013 at 1:23

1 Answer 1

0

You have to login google in order to get multiple trends data, otherwise, it is easy for you to be blocked by google. Google may consider several factors when blocking you, e.g. IP address/ google accounts/device type / machine or human.

I provide a online google trends scraping service on http://www.datadriver.info/scrapdata/?case_task_id=b333f048be31cad3922f1c8c919700f860f5adbe, Using this service, you won't encounter the boring problem "You have reached your quota limit. Please try again later."

Not the answer you're looking for? Browse other questions tagged or ask your own question.