Dataset of Historical Tweets – FAQ

Q: Is filtering for keywords, date and language available when querying the data? I saw that date can be specified in the examples posted on the webpage. Are keywords and language filtering working similarly?

A: Queries are made with SQL and you will have access to the whole database. Date, Language and Location are simple to query. Searching for more than one/two words in the tweet text can be complicated.

Q: I saw on webpage that data is usually collected in seconds, which is really fast. Is there any rate limit or extraction constraints such as xxx maximus / day or others?

A: The data is updated frequently, and you can certainly search the most recent tweets, but since it is only a 1% sample I wouldn’t rely on it for doing any real time analysis for periods less than hour. For example, you can see how trends evolved in the past +1 day.

We set up guardrails with all new accounts to limit over usage and costs. When we set up a account you be be informed of these limits.

Q: I noticed that the queries are made by SQL. I’ve never used SQL before… Is there any possible help that I might get in learning to query?

A: SQL is just really fancy Excel formulas. SQL can be complicated, but since all of the complicated SQL is when making JOINs across tables, and this is only one table I believe you will be able to figure out all you need to know within two hours at most.

  1. Visit this tutorial: https://www.w3schools.com/sql/
  2. Don’t pay for any SQL training, you only need some very basic skills that you can get on YouTube and Google.
  3. Consult the docs

Q: I am having trouble making a query using the console?

A: Make sure your region is set to `us-east-1` and avoid using FireFox (I know, Amazon, get with the times).

Q: I am still having trouble?

A: Email us @ research-programming@wharton.upenn.edu