Twitter Offers Free Premium Access to Researchers

This service has been discontinued by Twitter.

Twitter

Twitter recently announced their Academic Research product track and now offers full-archive search for users conducting academic research. Individual researchers can apply for an account; a process that takes a few days. Once the account is approved, you can take advantage of premium access that Twitter describes as:

This specialized track on the new Twitter API grants free access to full-archive search and other v2 endpoints, a higher Tweet volume cap, as well as enhanced features and functionality to get more precise, complete, and unbiased data for analyzing the public conversation. To use this product track, you need to submit your use case through the Academic Research application.

Examples

In the following examples I will be using the TwitterAPI (docs). CONSUMER_KEY and CONSUMER_SECRET are keys Twitter will provide to you once your account is created. They are unique to your project and should not be shared. They are not shown in the examples.

You can learn about Twitter’s query syntax and formatting here.

Let’s find a very old mention of “Wharton” on Twitter.

import json
from TwitterAPI import ( TwitterAPI, HydrateType, TwitterRequestError, TwitterConnectionError, TwitterPager,)

api = TwitterAPI(
    CONSUMER_KEY, CONSUMER_SECRET, api_version="2", auth_type="oAuth2", proxy_url=None
)

query = 'Wharton'
try:
    pager = TwitterPager(
        api,
        "tweets/search/all",
        {
            "query": {query},
            "max_results": 10,
            "start_time": "2006-04-21T00:00:00Z",
            "end_time": "2007-01-01T00:00:00Z",
        }
    )
    for item in pager.get_iterator(new_tweets=False):
        print(json.dumps(item, indent=2))
        exit()

except TwitterRequestError as e:
    print(e.status_code)
    for msg in iter(e):
        print(msg)

except TwitterConnectionError as e:
    print(e)

except Exception as e:
    print(f'Error: {e}')

Returns:

{
  "id": "510693",
  "text": "Seen on BART: woman with laptop, Wharton backpack, Business Week, and Long Tail in hardback. New in town?"
}


We are authenticating using our CONSUMER_KEY and CONSUMER_SECRET making sure to set the api_version="2", auth_type="oAuth2", proxy_url=None options. These options ensure you are able to take advantage of the full-archive search. The example  is also using the TwitterPager object. This will handle rate limiting and pagination rules enforced by Twitter. From the docs:

Whether one is searching for tweets with search/tweets or downloading a user’s timeline with statuses/user_timeline Twitter limits the number of tweets. So, in order to get more tweets, one must make successive requests and with each request skip the previously acquired tweets. This is done by specifying the tweet id from where to start. Twitter has a description here. If you don’t want to implement paging yourself, you can use the TwitterPager helper class with any REST API endpoint that returns multiples of something.

But in this example we stop searching once we find a single tweet and the only fields returned are the id and text. So what if you wanted more tweets with additional fields? We need to expand and hydrate our tweets.

  • Expand: Requesting additional fields from the tweet object.
  • Hydrate: Filling in details from related objects, like the user.

Let’s take a look at the same example as above, but now we want all tweets that mention Wharton from the beginning of the the Fall term back in 2019.

import json
from TwitterAPI import ( TwitterAPI, HydrateType, TwitterRequestError, TwitterConnectionError, TwitterPager,)

api = TwitterAPI(
    CONSUMER_KEY, CONSUMER_SECRET, api_version="2", auth_type="oAuth2", proxy_url=None
)
EXPANSIONS = "author_id,referenced_tweets.id,referenced_tweets.id.author_id,in_reply_to_user_id,attachments.media_keys"
MEDIA_FIELDS = (
    "duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics"
)
TWEET_FIELDS = "created_at,author_id,public_metrics,source"
USER_FIELDS = (
    "description,name,username,created_at,location,url,verified,public_metrics"
)
query = 'Wharton'
try:
    pager = TwitterPager(
        api,
        "tweets/search/all",
        {
            "query": {query},
            "expansions": EXPANSIONS,
            "media.fields": MEDIA_FIELDS,
            "tweet.fields": TWEET_FIELDS,
            "user.fields": USER_FIELDS,
            "max_results": 500,
            "start_time": "2019-09-01T00:00:00Z",
            "end_time": "2019-09-02T00:00:00Z",
        },
        hydrate_type=HydrateType.APPEND,
    )
    count = 0
    last_tweet = None
    for item in pager.get_iterator(new_tweets=False):
        count += 1
        last_tweet = item
    print(f'found {count} tweets')
    print(json.dumps(last_tweet, indent=2))
    
except TwitterRequestError as e:
    print(e.status_code)
    for msg in iter(e):
        print(msg)

except TwitterConnectionError as e:
    print(e)

except Exception as e:
    print(f'Error: {e}')

This example prints out the count of tweets found (464) and the last tweet found:

found 464 tweets
{
  "referenced_tweets": [
    {
      "type": "quoted",
      "id": "1167948496667676672",
      "id_hydrate": {
        "text": "Donald Trump accidentally called Sean Hannity a \"shoe\" today, and it's not even among the most embarrassing things he's done today.",
        "id": "1167948496667676672",
        "public_metrics": {
          "retweet_count": 191,
          "reply_count": 42,
          "like_count": 942,
          "quote_count": 7
        },
        "source": "Twitter Web App",
        "created_at": "2019-08-31T23:53:15.000Z",
        "author_id": "15115280"
      }
    }
  ],
  "text": "Still waiting to see evidence of that \"Wharton School of Business\" education, of which #Trump was so proud & made endless references to in his campaign-stops, to show-up!\n\nJust show-up ONCE, please! https://t.co/3cGVUusaQL",
  "id": "1167950541395124229",
  "public_metrics": {
    "retweet_count": 0,
    "reply_count": 0,
    "like_count": 0,
    "quote_count": 0
  },
  "source": "Twitter Web App",
  "created_at": "2019-09-01T00:01:23.000Z",
  "author_id": "896400632",
  "author_id_hydrate": {
    "username": "PaulaDuvall2",
    "public_metrics": {
      "followers_count": 7751,
      "following_count": 7226,
      "tweet_count": 212135,
      "listed_count": 92
    },
    "created_at": "2012-10-22T00:07:59.000Z",
    "verified": false,
    "name": "Paula Duvall",
    "description": "",
    "id": "896400632",
    "url": ""
  }
}

We plan to maintain our historical database but this will be the prefered way to obtain large amounts of tweets moving forward.