Author avatar

Gaurav Singhal

Building a Twitter Bot with Python

Gaurav Singhal

  • May 5, 2020
  • 10 Min read
  • 1,222 Views
  • May 5, 2020
  • 10 Min read
  • 1,222 Views
Data
Data Analytics
Machine Learning
Python

Introduction

Twitter needs no introduction—everyone knows about it. The popular social media platform uses hashtags to represent topics, and anyone is authorized to make their own hashtags. A significant amount of discussion happens every second using these hashtags. Industries such as marketing use Twitter regularly to find out what the general sentiment is towards specific hashtags, and based on that, they offer services to their clients.

Tweets and hashtags need to be collected in order to analyze them. Copying each tweet that mentions a particular hashtag is not an option because hundreds of tweets are published each minute. A smart computer program, or bot, is required to fetch the trending hashtags and tweets and save them for further analysis.

In this guide, you will learn how to make a custom twitter bot that can fetch hashtags and tweets for you.

Getting the Twitter API

To enable your bot to interact with Twitter, you first have to sign up as a developer on Twitter. Navigate here to sign up and get the API keys.

In this guide, you will make a GET request to fetch the tweets and hashtags, and for that, you need "CONSUMER_KEY", "CONSUMER_SECRET", "ACCESS_KEY", and "ACCESS_SECRET" keys.

Before jumping into the code, it's important to understand the rate limits on Twitter requests. Twitter has free and premium developer accounts that vary in how many requests you can make in a 15-minute window. Rate limiting is subject to change, so to get the current specifications, read the official doc.

Required Configurations and Python Libraries

Implement the twitter bot using the Tweepy library. The API documentation is well organized.

Install Tweepy in Python.

1
pip install tweepy
shell

Create the config.json file and put in all the aforementioned API keys.

1
2
3
4
5
6
{
    "CONSUMER_KEY" : "<KEY>", 
    "CONSUMER_SECRET" : "<KEY>", 
    "ACCESS_KEY" : "<KEY>", 
    "ACCESS_SECRET" : "<KEY>"
}
json

Coding the Bot

Twitter keeps hashtags and tweets separated by locations. To refer to different geolocations, the term WOEID (Where On Earth IDentifier) is used. You can find the WOEID for available countries here.

Importing Required Libraries

You'll work with these libraries throughout the guide. Import them into your file.

1
2
3
4
5
6
7
import tweepy
import json
import schedule
import time
import datetime
import os
import csv

Initiating API

Access the config.json and initiate the API with all the access keys. Establishing a connection with the server can sometimes be problematic due to internet connectivity, server response time, etc. It's always good to handle the errors for better clarity.

1
2
3
4
5
6
7
8
9
10
11
def initiate_api():
    try: 
        with open('config.json', 'r') as f:
            config = json.load(f)        
        auth = tweepy.OAuthHandler(config["CONSUMER_KEY"], config["CONSUMER_SECRET"])
        auth.set_access_token(config["ACCESS_KEY"], config["ACCESS_SECRET"])
        api = tweepy.API(auth)
        return api
    except:
        print("Problems with config.json")
        return None
python

Accepting Only English Tweets

For this guide, you'll focus only on English tweets. Tweets in other languages, such as Chinese, Arabic, Hindi, etc. will not be considered. You can include other languages easily in this function without affecting other parts of the application.

1
2
3
4
5
6
7
def isEnglish(text):
    try:
        text.encode(encoding='utf-8').decode('ascii')
    except UnicodeDecodeError:
        return False
    else:
        return True
python

Getting the WOEID of Countries

This function will help to get the WOEID of the locations for specific locations. It takes api object and location list as an argument and returns the WOEID for them.

1
2
3
4
5
6
7
8
9
10
def get_woeid(api, locations):
    twitter_world = api.trends_available()
    places = {loc['name'].lower() : loc['woeid'] for loc in twitter_world};
    woeids = []
    for location in locations:
        if location in places:
            woeids.append(places[location])
        else:
            print("err: ",location," woeid does not exist in trending topics")
    return woeids
python

Fetching the Tweets

This function will get the tweets for the given hashtag. The api object will fetch the tweets for the given query. This code is only fetching English tweets, but you can manipulate this to get other languages as well.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
'''
Getting Tweets for the given hashtag with max of 1000 popular tweets with english dialect
'''
def get_tweets(api, query):
    tweets = []
    for status in tweepy.Cursor(api.search,
                       q=query,
                       count=1000,
                       result_type='popular',
                       include_entities=True,
                       monitor_rate_limit=True, 
                       wait_on_rate_limit=True,
                       lang="en").items():
     
        # Getting only tweets which has english dialects
        if isEnglish(status.text) == True:
            tweets.append([status.id_str, query, status.created_at.strftime('%d-%m-%Y %H:%M'), status.user.screen_name, status.text])
    return tweets
python

It will return the list of tweets for the given query. The tweet will include the ID, hashtag, creation time, user handle, and tweet body.

Getting Trending Hashtags

In this function, you'll fetch the trending hashtags for a given location.

Note: If you are working with free developer account, you have a very limited number of requests. This code is programmed in such a way that if your hourly requests are exhausted, this bot will wait for one hour and then resume.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def get_trending_hashtags(api, location):
    woeids = get_woeid(api, location)
    trending = set()
    for woeid in woeids:
        try:
            trends = api.trends_place(woeid)
        except:
            print("API limit exceeded. Waiting for next hour")
            #time.sleep(3605) # change to 5 for testing
            trends = api.trends_place(woeid)
        # Checking for English dialect Hashtags and storing text without #
        topics = [trend['name'][1:] for trend in trends[0]['trends'] if (trend['name'].find('#') == 0 and isEnglish(trend['name']) == True)]
        trending.update(topics)
    
    return trending
python

Getting Everything Together

This function will pull all the functions together. All the fetched tweets will be saved in the trending_tweets directory. Every time the bot runs, it will save trending hashtags and tweets in different csv files with the timestamp.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def twitter_bot(api, locations):
    today = datetime.datetime.today().strftime("%d-%m-%Y-%s")
    if not os.path.exists("trending_tweets"):
        os.makedirs("trending_tweets")
    file_tweets = open("trending_tweets/"+today+"-tweets.csv", "a+")
    file_hashtags = open("trending_tweets/"+today+"-hashtags.csv", "w+")
    writer = csv.writer(file_tweets)
    
    hashtags = get_trending_hashtags(api, locations)
    file_hashtags.write("\n".join(hashtags))
    print("Hashtags written to file.")
    file_hashtags.close()
    
    for hashtag in hashtags:
        try:
            print("Getting Tweets for the hashtag: ", hashtag)
            tweets = get_tweets(api, "#"+hashtag)
        except:
            print("API limit exceeded. Waiting for next hour")
            #time.sleep(3605) # change to 0.2 sec for testing
            tweets = get_tweets(api, "#"+hashtag)
        for tweet in tweets:
            writer.writerow(tweet)
    
    file_tweets.close()
python

There's a commented code, time.sleep(). Use this if you want to play around with the file saving process.

Main Function

Finally, the main function will call the bot promptly. Due to fewer requests, I am using only one location in the locations list. You can put any number of countries into it and this code will handle everything.

The schedule package is used to keep the program running all the time. Currently, the bot will fetch data at 00:00 every day; however, you can change the schedule according to your needs. There's a commented part that schedules the bot every 10 seconds. Practically, you'll need a huge number of requests and very good hardware infrastructure to handle requests every 10 seconds.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def main():
    ''' 
    Use location = [] list for getting trending tags from different countries. 
    I have limited number of request hence I am using only 1 location
    '''
    #locations = ['new york', 'los angeles', 'philadelphia', 'barcelona', 'canada', 'united kingdom', 'india']        
    locations = ['new york']
    api = initiate_api()
    
    schedule.every().day.at("00:00").do(twitter_bot, api, locations)
    #schedule.every(10).seconds.do(twitter_bot, api, locations)
    while True:
        schedule.run_pending()
        time.sleep(1)
        
if __name__ == "__main__":
    main()
python

Conclusion

In this guide, you have learned how to make a Twitter bot that can fetch trending hashtags and tweets of different geographic locations. With a little more modification to this code, you can use this for any business use case.

Twitter is one of the most popular ways to study human behavior and attitudes towards a topic. To study tweets, we must first collect a lot of them, and fetching them automatically is always the best solution. In the next guide, we'll build a machine learning model that will do the sentiment analysis for these tweets. Click here to read Twitter Sentiment Analysis in Python.

29