Python Note2: Search Method in Data mining
If you want to get tweets and other information from any sources in the Twittersphere, you’ll need to create a Twitter Application through your account, whatever you use search method, or stream data method.
So what's difference between Search method and Stream data method?
Search method helps us to dig tweets from the past of database on Twitter.
If you want to get tweets occurring currently or in future, you need to use Stream data method, which we talk in next chapter.
In this chapter, I introduce the search method first .
Search Method
Whatever search method or stream data method, the first thing you need to do to is to get your Twitter API key. You need to visit dev.twitter.com/apps, login to your Twitter account, and create a Twitter application through their site.
Name – This field will be the display name of your application and will be used during user authentication. It will be checked against all other Twitter applications, so make sure to use a unique name.
Description – Write a short description on what you intend to do with the application.
Website – Fill in the full URL (including http://www) of the website you intend to either use the application or make it available for public download.
Callback URL – If you intend to return your users to a specific URL after authentication, specify it here. In basic cases like WordPress widget integration, this field can be left blank.
Read and agree to the Developer Rules of the Road, enter the Captcha phrase, and create!
Application Information
Now that you’ve created your application, Twitter has assigned you a few ID information including API key to work with.
Keep the "Consumer Secret" a secret. This key should never be human-readable in your application.
Next, you need a personal Access Token and Secret key to allow you to unlock personal account functions with your application – just click ‘Create Access Token’ and Twitter will add these data points as well. The access token can be used to make API requests on your own account's behalf. Do not share your access token secret with anyone.
Open your terminal, and input the codes as below:
If everything is smooth, you will start to download the tweets from past database on Twitter, but you will find every time you can only obtain 100 limit tweets: the length of statuses is always 100.
Can't believe it? Let's test more.
We believe one time count has limitation of 100 tweets, which means if we want to get tweets in a continuous week, it is difficult for us to find the exact stop time/seconds, and re-count it up to one week, because the tweets we want in one week is huge.
However, you can see what massy data you get after Token and Count processing in one status(tweet):
As above, you can see the whole messy data from one status(tweet). There comes another question: do you need all messy data? Which data you really want? How can you choose the messy data you only want? When we experience the stream-data collection, we also stand in front of such a Question. This is what we need to solve in later chapter.