1 / 25

Mining twitter

Mining twitter. 1.9, 1.10 1131036001 김종명. 1.9 Making Robust Twitter Requests. Problem

Download Presentation

Mining twitter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining twitter 1.9, 1.10 1131036001 김종명

  2. 1.9 Making Robust Twitter Requests • Problem • You want to write a long-running script that harvests large amounts of data, such as the friend and follower ids for a very popular Twitterer; however, the Twitter API is inherently unreliable and imposes rate limits that require you to always expect the unexpected. • Solution • Write an abstraction for making twitter requests that accounts for rate limiting and other types of HTTP errors so that you can focus on the problem at hand and not worry about HTTP errors or rate limits, which are just a very specific kind of HTTP error.

  3. Error Codes & Responses

  4. Error Messages • {"errors":[{"message":"Sorry, that page does not exist","code":34}]} • <?xml version="1.0" encoding="UTF-8"?><errors><error code="34">Sorry, that page does not exist</error></errors>

  5. Error Codes

  6. 정상 수행

  7. 존재하지 않는 페이지 404 34

  8. Rate limit reached 429 88

  9. URL Error • DNS 교체

  10. 1.10 • Problem • You want to harvest and store tweets from a collection of id values, or harvest entire timelines of tweets • Solution • Use the /statuses/showresource to fetch a single tweet by its id value; the various /statuses/*_timeline methods can be used to fetch timeline data. CouchDBis a great option for persistent storage, and also provides a map/reduce processing paradigm and built-in ways to share your analysis with others.

  11. 문서 기반분산 데이터베이스 • Cluster Of Unreliable Commodity Hardware

  12. Document-oriented

  13. Document-oriented

  14. Document-oriented DB • MongoDB(C++) • RavenDB(C#) • CouchDB(Erlang)

  15. Document

  16. Document { "_id": "tansac", “_rev”: “1” "profile": { "nickname": "tansanc", "name": { "firstname": "종명", "lastname": "김" }, "birthdate": "1987-05-31“ } }

  17. Schema Free { "_id": "tansac", “_rev”: “2” "profile": { "nickname": "tansanc", "name": { "firstname": "종명", "lastname": "김" }, "birthdate": "1987-05-31” “hasBrother”: true } }

  18. Typical 3-Tier Architecture

  19. 2-Tier Architecture with CouchDB

  20. No Locking • Multi-Version Concurrency Control (MVCC)

  21. /statuses/show • public_timeline() • user_timline() • home_timeline()

  22. tweepy get timeline • API.public_timeline() • Returns the 20 most recent statuses from non-protected users who have set a custom user icon. The public timeline is cached for 60 seconds so requesting it more often than that is a waste of resources. • Parameters: None • Returns: list of class:Status objects • API.home_timeline() • Returns the 20 most recent statuses, including retweets, posted by the authenticating user and that user’s friends. This is the equivalent of /timeline/home on the Web. • Parameters: since_id, max_id, count, page • Returns: list of class:Status objects • API.friends_timeline() • Returns the 20 most recent statuses posted by the authenticating user and that user’s friends. • Parameters: since_id, max_id, count, page • Returns: list of class:Status objects • API.user_timeline() • Returns the 20 most recent statuses posted from the authenticating user. It’s also possible to request another user’s timeline via the id parameter. • Parameters: (id or user_id or screen_name), since_id, max_id, count, page • Returns: list of class:Statusobjects • http://pythonhosted.org/tweepy/html/api.html#timeline-methods

  23. home_timeline()

  24. API.friends_timeline() • API.public_timeline()

  25. API.user_timeline • API.mention_timeline

More Related