210 likes | 226 Views
Delve into the advanced software architecture and technologies powering social network systems with a focus on high performance and scalability. Learn about modern web applications, cloud solutions, and key components like Hadoop, HBase, and Memcached. Explore the functional and technology requirements necessary for creating a professional user network akin to Facebook, Twitter, and LinkedIn. Gain insights into relational databases, data-oriented applications, and the challenges of scaling up dataset sizes. Discover how architecture and design decisions impact the usability and maintenance of social networking platforms.
E N D
Inner Architecture of a Social NetworkingSystem Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner
Whoam I? • Master student of FI MU • MemberofLaSArIS • Webtops • Modern web applications • Cloud (and distributive) solutions • Firsttimespeakeratconference
Social network systems • Hundredsmillionusers => advanced software architecture and technologies • High performance • Scalability • Billionsofrows
Table ofcontents • What and why? • Takeplace • Whichway? • Hadoop • HBase • Memcached • How? • Architecture and design • Wasitworthit? • Testing
Takeplace and SocialNetworking • Web-based service facilitating organization ofevents based on meeting, sharing andcommunication. • Emphasison social and interpersonal interaction • Easytool to comment conferences (feedback) • Professional user network: to create relations amongacademic and professionalworldwithcommoninterests • Analysis and statistics • „To behavelikeFacebookwith relations likeTwitterand to beused as LinkedIn.“
Functionalrequirements • Entities can create asymmetric relations • Posts • Walls and newsfeed • Comments and „like“
Technology requirements • Linux and Cloud • Data-orientedapplication • Highthroughput • Heavyloads • Concurrentrequests • Cachingtool
Relationaldatabases • Fixedschema, ACID, indexes, joins • Problems • scaling up datasetsize • Read/writeconcurrency • Typical use ofMySQL: Production=> Memcached (losing ACID) => Costly server => Denormalizing => „materialize“ most commonqueries=> drop triggers, indexes • (compromisesorexpensive)
Hbase • Inspired by Google BigTable • Regions • 4 dimensions • „multidimensionalsortedpersistentdistributedkey-value map“ • Keys& values = array of bytes • Row, CF, Columns & Version
Example { “aa” : { “cf” : { “c1” : data “c2” : data } “cf2” : { “anyByteArray” : true } }, “ab” : { … } }
Hadoop • SW framework – backboneofdistributedenvironment • MapReduce • HDFS
HBase • No realindexes • Automaticpartitioning • Scalelinearly and automatically • Parallel • Cheap • Not foreveryone • Write once, read many • Built on top of Hadoop
Memcached • Distributed cache • Typical usage public Data getData(String query) { Data data = memcached.get(query); if (data == null) { data = database.get(query); memcached.set(query, data); } return data; }
Architecture (2) • To be used in any system • Interface of services (REST, SOAP, …) • User tables • Services: Follow, Wall, Like and Discussion • Security
Architecture (3) User ID transformation
Data! • Three tables • Entities • Followers, Following, Blocked, Count, News • Walls • Info, text, likes • Discussions (similar to Walls)
Storing data • Row IDs! Performance! • Lexically • Sequence scanner • UID (constant length) • yyyymmddhhmmssSSS • Inverted bytes -> newest to oldest
News feed • One by one (slow) • OR • Store news at each profile (great redundancy) • MEMCACHED! • Post put in DB => search followers => store minimized in Memcached => links to news feed => 1 normal q & 1 batch q to Memcached • TTL (LRU)
Conclusion • Pros • High volume data distribution • Scalability • High throughput • Heavy data load (write once, read many) • Cons • Losing relations, indexes, triggers, … • Responsibility for consistent data • still not surehowitwillbehavewhendeployed on production