140 likes | 297 Views
PROJECT. Topics. Theoretical: Error Performance Analysis for Partitioned Sketch Data Structures Survey: Security and Privacy for Big Data: A Survey and Future Directions Experiments: Citizen Behavior of 7-21 Storm in Beijing, 2012 Music Knowledge Mining
E N D
Topics • Theoretical: • Error Performance Analysis for Partitioned Sketch Data Structures • Survey: • Security and Privacy for Big Data: A Survey and Future Directions • Experiments: • Citizen Behavior of 7-21 Storm in Beijing, 2012 • Music Knowledge Mining • Hadoop for Video Streaming on the Web • MapReduce Jobs For Video Conversion • Your proposed one…
1. Error Performance Analysis for Partitioned Sketch Data Structures • We talked about the time complexity already (in terms of update time) • TASK: • What about error performance? • How to optimally allocate the depth of each sketch (zipfian)? • Start to learn from how CM sketch analyzes its error performance (Theorem 1 and alike) • http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf • Learn about P(d)-CU • http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6574663
Result • Analysis (e.g., mathematical derivations) • Some initial simulation (correctness)
2. Survey • Write a good survey in English on • Security and Privacy for Big Data: A Survey and Future Directions • Cite at least 40+ references (IEEEXplore and ACM Digital Lib) • Paper organization • Classify these works in different categories, from different angles • Extensive comparisons • Identify future directions (i.e., what are the missing pieces?)
Some Materials • http://www-03.ibm.com/security/solution/intelligence-big-data/ • https://ssl.www8.hp.com/ww/en/secure/pdf/4aa4-4051enw.pdf • http://www.emc.com/collateral/industry-overview/big-data-fuels-intelligence-driven-security-io.pdf • http://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_v1.pdf • http://www.trendmicro.com/cloud-content/us/pdfs/business/white-papers/wp_addressing-big-data-security-challenges.pdf • http://scholarlycommons.law.northwestern.edu/njtip/vol11/iss5/1/ • Think about: • Storage • Analysis • Applications • Cloud, Internet-of-Things
3. Analyze Citizen Behaviors of 7-21 Storm in Beijing, 2012 • The Power of Social Networks and Public Crowd • http://v.youku.com/v_show/id_XNDM5NjY1Mzc2.html • Using social network APIs like Sina Weibo • open.weibo.com/wiki • Use the keyword search to retrieve all related data • #望京人赴机场免费救援# ,#双闪车队# (100+) • 菠菜X6,@望京网
4. Music Knowledge Mining • Million Song Dataset • http://labrosa.ee.columbia.edu/millionsong • For Example: to calculate music density • http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/ • YOUR TASK: Predict which songs a user will listen to • http://www.kaggle.com/c/msdchallenge
5. Video Streaming on the Web • Store your video as chunks in HDFS • Case: user suddenly move to a specific part of the video • Seek in the file to position the cursor at a specific location • HDFS can only be accessed through a Hadoop client, Apache server is not. • Apache/FUSE: all file system operations (dir browsing, file opening and content access) are enabled over HDFS content through the FUSE interface. • http://internetmemory.org/en/index.php/synapse/using_hadoop_for_video_streaming/
Result • A demo • Choose a least 1 type of video format (e.g., flv) • A client to play video • A web server (with Apache FUSE) • HDFS to store your videos
6. MapReduce For Video Conversion • Convert huge number of video files from one format to another. • using the open source video converter FFMPEG (http://ffmpeg.org/download.html). • Data stored on HDFS • Create an app doing it (running on Google AppEngine)
Mechanism • Working in group: 3-5 students, clear roles • Email me (ase_bit@yahoo.com) by this Friday (Nov 22) • Team leader, Team members • Topic • Deadline: 28 December 2013! • Deliverable: project report in Chinese • Introduction (motivation, WHY?) • Related Work (What others have done) • Your proposal (HOW?) • Performance Evaluation • Conclusion • Presentation
Suggested Arrangement • Week-1: Define your roles and start literature research • Week-2 and 3: Propose solutions • Week-4 and 5: Implementation and obtain results • Week-6: Write report