240 likes | 373 Views
Towards Successful Ph.D. Research in Database Systems and Data Mining. Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj July 13, 2014. Outline. Database and data mining: highly promising themes
E N D
Towards Successful Ph.D. Research in Database Systems and Data Mining Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj July 13, 2014
Outline • Database and data mining: highly promising themes • Long history of strong and successful research • Lots of new challenges • Lots of research themes • Selection of promising directions and promising topics • Making your research bigger impact • Discussing, debating, and active brain-storming • Capturing and harvesting the sparks of thought • Towards highly productive research • Learning from others: reviews and judgment • Collaborations and team work
DB and DM: Long History of Strong & Successful Research • Necessity is the mother of invention • Coming from the real application demand • Constantly seeking new and extended applications • Developing core technologies for information systems • A long history of success • Real systems, numerous applications, and big industry • Relational database systems → application-oriented DBMS (spatiotemporal, CRM, banking, health info, …) → data warehouses → data mining → Web search: Google • In-depth and thoroughness in research • Constant search for new, innovative methodologies and algorithms • In-depth study of implementation, optimization, and user needs • Scalability, uncertainty, approximation, streaming, ranking, aggregation, privacy, and security
Still Challenging and Promising • Huge amount of data is mounting up rapidly • Giga-bytes → terabytes → peta-bytes in very fast pace • Data collection and dissemination: sensors, digital cameras, Web • Database and data mining: Various new applications • Data streams, RFID, sensor networks, video/audio data, text and Web, computer/software systems, social networks, biological data, and science/engineering data • Searching, ranking, mining, uncertainty, noise, privacy, security • Database and data mining are still flourishing • Scalable statistical and machine learning methods • Pattern analysis methods • Integrated with database systems, data warehouses, and Web as a natural, hidden process • Still many open research problems and multiple research frontiers
Research Frontiers in Data Mining • Information network analysis • Stream data warehousing & data mining • Pattern mining, pattern usage, and pattern understanding • Warehousing, and mining of moving object data, RFID data, and data from sensor networks • Spatiotemporal and multimedia data mining • Biological data mining • Text and Web mining • Data mining for software engineering and system analysis • Data cube-oriented multidimensional online analysis • Classification and ranking everywhere: databases, Web, documents, and knowledge
A Multidimensional View of Research Themes • Data view • relational data, transactional data, information network data, stream data, spatial, temporal, multimedia (video/audio), moving object data, RFID data, sensor data, biological data, text and Web data, software engineering and system data • Issue view • modeling, management, indexing, retrieval (query), update, integration, warehousing, mining, data cube computation, multidimensional online analysis, security, privacy, … • Methodology view: incremental, parallel, distributed • For mining: statistical, machine learning, decision-tree, MDL, HMM, Naïve-Bayes, … • Application view: Different industries, governments, science & engr. • Adding dimensions: time, space, … • Relaxing assumptions: approximation, uncertainty, …
Outline • Database and data mining: highly promising themes • Long history of strong and successful research • Lots of new challenges • Lots of research themes • Selection of promising directions and promising topics • Making your research bigger impact • Discussing, debating, and active brain-storming • Capturing and harvesting the sparks of thought • Towards highly productive research • Learning from others: reviews and judgment • Collaborations and team work
Selection of Promising Directions • Read survey papers, proceedings, etc., discuss with your friends and professors, and use your own reasoning • Is the direction likely to be much needed and have a bright future? • Do I have sufficient background to work on it? • Am I truly interested in it? • Does the direction attract long-term investigation? • It is OK to change it or adjust it? • May need to constantly adjust your research directions • Ex. Myself, from deductive DBs (recursive query processing) to data mining
Making Your Research Bigger Impact • Necessity is the mother of invention • What is the most needed in the next several years? • Will it have long term impact or fade out soon? • Innovative and thorough research • Is your approach fresh, innovative, somewhat ground-breaking? • Have you examined it systematically? Have you considered alternative or previously studied methods? • Can it be further improved? • Two kinds of research topics: creative vs. improvement • Find new themes (new patterns, new methodologies, new directions) • Improve the existing solutions • Never be tied with the existing solutions • First think on it independently, and work out independently • Believe “always can find new ways to improve it!”
Discussions, Sparks, and Technical Meat • Watch before you leap • Careful and thorough thinking should go before implementing and testing • Form small groups instead of working alone • Slides, emails, and weekly theme-based meetings or teleconferences • Questions on slides, related work, new design, proposed algorithms, try to find ways to improve it • Capture and harvest the sparks of thought • Many good ideas may come from a “weak” spark of thinking • Capture the sparks timely and do not let it slip away
Case 1: ICDE’07 Best Student Paper Award • Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu, and Hong Cheng, “Mining Colossal Frequent Patterns by Core Pattern Fusion”, in Proc. 2007 Int. Conf. on Data Engineering (ICDE'07), Istanbul, Turkey, April 2007 (the BEST STUDENT award) • Identifying the problem that the current technology cannot solve and its applications • Colossal patterns, bio-applications • How the paper was generated? Progressive refinement: • slides → discussions → algorithms → discussions → experiments → new slides • Smart ideas and technical innovation
Case 2: ICDE’06 Best Student Paper Award • Hector Gonzalez, Jiawei Han, Xiaolei Li, and Diego Klabjan, “Warehousing and Analysis of Massive RFID Data Sets”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006. • Necessity is the mother of invention • Working on a key problem: RFID data warehousing • The key solution: deep compression • How deep is deep? Maximal sharing of bulky movements • Multiple designs, refinements, testing and refinement again • slides → discussions → algorithms → discussions → experiments → new slides • Constant brain-storming
Outline • Database and data mining: highly promising themes • Long history of strong and successful research • Lots of new challenges • Lots of research themes • Selection of promising directions and promising topics • Making your research bigger impact • Discussing, debating, and active brain-storming • Capturing and harvesting the sparks of thought • Towards highly productive research • Learning from others: reviews and judgment • Collaborations and team work
Learning from Others: Reviews and Judgment • A very important task for training Ph.D. is the judgment: judging others as well as judging yourself • A good researcher should be first a good judge on research • Reading a good research paper: First read the problem and try it by yourself • Be active at serving as a reviewer: See how others evaluate the work and learn from the good judges • Read survey papers and write your own simple surveys on the problems you intend to work on
Putting All the Eggs in One Basket? • Working on several research problems or only on one? • Initially, more than one theme may help test the water and settle down a promising theme that matches you • Even after you have been focused on one theme, it is good to try slight different problems • Productivity, alternative thoughts, adjustable solutions, and research collaborations • Working with your friends and colleagues • Complement each other on strength and expertise
Seminar Course: Continuous Training/Education • Advanced seminars for DAIS and DM group • Constantly running in every semester • Presenting your own work and get feedbacks from the group • Mostly are recently accepted conference papers • Requiring only one page summary/abstract • Presenting good papers from recent, top conferences: selecting only SIGKDD, SIGMOD, VLDB, ICDE, ICDM, SDM, WWW, …, conference papers published in the last 12 months.
Conference and Journal Reviews • Volunteering on conference and journal coordination • For each conference we served as a PC member, we have one Ph.D. student volunteering as conference coordinator • S/he will communicate with the group members to select papers, collect reviews, and I will have one or more rounds of thorough discussions with the coordinator to make sure the reviews are not biased, comprehensive and in high quality • Also, the reviews will be relatively ranked and balanced • A good exercise for all the participants • Similar exercises for journals and proposal reviews
Semester Summary and Awards • Award summary as a way to promote excellence on research • Summary meeting at each semester • Summary on each student’s Webpage and presentation • Award voting with multiple grades: Gold, silver, bronze and honorable mentioning • Vote after the major conference evaluation results are out • Publish the award voting summary • Presents and web publicity • Award competition also promotes collaborations
Create a Productive Research Group • Selection of promising students • Training and selection of students from classes • Test run with research problems • Watch on sparks and working attitude • Written qualifications vs. oral ones • Team organization • CS591 vs. meetings (start + ending meetings) • Use student’s expertise, strength, and interests • Division of group work: Everyone is in charge • Theme-based dynamic small research groups • Encouraging students on their progress: papers, etc. • Semester summary, web-pages • Award competition
Group Administration/Public Relation Work (Sept. 06-Aug.'07) • Group Webmaster (news, group Web page, pictures, etc.): Tianyi Wu • Web-based research reference collections: Hong Cheng • Hardware, equipment, and software master: Sang Kim • TKDD Information Director: Xiaoxin Yin • DAIS seminar coordinator: Deng Cai • DAISY System administrator: Hector Gonzalez • IlliMine project coordinator: Xiaolei Li • Industry/visitor coordinator: Chao Liu • Conference and journal review coordinator (3): Dong Xin, Jing Gao and Chen Chen • Research proposal coordinator (2): Feida Zhu and Jianlin Feng • Social activity coordinator: Jaegil Lee, Ok-ran Jeong
Work on Promising Research Topics • Selection of promising research topics • Select topics based on your strength and interest • Putting all the eggs in one basket ?― may work on 2-3 topics at the same time • Discussion, debate, and active brain-storming • Capture and harvest the sparks of thought • Two kinds of research topics: creative vs. improvement • Find complete new theme (new patterns, new methodologies, new directions) • Improve the existing solutions • Never be tied with the existing solutions • First think on it independently, and work out independently • Believe “always can find new ways to improve it!”