140 likes | 313 Views
BIFE Project Presentation. Totem Team Hongbin Li, Chenhao Wu, Bin Zhou (by alphabet). Outline. Project Brief Introduction Dashboard Demonstration Technical Description Teamwork Experience. Brief Project Description. Goal: Provide review and analysis in Olympic history.
E N D
BIFE Project Presentation Totem Team Hongbin Li, Chenhao Wu, Bin Zhou (by alphabet)
Outline • Project Brief Introduction • Dashboard Demonstration • Technical Description • Teamwork Experience
Brief Project Description • Goal: • Provide review and analysis in Olympic history. • Evaluate Olympic promotion in country and sport levels respectively. • Seek and suggest opportunities for cooperation among countries with common popular sports.
Brief Project Description • Data source • Online database at http://www.databasesports.com/olympics/ • Incomplete • Erroneous • Inconsistent
Brief Project Description • Warehouse • A complete process of ETL and warehouse construction. • Scheme Design
Brief Project Description • Report and Dashboard Design • 8 Attributes, 9 Facts, 30+ metrics. • 30+ reports either for test or use.
Outline • Project Brief Introduction • Dashboard Demonstration • Technical Description • Teamwork Experience
Outline • Project Brief Introduction • Dashboard Demonstration • Technical Description • Teamwork Experience
Technical Description • Raw data extraction • Web crawler developed in C# -> html files • Data preprocessing, LU table generation • Done in text level using C# -> txt files • WH generation and data completion • Done in db level using Java and SQL -> one mdb file ~30K html files Two txt files 1 fact + 7 LU txt files 1 mdb file Athlete level and Event level 70K lines of medal records Normalized ID generated Ready for MSTR Project Consistency Check Redundancy Check
Technical Description • Challenges Encountered • Html files with identical names contain different data • Both files should be processed. • Inconsistent country abbreviations • ICE&ISL, JPN&JAP, IRI&IRN -> manual correction. • Special Game history • Figure skating was held in both summer and winter games -> counted as winter. • Erroneous data • Ridiculous age, 108, -56 -> Correction conducted, still doubtful. • Athlete name ambiguity -> No feasible way to solve completely. • Some results do not exist at all -> Currently no patch applied. • Heterogeneous result format: 10.22s, 18m, 127kg, 33:02, 2-1 • Normalize into double type, keep only distance, weight, time and point measures.
Technical Description • Challenges Encountered (cont’) • Unable to render correct result with Selector and View Filter settings. • Aggregation for attributes from different Hierarchies is not supported. • Currently a prompt is employed. • Unable to combine two prompts in one dashboard for two slightly different datasets. • Currently one dataset is dropped for better user experience. • Unable to count medal numbers both at athlete and event levels. • Currently one table with no athlete info is duplicated for medal counting at event level.
Outline • Project Brief Introduction • Dashboard Demonstration • Technical Description • Teamwork Experience
Teamwork Experience • Everyone gets involved in each phase. • In particular: • Bin initiated the data collection with his expertise in web search. • Chenhao and Hongbin jointly conducted data processing and WH construction. • Chenhao holds main credits in schema design. • Pair work makes sense!