640 likes | 753 Views
ArtifactWebs: Navigable Product Structures. Susan Finger and Sharad Oberoi Carnegie Mellon University. Collaborative learning in design. Goal Develop tools that encourage process competence, constructive skills, and reflective practice Web–based collaboration tool
E N D
ArtifactWebs: Navigable Product Structures Susan Finger and Sharad Oberoi Carnegie Mellon University
Collaborative learning in design • Goal • Develop tools that encourage process competence, constructive skills, and reflective practice • Web–based collaboration tool • Meeting capture and summarization • Navigable artifact webs
Collaborative learning in design • Goal • Develop tools that encourage process competence, constructive skills, and reflective practice • Web–based collaboration tool • Meeting capture and summarization • Navigable artifact webs
Collaborative learning in design • Assertions • Most learning in design classes takes place in team meetings and in individual activities undertaken to help meet team goals • Argumentation, co-construction, and reflection are important elements of collaborative learning
Outline • Setting • Engineering design capstone course • Ongoing project to understand collaborative learning by student design teams
Engineering design capstone course • Required for all accredited engineering programs in US • Commonly stated goal: Students should synthesize all the engineering knowledge they have acquired as undergraduates
Engineering design course projects • The projects are usually: • Team-based • Company-sponsored (or client-driven) • Non-competing (each team has an independent project) • Often taught by academics with little project experience and even less design experience • The grade is usually based on • The quality of the final product • The self-reported quality of the team interactions
Engineering design course projects • Students • are novices in their domain knowledge • are novices in their knowledge of the design process • often judge their success by the grade they earn or by the artifacts they produce • Teacher • rarely plans to use the team’s design directly • usually does not attend group meetings • often does not know if a feasible solution exists to the design problem as stated
Engineering design course projects • Team membership can change over time, so it is difficult to keep track of the progress as well as the options explored • Inherent temptation to start the work over from scratch, wasting time and resources • These problems exist for both industry and student teams, but are usually more severe for student teams
Engineering design course projects learning goals activities assessment
Collaborative learning research group • Our focus is to develop tools that encourage process competence, constructive skills, and reflective practice • Need to capture process to understand student learning • Collaboration tools designed for industry rarely work well for student teams • Sequence of two National Science Foundation grants on collaborative learning in design
NSF Grant: Collaborative Learning across Time and Space • Goal: To take advantage of advances in mobile computing to create collaboration tools for student design teams • Means: Create an environment that • facilitates group collaboration for students • enables faculty to peer into the collaborative learning process • Hook: Students design the tools they need for their own collaboration
Kiva collaboration tool • Takes advantage of students willingness to send email, use IM, post on newsgroups, send text messages • Design goal: Create an interface that students perceive to be equivalent to their preferred communication modes; that is: make it feel like chat
Design education testbed • RPCS: Rapid prototyping of computer systems • Interdisciplinary, capstone design course • Ambitious projects, e.g. • GM companion car-driver interface • Context aware cell-phone • Wireless classroom on the Voyager science boat
Capturing in-process data • For 4 years, RPCS has used the Kiva for team collaboration • Light-weight collaboration tool • Combines functions of e-mail and bboards • Widely accepted and liked by student teams; it feels likes chat and meets their needs • Each year’s Kiva has hundreds of threads and thousands of posts and files • We have 4 years of data of all the team conversations and files that would normally go through email or chat
Kiva usage • How do students use the Kiva? • Group coordination (18%) • Knowledge and work exchange (33%) • Preparation of deliverables (24%) • Other (25%)
ADEPT - Assessing Design Engineering Projects Classes with Multi-disciplinary Teams • Develop a physical infrastructure that enables the capture of synchronous and asynchronous interactions of student design teams • The (complete) up-to-date record of all of a team’s interactions will enable us to create ArtifactWebs that integrate and summarize team communications • The ArtifactWebs will provide traceability and accountability for individual contributions to shared knowledge • The ArtifactWebs will enable facilitated improvement of engineering design courses (i.e. the instructor will know when to intervene)
Capturing in-process data • This year, we collected audio files of meetings • Individual speaker • Automated speech to text transcripts • Observation and coding of all team meetings • We have 1 year of data of team conversations (with many gaps)
Objectives • To create ArtifactWebs that • represent the state of the project based on the artifacts described in the project documents • enable designers to search and navigate to find relevant information quickly and efficiently • evolve as the artifact, and the documents about it, evolve.
Design documentation • Design project documents are generated by different team members at different times during a project, so no one is aware of everything that is in all the documents • Locating the right information among evolving documents or reference documents can be time consuming • Even for teams with well-structured document management systems, finding the correct paragraph or document fragment for a given topic can be difficult
Visionary Scenario A student in the wearable computer class is working on developing a text to speech module for a mobile device. Someone tells her that last year’s class developed an OCR (optical character recognition) module for the Trinetra project. She accesses the Trinetra DesignWeb through the class web space.
Visionary Scenario She quickly searches (using standard search) to find the subweb for the OCR module. She then browses within the OCR module exploring various aspects of the OCR design from the previous team.
Visionary Scenario Finally she focuses on the modules on the mobile device. She reads the segment of the final report on the OCR mobile module as well as some of the supporting documents that led to the final decisions in the OCR design.
Challenges • Levels of abstraction • Alternate views for different users • Credibility of source (transcripts of meetings vs. final reports) • Identifying the structure of created knowledge, especially for different versions of the same document • Identifying the design intent
Strategy overview • Divide documents into topic segments • Cluster segments by semantic similarity (e.g. revisions of same paragraph or similar paragraphs from different sources) • Summarize each cluster • Create a diagram that connects the key words in the document summaries • Develop graphical display algorithms that enable users to search and navigate the graphs to access the underlying documents
Segmentation • Divide documents into topic segments • use the explicit structure of the documents (table of contents and internal headings) • use existing text segmentation algorithms such as TextTiling, which performs semantic clustering of terms and topic identification based on clustering • Issue: Size of segments (big or little chunks)
Clustering • Cluster segments by semantic similarity (e.g. revisions of same paragraph or similar paragraphs from different sources) • InfoMagnets, created by Rosé, uses Latent Semantic Analysis and document clustering to automatically generate a bubble diagram, which a user can then incrementally adjust through the interface. • Issue: Non-standard vocabulary across disciplines
Summarizing • Summarize each cluster • Summarization is widely used in web searches • Many potential summarization algorithms exist • Issues: What types of summaries are useful for designers and what types are useful for creating the graphs
Graphing • Create a diagram that connects the key words in the document summaries • Use co-word analysis to find relationships among the key words in the document summaries • Issues: Level of granularity and strength of relationships
Visualizing • Develop graphical display algorithms that enable designers to search and navigate the graphs to access the underlying documents • Issues: Algorithm and interface design
Collocation analysis • Version matching • Credibility mapping • Document structure and associated metadata documents Auto-summarization Design teams Summarized fragments Document fragments Network of versioned fragments
documents Auto-summarization Design teams Summarized fragments Document fragments • Collocation analysis • Version matching • Credibility mapping • Document structure and associated metadata Network of versioned fragments
documents Auto-summarization Design teams Summarized fragments Document fragments • Collocation analysis • Version matching • Credibility mapping • Document structure and associated metadata Network of versioned fragments
Conclusions • Creating ArtifactWebs automatically from student design documents is useful for organizing the information into product structures. • These product structures can be used for developing computational environments that support systematic modeling and also for characterizing design problems. • ArtifactWebs can help us understand the content and nature of information related to various aspects of the artifact and how designers generate and refine it.
Prior work • Previous work on automatic topic segmentation has focused on segmentation of expository text written by professionals • technical articles, such as journal papers • non-technical articles (e.g. blogs) • multi-party dialogues in a synchronous (e.g. chat) or asynchronous environment (e.g. discussion-boards) • Student project reports do not come under any of these categories • Nobody has evaluated student design reports that are often characterized by their authors’ lack of experience in technical writing
Proposed Solution • Navigable ArtifactWebs that will: • Aid instructors and students alike by giving them a bird’s eye view of the evolving design. • Enable team members to explore the ideas that have been generated during the design process, the connections between the ideas, and the evolution of the ideas. • Direct the users to the relevant fragment of a document that contains the detailed discussion of an idea, in addition to searching the relevant topics using a query-based approach.
Challenges • Levels of abstraction • Alternate views for different users • Credibility of source (transcripts of meetings vs. final reports) • Identifying the structure of created knowledge, especially for different versions of the same document • Identifying the design intent
Background • Two broad categories of previous work in topic segmentation: • Lexical Cohesion Models: based on the central idea that the segmentation of text is guided primarily by distribution of terms used in it, in contrast to using cue words for the purpose. Examples: TextTiling (Hearst, 1997) and Latent Semantic Analysis (Landauer and Dumais, 1997) • Content-oriented Models: based on the evaluation of reoccurrence of topic patterns over multiple thematically similar discourses. Examples: Approaches based on Hidden Markov models (Barzilay et al, 2004).
TextTiling (Hearst, 1997) • Block comparison approach: Adjacent pairs of text blocks are compared for overall lexical similarity. The sentences are grouped into blocks of size N/2 each, where the more the terms are similar to each other in the two blocks, the higher the lexical score we get at the gap between them. • Vocabulary introduction approach: Adjacent pairs of text blocks are compared for overall lexical dissimilarity. The sentences are grouped into blocks of size N/2 each, where the more thematically unrelated terms are introduced, the higher the lexical score we get at the gap between them.
TextTiling (Contd) • Lexical chain-based approach: Adjacent pairs of text blocks are compared for identifying the number of active chains, or terms that repeat within threshold sentences and span the sentence gap. This approach is based on the assumption that when a term is repeated in a more or less short distance (called a hiatus), a lexical chain is created between these two occurrences. Thematic boundaries are set in the text at places where the number of chains is minimal. This approach attempts to segment texts at places where the local cohesion is the lowest.
Museli (Arguello et al,2006) • Used for evaluating dialogues. • It combined evidence of topic shifts from lexical cohesion with linguistic evidence such as syntactically distinct features. • It used unigrams, bigrams, POS-tagging and lexical scores as the features to solve the segmentation problem as a binary classification problem where each contribution is classified as NEW_TOPIC if the contribution introduces a new topic and SAME_TOPIC otherwise.
Three degenerative approaches • Classifying all contributions as NEW_TOPIC (ALL), • Classifying no contributions as NEW_TOPIC (NONE), • Classifying contributions as NEW_TOPIC at uniform intervals (EVEN), corresponding to the average reference topic length
Experiments • Data Source: Documents created by students in the Rapid Prototyping of Computer Systems classes at Carnegie Mellon as our data-set.
Experiments • Evaluation Metrics: • Pk measure determines the probability of misclassifying two contributions a distance of k contributions apart from each other by determining if they constitute the same topic segment or not. Lower Pk values are preferred over higher ones. • F-measurerefers to the weighted harmonic mean of precision and recall.
Experiments (Contd) • Gold Standard: We use the section and sub-section headings for student documents as tags for different student document fragments and the boundaries between them as the correct segmentation locations.
Experiments (Contd) • Methodology: • TextTiling: Block comparison approach • Museli: Naïve Bayes classifier with an attribute selection wrapper and the Chi-square test for ranking the attributes using 10-fold cross-validation. [All along we were careful not to include instances from the same document in both the training and test sets on any fold so that the results would not be biased.] We trained a model with the top 1000 features, and applied that trained model to the test data. • Three degenerative approaches
Results TextTiling works best, while Museli worked worst.