1 / 42

Yuen-Hsien Tseng and Chun-Yen Chang National Taiwan Normal University 2009/12/21

Automatic Content Analysis of Publications in Education-oriented ICT ( Information and Communication Technologies ). Yuen-Hsien Tseng and Chun-Yen Chang National Taiwan Normal University 2009/12/21. Purposes. Summarizing the background of a research field When, who, where, whom, what

reilly
Download Presentation

Yuen-Hsien Tseng and Chun-Yen Chang National Taiwan Normal University 2009/12/21

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Content Analysis of Publications inEducation-oriented ICT (Information and Communication Technologies) Yuen-Hsien Tseng and Chun-Yen Chang National Taiwan Normal University 2009/12/21

  2. Purposes • Summarizing the background of a research field • When, who, where, whom, what • Providing overview of the research topics • Breakdown analysis of various actors • Tracking trends of the knowledge development • Offering evidence-based, data-driven, bottom-up information for panel discussion, strategic planning, or decision making for the field • Suggesting hypotheses for further exploration • Beneficial to both novices and experts

  3. Publications Downloaded from ISI Web of Knowledge Data are downloaded on 2008/05/09. After deleting 38 articles from 1989 and 2008, 4036are used for analysis.

  4. FN ISI Export Format VR 1.0 PT J AUTseng, SC Tsai, CC AF Tseng, Sheng-Chau Tsai, Chin-Chung TIOn-line peer assessment and the role of the peer feedback: A study of high school computer course SOCOMPUTERS & EDUCATION LA English DT Article DEinteractive learning environments; secondary education; learning communities; improving classroom teaching; peer assessment IDWORLD-WIDE-WEB; ASSESSMENT SYSTEM; HIGHER-EDUCATION; STUDENTS; THINKING; SCIENCE; SELF ABThe purposes of this study were to explore the effects and the validity of on-line peer assessment in high schools and … C1Natl Chiao Tung Univ, Inst Educ, Hsinchu 300, Taiwan. Natl Chiao Tung Univ, Ctr Teacher Educ, Hsinchu 300, Taiwan. RP Tsai, CC, Natl Chiao Tung Univ, Inst Educ, 1001 Ta Hsueh Rd, Hsinchu 300, Taiwan. EM cctsai@mail.nctu.edu.tw CRROTH WM, 1997, SCI EDUC, V6, P373 DOCHY F, 1999, STUD HIGH EDUC, V24, P331 … NR23 TC2 PU PERGAMON-ELSEVIER SCIENCE LTD PI OXFORD PA THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB, ENGLAND SN 0360-1315 J9COMPUT EDUC JI Comput. Educ. PD DEC PY2007 VL49 IS 4 BP1161 EP 1174 DI 10.1016/j.compedu.2006.01.007 PG 14 SCComputer Science, Interdisciplinary Applications; Education & Educational Research GA 218OF UTISI:000250024100013 ER ISI WoK Publication Record Only the fields in red color are used. Cited References are used in the bibliographic coupling for topic clustering and citation tracking

  5. Overview Analysis • General trend of ICT research • Most productive authors, institutes, countries • Most cited (influential) references, authors, journals

  6. Overview: No. of Articles Per Year Data are from the PY field of each record.

  7. Year Production: Top 8 Countries

  8. Most Productive Authors: Top 10 AUTseng, SC Tsai, CC Tseng, SC : 1 Tsai, CC : 1 AUTseng, SC Tsai, CC Tseng, SC : 0.5 Tsai, CC : 0.5 NC=Normal Count: each co-author is counted as a single author FC=Fractional Count: all the co-authors are counted as a single author

  9. Most Productive Institutes: Top 15 Data are from the C1 field of each record: C1Natl Chiao Tung Univ, Inst Educ, Hsinchu 300, Taiwan

  10. Most Productive Countries: Top 10 There are 87 countries in the 4036 papers, There are 4036-3336=700 records without the country information Data are from the C1 field of each record: C1Natl Chiao Tung Univ, Inst Educ, Hsinchu 300, Taiwan

  11. Most Cited References Data are from the CR field of each record: CRROTH WM, 1997, SCI EDUC, V6, P373

  12. Most Cited Authors Data are from the CR field of each record: CRROTH WM, 1997, SCI EDUC, V6, P373

  13. Most Cited Journals Data are from the CR field of each record: CRROTH WM, 1997, SCI EDUC, V6, P373

  14. Breakdown Analysis • What have we been doing for the past 18 years in this field? • What topics are studied in these 4036 publications? • We can analyze them by grouping similar articles into concepts, which in turn can be grouped into topics, ideally.

  15. Bibliographic Coupling • Similarity between two articles can be computed by the ratio of common references they both cites • It is believed that the more the same references two articles cite, the more likely the two articles are about the same topic (concept) • Example: • If X cites 10 references and Y cites 15 references, and if there are 5 common references among them, • then the similarity between X and Y is 2*5/(10+15)=10/25=0.4, or X and Y are bibliographically coupled with a measure of 0.4.

  16. Time D1 D2 Dn D1 Ref. M D2 Dn Ref. 2 Ref. 1 Doc. A Doc. B bibliographic couple Similarity between Articlesfor Clustering M=72723 for the 4036 records Once similarity between each pair of documents is computed, the resulting similarity matrix can be used for clustering (or factor analysis) to identify major knowledge structures underlying these documents.

  17. Topics Concepts Docs. Multi-Stage Clustering A conceptual sketch of the multi-stage clustering approach, where dashed white circles denote outliers.

  18. Cluster Labeling Grouped documents need some concise descriptors/titles to help reveal their contents. By sorting the correlations Co(T,C) of all the terms in cluster C in decreasing order, the top few terms (e.g. top 5 terms) can be selected as the cluster’s descriptors. TP: Number of documents in C containing term T FP: Number of documents not in C, but containing T FN: Number of documents in C, but do not contain T TN: Number of documents not in C, and do not contain T

  19. The Resulting 6 Topics: Topic 1 • 1(5): (contain 5 sub-topics) • 68 : 993筆 : 0.015085 (effect: 35.9, instructional: 30.1, design: 27.6, computer-based: 25.0, multimedia: 24.0) • 7 : 894筆 : 0.050087 (effect: 23.1, program: 22.6, instructional: 19.7, design: 19.1, problem-solving: 17.5) • 1 : 779筆 : 0.085113 (logo: 17.9, effect: 17.2, learn environment: 17.1, program: 15.9, learner control: 14.0) • 10 : 248 : 577筆 : 0.018030(effect: 18.6, learn: 16.3, environment: 14.5, learn environment: 13.6, collaborative learn ... • 15 : 497 : 202筆 : 0.012140(multimedia: 8.8, hypermedia: 8.1, instructional: 5.2, cognitive: 4.8, learn: 4.8) • 17 : 173 : 115筆 : 0.021531(instructional-design: 5.8, instructional design: 3.3, model: 3.3, technology: 3.2, expert ... • 26 : 99筆 : 0.024366 (instruction: 3.2, computer-assisted instruction: 2.8, college: 2.7, computer-based: 2.7, effect: 2.4) • 5 : 396 : 48筆 : 0.014308(instruction: 3.4, computer-assisted instruction: 2.8, computer science: 2.1, teletrain: 2.1 ... • 7 : 32 : 51筆 : 0.037322(acceptance: 2.4, continuance: 2.1, arc: 2.1, self-regulation: 1.6, effectivene: 1.5) ** The 6 topics can be further breakdown into 23 sub-topics.

  20. The Resulting 6 Topics: Topic 2 • 2(5):(contain 5 sub-topics) • 104 : 464筆 : 0.011259 (higher education: 18.3, online: 15.7, communication: 13.2, web-based: 12.1, design: 10.4) • 54 : 424筆 : 0.018029 (peer: 13.7, online: 13.0, higher education: 12.7, communication: 11.0, assessment: 10.3) • 23 : 397筆 : 0.025689 (peer: 17.0, higher education: 16.0, discussion: 11.1, online: 10.4, communication: 8.9) • 5 : 359筆 : 0.050810 (peer: 22.2, assessment: 11.8, higher education: 11.5, peer assessment: 11.0, collaborative learn: 7.6) • 16 : 654 : 209筆 : 0.010010(communication: 7.6, cmc: 6.2, learn: 5.8, education: 5.6, community: 5.5) • 8 : 43 : 150筆 : 0.035472(peer: 11.4, peer assessment: 9.8, patchwork: 6.0, formative: 5.6, teach: 3.7) • 21 : 227 : 38筆 : 0.019048(online: 3.4, computer-mediated: 2.6, global: 1.6, esl: 1.6, discussion: 1.4) • 0 : 312 : 27筆 : 0.016155(web-based: 2.3, utilisation: 2.1, web-based learn portfolio: 1.4, hall: 1.4, s science: 1.4) • 6 : 134 : 40筆 : 0.024096(electronic performance support: 4.9, electronic performance support system: 4.2, learn style ...

  21. The Resulting 6 Topics: Topic 3,4 • 3(3): (contain 3 sub-topics) • 22 : 237筆 : 0.025913 (computer attitude: 17.5, gender: 10.3, difference: 9.3, primary: 8.0, teacher: 7.8) • 4 : 169筆 : 0.051091 (computer attitude: 22.0, gender: 18.4, scale: 13.1, computer anxiety: 13.1, attitude scale: 7.6) • 19 : 238 : 130筆 : 0.018484(attitude: 17.9, anxiety: 15.5, computer: 13.4, scale: 13.2, computer anxiety: 12.5) • 9 : 62 : 39筆 : 0.032295(gender: 5.3, attitude to computer: 4.2, collaborative argumentation: 2.1, computer usage: 2.1 ... • 18 : 623 : 68筆 : 0.010435(teacher: 5.5, primary: 4.9, technology: 4.2, ict: 3.6, trainee: 2.3) • 4(3): (contain 3 sub-topics) • 85 : 139筆 : 0.012809 (conceptual: 6.3, support: 3.3, strategy: 3.0, information: 2.5, mobile: 2.5) • 29 : 104筆 : 0.023063 (concept map: 4.4, science: 3.9, conceptual: 3.6, epistemological: 3.4, conceptual change: 2.7) • 11 : 99 : 55筆 : 0.026828(concept map: 5.3, epistemological: 4.0, science: 3.4, misconception: 3.1, conceptual change: 2 ... • 22 : 356 : 49筆 : 0.015135(language: 1.4, mathematic: 1.2, dimension: 1.2, international: 1.1, foreign: 1.1) • 4 : 289 : 35筆 : 0.016644(mobile: 2.9, cube: 2.8, discover: 2.1, data cube technology: 2.1, handheld: 1.7)

  22. The Resulting 6 Topics: Topic 5,6 • 5(2): (contain 2 sub-topics) • 97 : 55筆 : 0.011788 (hypertext: 4.6, courseware: 2.9, evaluate instructional software: 2.8, computer in teach and learn: 2.1, hypertext as instructional-design: 2.1) • 1 : 383 : 28筆 : 0.014572(computer in teach and learn: 2.1, hypertext as instructional-design: 2.1, platform: 2.1, network ... • 2 : 285 : 27筆 : 0.016771(evaluate instructional software: 2.8, courseware: 1.7, educational software: 1.5, hyperbole ... • 6(2): (contain 2 sub-topics) • 51 : 83筆 : 0.018331 (semantic: 7.3, ontology: 6.0, adaptive: 5.7, learn design: 5.5, author: 3.3) • 14 : 55 : 38筆 : 0.033654(learn design: 6.3, semantic: 5.6, ims: 4.1, ontology: 4.1, object: 2.2) • 20 : 554 : 45筆 : 0.011325(adaptive: 4.7, author: 3.1, model: 1.6, environment: 1.3, prototype: 1.1)

  23. Multi-Dimensional Scalingto Draw the Topic Map

  24. Breakdown Trends Main stream topic Dying out topics Hot topics during that period Topic with periodic attraction Promising topics (not yet mature)

  25. Most Productive Authors: 1-3

  26. Most Productive Authors: 4-6

  27. Most Productive Countries: 1-3

  28. Most Productive Countries: 4-6 Taiwan dominates the promising topic (Topic 4). Taiwan does not waste resources on the dying out topic (Topic 5).

  29. Most Cited Authors: 1-3

  30. Most Cited Authors: 4-6

  31. Citation Tracking Analysis • A closer look at • the trends • the topic shift/drift • the knowledge diffusion • the quality of the cited actors (authors, institutes, countries) • author self-citation • co-author self-citation • non-self citation

  32. Procedure • Parse the CR field and match each of the cited reference against the downloaded records • Example problem to be solved: • In 2008, Chang, CK (ID=ISI:000250796900016) cites • TSAI CC, 2002, COMPUT EDUC, V38, P241 • How do we know it is the article of Tsai, CC in 2002 (ID=ISI:000174590300019)?

  33. Step 1: Citation Parsing • TSAI CC, 2002, COMPUT EDUC, V38, P241 • is parsed into • PY=2002 • J9=COMPUT EDUC • VL=38 • BP=241 • AU=Tsai, CC

  34. Step 2: Citation Matching • Use SQL (Structured Query Language) command to retrieve candidates: • SELECT UT, AU FROM Paper where BP=241 and VL=38 and PY=2002 and J9=“COMPUT EDUC” • But there are cases where VL or BP are missing • There are also multiple AU retrieved • There are even misspelling/variation in AU

  35. Step 3: Citation Filtering • Retrieved candidates are verified by matching the AU in Soundex coding • Soundex(DICK)=D200=Soundex(DYCK) • Soundex(MALSRI)=M426=Soundex(MALASRI) • But still cannot resolve the cases: • "DICK J" vs. "DYCK, JL„ • "HALLSELL DJ" vs. "HASSELL, DJ"

  36. Step 4: Citation Linking Self cross citation Self citation Co-author self citation Non-self citation

  37. Step 5: Citation Network Tracking and Visualization • Trace the root of each citation • Call the citation network originated from the root a citation track • Example: • If A cites B, B cites C, C cites D, E cites C • Then D is the root, and D<-C<-{B,E}<-A forms the citation track • Visualize the selected citation tracks based on the publication year of each actor

  38. 1. Students' use of web-based concept map testing and strategies for learning 2. Collaboratively developing instructional activities of conceptual change through the Internet: Science teachers' perspectives 3. … A Study of Learning Styles And Computer-Assisted-Instruction Can computer-aided instruction accommodate all learners equally? Working With The New-Generation Of Interactive Media Technologies In Schools - CD-I And CDTV Advantages, disadvantages, facilitators, and inhibitors of computer-aided instruction in Singapore's secondary schools Could a laptop computer plus the liquid crystal display projector amount to improved multimedia geoscience instruction? Three Citation Tracks

  39. One of the Citation Track of the Most Cited Author: Jonassen, DH Hypermedia and Instruction ETR&D-EDUC TECHNOL RES DEV HYPERVIDEO 1. Hypertext as Instructional-Design 2. Objectivism versus Constructivism - Do We Need a New Philosophical Paradigm Intellectual base of the highly cited paper of Jonassen, DH

  40. Citation Tracks of the Most Cited Author in Topic 6: Koper, R Inscript - a Courseware Specification Language Profil - a Method for the Development of Multimedia Courseware • Representing the learning design of units of learning • Educational modelling language: modelling reusable, interoperable, rich and personalised units of learning • Latent semantic analysis as a tool for learner positioning in learning networks for lifelong learning • New directions for lifelong learning using network technologies • Cueing for transfer in multimedia programmes: process worksheets vs. worked-out examples

  41. Citation Track of the Most Cited Author in Topic 4: Tsai, CC • Developing an Internet Attitude Scale for high school students • A networked peer assessment system based on a Vee heuristic • Students' use of web-based concept map testing and strategies for learning • Collaboratively developing instructional activities of conceptual change through the Internet: Science teachers' perspectives

  42. Conclusions • WoK data are valuable for analysis • Well structured, normalized, and indexed • But still, small amount of errors and inconsistency • The analysis provides background, trend, and citation track information (who, what, when, where) to scaffold new comers to know a scientific field better • A quick overview to novices or young researchers • Help decision making (which topics to explore further) • expert finding (who can be consulted for domain problems and/or for project/paper review) • inspire us to go further with a map and compass

More Related