420 likes | 535 Views
Automatic Content Analysis of Publications in Education-oriented ICT ( Information and Communication Technologies ). Yuen-Hsien Tseng and Chun-Yen Chang National Taiwan Normal University 2009/12/21. Purposes. Summarizing the background of a research field When, who, where, whom, what
E N D
Automatic Content Analysis of Publications inEducation-oriented ICT (Information and Communication Technologies) Yuen-Hsien Tseng and Chun-Yen Chang National Taiwan Normal University 2009/12/21
Purposes • Summarizing the background of a research field • When, who, where, whom, what • Providing overview of the research topics • Breakdown analysis of various actors • Tracking trends of the knowledge development • Offering evidence-based, data-driven, bottom-up information for panel discussion, strategic planning, or decision making for the field • Suggesting hypotheses for further exploration • Beneficial to both novices and experts
Publications Downloaded from ISI Web of Knowledge Data are downloaded on 2008/05/09. After deleting 38 articles from 1989 and 2008, 4036are used for analysis.
FN ISI Export Format VR 1.0 PT J AUTseng, SC Tsai, CC AF Tseng, Sheng-Chau Tsai, Chin-Chung TIOn-line peer assessment and the role of the peer feedback: A study of high school computer course SOCOMPUTERS & EDUCATION LA English DT Article DEinteractive learning environments; secondary education; learning communities; improving classroom teaching; peer assessment IDWORLD-WIDE-WEB; ASSESSMENT SYSTEM; HIGHER-EDUCATION; STUDENTS; THINKING; SCIENCE; SELF ABThe purposes of this study were to explore the effects and the validity of on-line peer assessment in high schools and … C1Natl Chiao Tung Univ, Inst Educ, Hsinchu 300, Taiwan. Natl Chiao Tung Univ, Ctr Teacher Educ, Hsinchu 300, Taiwan. RP Tsai, CC, Natl Chiao Tung Univ, Inst Educ, 1001 Ta Hsueh Rd, Hsinchu 300, Taiwan. EM cctsai@mail.nctu.edu.tw CRROTH WM, 1997, SCI EDUC, V6, P373 DOCHY F, 1999, STUD HIGH EDUC, V24, P331 … NR23 TC2 PU PERGAMON-ELSEVIER SCIENCE LTD PI OXFORD PA THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB, ENGLAND SN 0360-1315 J9COMPUT EDUC JI Comput. Educ. PD DEC PY2007 VL49 IS 4 BP1161 EP 1174 DI 10.1016/j.compedu.2006.01.007 PG 14 SCComputer Science, Interdisciplinary Applications; Education & Educational Research GA 218OF UTISI:000250024100013 ER ISI WoK Publication Record Only the fields in red color are used. Cited References are used in the bibliographic coupling for topic clustering and citation tracking
Overview Analysis • General trend of ICT research • Most productive authors, institutes, countries • Most cited (influential) references, authors, journals
Overview: No. of Articles Per Year Data are from the PY field of each record.
Most Productive Authors: Top 10 AUTseng, SC Tsai, CC Tseng, SC : 1 Tsai, CC : 1 AUTseng, SC Tsai, CC Tseng, SC : 0.5 Tsai, CC : 0.5 NC=Normal Count: each co-author is counted as a single author FC=Fractional Count: all the co-authors are counted as a single author
Most Productive Institutes: Top 15 Data are from the C1 field of each record: C1Natl Chiao Tung Univ, Inst Educ, Hsinchu 300, Taiwan
Most Productive Countries: Top 10 There are 87 countries in the 4036 papers, There are 4036-3336=700 records without the country information Data are from the C1 field of each record: C1Natl Chiao Tung Univ, Inst Educ, Hsinchu 300, Taiwan
Most Cited References Data are from the CR field of each record: CRROTH WM, 1997, SCI EDUC, V6, P373
Most Cited Authors Data are from the CR field of each record: CRROTH WM, 1997, SCI EDUC, V6, P373
Most Cited Journals Data are from the CR field of each record: CRROTH WM, 1997, SCI EDUC, V6, P373
Breakdown Analysis • What have we been doing for the past 18 years in this field? • What topics are studied in these 4036 publications? • We can analyze them by grouping similar articles into concepts, which in turn can be grouped into topics, ideally.
Bibliographic Coupling • Similarity between two articles can be computed by the ratio of common references they both cites • It is believed that the more the same references two articles cite, the more likely the two articles are about the same topic (concept) • Example: • If X cites 10 references and Y cites 15 references, and if there are 5 common references among them, • then the similarity between X and Y is 2*5/(10+15)=10/25=0.4, or X and Y are bibliographically coupled with a measure of 0.4.
Time D1 D2 Dn D1 Ref. M D2 Dn Ref. 2 Ref. 1 Doc. A Doc. B bibliographic couple Similarity between Articlesfor Clustering M=72723 for the 4036 records Once similarity between each pair of documents is computed, the resulting similarity matrix can be used for clustering (or factor analysis) to identify major knowledge structures underlying these documents.
Topics Concepts Docs. Multi-Stage Clustering A conceptual sketch of the multi-stage clustering approach, where dashed white circles denote outliers.
Cluster Labeling Grouped documents need some concise descriptors/titles to help reveal their contents. By sorting the correlations Co(T,C) of all the terms in cluster C in decreasing order, the top few terms (e.g. top 5 terms) can be selected as the cluster’s descriptors. TP: Number of documents in C containing term T FP: Number of documents not in C, but containing T FN: Number of documents in C, but do not contain T TN: Number of documents not in C, and do not contain T
The Resulting 6 Topics: Topic 1 • 1(5): (contain 5 sub-topics) • 68 : 993筆 : 0.015085 (effect: 35.9, instructional: 30.1, design: 27.6, computer-based: 25.0, multimedia: 24.0) • 7 : 894筆 : 0.050087 (effect: 23.1, program: 22.6, instructional: 19.7, design: 19.1, problem-solving: 17.5) • 1 : 779筆 : 0.085113 (logo: 17.9, effect: 17.2, learn environment: 17.1, program: 15.9, learner control: 14.0) • 10 : 248 : 577筆 : 0.018030(effect: 18.6, learn: 16.3, environment: 14.5, learn environment: 13.6, collaborative learn ... • 15 : 497 : 202筆 : 0.012140(multimedia: 8.8, hypermedia: 8.1, instructional: 5.2, cognitive: 4.8, learn: 4.8) • 17 : 173 : 115筆 : 0.021531(instructional-design: 5.8, instructional design: 3.3, model: 3.3, technology: 3.2, expert ... • 26 : 99筆 : 0.024366 (instruction: 3.2, computer-assisted instruction: 2.8, college: 2.7, computer-based: 2.7, effect: 2.4) • 5 : 396 : 48筆 : 0.014308(instruction: 3.4, computer-assisted instruction: 2.8, computer science: 2.1, teletrain: 2.1 ... • 7 : 32 : 51筆 : 0.037322(acceptance: 2.4, continuance: 2.1, arc: 2.1, self-regulation: 1.6, effectivene: 1.5) ** The 6 topics can be further breakdown into 23 sub-topics.
The Resulting 6 Topics: Topic 2 • 2(5):(contain 5 sub-topics) • 104 : 464筆 : 0.011259 (higher education: 18.3, online: 15.7, communication: 13.2, web-based: 12.1, design: 10.4) • 54 : 424筆 : 0.018029 (peer: 13.7, online: 13.0, higher education: 12.7, communication: 11.0, assessment: 10.3) • 23 : 397筆 : 0.025689 (peer: 17.0, higher education: 16.0, discussion: 11.1, online: 10.4, communication: 8.9) • 5 : 359筆 : 0.050810 (peer: 22.2, assessment: 11.8, higher education: 11.5, peer assessment: 11.0, collaborative learn: 7.6) • 16 : 654 : 209筆 : 0.010010(communication: 7.6, cmc: 6.2, learn: 5.8, education: 5.6, community: 5.5) • 8 : 43 : 150筆 : 0.035472(peer: 11.4, peer assessment: 9.8, patchwork: 6.0, formative: 5.6, teach: 3.7) • 21 : 227 : 38筆 : 0.019048(online: 3.4, computer-mediated: 2.6, global: 1.6, esl: 1.6, discussion: 1.4) • 0 : 312 : 27筆 : 0.016155(web-based: 2.3, utilisation: 2.1, web-based learn portfolio: 1.4, hall: 1.4, s science: 1.4) • 6 : 134 : 40筆 : 0.024096(electronic performance support: 4.9, electronic performance support system: 4.2, learn style ...
The Resulting 6 Topics: Topic 3,4 • 3(3): (contain 3 sub-topics) • 22 : 237筆 : 0.025913 (computer attitude: 17.5, gender: 10.3, difference: 9.3, primary: 8.0, teacher: 7.8) • 4 : 169筆 : 0.051091 (computer attitude: 22.0, gender: 18.4, scale: 13.1, computer anxiety: 13.1, attitude scale: 7.6) • 19 : 238 : 130筆 : 0.018484(attitude: 17.9, anxiety: 15.5, computer: 13.4, scale: 13.2, computer anxiety: 12.5) • 9 : 62 : 39筆 : 0.032295(gender: 5.3, attitude to computer: 4.2, collaborative argumentation: 2.1, computer usage: 2.1 ... • 18 : 623 : 68筆 : 0.010435(teacher: 5.5, primary: 4.9, technology: 4.2, ict: 3.6, trainee: 2.3) • 4(3): (contain 3 sub-topics) • 85 : 139筆 : 0.012809 (conceptual: 6.3, support: 3.3, strategy: 3.0, information: 2.5, mobile: 2.5) • 29 : 104筆 : 0.023063 (concept map: 4.4, science: 3.9, conceptual: 3.6, epistemological: 3.4, conceptual change: 2.7) • 11 : 99 : 55筆 : 0.026828(concept map: 5.3, epistemological: 4.0, science: 3.4, misconception: 3.1, conceptual change: 2 ... • 22 : 356 : 49筆 : 0.015135(language: 1.4, mathematic: 1.2, dimension: 1.2, international: 1.1, foreign: 1.1) • 4 : 289 : 35筆 : 0.016644(mobile: 2.9, cube: 2.8, discover: 2.1, data cube technology: 2.1, handheld: 1.7)
The Resulting 6 Topics: Topic 5,6 • 5(2): (contain 2 sub-topics) • 97 : 55筆 : 0.011788 (hypertext: 4.6, courseware: 2.9, evaluate instructional software: 2.8, computer in teach and learn: 2.1, hypertext as instructional-design: 2.1) • 1 : 383 : 28筆 : 0.014572(computer in teach and learn: 2.1, hypertext as instructional-design: 2.1, platform: 2.1, network ... • 2 : 285 : 27筆 : 0.016771(evaluate instructional software: 2.8, courseware: 1.7, educational software: 1.5, hyperbole ... • 6(2): (contain 2 sub-topics) • 51 : 83筆 : 0.018331 (semantic: 7.3, ontology: 6.0, adaptive: 5.7, learn design: 5.5, author: 3.3) • 14 : 55 : 38筆 : 0.033654(learn design: 6.3, semantic: 5.6, ims: 4.1, ontology: 4.1, object: 2.2) • 20 : 554 : 45筆 : 0.011325(adaptive: 4.7, author: 3.1, model: 1.6, environment: 1.3, prototype: 1.1)
Breakdown Trends Main stream topic Dying out topics Hot topics during that period Topic with periodic attraction Promising topics (not yet mature)
Most Productive Countries: 4-6 Taiwan dominates the promising topic (Topic 4). Taiwan does not waste resources on the dying out topic (Topic 5).
Citation Tracking Analysis • A closer look at • the trends • the topic shift/drift • the knowledge diffusion • the quality of the cited actors (authors, institutes, countries) • author self-citation • co-author self-citation • non-self citation
Procedure • Parse the CR field and match each of the cited reference against the downloaded records • Example problem to be solved: • In 2008, Chang, CK (ID=ISI:000250796900016) cites • TSAI CC, 2002, COMPUT EDUC, V38, P241 • How do we know it is the article of Tsai, CC in 2002 (ID=ISI:000174590300019)?
Step 1: Citation Parsing • TSAI CC, 2002, COMPUT EDUC, V38, P241 • is parsed into • PY=2002 • J9=COMPUT EDUC • VL=38 • BP=241 • AU=Tsai, CC
Step 2: Citation Matching • Use SQL (Structured Query Language) command to retrieve candidates: • SELECT UT, AU FROM Paper where BP=241 and VL=38 and PY=2002 and J9=“COMPUT EDUC” • But there are cases where VL or BP are missing • There are also multiple AU retrieved • There are even misspelling/variation in AU
Step 3: Citation Filtering • Retrieved candidates are verified by matching the AU in Soundex coding • Soundex(DICK)=D200=Soundex(DYCK) • Soundex(MALSRI)=M426=Soundex(MALASRI) • But still cannot resolve the cases: • "DICK J" vs. "DYCK, JL„ • "HALLSELL DJ" vs. "HASSELL, DJ"
Step 4: Citation Linking Self cross citation Self citation Co-author self citation Non-self citation
Step 5: Citation Network Tracking and Visualization • Trace the root of each citation • Call the citation network originated from the root a citation track • Example: • If A cites B, B cites C, C cites D, E cites C • Then D is the root, and D<-C<-{B,E}<-A forms the citation track • Visualize the selected citation tracks based on the publication year of each actor
1. Students' use of web-based concept map testing and strategies for learning 2. Collaboratively developing instructional activities of conceptual change through the Internet: Science teachers' perspectives 3. … A Study of Learning Styles And Computer-Assisted-Instruction Can computer-aided instruction accommodate all learners equally? Working With The New-Generation Of Interactive Media Technologies In Schools - CD-I And CDTV Advantages, disadvantages, facilitators, and inhibitors of computer-aided instruction in Singapore's secondary schools Could a laptop computer plus the liquid crystal display projector amount to improved multimedia geoscience instruction? Three Citation Tracks
One of the Citation Track of the Most Cited Author: Jonassen, DH Hypermedia and Instruction ETR&D-EDUC TECHNOL RES DEV HYPERVIDEO 1. Hypertext as Instructional-Design 2. Objectivism versus Constructivism - Do We Need a New Philosophical Paradigm Intellectual base of the highly cited paper of Jonassen, DH
Citation Tracks of the Most Cited Author in Topic 6: Koper, R Inscript - a Courseware Specification Language Profil - a Method for the Development of Multimedia Courseware • Representing the learning design of units of learning • Educational modelling language: modelling reusable, interoperable, rich and personalised units of learning • Latent semantic analysis as a tool for learner positioning in learning networks for lifelong learning • New directions for lifelong learning using network technologies • Cueing for transfer in multimedia programmes: process worksheets vs. worked-out examples
Citation Track of the Most Cited Author in Topic 4: Tsai, CC • Developing an Internet Attitude Scale for high school students • A networked peer assessment system based on a Vee heuristic • Students' use of web-based concept map testing and strategies for learning • Collaboratively developing instructional activities of conceptual change through the Internet: Science teachers' perspectives
Conclusions • WoK data are valuable for analysis • Well structured, normalized, and indexed • But still, small amount of errors and inconsistency • The analysis provides background, trend, and citation track information (who, what, when, where) to scaffold new comers to know a scientific field better • A quick overview to novices or young researchers • Help decision making (which topics to explore further) • expert finding (who can be consulted for domain problems and/or for project/paper review) • inspire us to go further with a map and compass