140 likes | 279 Views
CENTRIA. Representing a Computer Science Research Organization on the ACM Computing Classification System. Boris Mirkin School of Computer Science and Information Systems , Birkbeck College, University of London , United Kingdom Susana Nascimento and Luís Moniz Pereira
E N D
CENTRIA Representing a Computer Science Research Organization on the ACM Computing Classification System Boris Mirkin School of Computer Science and Information Systems , Birkbeck College, University of London, United Kingdom Susana Nascimento and Luís Moniz Pereira Computer Science Department and Centre for Artificial Intelligence (CENTRIA) Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa Portugal
Motivation: an Objective Portrayal of Research Organisation as a Whole • Overview the structure of scientific subjects being developed in the organisation. • Position the organisation over the ACM-CCS ontology. • Assessing scientific subjects not fitting well to ACM-CCS • these are potentially the growth points or other breaking throughdevelopments. • Planning research restructuring and investment. • Overview of scientific field being developed in a country, with a quantitative assessment of controversial areas • e.g. the level of activity is not sufficient or the level of activities excesses the level of results.
CS B E F A C D K I J H G ACM-CCS: Classification 1998 - level 1 • G. Mathematics of Computing • H. Information Systems • I. Computing Methodologies • J. Computer Applications • K. Computing Milieux • A. General Literature • B. Hardware • C. Comp. Sys. Organization • D. Software • E. Data • F. Theory of Computation
Cluster-Lift Method • Express Research Activities of CS Organization (RAO) as a set of CLUSTERS of ACM-CCS Subjects • Captures RAO in a straightforward way • No information away about individual members or teams • Can be implemented on different levels of the taxonomy • Needs good clustering tecniques • MAP individual clusters to ACM-CCS and GENERALISE them • A new approach • Extendable to other ontologies and activities
Generic Survey Output: fuzzy memberships over all subjects in 3rd Layer of ACM-CCS
Fuzzy Similarity between ACM-CCSSubjects • Contribution by a respondent • [f(i)] – membership vector over all subjects iin 3rd layer of ACM-CCS from the survey. • A(i,j)=f(i)f(j), the product, for all ACM-CCS 3rd layer subjects iand j. • Matrices A(i,j)summed up over all individuals weighted according to their span ranges. • Fuzzy similarity measure between two ACM-CCS subjects • measure is proportional to the number and importance of research activitives in both subjects (details can be presented).
Bulding Overlapping Subject Clusters • Additive Clustering with Iterative Extraction (ADDI-S) • Given the similarity matrix, the additive clustering problem is of finding one-by-one of K clusters and their intensity weights that minimize the sum of squared errors. • Interpretable parameters of cluster intensity and its contribution to the explanation of the data scatter. • Leads to tight clusters • A subject ibelongs to a cluster S in case its similarity is higher than half of the average similarity within the clusterS; • Subject i is also well separated from the rest, because for each entity jS, its average similarity with S is less than that. • Computationally feasible.
CS GeneralisingSubject Clusters mapped onto ACM-CCS: good and bad cases • Blue cluster is tight, all topics are in one ACM-CCSsubject. • Red cluster is dispersed over many ACM-CCSsubjects.
Elementary Structures The set of subject clusters, their ‘head subjects’, ‘gaps’ and ‘offshoots’ constitutes what can be calledthe profile of the organization under study. The total count of ‘head subjects’, ‘gaps’, and ‘offshoots’, each type weighted accordingly, can be used for scoring the extent of the fit between a research grouping and the ontology. Liftinga Subject Cluster onto the Ontology
Parsimonious Lifting of Subject Cluster onto ACM-CCS • Plural Solutions: which one is better? • Mapping (B) is better than (A) if ‘gaps’ are much cheaper than additional ‘head subjects’.
Real Case Study: 2006 Survey of CS of FCT-Universidade Nova de Lisboa Survey conducted in our Department in 2006 Participation 30 individuals Each one suppliedthree ACM-CCS 2nd level topics 26 of 59 topics at ACM-CCS 2nd level are covered Additive clustering algorithm ADDI-S Six subject clusters found cl1= {F1, F3, F4, D3}(contribution 27.08%) cl2= {C2, D1, D2, D3, D4, F3, F4, H2, H3, H5, I2, I6}(contribution 17.34%) cl3= {C2, C3, C4}(contribution 5.13%) cl4= {F4, G1, H2, I2, I3, I4, I5, I6, I7}(contribution 4.42%) cl5= {E1, F2, H2, H3, H4}(contribution 4.03%) cl6= {C4, D1, D2, D4, K6}(contribution 4.00%)
Profile of DI-FCT-UNL (2006 Survey) E1 E2 E3 E4 E5 G1 G2 G3 G4 K1 K2 K3 K4 K5 K6 K7 K8 I Head subject Offshoot Gap I. Computing Methodologies A E B G J K CS I1 I2 I3 I4 I5 I6 I7 D. Software D. Software and H. Information Systems H. Information Systems D C H F C. Computer Systems Organization D. Software and H. Information Systems F. Theory of Computation
Analysis The most contributing cluster with head subject ( ) ‘Theory of Computation’ comprises a very tight group; The next contributing cluster has two head subjects ( ) D. Software and H. Information Systems, and several offshoots among the other head subjects, indicating that this cluster should be the structure underlying a certain unity of the department; There are only 3 offshoots outside the department’s head subjects. E1. Data Structures from H. Information Systems; G1. Numerical Analysis from I. Computing Methodologies; K6. Management of Computing and Information Systems from D. Software as all them seem natural, they potentially could be updated in the list of collateral links of theACMontology. I A E B G J K CS D C H F