520 likes | 724 Views
Object-Oriented Reengineering Patterns and Techniques. Wahyu Andhyka Kusuma, S.Kom kusuma.wahyu.a@gmail.com 081233148591. M ateri 5 Problem Detection. Topik. Metrics Object-Oriented Metrics dalam Praktek Duplikasi k ode. Topik. Metrics Kualitas dari Perangkat Lunak
E N D
Object-Oriented Reengineering Patterns and Techniques Wahyu Andhyka Kusuma, S.Kom kusuma.wahyu.a@gmail.com 081233148591 Materi 5 • Problem Detection
Topik • Metrics • Object-Oriented Metrics dalam Praktek • Duplikasi kode
Topik • Metrics • Kualitas dari Perangkat Lunak • Menganalisa Kecenderungan • Object-Oriented Metrics dalam Praktek • Duplikasi kode
Mengapa menggunakan OO dalam Reengineering? • Menaksir kualitas dari perangkat lunak • Komponen mana yang memiliki kualitas yang buruk? (sehingga dapat di reengineering) • Komponen yang mana memiliki kualitas yang baik? (sehingga dapat di reverse engineered) Metrics sebagai peralatan untuk reengineering • Mengontrol proses dari reengineering • Menganalisa kecenderungan : • Komponen mana yang bisa diubah?? • Bagian refactoring mana yang dapat digunakan? Metrics sebagai peralatan reverse engineering!
ISO 9126 Quantitative Quality Model Functionality Error tolerance Reliability Accuracy defect density Efficiency = #defects / size Software Consistency Quality Usability correction time Simplicity Maintainability correction impact Modularity Portability = #components changed ISO 9126 Factor Characteristic Metric
Product & Process Attributes Process Attribute Product Attribute Definisi : Mengukur aspek dari Definisi : Mengukur aspek dari Proses dimana memproduksi produk Hasil yang dikirimkan ke pelanggan Contoh : waktu untuk memperbaiki, Contoh : Jumlah dari sistem kerusakan jumlah dari komponen Yang dirubah per perbaikan Yang rusak, mempelajari tentang sistem
External & Internal Attributes Internal Attribute External Attribute Definisi : mengukur didalam Definisi : mengukur bagaimana Istilah didalam produk Memisahkan FORM, dalam konteks behaviour product/process berjalan dalam environment Contoh : class coupling dan Contoh : waktu rata-rata dalam cohesion, method size kesalahan, #components changed
Metrik dan Pengukuran • Weyuker [1988] mendefinisikan sembilan properti dimana Metrik software harus diambil • Untuk OO hanya 6 properti yang sangat penting [Chidamber94, Fenton & Pfleeger ] • Noncoarseness: • Diberikan sebuah Class P dan sebuak metrik m, kelas lain misal Q juga dapat ditemukan sehingga menjadi m(P) m(Q) • Tidak semua kelas memiliki nilai yang sama untuk metrik • Nonuniqueness. • Dimana kelas P dan Q memiliki ukuran tetap sedemikian sehingga m(P) = m(Q) • Dua kelas dapat memiliki metrik yang sama • Monotonicity • m(P) m (P+Q) dan m(Q) m (P+Q), P+Q adalah “kombinasi” dari kelas P dan Q.
Metrik dan Pengukuran • Design Details are Important • Inti utama dari Class harus mempengaruhi nilai dari metrik. Setiap class melakukan aksi yang sama dengan detailnya harus memberikan dampak terhadap nilai dari metrik. • Nonequivalence of Interaction • m(P) = m(Q) m(P+R) = m(Q+R) dimana R interaksi dengan Class • Interaction Increases Complexity • m(P) + (Q) < m (P+Q). • Dimana dua class digabungkan, interaksi diantaranya juga akan menambah nilai dari metrik • Kesimpulan: Tidak semua pengukuran berupa Metrik
Memilih Metrik • Cepat • Scalable: Kita tidak dapat menghasilkan log(n2) dimana n 1 juta LOC (Line of Code) • Tepat • (misalnya #methods — perhitungkan semua method, public, juga inherited?) • Bergantung pada kode • Scalable: Kita menginginkan mengumpulkan metrik dalam waktu sama • Sederhana • Metrik yang komplek sulit untuk diterjemahkan
Menaksir kemudahan perbaikan • Ukuran dari sistem, termasuk entitas dari sistem • Ukuran Class, Ukuran method, inheritance • Ukuran entitas mempengaruhi maintainability • Kesatuan dari entities • Class internal • Perubahan harusnya ada dikelas tersebut • Coupling (penggabungan)diantara entitas • Didalam inheritance: coupling diantara class-subclass • Diluar inheritance • Strongcoupling mempengarui perubahan di kelas tersebut
Inherit Class BelongTo Invoke Attribute Method Access Sample Size and Inheritance Metrics Class Size Metrics # methods (NOM) # instance attributes (NIA, NCA) # Sum of method size (WMC) Inheritance Metrics hierarchy nesting level (HNL) # immediate children (NOC) # inherited methods, unmodified (NMI) # overridden methods (NMO) Method Size Metrics # invocations (NOI) # statements (NOS) # lines of code (LOC)
Sample class Size • (NIV) • [Lore94] Number of Instance Variables (NIV) • [Lore94] Number of Class Variables (static) (NCV) • [Lore94] Number of Methods (public, private, protected) (NOM) • (LOC) Lines of Code • (NSC) Number of semicolons [Li93] number of Statements • (WMC) [Chid94] Weighted Method Count • WMC = ∑ ci • where c is the complexity of a method (number of exit or McCabe Cyclomatic Complexity Metric)
Hierarchy Layout • (HNL) [Chid94] Hierarchy Nesting Level , (DIT) [Li93] Depth of Inheritance Tree, • HNL, DIT = max hierarchy level • (NOC) [Chid94] Number of Children • (WNOC) Total number of Children • (NMO, NMA, NMI, NME) [Lore94] Number of Method Overridden, Added, Inherited, Extended (super call) • (SIX) [Lore94] • SIX (C) = NMO * HNL / NOM • Weighted percentage of Overridden Methods
Method Size • (MSG) Number of Message Sends • (LOC) Lines of Code • (MCX) Method complexity • Total Number of Complexity / Total number of methods • API calls= 5, Assignment = 0.5, arithmetics op = 2, messages with params = 3....
Sample Metrics: Class Cohesion • (LCOM) Lack of Cohesion in Methods • [Chidamber 94] for definition • [Hitz 95] for critique Ii = set of instance variables used by method Mi let P = { (Ii, Ij ) | Ii Ij = } Q = { (Ii, Ij ) | Ii Ij } if all the sets are empty, P is empty LCOM = |P| - |Q| if |P|>|Q| 0 otherwise • Tight Class Cohesion (TCC) • Loose Class Cohesion (LCC) • [Bieman 95] for definition • Measure method cohesion across invocations
Sample Metrics: Class Coupling (i) • Coupling Between Objects (CBO) • [Chidamber 94a] for definition, • [Hitz 95a] for a discussion • Number of other classes to which it is coupled • Data Abstraction Coupling (DAC) • [Li 93] for definition • Number of ADT’s defined in a class • Change Dependency Between Classes (CDBC) • [Hitz 96a] for definition • Impact of changes from a server class (SC) to a client class (CC).
Sample Metrics: Class Coupling (ii) • Locality of Data (LD) • [Hitz 96] for definition LD = ∑ |Li | / ∑ |Ti | Li = non public instance variables + inherited protected of superclass + static variables of the class Ti = all variables used in Mi, except non-static local variables Mi = methods without accessors
The Trouble with Coupling and Cohesion • Coupling and Cohesion are intuitive notions • Cf. “computability” • E.g., is a library of mathematical functions “cohesive” • E.g., is a package of classes that subclass framework classes cohesive? Is it strongly coupled to the framework package?
Conclusion: Metrics for Quality Assessment • Can internal product metrics reveal which components have good/poor quality? • Yes, but... • Not reliable • false positives: “bad” measurements, yet good quality • false negatives: “good” measurements, yet poor quality • Heavyweight Approach • Requires team to develop (customize?) a quantitative quality model • Requires definition of thresholds (trial and error) • Difficult to interpret • Requires complex combinations of simple metrics • However... • Cheap once you have the quality model and the thresholds • Good focus (± 20% of components are selected for further inspection) • Note: focus on the most complex components first!
Topik • Metrics • Object-Oriented Metrics dalam Praktek • Detection strategies, filters and composition • Sample detection strategies: God Class … • Duplikasi kode
Detection strategy • A detection strategy is a metrics-based predicate to identify candidate software artifacts that conform to (or violate) a particular design rule
Filters and composition • A data filter is a predicate used to focus attention on a subset of interest of a larger data set • Statistical filters • I.e., top and bottom 25% are considered outliers • Other relative thresholds • I.e., other percentages to identify outliers (e.g., top 10%) • Absolute thresholds • I.e., fixed criteria, independent of the data set • A useful detection strategy can often be expressed as a composition of data filters
God Class • A God Class centralizes intelligence in the system • Impacts understandibility • Increases system fragility
Feature Envy • Methods that are more interested in data of other classes than their own [Fowler et al. 99]
Data Class • A Data Class provides data to other classes but little or no functionality of its own
Shotgun Surgery • A change in an operation implies many (small) changes to a lot of different operations and classes
Topik • Metrics • Object-Oriented Metrics dalam Praktek • Duplikasi kode • Detection techniques • Visualizing duplicated code
Kode di salin Contoh dariMozilla Distribution (Milestone 9) Diambil dari /dom/src/base/nsLocation.cpp
Berapa banyak kode diduplikasi? Biasanya diperkirakan:8 hingga12% dari kode
Apa itu duplikasi kode? • Duplikasi kode = Bagian dari kode program ditemukan ditempat lain dalam satu sistem yang sama • Dalam File yang berbeda • Dalam File sama tapi Method berbeda • Dalam Method yang sama • Bagian tersebut harus memiliki logika atau struktur yang sama sehingga dapat diringkas,
Permasalahan dari duplikasi • Biasanya memberikan efek negatif • Penggelembungan kode • Efek negatif ketika perbaikan sistem atau software • Menyalin menjadi kerusakan tambahan dalam kode • Software Aging, “hardening of the arteries”, • “Software Entropy” increases even small design changes become very difficult to effect
Mendeteksi duplikasi kode • Nontrivial problem: • No a priori knowledge about which code has been copied • How to find all clone pairs among all possible pairs of segments?
Simple Detection Approach (i) • Assumption: • Code segments are just copied and changed at a few places • Noise elimination transformation • remove white space, comments • remove lines that contain uninteresting code elements • (e.g., just ‘else’ or ‘}’) … //assign same fastid as container fastid = NULL; const char* fidptr = get_fastid(); if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ]; … fastid=NULL; constchar*fidptr=get_fastid(); if(fidptr!=NULL) intl=strlen(fidptr) fastid = newchar[l+]
Simple Detection Approach (ii) • Code Comparison Step • Line based comparison (Assumption: Layout did not change during copying) • Compare each line with each other line. • Reduce search space by hashing: • Preprocessing: Compute the hash value for each line • Actual Comparison: Compare all lines in the same hash bucket • Evaluation of the Approach • Advantages: Simple, language independent • Disadvantages: Difficult interpretation
A Perl script for C++ (ii) • Handles multiple files • Removes comments • and white spaces • Controls noise (if, {,) • Granularity (number of lines) • Possible to remove keywords
Output Sample Lines: create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pnMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); create_property(pd,pnOwnership,stBool,true,*iOwnership); Locations: </face/typesystem/SCTypesystem.C>6178/6179/6180/6181/6182 </face/typesystem/SCTypesystem.C>6198/6199/6200/6201/6202 Lines: create_property(pd,pnSupertype,stReference,true,*iSupertype); create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); Locations: </face/typesystem/SCTypesystem.C>6177/6178 </face/typesystem/SCTypesystem.C>6229/6230 Lines = duplicated lines Locations = file names and line number
Enhanced Simple Detection Approach • Code Comparison Step • As before, but now • Collect consecutive matching lines into match sequences • Allow holes in the match sequence • Evaluation of the Approach • Advantages • Identifies more real duplication, language independent • Disadvantages • Less simple • Misses copies with (small) changes on every line
Abstraction • Abstracting selected syntactic elements can increase recall, at the possible cost of precision
Metrics-based detection strategy • Duplication is significant if: • It is the largest possible duplication chain uniting all exact clones that are close enough to each other. • The duplication is large enough.
Automated detection in practice • Wettel [ MSc thesis, 2004] uses three thresholds: • Minimum clone length: the minimum amount of lines present in a clone (e.g., 7) • Maximum line bias: the maximum amount of lines in between two exact chunks (e.g., 2) • Minimum chunk size: the minimum amount of lines of an exact chunk (e.g., 3) Mihai Balint, Tudor Gîrba and Radu Marinescu, “How Developers Copy,” ICPC 2006
Visualization of Duplicated Code • Visualization provides insights into the duplication situation • A simple version can be implemented in three days • Scalability issue • Dotplots — Technique from DNA Analysis • Code is put on vertical as well as horizontal axis • A match between two elements is a dot in the matrix
Visualization of Copied Code Sequences Detected Problem File A contains two copies of a piece of code File B contains another copy of this code Possible Solution Extract Method All examples are made using Duploc from an industrial case study (1 Mio LOC C++ System)
Visualization of Repetitive Structures Detected Problem 4 Object factory clones: a switch statement over a type variable is used to call individual construction code Possible Solution Strategy Method
Visualization of Cloned Classes Class A Class B Detected Problem: Class A is an edited copy of class B. Editing & Insertion Possible Solution Subclassing … Class A Class B