100 likes | 114 Views
Applying machine learning techniques to identify financial reporting peer firms through clustering analysis of financial ratios. Enhances fraud detection models.
E N D
Redoing ratio analysis with clustering techniques Kexing Ding Xuan Peng Miklos Vasarhelyi Yunsen Wang
Peerselectionwithclustering • Weapplymachinelearningtechniquesonfinancialratiostoidentifyfinancialreportingpeerfirm. • Financialreportingpeerfirms:firmsthathavesimilarfinancialreportingqualities. • Observation:SICcodesarecommonlyused • Obsolete • Infrequentupdate • Focusonproductionprocess
Financialreportingqualitymodels • Beneish(1997,1999): • Financialreportingqualityisassociatedwitheightratios • Eitherthe accounting distortion or the preconditions that encourage managers to conduct misreporting. • Widelyusedinacademicandinpracticeeversince • Sales in receivables (Receivables/Sales), • Gross Margin (Sales-Costs of Goods Sold/Sales), • Asset Quality (1 - (Current Assets + PPE)/Total Assets), • Sales Growth (Salest/Salest-1), • Depreciation Rate (Depreciation/NetPPE), • SGA Rate (Sales, general, and administrative expenses/Sales), • Leverage (Total Debt/Total Assets), • Accruals (Total Accruals/ Total Assets).
Clustering method • Clustering analysis is one of the data mining methodologies that groups a set of objects in such a way that objects in a same cluster is more similar to each other than to those in the other clusters. • The K-medians algorithm operates on a set (X) of n points. It chooses k centers { from X and form k clusters { that the sum of the distances from each to the center of its clusters ci () is minimized. • An observation with eight financial ratios can be viewed as a point () in an eight-dimensional space. • Everyyear,theclusteringprocedureassignseachpointinoneofthe300clusters.
Results(1) • Clusteringsuccessfullyfindsfirmswithsimilarfinancialratios. • Lowwithin-groupdeviation • Highinter-groupdeviations
Results(2) • Enhanceaccountingfrauddetection model • Dechow(2011)F-scoremodel: • F-score = The predicted probability / the unconditional expectation of accounting fraud • The higher the F-score, the more suspicious • Revised model:
RankthefirmsbasedonF-score: Incorporating firms’ deviation from financial reporting peer firms enhances the ability of F-score to detect fraud firms. Model 1: Dechow (2011) Model Model 2a: Revised Model
1.WerankthefirmsbasedonF-score: The ability of F-score derived from the SIC 4-digit (NAIC 4-digit) classification to detect fraud is similar to the original Dechow et al. (2011) model. . Model 2a: Revised Model with clustering Model 2b and 2c are Revised models with SIC 3-digit and NAIC 4-digit
Type I error is calculated as the percentage of non-fraud observations that are incorrectly predicted as fraud. • Type II error is calculated as the percentage of fraud observations that are incorrectly predicted as non-fraud by the model. • Type1errorreducesfrom40.8%to34.7% • Type2 error reduces from32.4% to28.8%.