270 likes | 413 Views
Control Methods Workshop 2010 1 3-15 April Ispra Italy. Risk based selection for On the Spot Control at Agricultural and Rural Development Agency. Miklós Lelkes Central Physical Control Department. Selection for physical control. Start point: 100 % of claims
E N D
Control Methods Workshop 2010 13-15April Ispra Italy Risk based selectionfor On the Spot Controlat Agricultural and Rural Development Agency Miklós Lelkes Central Physical Control Department Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Selection for physical control • Start point: 100 % of claims • Selection of control sample(cost effectiveness) (e.g. min. 5% control rate) • Random selection: 20-25% (representative overview) • Risk analysis: 75-80% (financial risk of EU) • Direct selection • To have effective control methods Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Defining the risk factors and weighting • First year(s) • Based on expert appraisal • Evaluation of the results of the Control • update of the selection method • Changes in the category limits • Changes in the scoring • Changes in the weighting Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Definition of a risk factor (example) small / large cases are with a higher risk Risk score Average parcel size Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Evaluation of the effectiveness of the risk analysis (2007) • ~ 241.000 claims • For > 80 measures • 4th year of Hungary/ARDA in the EU • Huge amount of information in the IACS • Need for special solution for deriving information from the DB • Data mining software/technique Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
The goal of Data Mining ? • Not to find the perfect model for a certain problem • but to find the optimal model for a certain problem that is: • Robust • Generalizes well • Easy to understand • Provides insight into drivers of the problem • Easy to implement Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Risk Analysis#1 Create an abstract mathematical model that behaves coherently with regard to risky farmers. Generalize risk patterns for automatic detection • Identify typical patterns • Score applications for probability of non-compliance Model 1 Model 2 training validation Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Risk Analysis#2 • Modeling techniques: • Predictive Modeling: • Decision Trees • Neural Networks • Regression • Scorecards • Pre-requirements: • confirmed historical cases as input data (non-compliance flag) • Result: Risk score Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Data mining /estimation models % of anomaly as a function of sampling rate (with population ordered by decreasing rate of anomaly) Average rate of anomalies Target variable: Errors (bad / good) Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Data mining with SAS in Hu • Pilot in 2007 • Selection of OTSC sample of SAPS by data mining • Operational from 2008 • All area and animal based subsidies • 2nd year of operational work in 2009 • Extend use to all measures Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Interesting first results (2007) • Less categoriesin the factors • Some criterion were not relevant!(but in 2007 they were used because of the regulation) High risk = low score Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Operational work from 2008 • The Integrated Risk Analysis System includes: • Interface to • IACS System (direct access) • Hungarian ovine and caprine I&R System • Hungarian bovine I&R System • Hungarian porcine I&R System • Farmers Registry • Risk datamart • Analytical models (approx. 15 analytical models) • Area based measures (e.g. SAPS); • Agricultural Environment Protection Program based measures; • Rural Development (investment projects) based measures Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Architecture ofIntegrated Risk Analysis System Place of operational work (system for transactions) Utilization Datamart Risk management Datawarehouse Datamining Statistical analysis Extraction IACS data quality integration transformation Webreporting Source Information for management External Uniform handling of metadata Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Models • Decision tree • Easy to understand (if not too complicated) • Big groups • Regression • Neural network • Normally the best prediction result • The result can not be interpreted (“black box”) • Scorecard Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Scorecard • Easy to work with, can be calculated “by hand” • Easy to interpret • Good for non linear variables (i.e. age) • Not a problem, if the variable has strange distribution • Sometimes can result in big groups • Not a group of factors, but a uniform model! Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Data Mining Model Training and Scoring • Converting a complex control result in a binary format (black or white) • Should be defined by PA Target Variable Analysis Score Rules (Score Code) Prediction Scoring Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Operational work from 2008 SAS® Enterprise Guide SAS® Enterprise Miner 5 • The Integrated Risk Analysis System includes (cont.): • Scoring lists • Evaluation statistics (Random and Risk based) • OLAP reports • Ad-hoc reporting and analysis interface Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Results • Analysis of results (e.g.: interpretation of scorecards) • Model documentation • Automatic report in Enterprise Miner • Auditable documentation of data mining process • Documentation of all selection procedures • Review of model quality • Statistics for EU (OLAP cubes) Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Risk factors found relevant by data mining Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Risk factors found relevant by data mining Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Results by mean of efficiency • Cumulative lift: 1.5-4times higher compared torandom sample • Fine tuning of criterionand weighting (0.2-3 lift increase) • Selection of variables (0.1-1 lift increase) • Proposal of new variables to include (0.1-0.8 lift increase) • Global optimization of cross-validation effects(0.1-0.5 lift increase) Lift = ratio of anomalies in risk sample over random sample Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Some objectives found by OTSC • The quality of application could be better level of irregularities in the random sample still high ?(although it is better from year to year) • Higher control rate? better risk assessment (in terms of % of anomaly)! • Rate of irregularities in classical field inspection sample is significantly higher than in RS sample (research needed) • Difference in technique • Difference in selection / population Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Sliding window Blocks farmers risk calculation per window, other info Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Various shapes 30 x 30, 30 x 42, 10 x 30, 30 x 10, 10 x 10… Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Efficiency Net over gross ratio constraint may be opposite to risk? Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Conculsionfor the effective risk analysis • Worth to use data mining solution • Yes, we need RS, but the level of anomalies in RS sample is a question • Better not to use all farms in the RS zone • 20-25% use of VHR? • Site shape? • Quota for MS? • Cost of data mining solution? • Expensive • But cheaper, than a flat rate correction! Agricultural and Rural Development Agency (ARDA) Budapest, Hungary
Thank you for your attention! Agricultural and Rural Development Agency H-1095Soroksári út 22-24. www.mvh.gov.hu Central PhysicalControl Department Tel.: + 36 1 301-2409 Fax:+ 36 1 301-2444 E-mail: lelkes.miklos@mvh.gov.hu