100 likes | 300 Views
CPUC Workshop on Best Practices & Lessons Learned in Time Variant Pricing TVP Pilot Design and Load Impact M&V. Dr. Stephen George Senior Vice President Nexant June 17, 2014.
E N D
CPUC Workshop on Best Practices & Lessons Learned in Time Variant PricingTVP Pilot Design and Load Impact M&V Dr. Stephen George Senior Vice PresidentNexant June 17, 2014
The gold standard of experimental designs is a randomized control trial (RCT) or randomized encouragement design (RED) • With RCTs, customers are randomly assigned to treatment and control conditions, so the only difference between the groups (except for random chance) is the treatment itself • Impacts are calculated as the difference between usage for the treatment and control groups after the treatment goes into effect minus the difference between the two groups before the treatment goes into effect (to eliminate any small difference due to change) PostTreatment Pretreatment kWh kWh Treatment Group (T) pre post kWh kWh Control Group (C) post pre Impact = Difference-in-difference = (Tpost – Cpost) – (Tpre– Cpre)
An RED has internal validity equal to that of an RCT • With an RED, customers are randomly assigned to two groups and one group (the encouraged group) is offered the treatment and the other (the control group) is not – not everyone offered the treatment takes it • Impacts are calculated in a two step process PostTreatment Pretreatment kWh kWh Group Offered Treatment (T)) pre post Step 2: Local Average Treatment Effect (LATE) Step 1: Intention to Treat Impact kWh kWh Control Group Not Offered Treatment (C) post pre Divide Intention-to-Treatestimate by the % of customers offered the treatment whoaccepted it Intention-to-Treat Impact = Difference-in-difference = (Tpost – Cpost) – (Tpre– Cpre)
RCTs and REDs Have Advantages & Disadvantages • For opt-in programs, RCTs require a “recruit and delay” or “recruit and deny” strategy to eliminate selection bias • Choosing two random groups of customers, offering the treatment to one and comparing usage for those who took it to those not offered it is NOT a valid experimental design (although it is often done) • Utilities are often concerned about customer dissatisfaction with recruit and delay or deny, but these methods have been used very successfully • SMUD’s SPO employed recruit and delay (for two years) for two treatments without any significant customer backlash – and surveys show no difference in customer satisfaction with SMUD between customers who were and were not delayed • PG&E used recruit and deny for its recent Honeywell/Opower PCT pilot • Marblehead Municipal Light used recruit and delay for its TVP pilot • RCTs are much easier to implement for default pilots and programs (e.g., Home Energy Reports) • If opt-out rates are large, RCTs morph into REDs and should be analyzed as a RED -- all technology programs are defacto REDs because of installation issues
RCTs and REDs Have Advantages & Disadvantages • For event based options (e.g., CPP tariffs, load control, etc.), RCTs can effectively fly below the radar as long as it is possible to rotate who is called for each event – PG&E uses a large RCT to estimate impacts with extremely high accuracy for the SmartAC program There are 10 lines here, all extremely similar with each based on a sample of roughly 12,000 SmartAC customers
RCTs and REDs Have Advantages & Disadvantages • REDs are typically much simpler to implement than RCTs because you don’t have to delay or deny recruitment to anyone but they can also be more expensive because of the need for larger sample sizes • If enrollment rates and/or expected impacts are small, quite large sample sizes may be required to estimate the intention-to-treat effect • 10% enrollment with 10% average impacts means you need to estimate a 1% effect, which requires many thousands of observations
What Research Questions Can TVP Pilots Help Answer? • With enough time, money and willpower, most things of interest can be explored through a well constructed pilot • California (and the industry) need several more pilots testing default and opt-in enrollment side-by-side • SMUD and ComEd came out with very different results • If legal, it would be great to implement one or more additional default pilots in the next couple of years as input to developing long term pricing strategy – it is NOT appropriate to extrapolate average impact estimates based on opt-in pilots to default deployments • Most industry pilots have focused more on load impacts than customer acceptance • Additional research on customer preferences for different rate options would be useful • How best to market TVP options can be determined by conducting “mini pilots” within the context of a controlled launch
It is Possible to Conduct “Mini Pilots” or Tests of Marketing Options In the Early Stages of Program Launch It’s possible to conduct dozens of test of marketing options within a few weeks or months of project launch by simultaneously conducting multiple tests on small random samples of customers – this can support a multi-stage cycle of continuous improvement in marketing and cost effectiveness
What Are the Primary Methods to Measure End-Use Conservation and Peak Demand Reduction for TVP Rates? • For existing tariffs and programs, RCTs and REDs typically are not possible • Although, as discussed before, it may be possible to use an RCT for event-based tariffs if notification can be controlled and alternated) • Quasi-experimental methods are typically used to estimate impacts • “Within-subjects” methods estimate impacts based on reference loads for individual customers developed from periods when the tariff is not in effect (e.g., prior to the customer going on the tariff or, for dynamic rates, on “event like” days when events are not called) • Another approach is to develop a comparison (control) group based on statistical matching methods (e.g., propensity score matching) – these methods match participants with non-participants that have similar usage patterns and other observable characteristics during the periods when the tariffs are not in effect and then estimate impacts in the same was as for an RCT or RED • Both methods rely on the availability of suitable pretreatment data – in the absence of suitable data, both approaches can produce biased results • With the right data, statistical matching eliminates the need to model weather and other factors that cause usage to fluctuate from one day to the next
For comments or questions, contact:Stephen GeorgeSenior Vice President, Utility Servicessgeorge@nexant.comNexant, Inc.101 Montgomery St., 15th FloorSan Francisco, CA 94104415-777-0707