100 likes | 112 Views
This update provides a detailed history of recent extract issues, including root causes, impact on data extracts, and mitigation measures taken. It also highlights recent extract issues and their resolutions.
E N D
Enterprise Information ServicesData Extracts DEWG 7/30/07
Agenda • June, 2007 • History • Root Cause • Impact • Mitigation • Recent Extract Issues
History • 6/12/07-6/16/07 Lodestar replication lag in 33 hour range • 6/17/07 Lodestar replication lag jumped to 62 hours • 6/17/07 Team conf calls on Sunday to address issue • 6/17/07 Shareplex ports stopped and restarted • 6/18/07 Server rebooted, DB restarted; lag improved • 6/19/07 2:00 a.m. Extreme slowdown experienced with replication of Statement Interface table • 6/19/07 12:30 p.m. Shareplex deadlock/rollback; cost about 10 hours • 6/19/07 contacted Quest Shareplex support; changed queue size and max-time parameters • 6/19/07 4:50 p.m. migrated Statement Interface table patch into iTest • 6/19/07 8:15 p.m. migrated Statement Interface table patch into production • 6/20/07 6:00 a.m. reached peak lag of 79 hours • 6/21/07 4:00 a.m. Shareplex deadlock/rollback; cost about 15 min of lag • 6/21/07 5:00 p.m. Shareplex coredump; contacted Quest Shareplex support, updated to single-thread from multi- thread • Have not experienced any deadlock/rollback issues since 6/21/07 • 6/27/07 caught up on data extracts day-for-day
Root Cause • Quest Shareplex deadlock/rollback issues; still working with Quest and analyzing why these occurred • Log review indicates first 5/21/07 • Contributors to lag are partition maintenance on 6/14/07, running stats on 6/16/07 and downing the database on 6/18/07
Impact • 1-3 day delay in data extracts that include data from Lodestar • Refer to Market Notice M-A061507-09 for details • Caught up on extract runs; day-for-day today • As of 6/28/07, we are not experiencing any Lodestar lag
Mitigation • Worked with Quest Shareplex support to make parameter changes • Implemented Statement Interface table patch into production • Looking at distributing replication load further among the Shareplex ports • Analyzing replication trigger logic inside the ODS
Recent Extract Issues • Issue: Some Load Extracts and Settlements and Billing Extracts posted on 6/20/07 are missing data due to the extracts being created prior to completion of the parsing of interval data records. • Notice Date: 6/29/07 • Market Notice: M-B062607-04 • Root Cause: Utilized Shareplex “multithreading” replication option which resulted in deadlock/rollback issues causing out of syncs. • Resolution: Performed compare repair for affected date range. • Issue: Resettlement Statements for Operating Days 6/6/06 and 6/7/06 were posted on 5/31/07 and 6/1/07 respectively. The Settlement and Billing extracts corresponding to these operating days do not contain the most current data. • Notice Date: 7/2/07 • Market Notice: W-A062907-02 • Root Cause: Applied a backout and change from a prior batch run after the follow-on batch run had completed. • Resolution: Created the Settlements and Billing Extracts for the two affected Operating Days and posted these files to TML on 6/29/07. Determine solution for backing out and changing one batch date after the next batch date has ran.
Recent Extract Issues • Issue: Missing records in the ESIID Service History and Usage Extract for data records with ADDTIMES/TIMESTAMPS from 4/20/07 to 6/12/07. • Notice Date: 7/3/07 • Market Notice: R-A060607-06 • Root Cause: Coding issue. • Resolution: Once lag was reduced, processed supplemental files and posted on TML on 7/3/07. Coding correction to address initial issue was implemented on 6/12/07. • Issue: Load Extracts posted Friday, 6/29/07-7/1/07 contained data previously posted due to job scheduling error. • Notice Date: 7/3/07 • Market Notice: M-B070207-02 • Root Cause: Related to W-A062907-02; neglected to manually switch the date back after the date had ran. • Resolution: Established a procedure for accounting for manual changes for manipulating range data set. Determine solution for backing out and changing one batch date after the next batch date has ran. • Issue: Some Settlements and Billing Extracts posted on 6/10/07 and 7/4/07 were missing data. • Notice Date: 7/16/07 • Market Notice: W-A071207-03 • Root Cause: Caused by duplicate and unparsed interval data related to Shareplex “multi-threading” replication option. • Resolution: Both extracts were reposted. Only “single threading” replication option currently utilized.
Recent Extract Issues • Issue: Bids and Schedules Extracts were missing data for 7/8/07-7/9/07. • Notice Date: 7/17/07 • Market Notice: W-A071307-03 • Root Cause: Server was down due to the Retail/Wholesale planned outage on Sunday, 7/8/07. Job did not run. • Resolution: Attention to detail of planned outage activities. • Issue: Additional Load, Generation and Settlements and Billing Extracts missing data. • Notice Date: 7/23/07 • Market Notice: M-A072307 • Root Cause: Caused by duplicate and unparsed interval data related to Shareplex “multi-threading” replication option. • Resolution: Both extracts were reposted. Only “single threading” replication option currently utilized. • Issue: Issue with delivery mechanism for extracts and reports • Notice Date: 7/23/07 • Market Notice: M-B072007-03 • Root Cause: Files not delivered to MIR • Resolution: ERCOT implemented work-around for the MID/MIR issue. Plan to perform additional analysis on current architecture.
Recent Extract Issues • Issue: MIMO Exceptions and Mapping Status Reject Extracts for 7/18/07 and 7/19/07 have not posted. • Notice Date: 7/23/07 • Market Notice: R-B072307-01 • Root Cause: Related to 7/21/07 and 7/22/07 maintenance outage. • Resolution: Pending ETS Project and manual workarounds. • Issue: Impact to the SCR 727 ESIID Service History and Usage Extract load order number caused by an internal work-around. • Notice Date: 7/27/07 • Market Notice: M-B072007-04 • Root Cause: Initial cause of rerun was due to MIR issue in M-B072007-03. • Resolution: Rerun process for SCR 727 is being modified.