1 / 46

Data Archiving @ SAP

Data Archiving @ SAP. Axel Herbst Performance, Data Management & Scalability SAP AG. The Life Cycle of SAP Data. DATABASE. FILE SYSTEM. S T O R A G E S Y S T E M. t. Data access. Deletion. Residence time. Audits. Business complete. Non changeable. Creation. Importance. t. 20. 0.

favian
Download Presentation

Data Archiving @ SAP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Archiving @ SAP Axel HerbstPerformance, Data Management & ScalabilitySAP AG

  2. The Life Cycle of SAP Data DATABASE FILE SYSTEM S T O R A G E S Y S T E M t Dataaccess Deletion Residencetime Audits Businesscomplete Nonchangeable Creation Importance t

  3. 20 0 1 2 3 4 5 10 15 Legal Compliance - Record Retention Periods 2 years after commercial release - Records relating to the manufacturing, processing, packing of food Pharmac./ Life Sci. (21 CFR Part 11) 3 years after distribution - Records relating to the manufacturing, processing, packing of drugs and pharmaceuticals 5 years after end of manufacturing or product - Records relating to the manufacturing of biological products 5 years - All hospitals must retain records in originally or legally produced form Healthcare (HIPAA) 21 years+ (perhaps for life) - Medical records for minors from birth to 21 2 years after patient’s death - Medical records 3 years - Financial statements Financial Services (SEC 17a-4) End-of-life of enterprise - Member registration for broker/dealers End of account +6 years - Trading account records 30 years after completion of audit - Employee and medical records of individuals exposed to toxic substances OSHA Sarbanes Oxley 4 years after completion - Original correspondences from financial audits or publicly-traded corporations Minimum Retention Period on Compliant Media (Years) Source: Enterprise Storage Group, May 2003

  4. Front End Application (e.g. Word) Architecture of an ABAP-Based SAP Component SAP GUI Web Server SAP GUI SAP GUI SAP GUI SAP GUI SAP GUI SAP GUI Gateway Dispatcher Dispatcher Gateway Shared Shared Memory Memory DIA BTC SPO DIA ENQ Enqueue and and WP WP WP WP WP Table Buffers Buffers DBMS DBWP DBWP DBWP DBWP Database

  5. Size of Database Survey Results from the American SAP User Group ASUG (Archiving Track) 1999 2002 12 10 8 8 7 7 6 5 5 4 5 2 1 1 0 < 300 GB 300-500 500-700 700-999 1.0-1.5 TB >1.5 TB GB GB GB

  6. Today's requirements to RDBMS in SAP Systems • Requirements concerning business objects • Several TB net data in RDBMS • Size of a business object between some KB and some 100 MB • Typically at least 3 search criteria • Business partner • Material/product/service/... • Date • Need to access business objects for several (even 10+) years • Requirements concerning OLTP • 1000 database transactions per second • 50 MB/s write access to database files • Database transactions change hundreds of tables (dozens of business objects) • Recovery in less than 15 minutes • 7*24 hours (2 hours planned down time a month)

  7. System Availability - Downtime Costs Per Hour Source: Storage Magazine

  8. Mirrors20 GB Data Records10 GB System Copies80 GB Compression to 20 % Archiving 2 GB 2 GBBackup to Tape Total savings in this scenario: 97.5% Use of Resources - Storage Savings

  9. Long-Term Issues • Costs • Efficient archive format: compact, indexable • Inexpensive storage system/media • Easy administration (low TCO) • Long-term reuse • Appropriate search/query capability to find archived data • Readability of archive • Replaceable/maintainable/migrateable hardware and software components • Interpretability (self-contained and self-describing format, schema) • Application connectivity (open interface/protocol) • Application integration • Enough semantics/context for future use • Archival of referenced documents • Security • Write once guarantee (-> for legal, tax and audit reasons) • Authorization, authentication, nonrepudiation, privacy, encryption • Operating (archiving in production environment) • Mass data transfer from DB to archive • Automation, Monitoring, ...

  10. DB-Platz Zeit Zusammenfassung: Datenarchivierung – Warum? Kontrolliertes Wachstum der Datenbank • Kosteneinsparungen • geringere Hardwarekosten • geringere Verwaltungskosten • Bessere Performance • schnelleres Sichern und Wiederherstellen • schnelleres Erstellen der CBO-Statistiken • schnellere Release-Wechsel • Kürzere Antwortzeiten für Endbenutzer bei Geschäftsvorfällen mit in DB verbleibenden Daten Langfristige Speicherung von Daten in “nutzbarer” Form • … wegen neuer rechtlicher Anforderungen (GDPdU, SOA)… trotz mehr “Bewegung” in der IT-Landschaft (End-of-life System vs. Data)

  11. Archive Archive DB DB Application-oriented DB Archiving Architectural classification w.r.t. provider of archiving service DBMS-based DBMS-integrated Application with archiving functionality Application DBMS DBMS with archiving functionality

  12. Archive DB SAP Data Archiving For SAP, only the DBMS-based approach is an option. However, SAP applications use Basis Archiving Functionality. DBMS independence (DB2/..., Informix, MS SQL Server, Oracle, MaxDB) • Availability: DBMS-integrated approach only in DB2/390 RAM • Even „Standard“ SQL needs to be unified (SAP „Open SQL“) Tertiary storage integration • Needs to be vendor-independent as well: dedicated archiving/CM systems connected via certified interface to store archive files Application „awareness“ of archiving • DB schema hardly contains application semantics (almost no integrity constraints on DB level, business context for archive access, ...) SAP application with archive integration SAP basis archiving functionality („ADK“) DBMS

  13. Datenarchivierung aus SAP-Kundensicht Auswertungen/Zugriff archivierte Daten lesen Datenbank Datenobjekte Archivdateien Anwendungsdaten in Archivdateien schreiben in Datenbank löschen

  14. für CO-Belege für FI-Belege Daten Customizing Archivierungs-objekt Programme Anwendungsbezug durch Archivierungsobjekte • Archivierungsobjekt • Struktur: Definition der logisch (aus betriebswirtschaftlicher Sicht) zusammenhängenden, gemeinsam zu archivierenden Daten – einschl. Kontext= “Schema” (allerdings nicht Archivschema) • Verhalten: Programme, Anwendungsspezifische Prüfungen • Technische Einstellungen

  15. Programme eines Archivierungsobjekts … sichtbar als Aktionen für den DA-Administrator

  16. Kernfunktionen in Archivierungsprogrammen I Schreibprogramm • Berechtigungsprüfungen • Einlesen des anwendungsspezifischen Customizings • Lesen von der Datenbank: SELECTs aus allen beteiligten Tabellen/Views • Archivierbarkeitsprüfung • Erzeugen eines neuen Archivierungslaufs • Aufbau komplexer (Daten-)Objekte aus gelesenen Sätzen • Sichern der Datenobjekte in Archivdatei • Ausgabe eines Protokolls • Abschluß des Archivierungslaufs • ggf. Anstoßen einer Folgeaktion (Löschen oder Ablegen)

  17. Verkaufsbeleg Lieferung Faktura ... ... SD_VBAK RV_LIKP SD_VBRK Zeit Verkaufsbeleg Erfassungsdatum Änderungsdatum Archivierung Erreicht Beginn 2 Beginn 1 Residenzzeit Beispiel für Archivierbarkeitsprüfungen Abhängigkeiten im Belegfluß Berücksichtigung der Residenzzeit

  18. Kernfunktionen in Archivierungsprogrammen II Löschprogramm • Berechtigungsprüfungen • Einlesen des technischen objektspezifischen Customizings (z.B. Transaktionsgrenzen) • An die aktuelle Plattform und das aktuelle Release (DB-Schema!) angepaßte Lesen der Datenobjekte aus der Archivdatei • Löschen der korrespondierenden Daten(sätze) in der Datenbank(SAP Business Information Warehouse: DROP PARTITION) • Ausgabe eines Protokolls • Abschlußarbeiten • ggf. Anstoßen einer Folgeaktion (Ablegen oder Nachlauf)

  19. Archivzugriff Sichtbarkeit von archivierten Dateneigenständiges Archiv integriertes Archiv … je nach SAP-Anwendung! • Bewußte Unterscheidung DB <-> Archiv oder uniformer Zugriff bei Anzeige aus Anwendungstransaktion heraus • Generische Tools • Archivinformationssystem • Document Relationship Browser

  20. Das Archive Development Kit – ADK • ADK = Generische Entwicklungs- und Laufzeitumgebung für ABAP-Archivierungsprogramme • Von SAP für alle Archivierungsobjekte eingesetzt • Schreiben, Löschen, Lesen, Zurückladen, Umsetzen • vom Archivinformationssystem benutzt • Für Kunden zur Entwicklung eigener Archivierungsobjekte freigegeben • ADK-Komponenten • Laufzeitsystem • Kapselung von anwendungsunabhängigen Basisfunktionen, vor allem in „Richtung Archiv“ • Repository • Verwaltungs- und Metadaten • Administration • Benutzerschnittstelle für DA-Administrator

  21. ADK als Laufzeitumgebung SAP-System Archivadministration Dateisystem ADK- Laufzeitsystem Archivierungs- programme • Datenobjektdienste • Konvertierungen • Dateiverwaltung • Statusverwaltung • Klassenaufrufe • Statistikmodul • CMS-Anbindung • AS-Aufrufe • Monitoring • Jobsteuerung Archivdateien Archive- Link/ Content Mgmt.- Service HSM- System AS Anwendungsdaten Verwaltungs- und Metadaten Ablagesystem Hinter-grund-verarbei-tung DA- Monitor Datenfluß (beim Schreiben) Datenbank Steuerfluß

  22. Temporäre Konvertierungen bei lesenden Zugriffen Plattformanpassung • Dateipfad entsprechend aktuellem Betriebssystem • Zeichensatz (Codepage) und Zahlenformat • für ADK-Format-Metadaten (Header, Tags) • für zeichenartige und numerische Anwendungsdaten • Erfordert „Bootstrapping“ beim Header-Lesen Strukturanpassung -> Umgehen mit Schemaevolution • Schema in Archivdatei wird mit aktuellem Schema (aus DDIC) verglichen • Bei namensgleichen Strukturen (Tabellen) • Namensgleiche Felder müssen zuweisungskompatibel sein • Initialwert bei noch nicht zum Archivierungszeitpunkt vorhandenen Feldern • Ausblenden nicht mehr vorhandener Felder

  23. Transaktionsaspekte Konsistenz Datenbank <-> Dateisystem • Physische Datei erst dann logisch sichtbar, wenn „fclose“ okay; d.h. vom Schreibprogramm übergebene Datenobjekte nicht logisch synchron archiviert • „orphan files“ unkritisch Konsistenz Datenbank-Löschen <-> Archiv-Schreiben • Pragmatische effektive Lösung durch 2-Phasigkeit • Ausschluß von Änderungen durch Anwendung zwischen Schreib- und Löschphase • Read-only-Zugriff und Archivierbarkeit über Status gewährleisten oder • unkritische Änderungen zulassen oder • Anwendungssperren verwalten

  24. Repräsentation der Phasen in der Archivverwaltung Write Phase Delete Phase File 1 Delete Phase File 2

  25. Auswahl eines geeigneten Archivierungsobjekts

  26. Auswertung von DA-Statistiken

  27. Data Rows Additional Session Extends to Minimum Level Free Space Database Page Concepts Overhead Area Data Removed During Archiving Session Reserved for Update Extension

  28. Free Space Fragmentation Overhead Area Reserved for Update Extension

  29. Index Fragmentation Index 1 After Delete Processes 2 Effect: • Index range scans:More index leaf blocks read • More tree levels -> Index Reorganization/Rebuild! ... ... 3 4 ... ... ... ... ... ... Table

  30. Customer Example • Starting pointApprox. 290GB DB size and approx. 15GB DB growth per month • AimReduction of DB growth rate to: • Reduce hardware costs • Maintain stable system performance • Response times and system administration • Faster implementation of support packages and upgrade projects • Local currency conversion • Archiving19 archiving objects from FI, CO, MM, SD, and HR • Result (after 15 months)> 200GB archived

  31. 700.00 Expected size without Archiving 600.00 Allocated DB size 500.00 400.00 Allocated DB content 300.00 'Without' Initial With regular archiving Archiving Archiving 200.00 100.00 DB growth: Reduction: DB growth: ~7GB/month ~15GB/month ~60GB 0.00 Jul Jul Oct Apr Jan Jun Jun Feb Mar Feb Sep Nov Dec Mar Aug Aug Sep May May Customer Example

  32. ADK-Based Data Archiving ... proved to be a valued concept and implementation for ABAP-based SAP components. And will continue to do so. However, let‘s review the long-term issues ... and discuss the pros & cons of new ideas.

  33. Drivers for XML-Based Archiving • Standards for long-term use of archived objects • XML • http(s) • WebDAV • Java • J2EE • Archive access • Independent of “home“ system • Potentially cross-system • For Java applications as well Central archivingservice • Reduced redundancy in distributed scenarios • Central administration

  34. mySAP X mySAP X XML DAS Connector XML DAS Connector Archive Browser XML XML XML Services Services Services App. System Database Services Database Open and inexpensive storage system XML Data Archiving Service (XML DAS) Business data archive XML-Based Archiving – A Service Approach

  35. Storage system Resource ADK- vs. XML Archiving Local Archive Admin. Local Archive Admin. ADK Archiving Programs XML Archiving Programs ADK XML Archive API XML DAS Connector SAP Web AS SAP Web AS HTTP XML Data Archiving Service (XML DAS) XML DAS Adminis-tration SAP J2EE Engine ArchiveLink WebDAV File system ArchiveLink File system Storage system File Document File

  36. XSD XML XML WebDAV-Like Archive Hierarchy • Collection • Archive path • Name System ID x / b4t /b4t/ 000 • Resource • Archive path • NameURI = • Archive path + Name Client x /b4t/000/ bc_sbook_x XML archiving object y /b4t/000/bc_sbook_x/ 2003 Archive Store /b4t/000/bs_sbook_x/2003/ order_schema.xsd /b4t/000/bs_sbook_x/2003/ order_4711.xml /b4t/000/bs_sbook_x/2003/ order_4712.xml XML XML Arch. Store1 Arch. Store2 Arch. Store3

  37. Properties of Resources • WebDAV concept for attributing and finding resources • A set of properties can be defined independently of resources. Such a property set is called property index. • The property index is used to attach property values to a resource either when a resource is archived or later on. • Properties are typed: • VARCHAR(n) • … • Properties are used in value-based queries OrderNumber: 4711 XML OrderDate: 2005/01/10 ShipTo: Eppingen-Rohrbach

  38. Resources (XML doc 1) resources (XML doc n) Storage System Support Application layer XML document/stream MKCOL PUT URI service interface: WebDAV-like HTTP methods Global service layer internal storage abstraction WebDAV client File system I/O Storage system protocols WebDAV URL Phys path Collection Directory Storage layer Resource File

  39. Header XML resource Prefix Standard Decompression Compressed Resource 1 Offset 1 Prefix Offset 2 Compressed Resource 2 ... Prefix Offset n Compressed Resource n Open Archive File Format • A wrapper format ensuring… • Efficiency (many data objects, byte-addressable for random access) • Long-term interpretability (XML schema, XML, easy to construct)

  40. XML DAS Functionality Beyond WebDAV Write once, resources cannot be modified • Insertion of resources into collections can be disabled • Deletions possible (occurs with logging) Identification of resources • Via unique URI • Long-term and stable Archive queries • Hierarchical search (using the path) • Value-based search (with property indexes) • Future: Content search using other engines No WebDAV spec ambiguities • No case sensitivity anywhere • Within one collection, resources and collections cannot have same name • More status codes

  41. XML DAS Functionality Beyond WebDAV XML awareness • TYPE = { XML | XSD | XSL | BIN | COL | RES | ALL | ... } • Check XML for well-formedness, and validity against schema • Keep meta data (XSD, XSL) “close“ to business data Automatic naming • Unique namefor a resource within a collection Integration of different storage systems • Independent of the logical XML DAS hierarchy • Supports data life cycle management Archiving application integration • For example, support of safe delete from DB, even in one phase Security • Authorization, HTTPS, check sum Pack resources • Optional • Asynchronous

  42. ABAP XML Archiving Object BC_SBOOK_X Transaction SARA also used in XML-based archiving

  43. SAP J2EE Engine Java Application Archiving programs run here 1 …n instances XML DAS Connector for JAVA SAP J2EE Engine XML Data Archiving Service JAVA Archiving: A System Deployment Scenario

  44. Java Archiving: GUI Prototype

  45. Positioning ADK and XML • ADK: primarily for reducing size of database; reliable, stable, secure; for ABAP only • XML: advantages when end-of-life of data longer than end-of-life of system and in multi system environments; for Java as well • Striking Differences for Users • Storage in the form of resources in standardized XML format instead of ADK files • Archived resources can be read by yourtool • Easier interpretation in the long term • Compression optional through pack function • Application-specific searches with help of property indexes • Direct archiving, no separate store phase (WebDAV or file system) • Schedule as many delete jobs as reasonable

  46. About SAP Data Archiving • Focus • SAP R/3 Enterprise • Further components • SAP BW • SAP CRM • Detailled information about • Technology and administration (ADK) • Data storage and data access • Implementation of archiving projects • Authors • Archiving experts at SAP(also mal nicht Küspert, Schaarschmidt, Zeller, Langguth ;-) http://service.sap.com/data-archiving

More Related