550 likes | 653 Views
Chapter Nine. Extracting and Transforming Data with SQL Server 2000. Objectives. Understand the option in SQL Server 2000 for inserting from and extracting to external data sources
E N D
Chapter Nine Extracting and Transforming Data with SQL Server 2000
Objectives • Understand the option in SQL Server 2000 for inserting from and extracting to external data sources • Use the BULK INSERT statement and the BCP utility in SQL Server 2000 to move records into and out of tables and views • Use the Data Transformation Services (DTS) interface to import and export data
Objectives • Create and save DTS packages to perform transformations and add workflow to packages • Identify and implement an appropriate strategy for replicating data among instances of SQL Server 2000 • Understand and configure SQL Server 2000 XML support
The BULK INSERT Statement • BULK INSERT statement • Used in SQL Server 2000 to insert data into a table or view from a data file • Architecture • The OLE DB provider in SQL Server 2000 interprets the BULK INSERT statement natively • As a result, it is possible to use the bulk copy API from ODBC, OLE DB, SQL-DMO and DB-Library-based applications
The BULK INSERT Statement • The BULK INSERT statement uses a simple syntax for inserting data from a file into a table • The destination table name, qualified by the database name, and the source data file are the only two parameters that must be specified for each BULK INSERT operation to succeed
BULK INSERT Statement Usage Scenarios • The BULK INSERT statement is a high-performance way to insert data from a file into SQL Server 2000 using a standard T-SQL statement • Best used when the data to be imported exist in a tabular, non-hierarchical format, such as a comma-delimited file • Can be used within a stored procedure as part of a data feed and can be initiated by scheduling a job that executes the stored procedure • Can be used if an event or program external to SQL Server 2000 initiates the data import
BULK INSERT Statement Performance • BULK INSERT statement and BCP are two high-performance ways to insert data into database tables in SQL Server 2000 • There is an overhead associated with logging the insert of records using these methods • SQL Server 2000 incurs this overhead to ensure that records inserted can be rolled back and rolled forward
BULK INSERT Statement Performance • To optimize the performance of a bulk insert, it is possible to cause SQL Server 2000 not to log the inserts of records • To do this, ensure the following are true: • Database option “select into/bulkcopy” is set to true • Target table has no indexes or if the table has indexes, it is empty when the bulk copy starts • Target table is not being replicated • TABLOCK hint is specified
Bulk Copy Program • Bulk Copy Program • Used to move data into and out of SQL Server 2000 • BCP syntax • BCP utility requires a slightly different syntax than the BULK INSERT statement but uses similar parameters • A key difference in syntax is that it is necessary to specify user name, password, database name and server name when using the BCP utility
BCP in Action Figure 9-1: The BCP command to export order data
BCP Usage Scenarios • The same requirements that lead to the selection of the BULK INSERT statement for inserting large sets of data into SQL Server 2000 can be used to select the BCP utility • In addition, you should select the BCP utility when you need to export data from a SQL Server table into a data file and extensive transformations are not necessary
BCP Performance • Like BULK INSERT statement, BCP utility is a high-performance, batch-oriented mechanism for transferring large sets of records • Because both implement the Bulk Copy API, the performance of BCP will not differ much from the performance of the BULK INSERT statement
Data Transformation Services • Data Transformation Services (DTS) • Set of features bundled together for use in Enterprise Manager • Interface • SQL Server 2000 provides an intuitive, easy-to-use interface to the Data Transformation Services in the format of the Data Transformation Services Import/Export Wizard
Interface Figure 9-2: Invoking the DTS Wizard from Enterprise Manager
Interface Figure 9-3: Invoking the DTS Wizard from the program group
Executing a Transformation Using DTS • The first two steps when executing a transformation using DTS are to correctly specify the source and target data repositories • Subsequently, it is necessary to specify several options about the data transfer and customer transformations • At a minimum, the user executing a transformation needs to select one or more tables or views to copy from the source to the target data repositories
Choosing a Data Source/Destination • It is necessary to identify source and target data repositories involved in the data transfer/transformation after invoking DTS Import/Export Wizard • To identify each repository, specify the following: • Type of data held in each repository • Physical location of the repository • Any information required to gain access to the repository
Choosing a Data Source/Destination Figure 9-4: The Choose a Data Source dialog box in the DTS Wizard
Specifying Table Copy or Query Figure 9-5: The Specify Table Copy or Query dialog box
Specifying Table Copy or Query Figure 9-6: The Select Source Tables and Views dialog box
Specifying Table Copy or Query Figure 9-7: The Type SQL Statement dialog box
Specifying Table Copy or Query Figure 9-8: Using Query Builder to create SQL statements
Specifying Table Copy or Query Figure 9-9: Column Mappings tab in the Column Mappings and Transformation dialog box, which allows customization of destination column mappings
Specifying Table Copy or Query Figure 9-10: Transformation tab in the Column Mappings and Transformations dialog box, which allows custom transformations to the source data
Copying Objects and Data between SQL Server Databases Figure 9-11: The Select Objects to Copy dialog box
Saving DTS Packages Figure 9-12: The Save, schedule and replicate package dialog box
Saving DTS Packages • SQL Server • With this default save option, you can store a package as a table in the SQL Server msdb database • SQL Server Meta Data Services • With this save option, you can maintain historical information about the data manipulated by the package • Structured Storage File • With this save option, you can copy, move and send a package across the network without storing the file a SQL Server database • Visual Basic File • With this save option, you can programmatically customize a package created in DTS Designer or DTS Import/Export Wizard
Editing DTS Packages Figure 9-13: Invoking the DTS Designer from Enterprise Manager
Editing DTS Packages Figure 9-14: The DTS Designer interface
Replication • Replication • Process of synchronizing copies of data stored in multiple sites • Uses of replication: • Allowing multiple locations to keep copies of the same data • Separating transaction processing applications from analytical processing applications • Allowing greater autonomy
Replication • Uses of replication (cont.): • Creating a scalable application that is based on strategically locating data that will be accessed • Bringing data closer to individuals or groups • Using replication as part of a customized standby server strategy
Replication Overview • Publisher • Server that makes data available for replication to other servers • Subscribers • Servers that receive replicated data • Distributor • Server that contains the distribution database and stores meta-data, history data and/or transactions
Snapshot Replication • Snapshot replication • Periodically copies entire publications as they exist at a moment in time Figure 9-15: The components involved in snapshot replication
Usage Scenarios for Snapshot Replication • Snapshot replication is the best choice when the following are true: • Most of the data are static and do not change often • It is acceptable to have copies of data that are out of data for some period of time • The total size of data publications to replicate is small • The sites are often disconnected, eliminating the possibility to send constant updates of data as in transactional replication
Transactional Replication • Transactional replication • Incremental changes made on the Publisher are sent to each Subscriber Figure 9-16: The components involved in transactional replication
Usage Scenarios for Transactional Replication • Transactional replication is an excellent choice when: • Data modification should be propagated to Subscribers within seconds of when they occur • Transactions should be atomic • Subscribers are connected to the Publisher most of the time • The application cannot tolerate out-of-date data on the subscribers that receive the changes
Merge Replication • Merge replication • Data modifications can be made at each site Figure 9-17: The components involved in merge replication
Usage Scenarios for Merge Replication • Merge replication is the best selection when: • Multiple Subscribers need to update data at various times and propagate those changes to the Publisher and to other Subscribers • Subscribers need to receive data, make changes offline and synchronize changes later with the Publisher and other Subscribers • The ability for each site to operate independently is critical
Replication Tools • SQL Server 2000 provides several easy-to-use tools for establishing and configuring replication • Each of these tools is available through Enterprise Manager • Most take the form of wizards that allow the definition of Publishers, Distributors, Subscribers, subscriptions and their properties
Replication Wizards • Configure Publishing and Distribution Wizard • Typically the first wizard that is invoked to establish replication • Create Publication Wizard • Allows creation of publication • Push Subscription Wizard • Allows the creation of a subscription to an existing publication • Pull Subscription Wizard • Allows the creation of a subscription to a publication requested by the Subscriber on the SQL Server specified in this wizard • Disable Publishing and Distribution Wizard • Allows removal of replication publications and distribution service from the SQL Server machine
Filtering Published Data • By filtering published data, it is possible to: • Minimize the amount of data sent over the network • Reduce the amount of storage space required at the Subscriber • Customize publications and applications based on individual Subscriber requirements • Avoid or reduce conflicts that could result if the same data were sent multiple Subscribers
Replication Performance Optimization • There are several application-specific and network-specific techniques for optimizing the performance of replication • Simple way is to set minimum amount of memory allocated to SQL Server 2000 by using the Min Server Memory option • Using a separate disk drive for the transaction log on any server involved in replication will help to avoid the disk I/O bottleneck that can result from logging many operations • Ensuring that the data that is replicated is then filtered as much as possible will also help to avoid unnecessary performance problems
Heterogeneous Queries • A heterogeneous query retrieves, inserts or updates data by joining together sets of data from multiple data source types • It is necessary to initiate a heterogeneous query from a SQL Server database that is linked to another data source • A linked server configuration allows remote server access and allows access of non-SQL Server data sources to be treated the same way as the access of SQL Server data sources
Configuring a Linked Server • To configure a linked server definition, a SQL Server administrator specifies an OLE DB provider and OLE DB data source Table 9-1: OLE DB Providers and Data Sources
Configuring a Linked Server Figure 9-18: Linked server configuration
Managing a Linked Server Definition • When setting up a linked server, register connection information and data source with SQL Server 2000 • After registration is accomplished, it is possible to refer to the data source with a single logical name • You can create or delete a linked server definition with stored procedures or through Enterprise Manager • It it possible to use stored procedures to manage a linked server definition
Arguments • Table 9-2 describes data returned by sp_linkedservers system-stored procedure Table 9-2: Returned Data
XML Support in SQL Server 2000 • Standard data format eXtensible Markup Language (XML) is a format for marking up data with extensible tags that can be used by applications to determine the purpose of the data • In SQL Server 2000, it is possible to make a request for relational data in the form of XML and have the SQL OLE DB provider in SQL Server construct an XML document from the resulting data • It is possible to insert, update or delete data in SQL Server using XML syntax to describe the data that should be inserted, updated or deleted • Much of this functionality is available through stored procedures and general T-SQL statements that are executed against SQL Server 2000
XML Support in SQL Server 2000 • There are features that can be used directly from a browser and do not require writing T-SQL code, including: • Executing a T-SQL Query in the URL window of a browser to retrieve the results as XML • Executing a query against a database object using the Xpath query syntax • Defining an XML template with a T-SQL statement that can be executed to return a set of data in the form of XML • Defining an XML View that uses XML Data Reduced (XDR) syntax instead of T-SQL syntax
Configuring Internet Information Services for XML Support Figure 9-19: The MMC snap-in to configure SQL SML support in IIS