420 likes | 499 Views
Dealing with large Content Scenarios in SharePoint Server 2007 . Architecture, Challenges, and Strategies Abrar Chisti, Microsoft Corporation. Agenda. Overview Manageability Planning Availability Case Study Takeaway’s. Content Database Growth. Use as Document Repository
E N D
Dealing with large Content Scenarios in SharePoint Server 2007 Architecture, Challenges, and Strategies Abrar Chisti, Microsoft Corporation
Agenda • Overview • Manageability • Planning • Availability • Case Study • Takeaway’s
Content Database Growth • Use as Document Repository • Multiple versions of documents • 70-95% of size is File Stream • Storage of large Multi Media files • Lack of Governance/Site Quotas • One Large Site Collection • Lack of Planning
Is SharePoint the Right Solution? • SharePoint sites evolve organically. • Database Capacity planning is often overlooked • Limited or no Governance • One or more large content database(s) • Difficulty for IT to maintain • IO Throughput and Latency is effected
Plan for Manageability • Limit Content Database Size to <= 100G • If Content DB Size is > 100G • Use Differential/Incremental Backups • SQL Server 2005/2008 • DPM 2007 • Test & Baseline IO Sub-System • Set DB Auto-growth to Fixed Value • Split Sites in Content DB to multiple Content DB’s
How to Manage Content • Split Content Database • Move Site Collections between Databases • Move Sites into Site Collections (Re-Parent) • May need to promote sub sites to sites • May need to move site collections between web applications • Use OOB or 3rd Party Tools • Stsadm –o export/import • Stsadm –o backup/restore • Stsadm –o mergecontentdb • Content Deployment API (Selective)
How to Limit Storage • Document Libraries • Limit # of Versions. • Archive or Delete Old Sites • Archive or Delete Unused Sites • Impose Site Quotas • Different types of quotas – Small/Med/Large • Take into Consideration Recycle Bin • Manage Lists for Performance
Upgrade Hardware/Software • Ensure Latest SP/Patch • Use Dedicated SQL Server • Use 64 Bit Architectures and 64 Bit OS • Use MS Hardware Recommendations • Use SQL Server connection alias when you configure your farm • Increase Bus Bandwidth
Take Advantage of SQL Server 2008 Capabilities • Performance - Implement database backup compression. • Availability - Implement log stream compression. • Security – Implement Transparent Data Encryption (TDE). • Resource management – Use SQL Server 2008 Resource Governor • Be Aware of DB Migration Considerations
Content Archival/Reduction • Use Database Snapshots • Use Records Repository Implementation • Externalize (BLOB) storage
Database Snapshot • Provides “snapshot” of Content DB at given instant. • Requires Same DB Server Instance • Refers to the Original Database • Uses “Copy on write” mechanism • Need to create Separate Web App.
Remote/External Blob Storage • Reduce Storage Costs • External Blob Storage API • Remote Blob Storage API • SQL Server 2008 has support for RBS • Can write BLOB directly using RBI • http://blogs.msdn.com/sqlrbs/
External Blob Based Solution -BLOB IO is moved to Web Front End -Supports Compression And Encryption Capability
Plan for Software Boundaries • Bottom Up Approach • Plan for SQL Storage • SharePoint Performance Recommendations • # of Site Collections/Content DB • 50,000 • # of Site Collections/Web Application • 150,000 Site Collections • 100 Content DB’s Per Web Application • Use Multiple SQL Servers for Higher Scalability
Storage Architecture • Use Appropriate Disk and SAN interface • SCSI vs IDE vs SATA vs SAS • Consideration – Hot Swap, Multiple IO, Speed, Capacity, Protocol • Use Appropriate Disks and RAID Arrays • Faster Disks/Arrays • Separate Disks for TempDB, ContentDB, and Trans Logs • Multiple Data Files for Large Content and Search DB’s • Distribute files across Disks
Content Database Allocation • SharePoint Allocation of Content DB’s • Pre-Allocate Pool of db’s • Round Robin Scheme between DB’s • Based on Delta between Max sites and Current sites • Example • Site Collection Per Database • Create Database with 100G (using ALTER DB Command) • Leverage Managed Paths
Clustering • SAN or Shared Disks • Use Windows/SQL Clustering for HA • Dedicated Disks or DAS • Use SQL Server Mirroring
Redundancy across Data Centers • Log Shipping • Synchronous Mirroring • Asynchronous Mirroring • SQL Server 2008 Log Compression
Monitoring • Processor: % Processor Time: _Total. On the computer that is running SQL Server, this counter should be kept between 50 percent and 75 percent. • System: Processor Queue Length: (N/A). 2 x #of core CPUs. • Memory: Available Mbytes: (N/A). Monitor this counter to ensure that you maintain a level of at least 20 percent of the total physical RAM available. • Memory: Pages/sec: (N/A). Monitor this counter to ensure that it remains below 100.
Disk Counters • Logical Disk: Disk Transfers/sec • Logical Disk:Disk Read Bytes/sec & Disk Write Bytes/sec • Logical Disk: Average Disk sec/Read (Read Latency)/Avg Disk Sec/Write • Logical Disk: Average Disk Byte/Read/Write • Physical Disk: % Disk Time • Logical Disk: Current Disk Queue Length • Logical Disk: Average Disk Reads/Sec and Logical Disk
Performance Monitoring • Perfmon • Analyze Logs using codeplex tools • Favorite Web Monitoring (3rd Party) solution. • System Center Operations Manager (SC-OM) • SharePoint Monitoring Toolkit • http://blogs.msdn.com/sharepoint/archive/2007/12/10/announcing-new-system-center-operations-manager-2007-packs-for-wss-3-0-and-moss-2007.aspx
Case Study Large Automotive Loan Origination Application
Large Storage Scenario (Phase I) • Ability to house 10.5 million content items (1+TB). • System input with "normal" input load, defined as 27,000 document per day (1 day = 10 hours). • Simulate user load to represent 200 users simultaneously accessing the system to: • Use search to find elements of document metadata. • View a document (scanned TIFF image). • Update elements of document metadata.
Phase II • Ability to house 50 million content items (5+TB). • 35 million TIFF images. • 15 million Microsoft Office documents • Determine the maximum number of users the solution could support. • Users perform the following tasks: • Use search to find elements of document content (full-text) and metadata. • View a document (scanned TIFF image or Microsoft Office document).
Architectural Overview Logical Architecture – Phase I
Takeaway’s • Optimize Performance • Planning & Monitoring • Plan for Scale • Plan for Availability • Plan for Manageability
References • SQL Server Database Optimization • http://technet.microsoft.com/en-us/library/cc263261.aspx • Plan for Software Boundaries • http://technet.microsoft.com/en-us/library/cc262787.aspx • Move Site Collections to new Content Database • http://technet.microsoft.com/en-us/library/cc825328.aspx • Enable SharePoint 2010 to Use Remote BLOB Storage • http://technet.microsoft.com/en-us/library/ee748641(office.14).aspx/ • Content Deployment API (PRIME) • http://msdn.microsoft.com/en-us/library/cc264073.aspx • Integration of SQL Server 2008 and SharePoint • http://msdn.microsoft.com/en-us/library/cc264073.aspx • Use Database Snapshots for Archiving Sites • http://technet.microsoft.com/en-us/library/cc706872.aspx • Configure Availability in SharePoint Farm • http://technet.microsoft.com/en-us/library/dd207311.aspx • Case Study for Large Content Scenario • http://technet.microsoft.com/en-us/library/cc262067.aspx • Scaling Storage Architecture • http://www.knowledgelake.com/whitepaper/Scaling%20SharePoint%202007%20-%20Storage%20Architecture.pdf
Tools Availability • SPUsed Space Info • SPSiteInfo • Content Deployment Wizard • Migrate from other source systems. • Other tools in CodePlex • 3rd Party • Metalogix, Qwest, Tzunami, AvePoint, StoragePoint, Knowledge Lake