240 likes | 338 Views
So you think you can crawl? Stretching the Boundaries of SharePoint 2013!. Petter Skodvin-Hvammen AD- Gruppen , Norway. Who am I?. www.adgruppen.no. Petter Skodvin-Hvammen. Solutions Architect SharePoint Consultant Search Enthusiast Community Lead @ pettersh - psh@adgruppen.no.
E N D
So you think you can crawl? Stretching the Boundaries of SharePoint 2013! Petter Skodvin-Hvammen AD-Gruppen, Norway
Who am I? www.adgruppen.no Petter Skodvin-Hvammen • Solutions Architect • SharePoint Consultant • SearchEnthusiast • Community Lead • @pettersh- psh@adgruppen.no Oseberg ship - Discovered 1904 in Tønsberg, Norway. Buried by Vikings in 834 AD
Enterprise Search Challenges and Solutions Index thousandsofsources Automateindex management Infrastructuresizing www.sharepointeurope.com Not Included:code/scripts, userexperience, relevancy, governance
The Mission… Enterprise Searchusing SharePoint Server 2013 • 30,000 users • 85 locations in 30 countries • 15,000 dailysearches • 100,000,000 documents(?) • 60 core systems, 2,000 applications
What do we index? 100,000,000 documents 500servers 3,000 fileshares
Where is the data? • Datacenters • Time zones • Bandwidth www.sharepointeurope.com
How canweget it? • Limit bandwidth usage for specific server locations • Limit crawler impact within local business hours • Grant read access to crawler per file share • Avoid token bloat issues with more than 1,015*groups per account *http://blogs.technet.com/b/shanecothran/archive/2010/07/16/maxtokensize-and-kerberos-token-bloat.aspx
How do weoperate it? • File shares are created, changed, and deleted every day using a custom self service solution • File shares are moved between servers every day by automation rules • Manage indexing and crawling of each file shares with minimum manual effort www.sharepointeurope.com
Whatcan SharePoint do? • Max 50 contentsourcesper service application • Max 500 withOctober 2013 CU installed • Max 100 start addressesper contentsource • Max 500 withOctober 2013 CU installed • Max 20 concurrent crawls per service application • Limitation has beenremoved http://technet.microsoft.com/en-us/library/cc262787(v=office.15).aspx#Search
It’s complicated • More data thanwe have space for • It’s located all over theplace • Everythingchanges all ofthe time • Therearelimitations in SharePoint • Someone’s gottamaintainthis • It has to be secure and relevant www.sharepointeurope.com
Whatdidwe do? • Createdlogicalgroupsof file shares • Used symbolic linking fewer content sources Start address \\file00\share\sym01 \\file01\share01 \\file00\share\sym02 \\file02\share03 \\file00\share \\file00\share\sym03 \\file03\share03 www.sharepointeurope.com
Whatdidwe do? • Grouped file sharesbasedon region • One contentsource per region • Incremental crawls everynight crawlingbasedon time zones www.sharepointeurope.com
Whatdidwe do? • Created DNS alias per impactrule in etc/hosts on crawl servers reduced crawler impact www.sharepointeurope.com
Whatdidwe do? • Granted file shareaccess to theaccountincluded in leastgroups • Monitoredgroupmemberships • Grouped file shares by crawl account • Crawl rulesmatched folder structure managed pool of crawl accounts SP\spcrwl01 SP\spcrwl02 file://.*/spcrwl01/.* file://.*/spcrwl02/.* Include Include www.sharepointeurope.com
The biggerpicture • Folder structure: • Start addresses: <content source>/<crawler impact>/<crawl account>/<symbolic link> file://<crawler impact>/<content source>/<crawler impact>
How didwemanagethis? custom timer job to get list of file sharesto crawl from self service portal self service portal for enablingindexingof file shares custom timer job for creatingand removingsymbolic links customsolution for grantingaccess to crawl accounts AUTOMATION custom lists for mappingserver to contentsource, scheduleand impact, shares to crawl accountsand metadata, UNC to symlink custom web service integration in self service portal contentenrichment service forreplacingsymlinks in pathswithactual file paths www.sharepointeurope.com
Example: Self Service Portal Example: Custom Lists Title: European SharePoint Conference Title: European SharePoint Conference Owner: Petter Skodvin-Hvammen Owner: Petter Skodvin-Hvammen Business Area: Consulting Business Area: Consulting Classification: Internal Classification: Internal Type: Project Type: Project UNC Path: Assignedautomatically UNC Path: \\file01\share01 Crawl Account: Assignedautomatically Crawl Account: SP\spcrawl01 Symlink: \\default\europe\default\spcrwl01\e5dc12a41d Save Cancel Location: europe (server file01 is located in Oslo DC) Bandwidth: 5Mbps www.sharepointeurope.com
40 10 WFE WFE Query Query Admin Admin Million Documents Queries / Second Caching Index-0 Index-1 Index-0 Index-1 Caching Doc Proc Doc Proc Doc Proc Doc Proc Enrichment Enrichment Enrichment Enrichment Crawling Index-2 Index-3 Index-2 Index-3 Crawling Analytics Doc Proc Doc Proc Doc Proc Doc Proc Analytics Doc Proc Central Admin Enrichment Enrichment Enrichment Enrichment Doc Proc Enrichment Enrichment • Admin DB • Analytics DB • Crawl DB • Link DB • Other SP DBs SQL Server SQL Server
Capacity testing Purpose • Crawlingofsymbolic links • Scalingofvirtualmachines • Sizingof disk space • VerifyMicrosoft’sadvises Approach • 4 server farm with 2 partitions • 8 vCPU, 16 GB RAM, 850 GB • Crawl 10 file shares (3.7M files) • Replaytop 300 queries • Apache JMeter www.sharepointeurope.com
Capacity testing – findings • Crawl rate declined 1% per million items indexed • Query latency increased exponentially from 12 million items indexed per partition • Database latency was insignificant during crawling • Successfully crawled file shares via symbolic directory links • Disk space usage was significantly… lower than expected • Reduced data volume from 850 GB to 450 GB • 40+ servers => huge cost savings www.sharepointeurope.com
Infrastructure – VM sizing Dedicated ESX Cluster • 14 x VM for SharePoint 2013 • 4 physicalmachines • 4 x 32 = 128 CPUs • 4 x 56 = 1024 GB memory • HA maxutiliization = ¾ • 3 x 32 = 96 CPUs • 3 x 56 = 768 GB memory • CPU and Memory can be over-commited • CPU over-commited 1,34 (1,78 if one physical host fail) • VM’s must wait for physical CPU Wait time for 8 cpu = 2 x 4 cpu • Mitigation: • Reduce allocated virtual CPU, or • Increase physical CPU • Memory factor 0,44 (0,59) • Reserved and lockedmemoryprevents HA failover www.sharepointeurope.com
Infrastructure – VM tuning Peak and average CPU usage is calculated over 30 days www.sharepointeurope.com
Summary • Indexingthousandsofcontentsources • Automation for rapid changingindexrequirements • Sizingtheinfrastructure for performance and HA www.sharepointeurope.com
Questions? @pettersh petter.skodvin-hvammen@adgruppen.no http://linkedin.com/in/petterskodvin