CPCC CCT Program Distributed Processing I

CPCC CCT ProgramDistributed Processing I Block1 Wrapping Your Nugget Around Distributed Processing

CPCC CCT Introductions Course paperwork - payment Plan for this module Credit by exam for CCT 242

What is Distributed Prcoessing? • According to Access Data’s Configuration Guide • You need to get this guide and read it • http://accessdata.com/downloads/media/Configuring%20Distributed%20Processing%20with%20FTK%203.pdf • Distributed processing is a functionality that exists within the FTK 3 application that allows users to create a distributed processing cluster with up to four total nodes (workers) 1 local and 3 distributed. These additional processing nodes (workers) will function together in a cluster to increase productivity and decrease overall processing time

Why do we need it ? • Drives are getting huge • 1-2 TB drives are becoming common • 3 TB drives available • Media driven computing society • Artifact counts are very high • Over 1 million artifacts is very common • Processing Time is increasing rapidly

Some more reasons why • To counter processing time increase we spend more on stand alone systems • That still take too long • That still become unstable when resource limits are reached. • Keeps gear tied up

How Distributed Helps Simple as Little Gun versus Big Gun

How Distributed Helps Faster More Stable More efficient Allow a hardware migration cycle

Real World Results • We have a small case (Sluix) • 4 hours = FRED • 94 Minutes = Barney • 44 Minutes = DP

Real World Results We have a medium size case (Testforsafeboot.eo1) FRED = 20 hours Barney = 7 hours 53 minutes DP = 1 hour 13 minute

Real World Results We have a large size case (HTI-001, Test.001) FRED = CRASH at 75-100 hours repeatable Barney = 28 hours DP = 3 hours

Terms and Definitions There is a lot to know before you do this Terminology Requirements Technical skills We’ll go over each

Terms • Header (Examiner machine) • Workers (Helper machines) • Oracle (where the database is stored • Could be on the examiner machine or separate • Evidence share (also called Imaging Server at CPCC) • Case share (could be implemented numerous ways) • We’ll discuss all in detail later

What it looks like

A Little Better

Terminology - How DP works How does distributed processing work? Evidence processing tasks assigned to the engine by the user are called Jobs. The FTK application submits the job to the processing engine. Each job is divided into small packets called Work Units. Each work unit is handed to a service called “ADProcessor.exe” (and ADIndexer, if you’ve chosen to index), which actually does the work.

Terminology – How DP works • There are two components in distributed processing: • 1. Processing Manager: The Processing Manager embedded in FTK manages Jobs and Work distribution. It also handles status updates and Job progress. • 2. Processing Engine: The Processing Engine manages the processing resources of a particular computer/node. • Every machine that participates in a processing cluster runs a Processing Engine. • It decides how many jobs can be concurrently processed by that node. • The processing engine also manages the ADProcessor.exe/ADIndexer.exe that actually do the processing work.

Examiner Hardware Requirements • Based on the CPCC recommended ring (more later) • Examiner Machine (should be your most powerful machine • See http://accessdata.com/downloads/media/FTK_3x_System_Specifications_Guide.pdf • CPCC recommends • 8 Core or better processor, • 12 GB RAM, 64 Bit Win 7, • 80 GB or more SSD for OS • 1 GBps NIC

Oracle Hardware Requirements • Based on the CPCC recommended ring (more later) • Oracle Machine (should be a powerful machine • See http://accessdata.com/downloads/media/FTK_3x_System_Specifications_Guide.pdf • CPCC recommends • 8 Core or better processor, • 12 GB RAM, 64 Bit Win 7, • 80 GB or more SSD for OS • 160GB SSD or RAID 0 config with spinning drives (4 7200 RPM – Vraptors min) – NO RAID 5 • 1 GBps NIC

Workers • Based on the CPCC recommended ring (more later) • Worker Machine (can be a little less powerful machine - if you don’t have good ones add what you have as long as it meets minimums • See http://accessdata.com/downloads/media/FTK_3x_System_Specifications_Guide.pdf • CPCC recommends • 8 Core or better processor, • 8 GB RAM, 64 Bit Win 7, • Vraptor or better drive • 1 GBps NIC

Other requirements .NET 3.5 Service Pack 1 (on the Application ISO, or if connected to the Internet, will attempt to download) Windows 2008 R2 requires that you manually install 3.5sp1 using the "Roles and Features" tool. AccessData Processing Engine installation executable The Evidence Processing Engine (Regular FTK) IS NOT TO BE INSTALLED in distributed mode on the FTK examiner machine.

Additional Considerations • Access Data says • The machines that store the evidence and case folder become a bottleneck. • Processing evidence is very disk IO intensive. As a result evidence should be stored on fast drives. With many large machines in a processing group, it is possible that the file sharing service in Windows will run out of kernel memory and fail to provide the evidence data across the network. • If you use the CPCC ring, you will be fine – more later

Additional Considerations • Access Data says • The machine that runs the Processing Manager may become a bottleneck during the discovery phase. • Discovery is the process of enumerating all the actual files in a piece of evidence. • Information about these files is stored in the database and the Distributed Processing Engines work on processing them. • This discovery phase always runs in the Processing Engine located on the same machine as the Processing Manager. • Since it produces much of the work that other Processing Engines work on, it needs to be one of the fastest machines (CPU speed) in the processing group. • If you use the CPCC ring, you will be fine – more later

Additional Considerations • Access Data says • Distributed processing produces a lot of network traffic. There is control traffic between the engine components, but primarily the network is used to read evidence and write results to the case folder and database. It is very easy to saturate a gigabit network for extended periods of time while processing a large image. • Please use the fastest network technologies available to you, at a minimum 100 Mb switched. NO …use 1Gps only !!!!!!! • We strongly recommend that the Case folder and image location are on separate drives. AND on a separate machine from examiner – more later • If you use the CPCC ring, you will be fine – more later

Lab Considerations • For maximum throughput we will disable a lot of security stuff • Firewalls, A/V • Permissions and shares will be VERY open • THE LAB must be isolated from the internet and any corporate lans • Vlan separation may be ok depending on details

CPCC CCT Program Distributed Processing I

CPCC CCT Program Distributed Processing I

Presentation Transcript

Distributed Query Processing

Distributed Data Processing

Open Distributed Processing

Distributed Signal Processing

Decentralized Distributed Processing

Distributed Query Processing

Distributed Transaction Processing

CCT

Distributed Graph Processing

Distributed Processing

5. Distributed Query Processing

What is Distributed Processing?

Distributed network signal processing

Distributed Processing Goes Galactic

CPCC: An Overview

WELCOME TO CPCC

CPCC: An Overview

Distributed Query Processing

Open Distributed Processing

Support for Distributed Processing

Distributed Query Processing