240 likes | 245 Views
Anatomy of Device Physics Big Data & Machine learning Janet George Fellow/Chief Data Scientist Western Digital Wolfram Data Summit 2016. Contents Gartner 3Vs & IBM 4Vs
E N D
Anatomy of Device Physics Big Data & Machine learning Janet George Fellow/Chief Data Scientist Western Digital Wolfram Data Summit 2016
Contents • Gartner 3Vs & IBM 4Vs • Anatomy of physics data – getting to the basics • Heteroskedastic data • Complexity of the data • Yield versus Endurance • Finding Nemo – Correlations and value (non-deterministic) • Getting caught up in machine learning • A simple performance predictor example • Game changing with Data • Data Intimacy • Embracing complexity, leading change. Sub-subsection head
Anatomy of Device Physics data (Getting to the basics) • 4th Dimension of data • High variability (manufacturing process Materials are constantly changing) • Experimentation @ scale • Production @ scale • Leading edge • Extremely complex
Going deeper into Device Physics Data • Heteroskedastic data • Heteroskedastic: A measure in statistics that refers to the variance of errors over a sample. • Heteroskedasticity is present in samples where random variables display differing variabilities than other subsets of the variables.
Challenges in manufacturing data (New technology Node Creation • Material instability. Deformation, does not bond • Material build-up • Substrate over heating. • Heat sink • Resistive loss • Deposition issues • Thickness • Warpage • Wetting layers • Oxidation • Diffusion barrier • DPPM known and unknown causes, complex error recovery • Coupling effects, adjacent track interferences.
Dealing with Heteroskedastic data - Challenges • Machine learning model building requires constant optimization – training and re-training with change. • Ranking and weighting correlations for DPPM – Page Rank Model. • Linear regression versus • Random forest • What works for the data/value creation
Failure Classification & Clustering Complex error recovery, tail failures
Finding Nemo! Correlation and value (non-deterministic) Permutation and Combination of every known and unknown correlation Manufacturing Data Screen Test Prep Measure Screen 1 Par 1 Clean Height Screen 2 Par 2 Film Tool Config Par 3 Screen 3 Oxide ALO Par 4 Screen 4 Cover Wet Par 5 Screen 5 Thickness
Caught up in Machine learning – A simple example • Problem statement • Simple enough – Predicting employee performance • Where is the data? • What data do we have?
Key Findings Match Bias: Bias in the data will yield biased results • Calibration is inherently biased • Line of business data likely to provide more reliable results (common example: sales target vs. actual sales) Bias in company/management policies will yield biased results
Game Changing with DATA! – New collection of Data • Unbiased data collection – getting away from bais. • Raw data – annotations, Lineage • Unfitted for existing tools • ETL – traditional versus new approaches to data collection methods. Machine learning mirrors human biases with extracted data. • Data loss • Getting to the “Holy Grail” faster. • No ETL • Evolving Schemas - Avro
Developing Intimacy with Data with Domain Experts • Asking the right questions • Critical thinking • Observing the signals • Systemic patterns
Leading the industry (Running faster than competitors) • Embracing Complexity and Change • Creating endless possibilities Key take away: Leading edge data from Industrial Internet, machine data is Heteroskedastic (high variability)