120 likes | 241 Views
Burton D. Morgan Entrepreneurial Competition. Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have an idea for a new business? Then join us. We will help you: formulate your ideas create a business plan
E N D
Burton D. Morgan Entrepreneurial Competition • Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have an idea for a new business? Then join us. We will help you: • formulate your ideas • create a business plan • get feedback and possibly funding from nationally known entrepreneurs and venture capitalists • get seed funding for your business in the form of prizes totaling at least $50,000 (possibly more) • get space in the Purdue Technology Incubator • The competition is open to all Purdue students. • Callouts on the 5th and 6th September, 7-9 pm in Krannert Auditorium. Register with paf@purdue.edu or call 4-7324 More information at www.mgmt.purdue.edu/entrep
CS 590M Fall 2001: Security Issues in Data Mining Lecture 6: Time Series, Regression, Data Mining Process
Regression • Problem: Prediction of Numerical Values • Similar to Classification, but continuous class • Strong Statistical base • Data mining community primarily concerned with scale
Regression: Problem Definition • Data: Sequence of vectors xi, yi, i=1,…,n • Goal: Find function f such that f(x)y for • Training data xi, yi • x, y where y is unknown • Note that f captures relationship between x and y, but doesn’t imply causality
Regression: Issues • Curse of dimensionality: As the number of attributes/values grows, • Space of possible functions f grows exponentially • Number of training examples needed to learn best f grows exponentially • Solution: Constrain space of possible functions
Regression: Approaches • Decision Trees • Regression Trees (e.g., CART) • Decision tree with automatic selection of number of choices at each node • Regression Splines (e.g., MARS) • Handles discontinuity at choice points • Artificial Neural Networks • Capable of computing arbitrarily complex functions
Time Series • Time/value data • Not sequential associations – value@time, not event@time • Generally viewed as a function with a value at any given time • Goals: • Learn function • Identify repeated patterns of value change
Time Series: Finding Patterns • Given a values over a time fragment, find time fragments with similar values given: • Shift of values • Scaling of values • Stretching of time • Find commonly occurring patterns of values (e.g., the time fragments that would give the most similar fragments under the above conditions)
Time Series: Approaches • Transformation • Use DFT to transform to frequency domain • Drop all but first few frequencies • Index in R* tree and search • Window-based • Sliding window across sequence • Index key features in special data structure • Count entries at each index point
Data Mining Process • Cross-Industry Standard Process for Data Mining (CRISP-DM) • European Community funded effort to develop framework for data mining tasks • Goals: • Encourage interoperable tools across entire data mining process • Take the mystery/high-priced expertise out of simple data mining tasks
CRISP-DM: Phases • Business Understanding • Understanding project objectives and requirements • Data mining problem definition • Data Understanding • Initial data collection and familiarization • Identify data quality issues • Initial, obvious results • Data Preparation • Record and attribute selection • Data cleansing • Modeling • Run the data mining tools • Evaluation • Determine if results meet business objectives • Identify business issues that should have been addressed earlier • Deployment • Put the resulting models into practice • Set up for repeated/continuous mining of the data