270 likes | 705 Views
STEGANOGRAPHY: Data Mining:. SOUNDARARAJAN EZEKIEL Department of Computer Science Indiana University of Pennsylvania Indiana, PA 15705. Steganography Cryptography Data Mining. Art of hiding information in ways that prevent the detection of hidden message Existence is not know.
E N D
STEGANOGRAPHY: Data Mining: SOUNDARARAJAN EZEKIEL Department of Computer Science Indiana University of Pennsylvania Indiana, PA 15705
Steganography Cryptography Data Mining Art of hiding information in ways that prevent the detection of hidden message Existence is not know Discovering hidden Values in your data Warehouse That is The extraction of hidden predictive information from large database Knowledge discovery method– extraction of implicit and interesting pattern from large data collection Science of writing in secret code It encodes a message so it cannot be understood
Data Mining-- Introduction • It started when we started to store data in computer( businesses) • Continued improvements– technology that navigate through data in real time • Examples:- • Single case: • Web server collect data for every single cleick • Logs are too big and contain gibberish • Lots of data and statistics • What we collected is not really useful • Multiple Case:- • Collection of web servers with large bandwidth • Think about the size of the data we collect
Data Mining --- Continue • It helps to design better and more intelligent business( e-learning environments) because it supported by • Massive data collection • Powerful multiprocessor computers • Good data mining algorithms • It existed at least 10 years, but it is getting popular recently • Example:- • Winter Corporation Report • Data warehouses with as much as 100 to 200 terabytes of raw data will be operational by next year, performing nearly 2,000 concurrent queries and occupying nearly 1 petabyte (1,000 terabytes) of disk space. In the same time period, transaction-processing databases will handle workloads of nearly 66,000 transactions per second
The scope of Data mining • It is similar to sifting gold from immense amount of dirt--- searching valuable information in a gigabytes data • Automated prediction of trends and behaviors: Data mining automates the process of finding predictive information in a large database. • Example: Question related to target marketing • Data mining can use mailing list data– other previous data to identify the solution • Another example- Forecasting bankruptcy by identifying segments of a population likely to respond similarly to given events
Automated discovery of previously unknown patterns: It sweep through the database and identify previously hidden patterns in one step • Example: Unrelated items purchased together in a store. • Detecting fraudulent credit card transactions etc • Data base can be larger in both depth and breadth • High performance data mining need to analyze full depth of a database without pre-selecting subsets • Larger samples yield lower estimation errors and variances
Research Rank • 2001 – According to MIT’s Technology Review – Data mining is a top 10 research area • Recently – According to Gartner Group Advanced Technology Research Note– data mining and AI is top 5 key research area.
Multi-disciplinary field with a broad applicability • My point of view of Data mining • Borrowing the idea from • Machine Learning • Artificial Intelligence • Statistics • High performance computing • Signal and Image Processing • Mathematical Optimization • Pattern Recognition • Natural Language processing • Steganography • Cryptography • Has several applications • Market based analysis • Customer relationship management • Fraud detection • Network intrusion detection • Non-destructive eavaluation • Astronomy (look up dataa) • Remote sensing data • ( look down data) • Text and mulitmedia mining • Medical imaging • Automated target recognition • Combined ideas from several diffferent fields • Steganography-- Cryptography
General view of Data mining Preprocessed data Knowledge Raw Data Target Data Transformed Data Pattern Data processing pattern recog. Interpreting results Dimension Reduction Data Fusion Sampling MRA Visualization Validation De-noising Object Identification Feature Extraction Normalization Classification Clustering Regression An Iterative and Interactive Process
Our Research Based On • Data Preprocessing • Multiresolution Analysis • De-noising ( wavelet based methods) • Object Classifications • Feature Extraction • Pattern Recognition • Classification • Clustering • Visualization and Validation • Steganography • Cryptography
Where we are going from here • More robust , accurate, scalable algorthim • For pre-processing and pattern recognition • Wavelets– and fractals • Newer data types • Video and multimedia • Multi-sensor data • More complex problems • Dynamic tracking in video • Mining text, audio, video, images • Investigating Steganography in images, analysis of data hiding methods, attacks against hidden information, and counter measures to attacks against digital watermarking ( detection and distortion)
How data mining works? • How exactly the data mining able to tell you important things that you did not know or what is going to happen next? • The method/ techniques that is used to perform these feats in data mining is called modeling • Modeling is simply the act of building a model in one situation where you know the answer and then applying it to another situation that you don’t • Example: Sunken treasure ship– Bermuda shore, other ships– path-- keep all these information– build the model– if the model is good– you find the treasure in the ocean • Example 2: Identify telephone customer– for example you have the information that is the model that 98% customer who makes $60K per year spend more than $80 per month on long distance • with this model new customer can be selectively targeted
Most commonly used techniques • Artificial Neural Networks: Non linear predictive models that learn through training and resemble biological neural networks in structure • Decision Trees: Tree- shaped structures that represents set of decisions . These decisions generated rules for the classification of a dataset. Specific decision tree include classification and Regression Test(CART)and Chi Square Automated Interaction Detection (CAID) • Genetic Algorithms: optimization techniques that uses processes genetic combination, mutation, and selection in a design based on the concept of evolution • Nearest Neighbor Method: • Rule Induction: • OUR METHODS WILL BE BASED ON WAVELETS, FRACTALS, STEG, AND CRYPT
Steganography Methods • Lets us discuss few methods and its advantage and disadvantage • 1. Least Significant Method • Idea:- Hide the hidden message in LSB of the pixels • Example:- • Advantage:- quick and easy– works well in gray image • Disadvantage:- insert in 8 bit– changes color– noticeable change– vulnerable to image processing– cropping and compression
Redundant method • Store more than one time--- withstand cropping • Spread Spectrum • Store the hidden message everywhere • STEGANALYSIS • Detection Distortion Analyst manipulate the stego-media To render the embedded information Useless or remove it altogether Analyst observe various Various relationship between Cover, message, stego-media Steganography tool Seeing the Unseen
DCT - Discrete Cosine Transformation 219 215 214 216 218 218 217 216 219 216 216 216 215 215 215 215 217 217 218 216 212 212 213 215 215 215 215 215 211 212 214 216 217 216 214 216 215 215 217 218 216 216 215 214 215 215 215 216 215 214 210 210 211 215 215 216 218 215 211 211 213 214 216 216 • Encode • Take image • Divide into 8x8 blocks • Apply 2-D DCT--- DCT coefficients • Apply threshold value • Store the hidden message in that place • Take inverse– store as image • Decode • Start with modified image • Apply DCT • Find coefficient less than T • Extract bits • Combine bits and make message 1720 1.524 7.683 1.234 1.625 0.9234 -0.07047 -1.055 5.667 3.475 -4.181 -1.524 1.152 1.637 1.016 0.3802 0.3711 -1.442 1.067 5.944 0.3943 -0.4591 0.1313 0.7812 3.888 -3.356 -1.97 3.265 0.5632 -0.939 -0.2434 0.2354 1.625 -2.279 0.4735 1.392 1.375 0.6552 -1.143 0.03459 -4.049 -1.223 0.5466 -0.5425 -1.013 -0.2651 0.5696 -0.9296 1.876 1.924 -1.369 -1.132 -0.02802 -0.4646 0.1831 0.9729 0.8995 -0.7233 0.667 0.436 0.1325 -0.03665 -0.3141 -0.4749
Wavelets Transformation Wavelets are basis function in continuous time. a basis is a set of linearly independent functions that can be used to produce all admissible functions f(t) The special feature of wavelet basis is that all functions are constructed from a single mother wavelet w(t). This wavelet is is a small wave ( a pulse). Normally it starts at time t=0 and end at time t=N Shifted k time = Compressed = Combine both we have Haar Wavelet :- 1909 Haar, 1984– theory, 88– daubechies 89- Mallat 2-d, mra, -- 92- bi-orthogonal Haar=
figure Carrier Stego image Wavelet Transformation Thresholding Compression Messageto be Hidden Error Image Inverse Transformation Extract the Hidden Message
Information security and data mining • Goal of intrusion detection – discover intrusion into a computer or network • With internet and available tool for attacking networks– security becomes a critical component of network • Misuse detection: finds intrusion by looking for activity corresponding to known techniques for intrusion • Anomaly detection: the system defines the expected behavior of the network in advance
What we want • The tools to filter and classify information • Tools to find and retrieve the relevant information when you need it • Tools that adapt to your pace and needs • Tools to predict information needs • Tools to recommend tasks and information sources • Tools than can be personalized, manually or automatically
The tools should be… • Non- intrusive • Secure • Integrated • Adaptable • Controllable • Automatic or semi-automatic • Useful • For learners • For educators • Integrate operational data with customer, suppliers and market --
Profitable application • A wide range of companies have deployed successful application of data mining • Some applications area include • A pharmaceutical company can analyze its recent sales force activity and their results to improve target of high-value physician and determine which marketing activities will have the greatest impact in the next few months • A credit card companies can leverage its vast warehouse of customers transactions data to identify customers most likely to be interested in a new credit product • A diversified transportation company with a large direct sales forces can apply data mining to identify the best prospect for its services • A large consumer package goods company can apply data mining to improve its sales process to retailers
Conclusion • In this talk, we have discussed data mining related topics • Our goals • Research • Software and algorithms • Application • Our main focus is Science Data, though applicable to other data sets as well • More information – check out website http://www.cosc.iup.eud/sezekiel Contact: sezekiel@iup.edu