470 likes | 545 Views
How to classify different events in heavy-ion collisions. Qinghui Zhang. Why we need to classification?. Events in the heavy-ion collisions may be different. Some events may undergo phase transition, but some events may not undergo phase transition.
E N D
How to classify different events in heavy-ion collisions Qinghui Zhang
Why we need to classification? • Events in the heavy-ion collisions may be • different. • Some events may undergo phase transition, • but some events may not undergo phase • transition.
What should we careful about eventsclassification? • We need to know the feature which can be • used to classify the different between • events • We need to know the feature value for QGP phase and non QGP phase
What we will do in event classification? • (1): verification: • For a given event and a claim, the system will say Yes or No • (2): Identification: • For a given event, the system will tell which class this events belongs by using the database the system have.
The complex of heavy-ion collisons • (1): Too many particles in each event • (2): It is difficult to choose a valuable measure. • (3): Do not know the detail value of measure for QGP or non-QGP. We can not observe QGP directly!
Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space
How to classify different events? • Suppose we can select some features of each event! • Suppose we know the features value for different classes. • How to classify them? We need a method to classify different events.
Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space
Which Separating Hyperplane to Use? Var1 Var2
Maximizing the Margin Var1 IDEA 1: Select the separating hyperplane that maximizes the margin! Margin Width Margin Width Var2
Support Vectors Var1 Support Vectors Margin Width Var2
Setting Up the Optimization Problem Var1 The width of the margin is: So, the problem is: Var2
Setting Up the Optimization Problem Var1 There is a scale and unit for data so that k=1. Then problem becomes: Var2
Setting Up the Optimization Problem • If class 1 corresponds to 1 and class 2 corresponds to -1, we can rewrite • as • So the problem becomes: or
Linear, Hard-Margin SVM Formulation • Find w,b that solves • there is a unique global minimum value • There is also a unique weight and b value that provides the minimum • Non-solvable if the data is not linearly separable • Quadratic Programming
Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space
Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space
Non-Linearly Separable Data Var1 Introduce slack variables Allow some instances to fall within the margin, but penalize them Var2
Formulating the Optimization Problem Constraints becomes : Objective function penalizes for misclassified instances and those within the margin C trades-off margin width and misclassifications Var1 Var2
Linear, Soft-Margin SVMs • Algorithm tries to maintain i to zero while maximizing margin • Notice: algorithm does not minimize the number of misclassifications , but the sum of distances from the margin hyperplanes • As C0, we get the hard-margin solution
Var1 i Var2 Robustness of Soft vs Hard Margin SVMs Var1 Var2 Hard Margin SVN Soft Margin SVN
Soft vs Hard Margin SVM • Soft-Margin always have a solution(C is not zero) • Soft-Margin is more robust to outliers • Hard-Margin does not require to guess the cost parameter (requires no parameters at all)(C is zero!!!)
Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space
Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space
Var1 Var2 Disadvantages of Linear Decision Surfaces
Var1 Var2 Advantages of Non-Linear Surfaces
Linear Classifiers in High-Dimensional Spaces Constructed Feature 2 Var1 Var2 Constructed Feature 1 Find function (x) to map to a different space
Mapping Data to a High-Dimensional Space • Find function (x) to map to a different space, then SVM formulation becomes: • Data appear as (x), weights w are now weights in the new space • Explicit mapping expensive if (x) is very high dimensional • Solving the problem without explicitly mapping the data is desirable
The Dual of the SVM Formulation • Original SVM formulation • n inequality constraints • n positivity constraints • n number of variables • The (Wolfe) dual of this problem • one equality constraint • n positivity constraints • n number of variables (Lagrange multipliers) • Objective function more complicated • NOTICE: Data only appear as (xi) (xj)
The Kernel Trick • (xi) (xj): means, map data into new space, then take the inner product of the new vectors • We can find a function such that: K(xi xj) = (xi) (xj), i.e., the image of the inner product of the data is the inner product of the images of the data • Then, we do not need to explicitly map the data into the high-dimensional space to solve the optimization problem (for training) • How do we classify without explicitly mapping the new instances? Turns out
Examples of Kernels • Assume we measure two quantities, e.g. expression level of genes TrkC and SonicHedghog (SH) and we use the mapping: • Consider the function: • We can verify that:
Non-linear SVMs: Feature spaces • General idea: the original feature space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x→φ(x)
The Mercer Condition • The SVM dual formulation requires calculation K(xi , xj) for each pair of training instances. The array Gij = K(xi , xj) is called the Gram matrix • There is a feature space (x) when the Kernel is such that Gis always semi-positive definite (Mercer condition)
Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space
Ising Model and Random Model We will use Ising Model and Random model in our analysis. The details of the model can be found in Phys. Rev. C64,054904 (2001). In short, we simulate Ising model on two dimension Space (288*288 Lattices), Then we corresponds Each site a “hadron “ in the heavy collisions. For Random model, the idea is the same as Ising except that there is no cluster as used in Ising model.
Typical picture of Ising model and Random model Ising Model Random
To classify Random and Ising model • Choose feature. Cast each event to a point In a 72*72 dimension space
Make it difficult? • Rescale the value of the Random model such that the average value of each component of Random model is the same as Ising model
Need to choose new feauture! • Choose the average value for each event • Choose the second moment of the hadron density.
Cluster structure in the event distribution in high dimension space
Cluster structure in the events distribution in high dimension space
Cluster structure in events distribution in two dimension space
Conclusions: • We can classify different events by choosing features • We classify Random model and Ising model successfully • We can of course generate the above model to multi-class cases