1 / 47

How to classify different events in heavy-ion collisions

How to classify different events in heavy-ion collisions. Qinghui Zhang. Why we need to classification?. Events in the heavy-ion collisions may be different. Some events may undergo phase transition, but some events may not undergo phase transition.

Download Presentation

How to classify different events in heavy-ion collisions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to classify different events in heavy-ion collisions Qinghui Zhang

  2. Why we need to classification? • Events in the heavy-ion collisions may be • different. • Some events may undergo phase transition, • but some events may not undergo phase • transition.

  3. What should we careful about eventsclassification? • We need to know the feature which can be • used to classify the different between • events • We need to know the feature value for QGP phase and non QGP phase

  4. What we will do in event classification? • (1): verification: • For a given event and a claim, the system will say Yes or No • (2): Identification: • For a given event, the system will tell which class this events belongs by using the database the system have.

  5. The complex of heavy-ion collisons • (1): Too many particles in each event • (2): It is difficult to choose a valuable measure. • (3): Do not know the detail value of measure for QGP or non-QGP. We can not observe QGP directly!

  6. The typical picture of a event

  7. Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space

  8. How to classify different events? • Suppose we can select some features of each event! • Suppose we know the features value for different classes. • How to classify them? We need a method to classify different events.

  9. Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space

  10. Which Separating Hyperplane to Use? Var1 Var2

  11. Maximizing the Margin Var1 IDEA 1: Select the separating hyperplane that maximizes the margin! Margin Width Margin Width Var2

  12. Support Vectors Var1 Support Vectors Margin Width Var2

  13. Setting Up the Optimization Problem Var1 The width of the margin is: So, the problem is: Var2

  14. Setting Up the Optimization Problem Var1 There is a scale and unit for data so that k=1. Then problem becomes: Var2

  15. Setting Up the Optimization Problem • If class 1 corresponds to 1 and class 2 corresponds to -1, we can rewrite • as • So the problem becomes: or

  16. Linear, Hard-Margin SVM Formulation • Find w,b that solves • there is a unique global minimum value • There is also a unique weight and b value that provides the minimum • Non-solvable if the data is not linearly separable • Quadratic Programming

  17. Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space

  18. Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space

  19. Non-Linearly Separable Data Var1 Introduce slack variables Allow some instances to fall within the margin, but penalize them Var2

  20. Formulating the Optimization Problem Constraints becomes : Objective function penalizes for misclassified instances and those within the margin C trades-off margin width and misclassifications Var1 Var2

  21. Linear, Soft-Margin SVMs • Algorithm tries to maintain i to zero while maximizing margin • Notice: algorithm does not minimize the number of misclassifications , but the sum of distances from the margin hyperplanes • As C0, we get the hard-margin solution

  22. Var1 i Var2 Robustness of Soft vs Hard Margin SVMs Var1 Var2 Hard Margin SVN Soft Margin SVN

  23. Soft vs Hard Margin SVM • Soft-Margin always have a solution(C is not zero) • Soft-Margin is more robust to outliers • Hard-Margin does not require to guess the cost parameter (requires no parameters at all)(C is zero!!!)

  24. Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space

  25. Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space

  26. Var1 Var2 Disadvantages of Linear Decision Surfaces

  27. Var1 Var2 Advantages of Non-Linear Surfaces

  28. Linear Classifiers in High-Dimensional Spaces Constructed Feature 2 Var1 Var2 Constructed Feature 1 Find function (x) to map to a different space

  29. Mapping Data to a High-Dimensional Space • Find function (x) to map to a different space, then SVM formulation becomes: • Data appear as (x), weights w are now weights in the new space • Explicit mapping expensive if (x) is very high dimensional • Solving the problem without explicitly mapping the data is desirable

  30. The Dual of the SVM Formulation • Original SVM formulation • n inequality constraints • n positivity constraints • n number of  variables • The (Wolfe) dual of this problem • one equality constraint • n positivity constraints • n number of  variables (Lagrange multipliers) • Objective function more complicated • NOTICE: Data only appear as (xi)  (xj)

  31. The Kernel Trick • (xi)  (xj): means, map data into new space, then take the inner product of the new vectors • We can find a function such that: K(xi  xj) = (xi)  (xj), i.e., the image of the inner product of the data is the inner product of the images of the data • Then, we do not need to explicitly map the data into the high-dimensional space to solve the optimization problem (for training) • How do we classify without explicitly mapping the new instances? Turns out

  32. Examples of Kernels • Assume we measure two quantities, e.g. expression level of genes TrkC and SonicHedghog (SH) and we use the mapping: • Consider the function: • We can verify that:

  33. Non-linear SVMs: Feature spaces • General idea: the original feature space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x→φ(x)

  34. The Mercer Condition • The SVM dual formulation requires calculation K(xi , xj) for each pair of training instances. The array Gij = K(xi , xj) is called the Gram matrix • There is a feature space (x) when the Kernel is such that Gis always semi-positive definite (Mercer condition)

  35. Support Vector Machines • Three main ideas: • Define what an optimal hyperplane is (in way that can be identified in a computationally efficient way): maximize margin • Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications • Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space

  36. Ising Model and Random Model We will use Ising Model and Random model in our analysis. The details of the model can be found in Phys. Rev. C64,054904 (2001). In short, we simulate Ising model on two dimension Space (288*288 Lattices), Then we corresponds Each site a “hadron “ in the heavy collisions. For Random model, the idea is the same as Ising except that there is no cluster as used in Ising model.

  37. Typical picture of Ising model and Random model Ising Model Random

  38. To classify Random and Ising model • Choose feature. Cast each event to a point In a 72*72 dimension space

  39. Make it difficult? • Rescale the value of the Random model such that the average value of each component of Random model is the same as Ising model

  40. Need to choose new feauture! • Choose the average value for each event • Choose the second moment of the hadron density.

  41. If we decreases the number of lattice

  42. Decreases the size of lattice

  43. Decreases the size of lattice

  44. Cluster structure in the event distribution in high dimension space

  45. Cluster structure in the events distribution in high dimension space

  46. Cluster structure in events distribution in two dimension space

  47. Conclusions: • We can classify different events by choosing features • We classify Random model and Ising model successfully • We can of course generate the above model to multi-class cases

More Related