420 likes | 641 Views
Automated Short-term Prediction of Solar Flares using Machine Learning. Rami Qahwaji r.s.r.qahwaji@bradford.ac.uk & TufanColak t.colak@bradford.ac.uk EIMC, University of Bradford BD71DP, U.K. Organisation of this talk. Objectives & related work Solar data (features and activities)
E N D
Automated Short-term Prediction of Solar Flares using Machine Learning Rami Qahwaji r.s.r.qahwaji@bradford.ac.uk & TufanColak t.colak@bradford.ac.uk EIMC, University of Bradford BD71DP, U.K.
Organisation of this talk • Objectives & related work • Solar data (features and activities) • Data Association • Machine learning algorithms • Practical results • Conclusions and future work
Objective: • We aim to design an automated system that could provide short-term prediction of solar flares by establishing a correlation between sunspots and solar flares using machine learning.
Related Work • Despite the recent advances in solar imaging, machine learning has not been widely applied to solar data, except for verification purposes. • Solar activity (i.e., Wolf Number) was predicted first by (Calvo et al. 1995). • (Borda et al. 2002) described a method for the automatic detection of solar flares using BP MLP. • MLP, SVM and RBF were used for flares detection in (Qu et al. 2003).
Organisation of this talk • Objectives & related work • Solar data (features and activities) • Data Association • Machine learning algorithms • Practical results • Conclusions and future work
Data? • Data from the publicly available National Geophysical Data Centre (NGDC) sunspot groups and flares catalogues are used in our study. • NGDC keeps record of data from several observatories around the world and holds one of the most comprehensive publicly available databases for solar features and activities.
The NGDC sunspots catalogue • The NGDC sunspot catalogue holds records of sunspot groups supplying their date, time, location, physical properties, sunspot area and classification data. • Two classification systems exist for sunspots: McIntosh, which depends on the size, shape and spot density of sunspots, and Mt. Wilson., which is based on the distribution of magnetic polarities within spot groups.
The NGDC Flares catalogue • This catalogue provides information about dates, starting and ending times for flare eruptions, location, NOAA number of the corresponding active region and x-ray classification for the detected flares. • Not all the flares have associated NOAA numbers. Flares without NOAA numbers are not included in our study.
Organisation of this talk • Objectives & related work • Solar data (features and activities) • Data Association and prediction model • Machine learning algorithms • Practical results • Conclusions and future work
Associating Flares and Sunspots • We’ve investigated all the sunspot groups that were associated with flares from 01 Jan1992 till 31 Dec 2005. • The degree of association was determined based on the NOAA region number and the timing information. • A C++ platform that extracts online flares and sunspots info from NGDC catalogues was created. • Our software has analysed the data related to 29343 flares and 110241 sunspots and has managed to associate 1425 M and X flares with their corresponding sunspot groups.
Organisation of this talk • Objectives & related work • Solar data (features and activities) • Data Association • Machine learning algorithms • Practical results • Conclusions and future work
Various neural network topologies, support vector machines (SVM) and Radial Basis Function Networks (RBFN) are optimized and compared. • In our previous work (Qahwaji & Colak, CITSA 2006 and Colak & Qahwaji, WSC11) the performance of several NN topologies (i.e., Elman BP, FFBP, cascade FFBP, etc.) was compared and it was concluded that CCNN provides better association between solar flares and sunspot classes. • CCNN and RBFN are used because of their efficient performance in classification and time-series prediction (Frank et al. 1997).
SVM vs NN? • Thank You for Listening
It is one of the recent trends in machine learning to compare the performance of SVMs and NNs. • The work reported in (Acir & Guzelis 2004), (Pal & Mather 2004), (Huang et al. 2004), and (Distante et al. 2003) supports this. • Similar performance for SVMs was reported for flares detection in (Qu et al. 2003),
Cascade FFBP • In cascade FFBP, the first layer has connecting weights with the input layer. Each subsequent layer has weights connecting it to the input layer and all previous layers. .
SVM (Support Vector Machines) maximises the distance between the closest vectors in both classes to the hyperplane
Optimising the Learning Algorithms • A learning algorithm provides best generalisation if it is optimised. • A NN is optimised if the optimum topology, learning algorithm and learning times are found. • After finding that CCNN provides best performance, we compared 100 different CCNN topologies. • We found that a CCNN with 6 hidden nodes in the first layer and 4 hidden nodes in the second layer gives the best results for CFP and CFTP. • Similar approaches were followed for SVM and RBNN.
Organisation of this talk • Objectives & related work • Solar data (features and activities) • Data Association • Machine learning algorithms • Practical results • Conclusions and future work
Both NGDC catalogues were used and our software has analysed the data related to 29343 flares and 110241 sunspots and has managed to associate 1425 M and X flares with their corresponding sunspot groups. • The total number of samples used for our training set is 2882, where 1425 samples represent sunspots that produced flares. • The remaining samples represent sunspots that existed in non-flaring days and are not related to any sunspot groups within the previous flaring sunspot samples.
The Training and Testing Sets • The NN training and testing was carried out based on the statistical Jack-knife technique (Fukunaga 1990). • For all the experiments, 80% of the samples are randomly selected and used for training while the remaining 20% are used for testing. These experiments are repeated for number of times and the average is taken.
Initial Experiments • For each sample, the training vector consists of 5 elements ( 3 for inputs; 2 for outputs).
Initial Experiments • Several experiments based on the Jack-knife technique were carried out and we found that the prediction rate for flares in the best case scenario was 72.9%. • This indicated that a correlation existed between the input and output sets. But this value is not high enough to provide reliable prediction of solar activities. • To improve the learning performance we tried to associate the classified sunspots with the sunspot cycle.
This seemed logical because the rise and fall of solar activity coincides with the sunspot cycle (Pap et al. 1990). • When the solar cycle is at a maximum, plenty of large active regions exist and many solar flares are detected. These decreases in number as the Sun approaches the minimum part of its cycle(Pap et al. 1990).
Solar Cycle and Flares Science @ NASA,"Solar Minimum Explodes", 9.15.2005
Solar Cycle Modelling-Hathaway’s Model a represents the amplitude and is related to the rise of the cycle minimum, b is related to the time in months from minimum to maximum; c gives the asymmetry of the cycle; and to denotes the starting time
For each sample, the training vector consists of 6 elements ( 4 for inputs; 2 for outputs).
Hence, for Fkc sunspot at solar maximum that produced an M flare, the training vector looks like this:
Organisation of this talk • Objectives & related work • Solar data (features and activities) • Data Association • Machine learning algorithms • Practical results • Conclusions and future work
Conclusions • A fully automated computer platform that could verify this correlation between sunspot classes and solar flares relation using machine learning, is designed. • The association and learning softwares will become public shortly at • Our findings show that there is a direct relation between the eruptions of flares and certain McIntosh classes of sunspots such as Ekc, Fki and Fkc. Our findings are in accordance with (McIntosh 1990), (Warwick 1966), and (Sakurai 1970). http://spaceweather.inf.bradford.ac.uk/
A hybrid system, which combines both SVM and CCNN, will give better results for flare prediction.
Future Work • Apply image segmentation and classification algorithms to detect sunspots and classify them automatically, so that the platform is completed. • To track the individual sunspot groups over their lifetime. The development of the sunspot group can contribute to the knowledge of the machine learning systems. • Will better prediction be achieved if the magnetic configuration of sunspots (Mt. Wilson classification) is combined with the sunspot area to replace the McIntosh classification (Sammis, Tang & Zirin, 2000, ApJ)?
To compare our findings with other authors who tested the correlations of the various McIntosh classes on flare rates and the applications to solar flare prediction (e.g. McIntosh 1990; Bornmann & Shaw 1994, Sol. Phys. 150, p. 127; Gallagher et al. 2002, Sol. Phys. 209, p. 171; Wheatland 2004, ApJ 609, p. 1134).
Acknowledgment. This work is supported by an EPSRC Grant (GR/T17588/01), which is entitled “Image Processing and Machine Learning Techniques for Short-Term Prediction of Solar Activity”.