10 likes | 130 Views
* u-tj/,rr.l),. '-. _'. r?. (-. Procccdings thc IntcntationalNtultiConl'crcnoe Ingrnecrs and Clomnuter-scrcntists. of. . of. 2010 Vol I. . I,lliClS 2010, Nlarch 17 - 19,2010, IIong Kong. . l. t. Data Streamclustering:Challenges. and Issues. Madj id Khalilian. Nor-r.vati.
E N D
* u-tj/,rr.l),. '\- _' r? (- Procccdings thc IntcntationalNtultiConl'crcnoe Ingrnecrs and Clomnuter-scrcntists of \ of 2010 Vol I. \ I\,lliClS 2010, Nlarch 17 - 19,2010, IIong Kong \ l t Data Streamclustering:Challenges and Issues Madj id Khalilian. Nor-r.vati |ulustapha considered as a strcarn of data w'hich oome ir-rliom onc sidc .,lbstract Vcl'y lar.gc databases al'e t.cquil.ed to stot'e nrassive and exit fi-om auother side so r.vearen't able to visit data lbr arnounts of data thnt ar.e continunusly insel.ted and quetied. sccond tirne. This main propcrtl' of data str-sarnanse sornc z\nalyzing huge data sets lnr-l extt.acting valuable pattcr.n in diffioulties. 'frvo main problems in thrs area trfiicir are related nrany apJllications are inter.esting firr. r-ese:u.clrel.s. \\re carr to this proporty includcs: I) onc soan is possiblc lbr identify trvo ntain gloups of techniques for huge data bascs processingdata,2) data is included evolutionary strearnand rr r i n i n g . O n c g r o u p r e fe r s to str .ca m in g d a ta a n d lp p lics rni ni ng concepts are changed during thc timc. It can bo graclual or techniques rvhereas second group attenlpts to solve this pr.oblenr abrupt. Nlany techniques are used in data minitrg area but dil'ectly with efficient algorithnrs. lLeccntlv ltrarrv l.escnlr:her.s thev should bc tunecl and changed to .r.t,orkin data strcam havc focused on data streant as an eflicient strategv against nlnllllg. We can catcgorizedata strearnmining in three main lruge dltl base mining instead of lnirring on entire data base. ll'hc nrain ploblern in data stre.am nrining me:rns evolving data tcchniques: classillcation, olustering and association nrlcs is n r o r e d i f f i c u l t to d e te ct in th is tcch n iq u cs therefore extraction- N{an1's6di", havc been executed to support data ur r s u p e r v i s e d m e t h o d s sh o u kl b e a p p lie d . Ilo r ve ve l., cl uster-i ng strcam rnining cspeoially for oonccpt drift[1. 2l techniqucs can lcad us to discover- hidden infrl.rnation. In this N{any rescalchcr's intcrest is to apply sornc tc<;hnicpcsfor sut'vey, we try to clar.il'y: lil.st, the diffe l.ent probleln definitions incrcasing compactness ol leplesentation. fast and re l a t c d t o d a t a s t l ' elm clu ste lin g in g e n e r a ll se co n d , the speci fi c rurcrcmentalproccssing of ncw data pourts, clcar and last dilficultics cncounter.cd in this field of r.csear.ch; thir.d, the iclentification of outliers[3]. Scalability and robustness varying assurnptions, heuristics, and intuitiorrs forrning tlre should bc studicd for data strcam mining. (iencrally it is basis of different approaches; and lrorv several pr.onrinent possible to enulnerate ttvo rnain problems iu data streanr solutions tacklc diffcrent problerns. clustering, concept changc and visiting data oncc. |nflsyfp7n1s- L)ata Stream. Clustering, K-lvlcans, Lloni:eptdrill F'irstofall, I'lorv to detecta changein the concepts?Ilar.e to: O Dctect thc changes as sootl as it is occur- I. INlnoouclr<tN a Detect equally u,'ell both type of changes abrupt and Nowadal's wc have many appli<.;ations rvith massir c zuuouut gradual of data rvhich are causedlimitation in data storagecapacity o Distinguish bctrveen real drift and noise and proccssing tirnc. TradiLional data mining is not suitable Whnt to do if ohangesare dcteoted? for this kind of applicationsso they should be tuned and O "Forger'" out-of-date exarnples and oiustcrs (.e.g. Tune ohangcd or dcsigned with ncr,valgorithms. Residcs of spccd wrndow) up and storage capacity. rcal-life concepts tend to change a "lternernbel'" sornc of the old cluster-s and exaruples ot,'cL ttmc: Sccoud problcnl rel'crs to ellicicncy. Data stream is srnilar tr,r r I'elecommunicatiorr and netrvork alea: calling r-eoords, rl\'er, it meaus data llotv in and florv out. We are unable to Nctrvork rnonitoring and trallic cnginccrng, Scnsor visit data twicc, so wc ltccd to usc ellioicnt algorithrus. nronitoring & sun'eillance, Security rnonitoring, Web logs and Web page olick streans II, GAPS r Busincss. credit card transaction flor.vs,stock cxchangc, O por.versupply & manufacturing The architectural aspeots of processitrg data streanrs . havc rcccived considerablc attention, but most ellbrt are Discovcling thc cvolution of the sprcad of illncssos. .'\s conccntrated on the nrining alrd clustering aspects of the r1e\\ioasesare rcported. finding out horl clusters evolve problcm. can provc crucial in idcntif'1'rugsoul-ccsresponsrblc lbr o Algor-ithms sulfer {i'onr the ability to handle difllcult thc spreadoI'illness. . clustcring tasks r,vithoutsultcn,ision. For cxamplc, thcrc Discovering thc er olution ol' workload in all is no assurnptionabout the nurnber of clusters in data e-comnlefce' sen'er-, u-hich can help in dyuamicalll' fine strearn but iu most mcthods this pararnetcr should bc tnnc thc scn'cr to obtain bettcr pcrlbmrauce. detetrnined. o Drsoot cring mctcorological data, such as tempor-atur.cs a The algcnthms rcquired cxpert assistant in thc lbnn o1' rcgistered throughout a region, b1'obselling hol' the number of partitions expected or the expected clusters of spatial-metcorological points cr-olvc rn time. density ofclustcrs. 1'he grotvth of ',,olurneof existing data and insulliciency of 'fhey are required to re-leanr anv recun-entl1. a occun-ine data storagc capacity lead us to thc dynamic proccssing clata pattcms. and extraoting knou'lcdge. In this rvay data have becn (lompactness and separateness of data are thc rnost a lnportant problcrus in the quality o1'oh-rstcring. a Accuracy itr temrs of dctecting concept drift. lvI. KI'IALILIAN is rvrth thc Islamio Azad Univcrsity, Ka-aj Branoh: Irarr. I I e ts rro rr P l rl J L a l d rd atc ir r l.acultr ol Cor r lr r ler SLr ur Le ar ( l Irr for r r r ;r tj or r O Elficicncy in tcmrs of spccd is a vital problcm in data ' l co h n o l o y. Un i vcrsrty Putr a N{ alavsia( UPIvl) . ( lr - mail. k1laliliantai c c c .or g) mining clustering. l)r Ntnvati Nlustapha rs rvith the (lturputer Scrcnoc Deparhncnt. Iraoulty a Pre"'ious approachcslack prccision in dctccting outlicrs. o{ ('cuputcr Scjcnoc ard In jilmation 'i cohnolor'. I tnilcrsitv puh.a i\lalaysia ( l iI'lvl). (L-mail : NorrvatrqZal.sktnr upm. cdu. nrv) ISBN:978-98 i 7t)12-8-2 8- IN,fl1LlS 2010 ISSN.2078- 0958 r i n t); SN :2 0 7 8 -0 9 6(O n l i n e ) (P IS 6