190 likes | 201 Views
Explore the aspects of information requirements analysis in big data for improving production system efficiency. Discover new methodologies and tools for data mining to uncover data-information opportunities.
E N D
Bigdatarequirementengineering Jānis Zemnickis, janis.zemnickis@gmail.comSupervisor: Laila Niedrīte, Dr.sc.comp.
Use cases for big data analysis • Log Analytics • RecommendationEngines • MarketResearch • PrecisionMedicine • CustomerService • Fraud Detection
Problems • Currently, most of the manufacturing companies do not make good use of all the generated and collected data to improve production system efficiency, in turn, to increase their competiveness (Dean 2014). • Big data uncover data-information opportunities • Implementing a business intelligence(BI) system is a costly, resource-intensive andcomplex undertaking (Yeoh, W., & Popovič, A., 2016)
Reviewofexistingmethods/tools • N. Kozmina, L. Niedrite, J. Zemnickis, Information Requirements for Big Data Projects: A Review of State-of-the-Art Approaches. In: Lupeikiene A., Vasilecas O., Dzemyda G. (eds) DB&IS 2018. Springer, Cham, CCIS, vol. 838, pp. 73-89, 2018. • N. Kozmina, L. Niedrite, J. Zemnickis, Perspectives of Information Requirements Analysis in Big Data Projects.In:Volume 315: Databases and Information Systems X, 10.3233/978-1-61499-941-6-109, 2018
Research conclusion • Accordingto guidelines (Guidelines for Performing Systematic Literature Reviews in Software Engineering, 2007) given by Kitchenham and Charters • The goal was to explore the aspects of informationrequirements analysis in the context of Big data • Found 242 papers, for 26 papers done full analysis • Big data RE usulallyis be done by setting goals, creating scenarios or some othersolution oriented approach • In average there is medium ability to generate the information requirements in a Big data projectby processing the existing data in a (semi-) automatic way
Results: Is it feasible to generate the information requirements in a Big data projectby processing the existing data in a (semi-) automatic way? Notspecified High
Newmethodology S3 S1 S2 Database Database • Calls • Custumeronsideconsultations • Freetextfeedback • E-mails • Comments Pictures • Socialmediapicutures • Adspictures Audio Text Bigadataanalysis - NLP A Entityconsolidation Attributegroupingalgorithm Big data requirement recognition algorithm A Namedentitiesandrelations Attrlist Newrequirements A A
Bigadataanalysis - NLP • Maintehnalogyforunstructureddataanalysis - NLP • Analyzebesttool: • CoreNLPfromStanfordgroup • NLTK, themostwidely-mentioned NLP libraryforPython • TextBlob, a user-friendlyandintuitive NLTK interface • Gensim, a libraryfordocumentsimilarityanalysis • SpaCy, anindustrial-strength NLP librarybuiltfor performance *https://towardsdatascience.com/5-heroic-tools-for-natural-language-processing-7f3c1f8fc9f0
Namedentitysourcetypeexamples • Externaldatabases • Public WEB pages (Twitter, Wikipedia, Ads) • Publicdatabases (Opendata – governance, statistics) • Internalsources • Operationaldatabases • E-mails • Calls • Freetextcustomerfeedback
Namedentityfiletypeexamples • Structured • XML • JSON • DB files • CSV • Unstructured • Text • Sound to text • Picuture • Video
Namedentityrelationtype, examples • Parent • Entity – entityatribute (subtag,) • Sameentity (consolidation) • Twosourcesdescribesoneentity • Related (businessspecificrelation) • Collateralcontract – asset • Leasingagreement - car
Namedentitytype, examples • Number • Date • Organization • Person • Identificator
Big data requirement recognition algorithm • Primary entities should be taken from organizationcoredatabases • Find additional attributes about existing entity • Additionalentiyrelationships • Incaseofnewentity it shouldbevalidated
Big data requirement recognition algorithm • Asresult: • Additionalattributesforexistingentities • Additionalrelationshipsbetweenentities • Newentitydiscovery • Newrquirementexample: • R1: Entity car, LE-7398, • Attr 1:Color, attr2:manufacture year 2000, attr 3:fuel, source: ss.lv • Comment: «Car looks bad», source: facebook.com • Voicerocord: relatedentity «KarlisBerziņs» intrestedaboutinsuranceforentity car, LE-7398, source: internalorganizationdatabase
Futurework • Existing NLPalgorithm comparison for current use case • Practicalimplementation • Bigdataecosystem • Bigdataanysis – NLP • Entityconsolidation • Attributegroupingalgorithm • Big data requirement recognition algorithm