Utilising software to enhance your research

Utilising software to enhance your research Eamonn Hynes 5th November, 2012

Basic statistics and some parallel computing

Basic statistics • Probability • Mean • Standard deviation • Simple examples: • Probability of just one six from three throws of a die? • Probability of winning the Lotto • Tougher problems: • Transcribing speech into words • Poker robot that plays optimally

Hands on Mean of column 1? Mean of row 4? Standard deviation of column 3?

Standard deviation 13.6%

A billion numbers? • Single-core • Multi-core Memory Memory Eight cores Single core

More interesting example • Again, a large sequence of numbers • Speech signal • ~56 Different sounds • Task is to calculate the most likely sequence of words • Over 50 years of research

Moore’s Law

Demise of Moore’s Law • Reality

Moore’s Law • The solution: • Parallel architectures • Hybrid architectures • New software – harder to write • New programming paradigms • Dedicated hardware • Beyond silicon

Amdhal’s Law • Limitations on parallel code • Thankfully a large number of problems are parallel in nature (rendering 3D graphics, weather prediction, image processing, DNA matching) • But many problems are sequential in nature! • e.g. card game, legal process, ordering a laptop, etc. • Nothing we can do except increase clock rate!

Clustering

Clustering • Categorise data into groups • Important in many fields – speech, medical statistics, data mining, etc. • Very loose algorithm (k-means clustering): • Let each point be a cluster centroid • Pick a random point • Get point closest to this chosen point • Calculate centroid • Repeat until just kcentroids • Big limitation: k must be specified in advance… • Example

Clustering • Not just for points on a 2d surface • Pixels of an image • Example

Support Vector Machines • Support vector machines (SVMs) • Popular in the 1990s/2000s (Vapnik et al. 1992) • Non-linear classification • Beautiful maths • Find a nonlinear boundary between k sets of points • Example

Text analysis

Text analysis • Searching documents task • Naïve search: • SQL query: “SELECT * FROM articles WHERE body LIKE '%$keyword%';” • Works fine for small document collections • Large databases: Better to index all documents • tf-idf

Text analysis • Process each document • Calculate the frequency of each word • Store the index, not the entire document • Much faster document retrieval • Intuitive to pick document with highest term count • Must weight each document by the inverse document frequency

Text analysis • Example: Simple Boolean logic • Searching for “rose” • If word appears, then document is relevant

Text analysis • Taking term frequencies into account

Text analysis TFIDF = TF * IDF where: TF = C/T where C = number of times a given word appears in a document and T = total number of words in a document IDF = D/DF where D = total number of documents in a corpus, and DF = total number of documents containing a given word

Text analysis • Natural language follows a Zipfian distribution

Finally

Deep belief networks • Given a document, how to find similar documents? • Deep belief networks (DBNs) • State-of-the-art in machine learning • More advanced than Latent Semantic Analysis (LSA) Principal Component Analysis (PCA) and clustering

Deep belief networks • 2000 most common word stems fed into base layer • Gradual reduction in number of neurons • Left with a 30-digit binary representation of a document with 2000-dimension feature vector • Super fast document retrieval (“semantic hashing”) Images from G. Hinton, Science (2006)

Images from G. Hinton, Science (2006)

Utilising software to enhance your research

Utilising software to enhance your research

Presentation Transcript

Using Software to Enhance Performance

Using Software to Enhance Student Learning

Botox to Enhance Your Eyes

Utilising a Bayesian Combination M odel to Enhance G amma-ray Detection Precision

Multimedia to Enhance Your Teaching

Using Software Rules To Enhance FPGA Reliability

PRA Research to Enhance Decision-Making

Enhance Your Scheduling Software Skills During Your Lunch Break!

Use the ERP Software to Enhance the Performance of your Company

Auto dialer Software to Enhance Contactability | Teckinfo

Supplements To Enhance Your Mood

Fully Utilize Online Recharge Software To Enhance Your Business.

Utilising IT to Enable Better Performance

How Management Software Can Enhance Your Gym Business

JOIN ON MVC TRAINING COURSE TO ENHANCE YOUR SOFTWARE SKILLS

How To Enhance Your Blog

Juvo - Letting Agent Software To Enhance Your Real Estate Business

10 Best HR Software to Enhance Your Business

Fixed Asset Management Software to enhance your Business

Employee Monitoring Software to Enhance Business

Enhance Your HR Department: HR Software UAE

Enhance Your Printing Experience with Tray Selector Software