Words that will inspire

Words that will inspire Eduardo Contreras Cortes www.speakthedata.com @edco_one

The Motivation I have a dream that one day… We choose to go to the moon in this decade… We shall fight on the beaches…

The Inspiration

The back-of-the-Envelope calculation

But we need more data! • Sufficient number of talks • Ideally same format and style • Transcripts to be scrapable • A way to track progress of popularity

Eureka! https://www.kaggle.com/rounakbanik/ted-talks

The approach • I. Data extraction and feature engineering • II. Data analysis and model ensemble • III. Model insights

I. Data Extraction and Feature Engineering “The” Source: www.tidytextmining.com/

I. Data Extraction and Feature Engineering • Dataset • More than 2,500 Ted Talks from all TED Events • All transcripts from each talk • Available data: Number of views, comments and ratings • Enriched dataset: Filmed date, published date and duration time • Building Features • Word counts: Number of sentences/words per minute, average words per sentence • Audience reaction: Laughs, questions, applauses • N-gram Word analysis: Frequency of words like “I”, “You”, “Going To”, “Want” To Predict: Binary classification if the Ted Talk is a top most viewed talk

II. Data analysis and model ensemble 1 2 3 4 • Correlation Analysis • Remove variables that were correlated • Descriptive Analysis • Duration of the talks to be similar • Analysis frequency of n-grams • Standarised views per time shown in website • Additional Feature Engineering • Combine n-grams to reduce features • Model Assessment • From simplex to complex models • Understand the most relevant variables of the models • Produce an explainable model! • Libraries used • Tidyverse, Tidytext, • Smbinning, Wordcloud • Libraries used • Cor, Vinf • Libraries used • Cor, Vinf • Libraries used • ROCR, glm, randomForest, • Xgboost, H2O

II. Data analysis and model ensemble

III. Model Insights • The selected model • Logistic Regression Scorecard with 7 variables • AUC: 76% Accuracy: 73% • Shorter duration talks were more effective • 2X More Effective • Speak slowly, less words per minute is better • 2X More Effective • Ask questions! The more the better! • 1.5X More Effective • Make your audience laugh! • 1.5X More Effective

III. Model Insights

III. Model Insights • Libraries used • Developed in Shiny • Deployed in Shinyapps.io • Shinydashboard as layout • Tidyverse for data transformations • Plotly for graphics www.speakthedata.com

Final remarks Text analytics and Predictive modelling showed influential factors that predicts popularity of talks R libraries eased the work of data transformations and modelling Shiny and Shinyapps.io facilitated the deployment of the tool

Thank you! www.speakthedata.com @edco_one

Words that will inspire

Words that will inspire

Presentation Transcript

Words that have oi

Words That Sing

12 Good Practices that Will Inspire You

Pictures that inspire me

Words That Hurt

Words to inspire positivity…

Words that Rhyme

Words that have “oi”

WORDS THAT SOUND ALIKE

10 Finest Books That Will Inspire the Traveler Inside You

Words that sell

Fitness Facts that will Inspire you to Get off the Couch

Coffee Tips That Will Inspire Your Palate

Top 11 Motivational Quotes That Will Inspire You to Be Successful

WEDDING CARDS THAT INSPIRE

11 Prime Reasons That Will Inspire You To Join Aged Care Course

Cataract Surgery That Will Inspire Your Friends

Top 5 Sales Training Ideas That will Inspire Your Sales Team

Unmissable Attractions in Fresno That Will Inspire You to Travel More

Find Happiness News Articles That Will Inspire You

5 home office ideas that will inspire productivity

10 Brilliant Email Marketing Campaigns That Will Inspire You