260 likes | 272 Views
Using the Download Tool. Dallas Alteryx Users Group Brian Purcell March 2019. Download Tool. Connectors Pallet Use Cases FTP/SFTP Transfers APPLICATION PROGRAMMING INTERFACES Web Scraping Programming Languages Not Required Then Leverage Other Alteryx Processes.
E N D
Using the Download Tool Dallas Alteryx Users Group Brian Purcell March 2019
Download Tool • Connectors Pallet • Use Cases • FTP/SFTP Transfers • APPLICATION PROGRAMMING INTERFACES • Web Scraping • Programming Languages Not Required • Then Leverage Other Alteryx Processes
Anatomy of a Download Workflow • Text Input (Or Formula) – URL String • Other Fields Could be Contained—Depending on Use Case • Download Tool • Check Configuration • Parsing / Shaping Tools • JSON Parse, RegEX, XML, Text to Columns, etc.
FtP Download Workflow • External Vendor Provides Program Information • The File is Uploaded on a Nightly Basis • Filename has An Appended Date • A Destination Path is Defined and Configured • The Job Runs Daily on Our Company’s Alteryx Server
Purpose – Webscraping / API • Enrich Existing Data • Easily Acquire External Data • Obtain Competitive Information • Automate Mundane Tasks
Differences API Webscraping Intended for Browser Traffic More Information Available Than API’s Most sites don’t offer an API Unstructured Text Output Gray Area – Legally Consult Your Legal Resource • Intended for Data Extraction • Often Requires An Application, Approval, and Token • May Require A Subscription or Per Use Fee • Default Outputs – JSON/XML
API—Application Programming interface • Websites Optimized For Data Transfer • The “under the hood” of Connectivity • Think of the Apps that Use Google Maps Realtor.com Likely Uses Google’s API within their Mobile App
Webscraping Browser View Alteryx View / PageSource Right Click in Browser, view Source
API’SApplication Programming Interface • Commonly Used for Mobile Applications • Methods – Get, Post, Put, Delete • URL EndPoint • Parameters • Key / Token
The process • Find the Registration Portal • Submit an Application, Store your Key • Review API Documentation • Build A Basic Query • Parse • Pagenate (if needed)
NOAA • https://www.ncdc.noaa.gov/cdo-web/webservices/v2 • 1527 Different Data Report Categories • Temperature, Wind, Precip • Soil Temp, Degree Days, Cloud Coverage • 9,563 Stations • 6,340 in the US • 361 in Texas
GHCN - Global Historical Climatology Network report Example of Data However, not usable for data analysis– embedded in a pdf We prefer a structured data table. Which will allow us to enrich our data by appending weather conditions to internal data (examples) -- Weather driven industries – HVAC, lawn care, utilities -- Logistics – snow/heavy rain on shipping
“Parameter Name”=“Value” • Join Multiple Parameters with “&” • Some are Required, Others are Optional • Authentication could be handled through a Parameter
Authentication • Nearly All API’s Require An Account • Commonly Seen Methods • Header – Documentation should identify the Header Name • Parameter Value – Inserted Along with Other Parameters • Basic – Use the User Name & Password Tab on Download Configuration • Very Hard – Two Step, Authentication API Call, Then 2ndApi Call • Twitter, Tableau
Pagenation • API’s Typically Limit Data Results To Maintain Bandwidth • Alter Your URL Request • Offset, RowCount– Common Parameters To Return The Next Page of Results • How to Handle • Most API’s will Return a Max Count • Reconfigure your API to send a API Request for Each Page (using offset) • Advanced--Create An Iterative Macro
Sometimes You’ll Get Lucky • API Explorer/Debugger • Azure / Qualtrics • Simulates HTML Request with Endpoints, Parameters, and Headers • https://dev.applicationinsights.io/apiexplorer/metrics?appId=DEMO_APP&apiKey=DEMO_KEY • Plug it into Alteryx
Tips • Check URLEncode • Limit URL Requests • Cache & Run • Save Output to YXDB • Use the Throttle Tool • Test Your RegEx First, Then Scape • Review HTML Response Codes • Often Will Indicate Error • Token • Exceeded Allotted Records • Improper Query
Possible Use Cases • Competitive Analysis • Location, Product Changes, Pricing • Automated Form Collection • Create a Survey, Alteryx Workflow to Transform The Data • Google API’s • Geocoding, Street View Images, Turn By Turn Directions
Resources • FTP Knowledge Bases Articles • https://community.alteryx.com/t5/Alteryx-Knowledge-Base/File-Transfer-Protocol-FTP-Download/ta-p/10599 • https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Automating-File-Transfer-Protocol-FTP-Downloads/ta-p/19186 • https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Upload-to-SFTP-FTP/ta-p/20845 (Upload) • https://pages.alteryx.com/Javelin-Webinar-Series-Master.html (Great Webscraping & API Examples) • List Of API’s • https://www.programmableweb.com/ • RegExTesters • https://regexr.com/ • The RegEx Coach – Free Program • Free Software • Postman – Test API Calls • Fiddler – • Useful for Webscraping • Track Your Computer’s HTML Requests, Incl. URL’s, Headers, HTML Request Type • Find Me on LinkedIn • https://www.linkedin.com/in/brian-purcell-2000/