240 likes | 513 Views
BY : LISSY VERMA SHRADDHA GUPTA. DATA COLLECTION AND IMPROVING DATA QUALITY. OUTLINE. Data Collection ODK : Open Data Kit Demo Usher : Improving Data Quality Purpose Implementation Results. DATA COLLECTION. Data collection in developing areas is difficult.
E N D
BY : LISSY VERMASHRADDHA GUPTA DATA COLLECTION ANDIMPROVING DATA QUALITY
OUTLINE • Data Collection • ODK : Open Data Kit • Demo • Usher : Improving Data Quality • Purpose • Implementation • Results
DATA COLLECTION • Data collection in developing areas is difficult. • None of existing tools suffice. • Based on need, new features are needed.
OPEN DATA KIT • ODK is a tool suite for collection and management of data on mobile phones. • The main objective is to provide open source tools.
OPEN DATA KIT • ODK COLLECT • Collects Data • ODK AGGREGATE • Store Data, view and export. • ODK MANAGE • Remote Device Management
AMPATH • AMPATH deployed the ODK for data collection for medical purpose. • Deployment was found to be successful minimizing delays and improving lives of healthcare workers and other people.
Data Collection is Challenging • Expertise in form design • Double Entry : Costly • Data Cleaning
Past Work Constraints • Combo-boxes. Reduce Time • Automatically filled Leave-forms.
USHER: Improving Data Quality ESCORTER : Guide towards correct entries. • Question Ordering in form. • Greedy Information Gain • Dynamically Reorder Questions • Predict Errors to Re-ask. • Contextualized Error Likelihood Principle.
CURBSTONING • Concept : An unscrupulous door-to-door surveyor Shirks Work, ask only important questions. • Greedy Information Gain • Uniform Prior : Equal likely inputs • Training Set • Context – specific Model Required • Bayesian Learning
DATASETS • The patient dataset collected at a rural HIV/AIDS clinic at Tanzania. • Survey dataset, responses from 1986 poll about race and politics
Probabilistic Relation : Form Questions Bayesian Network for the patient dataset
Re-ask Questions Approximates Double Entry • Uncertainty : High Entropy • Outliers
DATA COLLECTION : PROBLEMS • Due to digital divide between the developing and developed areas, it is very difficult to collect and use data in the developing regions. • The main problems being : Lack of reliable infrastructure,Proper connectivity, and,Inadequate expertise. • Currently available tools for data collection like Pedragon Forms, Nokia Data Gathering, Java-Rosa, RapidSMS etc. are difficult to deploy, hard to use, complicated to scale and rarely customizable.
OPEN DATA KIT • The Open Data Kit or simply ODK is a suite of tools for data collection that uses Google’s Android platform. • The main objectives of the technology are : Modularising and customising toolsUse of open interfaces and standardsLong time survival of tools. • The three components of ODK are:1. ODK Collect : collects data using Forms.2. ODK Aggregate : ready to deploy online repository to store, view and export collected data.3. ODK Build : enables users to generate forms.4. ODK Voice : maps Forms to sound snippets.5. ODK Clinic : mobile medical record system.6. ODK Manage : maintains database of all phones for remote device management7. ODK Validate : validates Form.Other tools being ODK Dropbox, ODK Rangefinder, ODK Tasks, ODK Listen and ODK Visualise.