460 likes | 493 Views
Infotools Harmoni. IHU: Data Sources. August 2017 Presented by: Janine Takle. Data Sources. A data source is the starting point for all you do in Infotools Harmoni A source is a collection of data that is either in an imported file, or in a connected data store
E N D
InfotoolsHarmoni IHU: Data Sources August 2017Presented by: Janine Takle
Data Sources • A data source is the starting point for all you do in Infotools Harmoni • A source is a collection of data that is either in an imported file, or in a connected data store • Each record in a source corresponds to a count in Infotools Harmoni
Sources vision • Our vision is for any appropriate data source to be analysed, have insights visualised, then distributed, all within Infotools Harmoni • Connections that facilitate real time updates are expected to be the way of the future • We have a continuous development stream dedicated to adding connections and supporting new sources • We are strongly influenced by the needs of our clients when identifying additional sources to support
Imported files • Imported files capture data for a moment in time • Files that can be imported into Infotools Harmoni include: • SAV files • X-files • MDD/DDF – Dimensions files • XLSX – Excel files • TXT files – tab delimited text • CSV files – comma delimited text
Direct connections • Direct connections allow updated data to flow into Infotools Harmoni in real time • Direct connections are achieved using APIs (Application Programming Interface). • If an API is available in a client’s data collection system we can potentially develop a direct connection • We currently support direct connections to: • Tables in SQL Server • Views in SQL Server • Projects in Decipher
Source content • Sources can be divided into 2 types – record level and summary level • Record level sources are sources where analysing the count of records is relevant, e.g. where a record represents a respondent, a log entry, a sales transaction, etc. • Summary level sources are sources where analysing the count of records is irrelevant, and analysis only has meaning when a measure is applied, e.g. scores, GRPs, Spend, etc. • Files in SAV and Dimensions format typically contain record level data with a record representing a respondent
Creating sources • SAV files can be generated from a variety of data collection applications, but not all make full use of available features • X-files is Infotools’ proprietary format read by our legacy Desktop applications • Dimensions files are typically created by Dimensions, although some other market research data collection applications can also create them • Xlsx files are primarily created from Excel in Office 2007+ • Text files, delimited with a tab or comma, can be generated from a variety of systems, and do not have market research specific structures
SAV Structure • Coded responses have labels • Looped questions have a separate variable for each loop combination • All respondents asked the question have a response • Respondents not asked the question don’t have any response • Multiple response sets capture yes responses • Cleaned data, with only valid respondents
X-File Structure • X-files are typically already designed • Vaxes must be in indexless format (achieved using SUP -Clean) • Each item within a common type needs a unique label • Headers need to be new style, i.e. version 4+ • Matrices, yngrids and vaxes with predefines are not supported
Dimensions MDD/DDF Structure • The Dimensions converter is currently undergoing an overhaul • Supported structures will be confirmed within the next few months
XLSX Structure • First row contains headers with unique, non-blank descriptions • Fields contain labels, not codes. If codes are used they will need to be renamed to labels after importing into Infotools Harmoni (IH) • Key identifiers can be added to the header labels to ensure required Infotools Harmoni types are applied • No blank rows between valid rows of data • Numeric values don’t contain text, e.g. $, commas, % • No fields contain |
TXT Structure • First row contains headers with unique, non-blank, descriptions that don’t include tabs • Fields are delimited with a tab • Records contain labels, not codes. If codes are used they will need to be renamed to labels after importing into IH • Key identifiers can be added to the header labels to ensure required IH types are applied • No blank rows between valid rows of data • Text containing tabs is enclosed in double-quotes • Time/Date fields contain descriptions, not numbers • Numeric values don’t contain text, e.g. $, tabs, %
CSV Structure • First row contains headers with unique, non-blank, descriptions that don’t include commas • Fields are delimited with a comma • Records contain labels, not codes (if codes are used they will need to be renamed to labels after importing into IH) • Key identifiers can be added to the header labels to ensure required IH types are applied • No blank rows between valid rows of data • Text containing commas is enclosed in double-quotes • Time/Date fields contain descriptions, not numbers • Numeric values don’t contain text, e.g. $, commas, %
SQL Structure • All relevant data exists in a single view or table • All data associated with a unique record is reported in a single table or view • Key identifiers can be added to field names to advise IH on the default type to import • Fields contain labels, not codes (if codes are used they will need to be renamed to labels after importing into IH) • Keep in mind that field names in SQL are limited to 128 characters, and cannot contain |
Direct Connection Structure • Each Direct Connection is developed to map the data types in the data collection system with the data types available in Infotools Harmoni • Direct Connections allow market research specific structures, such as grids and multiple responses, to retain their internal relationships
Source Dictionaries • A Dictionary contains meta-data to guide interpretation • When loading sources with a dictionary, Infotools Harmoni automatically maps items into Infotools Harmoni item types • When loading sources that d • on’t contain a dictionary, the user is given the option to override the automated mapping of source variables into Infotools Harmoni types • Sources with inherent dictionaries are SAV, X-files, MDD/DDF and Decipher
SAV Dictionary • Variables that are a Numeric Type, and have defined Value Labels become standard axes in IH, e.g. • Variables that are a Numeric or Date Type, and have no defined Value Labels become measures in IH, e.g. • Variables that are a String Type, and don’t have defined Value Labels become text items in IH, e.g.
Dictionary Tricks • If the file doesn’t have a dictionary, key identifiers can be used to pre-determine the data type in text file imports • The key identifiers are: • $ any field starting with $ becomes a measure in IH • $weight any field starting with $weight becomes a weight in IH • & any field starting with & becomes a text item in IH
Source Tips & Tricks • You can see the sources in a Project that’s not loaded by selecting the project and clicking Sources • You can see the sources that exist in a loaded Project by clicking • Once in the view/add sources menu, you can add more sources to a project by clicking
Source Tips & Tricks • Projects can contain multiple sources - harmonization of sources is, of course, one of our key selling points! • Infotools Harmoni is source agnostic, so sources in a project can be a mix of imported files and direct connections • Sources are specific to a Project
Update Tips & Tricks • If a source is updated, or a new source is added, dictionary mappings applied to the previous source are applied to the new one • When adding new sources, items that don’t already exist in the project are added to the end • If an item does not appear with the expected label or type, Infotools Harmoni provides functionality to change these things within your Project … although sometimes it’s easier to change the source!
Filetype Tips & Tricks • When using an x-files source you need to upload: • Xbf • Xdf • Xef • Xlk (optional) • Excel files must be .xlsx • Infotools Harmoni can use sources in any language, including languages requiring multi-byte storage, e.g. traditional Chinese
Tips & Tricks • When importing files that do not have an associated dictionary, the elements of standard axes are written into the axis in the order that they appear within the file • Multiple response sets in SAV files appear at the bottom of the project • Data imported into SQL from Excel, Access or Text, has field names truncated at 64 characters
Helpful resources • Infotools Harmoni Knowledge Base https://support.infotools.com/hc/en-us • Sample files https://support.infotools.com/hc/en-us/articles/360015447734
Try it yourself…? • Sample files can be found in https://support.infotools.com/hc/en-us/articles/360015447734 • Try importing each of the different sources and see if you can answer the following questions
Can you…? • See what’s new in the 201502 SAV file that wasn’t in 201501? • Work out why the sample size is so high in the x-files? • Import Experience feedback in the XLSX to be a verbatim in Infotools Harmoni • Add the TXT file into the same project you imported the SAV files and watch the magic happen • Take a copy of the CSV and change the source so NumBevs Per Week is recognised as a measure in the import screen?
What is this data…? • The following slides show some screenshots of sources. For each data source, consider the following questions: • Is this data record level or summary level? • Is this data suitably formatted to import into Infotools Harmoni? • If not, why not?
Answers/Hints • Try it yourself • Hint: use the NEW filter at the top of the project view • Hint: a population projection has been set in the xlk • Hint: Insert a & into the start of the Experience feedback header • Hint: the colouring indicates the common and non-common items • Hint: Insert a $ into the start of the NumBevs Per Week header • What is this data • Source 1 – record level, suitably formatted • Source 2 – summary level, not suitably formatted – there are blank lines, merged cells in the headers, headings among the entries in column A, etc. • Source 3 – summary level, suitably formatted • Source 4 – record level, suitably formatted • Source 5 – summary level, suitably formatted