460 likes | 494 Views
Understand the different data sources in Infotools Harmoni, including imported files and direct connections, for efficient data analysis. Learn about source structures and creating sources from various file formats like SAV, X-files, and more.
E N D
InfotoolsHarmoni IHU: Data Sources August 2017Presented by: Janine Takle
Data Sources • A data source is the starting point for all you do in Infotools Harmoni • A source is a collection of data that is either in an imported file, or in a connected data store • Each record in a source corresponds to a count in Infotools Harmoni
Sources vision • Our vision is for any appropriate data source to be analysed, have insights visualised, then distributed, all within Infotools Harmoni • Connections that facilitate real time updates are expected to be the way of the future • We have a continuous development stream dedicated to adding connections and supporting new sources • We are strongly influenced by the needs of our clients when identifying additional sources to support
Imported files • Imported files capture data for a moment in time • Files that can be imported into Infotools Harmoni include: • SAV files • X-files • MDD/DDF – Dimensions files • XLSX – Excel files • TXT files – tab delimited text • CSV files – comma delimited text
Direct connections • Direct connections allow updated data to flow into Infotools Harmoni in real time • Direct connections are achieved using APIs (Application Programming Interface). • If an API is available in a client’s data collection system we can potentially develop a direct connection • We currently support direct connections to: • Tables in SQL Server • Views in SQL Server • Projects in Decipher
Source content • Sources can be divided into 2 types – record level and summary level • Record level sources are sources where analysing the count of records is relevant, e.g. where a record represents a respondent, a log entry, a sales transaction, etc. • Summary level sources are sources where analysing the count of records is irrelevant, and analysis only has meaning when a measure is applied, e.g. scores, GRPs, Spend, etc. • Files in SAV and Dimensions format typically contain record level data with a record representing a respondent
Creating sources • SAV files can be generated from a variety of data collection applications, but not all make full use of available features • X-files is Infotools’ proprietary format read by our legacy Desktop applications • Dimensions files are typically created by Dimensions, although some other market research data collection applications can also create them • Xlsx files are primarily created from Excel in Office 2007+ • Text files, delimited with a tab or comma, can be generated from a variety of systems, and do not have market research specific structures
SAV Structure • Coded responses have labels • Looped questions have a separate variable for each loop combination • All respondents asked the question have a response • Respondents not asked the question don’t have any response • Multiple response sets capture yes responses • Cleaned data, with only valid respondents
X-File Structure • X-files are typically already designed • Vaxes must be in indexless format (achieved using SUP -Clean) • Each item within a common type needs a unique label • Headers need to be new style, i.e. version 4+ • Matrices, yngrids and vaxes with predefines are not supported
Dimensions MDD/DDF Structure • The Dimensions converter is currently undergoing an overhaul • Supported structures will be confirmed within the next few months
XLSX Structure • First row contains headers with unique, non-blank descriptions • Fields contain labels, not codes. If codes are used they will need to be renamed to labels after importing into Infotools Harmoni (IH) • Key identifiers can be added to the header labels to ensure required Infotools Harmoni types are applied • No blank rows between valid rows of data • Numeric values don’t contain text, e.g. $, commas, % • No fields contain |
TXT Structure • First row contains headers with unique, non-blank, descriptions that don’t include tabs • Fields are delimited with a tab • Records contain labels, not codes. If codes are used they will need to be renamed to labels after importing into IH • Key identifiers can be added to the header labels to ensure required IH types are applied • No blank rows between valid rows of data • Text containing tabs is enclosed in double-quotes • Time/Date fields contain descriptions, not numbers • Numeric values don’t contain text, e.g. $, tabs, %
CSV Structure • First row contains headers with unique, non-blank, descriptions that don’t include commas • Fields are delimited with a comma • Records contain labels, not codes (if codes are used they will need to be renamed to labels after importing into IH) • Key identifiers can be added to the header labels to ensure required IH types are applied • No blank rows between valid rows of data • Text containing commas is enclosed in double-quotes • Time/Date fields contain descriptions, not numbers • Numeric values don’t contain text, e.g. $, commas, %
SQL Structure • All relevant data exists in a single view or table • All data associated with a unique record is reported in a single table or view • Key identifiers can be added to field names to advise IH on the default type to import • Fields contain labels, not codes (if codes are used they will need to be renamed to labels after importing into IH) • Keep in mind that field names in SQL are limited to 128 characters, and cannot contain |
Direct Connection Structure • Each Direct Connection is developed to map the data types in the data collection system with the data types available in Infotools Harmoni • Direct Connections allow market research specific structures, such as grids and multiple responses, to retain their internal relationships
Source Dictionaries • A Dictionary contains meta-data to guide interpretation • When loading sources with a dictionary, Infotools Harmoni automatically maps items into Infotools Harmoni item types • When loading sources that d • on’t contain a dictionary, the user is given the option to override the automated mapping of source variables into Infotools Harmoni types • Sources with inherent dictionaries are SAV, X-files, MDD/DDF and Decipher
SAV Dictionary • Variables that are a Numeric Type, and have defined Value Labels become standard axes in IH, e.g. • Variables that are a Numeric or Date Type, and have no defined Value Labels become measures in IH, e.g. • Variables that are a String Type, and don’t have defined Value Labels become text items in IH, e.g.
Dictionary Tricks • If the file doesn’t have a dictionary, key identifiers can be used to pre-determine the data type in text file imports • The key identifiers are: • $ any field starting with $ becomes a measure in IH • $weight any field starting with $weight becomes a weight in IH • & any field starting with & becomes a text item in IH
Source Tips & Tricks • You can see the sources in a Project that’s not loaded by selecting the project and clicking Sources • You can see the sources that exist in a loaded Project by clicking • Once in the view/add sources menu, you can add more sources to a project by clicking
Source Tips & Tricks • Projects can contain multiple sources - harmonization of sources is, of course, one of our key selling points! • Infotools Harmoni is source agnostic, so sources in a project can be a mix of imported files and direct connections • Sources are specific to a Project
Update Tips & Tricks • If a source is updated, or a new source is added, dictionary mappings applied to the previous source are applied to the new one • When adding new sources, items that don’t already exist in the project are added to the end • If an item does not appear with the expected label or type, Infotools Harmoni provides functionality to change these things within your Project … although sometimes it’s easier to change the source!
Filetype Tips & Tricks • When using an x-files source you need to upload: • Xbf • Xdf • Xef • Xlk (optional) • Excel files must be .xlsx • Infotools Harmoni can use sources in any language, including languages requiring multi-byte storage, e.g. traditional Chinese
Tips & Tricks • When importing files that do not have an associated dictionary, the elements of standard axes are written into the axis in the order that they appear within the file • Multiple response sets in SAV files appear at the bottom of the project • Data imported into SQL from Excel, Access or Text, has field names truncated at 64 characters
Helpful resources • Infotools Harmoni Knowledge Base https://support.infotools.com/hc/en-us • Sample files https://support.infotools.com/hc/en-us/articles/360015447734
Try it yourself…? • Sample files can be found in https://support.infotools.com/hc/en-us/articles/360015447734 • Try importing each of the different sources and see if you can answer the following questions
Can you…? • See what’s new in the 201502 SAV file that wasn’t in 201501? • Work out why the sample size is so high in the x-files? • Import Experience feedback in the XLSX to be a verbatim in Infotools Harmoni • Add the TXT file into the same project you imported the SAV files and watch the magic happen • Take a copy of the CSV and change the source so NumBevs Per Week is recognised as a measure in the import screen?
What is this data…? • The following slides show some screenshots of sources. For each data source, consider the following questions: • Is this data record level or summary level? • Is this data suitably formatted to import into Infotools Harmoni? • If not, why not?
Answers/Hints • Try it yourself • Hint: use the NEW filter at the top of the project view • Hint: a population projection has been set in the xlk • Hint: Insert a & into the start of the Experience feedback header • Hint: the colouring indicates the common and non-common items • Hint: Insert a $ into the start of the NumBevs Per Week header • What is this data • Source 1 – record level, suitably formatted • Source 2 – summary level, not suitably formatted – there are blank lines, merged cells in the headers, headings among the entries in column A, etc. • Source 3 – summary level, suitably formatted • Source 4 – record level, suitably formatted • Source 5 – summary level, suitably formatted