110 likes | 248 Views
The tool is a powerful instrument for exploring new datasets . The detailed descriptions generated for each category constitute an intuitive summary of the dataset . The Detect Categories table analysis tool finds natural groups in the data .
E N D
The tool is a powerful instrument for exploring new datasets. Thedetaileddescriptions generated for each category constitute an intuitive summary ofthedataset. The Detect Categories table analysis tool finds natural groups in thedata. Itanalyzes your data, finds the most common combinations of column values,and then defines groups based on these common patterns. It provides adetailed description of the groups it identifies, and it can label each row in theoriginal data with the name of the group to which it belongs. This practice (often called clustering or segmentation) eases many data analysistasks TheDetectCategoriesTool
LaunchingtheTool TheDetectCategoriesTool • select thedataset, • and then select the DetectCategorieson the Analyze ribbon, The DetectCategories dialog box hasthreesectionsthat may need yourinput.
LaunchingtheTool • The first section is the list of columns to be analyzed. TheDetectCategoriesTool The ID column is an example of a column that should be ignored. It containsunique identifiers for each row of the table, and likely does not contain anyinsightsaboutthe data.
LaunchingtheTool MaximumNumberof categoriesdrop-down boxcontains numbers from 2 to 10, plus an<Auto-detect> option TheDetectCategoriesTool If you know beforehand how many categories you would like your data to be partitioned into, select that number in thelist The <Auto-detect> option will try to identify the actualnumber of natural categories in the data
TheCategoriesReport • Thereportcontainsthree sections. • The first section presents the categories and the number ofrows determined to belong to each category. • The second section describes thecharacteristics of each category, and • The third section provides a visualizationof the data in each category. TheDetectCategoriesTool
TheCategoriesReport Categories and the Number of Rows in Each TheDetectCategoriesTool • It consists of two columns: • Onecontaining the category name, and • the second showing the count of data rowsthat are included in each category. • The Category Name column is editable, andchanges in this column will propagate to the rest of the report
TheCategoriesReport Characteristics of EachCategory • The first column is the category name. • The next two columns (Columnand Value) identify a characteristic attribute of the current category.Forexample, Category 1 is characterized by very low income, below $39,050 • The last column tells how important the current characteristic is in describingthecurrentcategory. For example, most of the rows in Category 1 containan Income value lower than $39,050 TheDetectCategoriesTool
TheCategoriesReport Characteristics of EachCategory TheDetectCategoriesTool …but you should not interpret thisimportance as a hard rule for all rows. Rows with an Income lower than $39,050may appear in othercategories, and rows in Category 1 may have an Income in a differentrange,
TheCategoriesReport Characteristics of EachCategory TheDetectCategoriesTool Based on the characteristics for each category, you can derive meaningful category labels. For example, Category 1 can be labeled, based on thecharacteristicsin theFigure, as Very Low Income Category 5, you canlabel it with some more meaningful name, such as Professionals with morethan3 cars.
TheCategoriesReport TheCategoryProfilesChart The chart presents the distribution (number of data rows) having a certain characteristic across all the detected categories. TheDetectCategoriesTool
TheCategoriesReport TheCategoryProfilesChart Each vertical bar represents the distribution of a column insideone category or the whole table. The length of a segment represents the proportion (not theabsolute number) of data rows in the current group having a certain property. A legend that maps colors to distinct column states is visible on the rightsideof thechart. TheDetectCategoriesTool