490 likes | 712 Views
Implementing DFS Search Services. Pierre-Yves “Pitch” Chevalier on behalf of Marc Brette. DFS 6.5 Search and Classification Services. DFS: Service-oriented and platform-agnostic Search service in DFS since 6.0: Federated Search on Documentum repositories and external repositories
E N D
Implementing DFS Search Services Pierre-Yves “Pitch” Chevalier on behalf of Marc Brette
DFS 6.5 Search and Classification Services • DFS: Service-oriented and platform-agnostic • Search service in DFS since 6.0: • Federated Search on Documentum repositories and external repositories • 6.5: New search and content intelligence features: • Nonblocking search • Clustering of search results • Saved searches • Classification service • A platform to build wide range of search applications from mobile search to advanced discovery interface • This presentation of services put them in practice by progressively building an application example.
Agenda • Search • Clustering • Saved Queries • Classification • Troubleshooting
DFS 6.5 Search and Classification Services Search service • Simple search • Federated search • Nonblocking search • Advanced queries
Search Service • SearchService: • execute • Executes a query and returns results • getRepositoryList • Returns the list of available sources (managed and unmanaged repositories) • Query can be structured or passthrough (straight DQL) • Results contains Query status and DataPackage (list of DataObject) • Stateless: relies on a caching mechanism
Consumers DFS DFC Content Server Search Service WSDL-based Proxies Search Service ECI Server Query Store Service DFS Runtime JAX-WS / JAXB Analytics Service CI Server DFS Runtime Control flow Services Architecture
Example: A Simple Search Application • A simple example that performs a search on one repository and displays results • Architecture of the example: • User interface in AJAX • Java servlets call DFS and format results in JSON for the UI • Remote call to DFS but could also be local calls
Example: Execute Query Setup context StructuredQuery q = new StructuredQuery(); q.addRepository("MSSQL60ECI4"); q.setObjectType("dm_document"); ExpressionSet expressionSet = new ExpressionSet(); expressionSet.addExpression(new FullTextExpression(searchQuery)); q.setRootExpressionSet(expressionSet); QueryExecution queryExec = new QueryExecution(0, 100, 100); QueryResult queryResult = searchService.execute(q, queryExec, null); RepositoryIdentity identity = new RepositoryIdentity("MSSQL60ECI4", "userdev1", "userdev1", ""); ContextFactory contextFactory = ContextFactory.getInstance(); IServiceContext context = contextFactory.newContext(); context.addIdentity(identity); ISearchService searchService = ServiceFactory.getInstance().getRemoteService(ISearchService.class, context, "search", "http://127.0.0.1:8080/services"); Build and execute query
Example: Wrap the Query in a Servlet Get parameter public class SearchServlet extends HttpServlet { protected void doPost(HttpServletRequest httpServletRequest, HttpServletResponse httpServletResponse) throws ServletException, IOException { String searchQuery = httpServletRequest.getParameter("queryTerms"); //…
JSON: A JavaScript-friendly structure Easy to represent lists and name/value pairs Example: Format Response as JSON
Example: Format Response as JSON public void writeJSON(PrintWriter writer, QueryResult response) { writer.append("["); for (Iterator it = response.getDataObjects().iterator(); it.hasNext();) { DataObject dataObject = (DataObject) it.next(); writer.append("{"); PropertySet set = dataObject.getProperties(); Iterator<Property> iterator = set.iterator(); while (iterator.hasNext()) { Property prop = iterator.next(); String strName = prop.getName(); String value = prop.getValueAsString(); writer.append("\"").append(strName).append("\":\"").append(value).append("\""); if (iterator.hasNext()) writer.append(","); } writer.append("}\n"); if (it.hasNext()) writer.append(","); } writer.append("]"); }
Example: HTML Form function updatepage(str){ var rsp = eval("("+str+")"); // use eval to parse JSON response var html= "<table>"; for (i = 0 ; i < rsp.length; i++) { var result = rsp[i]; html += "\n<tr><td>" + result.object_name + "</td></tr>"; } html += "</table>" document.getElementById("result").innerHTML = html; } <!-- … --!> <form name="searchForm" onsubmit='xmlhttpPost("/EMCWorldDemo/search",updatepage, getQueryParams()); return false;'> <p>query: <input name="queryTerms" type="text"> <input value="Go" type="submit"></p> <div id="result"></div></td> </form>
Federated Search • DFS Search Service supports federated search across multiple Documentum repositories and external repositories • Requires ECI option for external repositories • ECI supports a large catalog of adapters to external sources: • CMS (FileNet, SharePoint, IBMCM…) • Websites (Google, Yahoo …) • Databases • Indexers (Verity, Fast, IndexServer…) • Specialized sources (legal, science, regulation, patents, health…) • EMC products (eRoom, EX, AX…) • Support for authentication using the same service as Docbase repositories
Federated Search: Configure ECI To search external repositories: • Install ECIS • Edit dfc.properties in DFS ear: • dfc.search.ecis.enable=true • dfc.search.ecis.host=ecishost
Example: Querying Several Sources Listing available sources String[] sources = httpServletRequest.getParameterValues("sources"); ContextFactory contextFactory = ContextFactory.getInstance(); IServiceContext context = contextFactory.newContext(); for (String source: sources) { RepositoryIdentity identity = new RepositoryIdentity( source, "userdev1", "userdev1", ""); context.addIdentity(identity); } StructuredQuery q = new StructuredQuery(); for (String source: sources) q.addRepository(source); List<Repository> repositories = searchService.getRepositoryList(null); for (Repository dataObject: repositories) { Repository dataObject = it.next(); String sourceName = dataObject.getName(); String userLogin = dataObject.getProperties().getUserLoginCapability(); } Querying multiple sources
Nonblocking Search • DFS is based on DFC, which supports asynchronous search execution • Allows dynamic display of results • DFS supports it through nonblocking query call: • Allows multiple successive call to get new results and query status DFS Client DFS Service execute(query,0,100) no results wait 1 second execute(query,0,100) 10 results wait 1 second execute(query,10,100) 90 results
Nonblocking Search: Cache • DFS queries are cached • Each query has a definition and a query ID used as key in the cache • Cache policy is size-based and time-based • Each Search Service call contains the initial query (definition) so that the query may be re-executed in case of cache miss. • Configurable in dfs-runtime.properties: • dfs.query_cache_house_keeper.period = 5
Nonblocking Search: QueryStatus • QueryStatus contains status of the query for each repository • Example: Two sources, one successful, one failed with network error
Example: Nonblocking Query Execution Set asynchronous call QueryExecution queryExec = new QueryExecution(start, len, 350); queryExec.setQueryId(queryId); SearchProfile profile = new SearchProfile(); profile.setAsyncCall(true); OperationOptions options = new OperationOptions(); options.setSearchProfile(profile); QueryResult queryResult = searchService.execute(q, queryExec, options);
Advanced Queries StructuredQuery: an abstract query • Allow to refine the query. • Allow to bind the query to UI controls. • Independent of the Full-text Indexer and Content Server version. Independent on the presence of an Indexer.
Advanced Queries • FullTextExpression • Supports a Boolean ‘mini-language’: phrase AND, OR, NOT and parentheses • Example: EMC contract AND (“end of life” OR termination) NOT ECIS • ExpressionSet • Boolean expression between FullTextExpression and PropertyExpression • PropertyExpression • Constraints on document attributes • Operators: EQUAL, NOT_EQUAL, GREATER_THAN, LESS_THAN, GREATER_EQUAL, LESS_EQUAL, BEGINS_WITH, CONTAINS, DOES_NOT_CONTAIN, ENDS_WITH, IN, NOT_IN, BETWEEN, IS_NULL, IS_NOT_NULL, • Values: SimpleValue, ValueList, ValueRange, RelativeDateValue
Advanced Queries: Example Example of structured query: • Object_name contains “test”, modified date in the last month and owner_name is “marc” or “ghislain” Advanced query example ExpressionSet expr = new ExpressionSet(); expr.addExpression(new PropertyExpression("object_name", Condition.CONTAINS,"test")); expr.addExpression(new PropertyExpression("r_modify_date", Condition.GREATER_EQUAL, new RelativeDateValue(-1, TimeUnit.MONTH))); ExpressionSet orExpr = new ExpressionSet(ExpressionSetOperator.OR); orExpr.addExpression(new PropertyExpression("owner_name", Condition.EQUAL,"marc")); orExpr.addExpression(new PropertyExpression("owner_name", Condition.EQUAL,"ghislain")); expr.addExpression(orExpr);
Agenda • Search • Clustering • Saved Queries • Classification • Troubleshooting
DFS 6.5 Search and Classification Services Clustering • Simple clustering of search results • Multiple facets and strategies • Getting results • Go beyond search
Clustering • Dynamic grouping of results into ‘clusters’ • Based on results properties (not content) • Uses linguistic rules • Option of Search Service • Requires an SBO to be installed • An installer is provided (Webtop Extended Search) • Supports hierarchical clustering
Clustering • SearchService: • getClusters • Return the clusters for a query • getSubClusters • Return the clusters for a subset of a query • getResultsProperties • Return the properties for a subset of a query • The services are stateless • Reuse query cached by SearchService.execute. Reexecute it if needed. • All the methods have query and query execution parameter in case of cache miss
Example: Computing Clusters Get clusters for a query QueryExecution queryExec = new QueryExecution(0, 100, 350); queryExec.setQueryId(queryId); ClusteringProfile profile = new ClusteringProfile(); profile.addClusteringStrategy(new ClusteringStrategy("Topics", Arrays.asList("object_name", "title", "subject", "summary"))); OperationOptions options = new OperationOptions(); options.setClusteringProfile(profile); QueryCluster queryClusters = searchService.getClusters(query, queryExec, options);
QueryCluster 0..* ClusterTree ClusteringStrategy + isRefreshable: Boolean + strategyName: String 1 0..* Cluster + clusterSize: int + clusterValues: List<String> + isSubClusterTreeAvailable: Boolean 0..1 ObjectIdentitySet Example: Clustering Response Objects • getClusters() response
Multiple Facets and Strategies • Several ways to group results together • Defined by a strategy: • Topic • Person names • Dates • Document sizes
Example: Multiple Strategies Set cluster strategy for ‘Topic’ and ‘Date’ • INSERT example of strategies call: author & date by quarter ClusteringProfile profile = new ClusteringProfile(); profile.addClusteringStrategy(new ClusteringStrategy("Topics", Arrays.asList("object_name", "title", "subject", "summary"))); ClusteringStrategy dateClusteringStrategy = new ClusteringStrategy("Date", Arrays.asList("r_modify_date")); PropertySet tokenizerPropSet = new PropertySet(new StringProperty("r_modify_date", "quarterdate")); dateClusteringStrategy.setTokenizers(tokenizerPropSet); profile.addClusteringStrategy(dateClusteringStrategy);
Go Beyond Search • Clustering can be used for nonsearch applications • Example: most active subjects in a repository (automatic tag clouds)
Agenda • Search • Clustering • Saved Queries • Classification • Troubleshooting
Saved Queries • QueryStoreService: • listSavedQueries • loadSavedQuery • saveQuery • Allow to manipulate dm_smart_list object (exposed in Webtop since 5.3) • Allow to control which results are saved
Saved Queries List the saved queries for the current user. IQueryStoreService service = ServiceFactory.getInstance().getRemoteService(IQueryStoreService.class, context, "core", "http://127.0.0.1:8080/services"); QueryExecution queryExec = new QueryExecution(0, 100, 100); SavedQueryFilter filter = new SavedQueryFilter(SavedQueryAccessibility.OWNED); DataPackage queryResult = service.listSavedQueries("MSSQL60ECI4", queryExec, filter, null);
Saved Queries Load a saved query ObjectIdentity queryId = new ObjectIdentity(new ObjectId("0821f7588000132e"), "MSSQL60ECI3"); QueryExecution queryExec = new QueryExecution(0, 100, 100); SavedQuery queryResult = queryStoreService.loadSavedQuery(queryId, queryExec, null); SavedQuery RichQuery Query + displayedAttributes: List<String> 1 1 + propertySet: PropertySet 0..1 QueryResult
Saved Queries Save a query Query query = //… ObjectIdentity queryId = new ObjectIdentity("MSSQL60ECI3"); DataObject metadata = new DataObject(queryId) ; metadata.getProperties().set("object_name", "My Saved Query"); RichQuery richQuery = new RichQuery(); richQuery.setQuery(query); QueryExecution queryExec = new QueryExecution(0, 100, 100); ObjectIdentity queryResult = queryStoreService.saveQuery(metadata, richQuery, queryExec, null, null);
Agenda • Search • Clustering • Saved Queries • Classification • Troubleshooting
Classification • Introduce a service to compute ‘tags’ for documents • Based on CIS classification engine and managed taxonomy • AnalyticsService: • analyze • Takes a list of object IDs and computes the list of categories for each document
Classification Configuration • Install CIS Server • Installer deploy ear with embedded app server (JBoss) • Install taxonomy • Available Taxonomies • Energy / Energy Industry • Energy / Oil Trading • General Finance • General Knowledge • Information Science and Technology • Law / Federal Legislation Terms • Life Sciences • Manufacturing / Chemical Hazards • Military / DTIC • Science and Engineering • …
Classification: Compute Categories Analyze an object ObjectIdentitySet documentsSet = new ObjectIdentitySet(new ObjectIdentity(new ObjectId("0821f7588000132e"), MY_DOCBASE)); OperationOptions operationOptions = new OperationOptions(); PropertyProfile propProfile = new PropertyProfile(); propProfile.setIncludeProperties(Arrays.asList("CATEGORIES")); operationOptions.setPropertyProfile(propProfile); IAnalyticsService analyticsService = serviceFactory.getRemoteService(IAnalyticsService.class, context, "analytics", "http://127.0.0.1:7001/services"); List<AnalyticsResult> analyticsResults = analyticsService.analyze(documentsSet, operationOptions);
Classification: ‘Analyze’ Response Display the categories for each object for (AnalyticsResult classResult : analyticsResults) { System.out.println("Document ID: " + classResult.getObjIdentity()); List<CategoryAssign> catAssigns = classResult.getCategoryAssignList(); for (CategoryAssign catAssign : catAssigns) { System.out.println("\t " + catAssign.getCategory().getName()); } }
Agenda • Search • Clustering • Saved Queries • Classification • Troubleshooting
Troubleshooting • Diagnose query issue: print QueryStatus object: • Diagnose ECIS communication problem: log4j traces Diagnose query issues after execute QueryResult queryResult = searchService.execute(query, exec, options); System.out.println(queryResults.getStatus());
Troubleshooting • Trace DFS request/response on SUN JVM: System.setProperty("com.sun.xml.ws.transport.http.client.HttpTransportPipe.dump", "true");