A Simple Content Enrichment Service

sharepoint 2013 logoIntroduction

One of the many challenges that SharePoint developers face is returning meaningful search results that allow users to access information efficiently. Oftentimes, data retrieved from search could be more effective if we could modify it slightly. Other times, search results would be enhanced if we could include related information that does not reside within SharePoint. FAST for SharePoint 2010 provided pipeline extensibility which allowed us to modify content on the “pipeline” using a PowerShell script or a compiled application. SharePoint 2013 introduced Content Enrichment which allows us to enrich content during the content processing phase using a WCF Web Service as seen in Figure 1. In this 3-part series, we will examine Content Enrichment being leveraged to enhance data three different ways. In part one, we will develop a simple Content Enrichment Service that combines two existing SharePoint managed properties into a single managed property. In part two, we will enhance data by taking a single managed property and querying a database to obtain related details. Finally, in part three, we will enhance content by taking a single managed property and obtaining details from a web service.

Figure 1. Content enrichment within content processing.

Creating a Simple Content Enrichment Service

For the first example, we will take the SharePoint built-in Title and Author managed properties and combine them into a single managed property that will have the Title, followed by a comma-delimited list of the Authors; this new property will be created for all Microsoft Word documents. Now it would probably make more sense to do something like this at the UI level; however, we are doing this to demonstrate the features of content enrichment and how to manipulate properties. Let’s begin:

  1. Create a new WCF Service Application project named ContentProcessingEnrichmentServiceEx1
  2. Delete the auto-generated Service1.svc and IService1.cs
  3. Add a reference to microsoft.office.server.search.contentprocessingenrichment.dll (located in Installation Path\Microsoft Office Servers\15.0\Search\Applications\External\)
  4. Add a new WCF Service to the project named ContentProcessingEnrichmentServiceEx1.svc
  5. Delete the auto-generated IContentProcessingEnrichmentServiceEx1.cs
  6. Use the following using directives:
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using Microsoft.Office.Server.Search.ContentProcessingEnrichment;
    using Microsoft.Office.Server.Search.ContentProcessingEnrichment.PropertyTypes;
    
  7. Modify the service to implement IContentProcessingEnrichmentService (instead of IContentProcessingEnrichmentServiceEx1)
  8. Delete the DoWork method
  9. Add the following code:
    
    // Defines the error code for managed properties with an unexpected type.
    private const int UnexpectedType = 1;
    // Defines the error code for encountering unexpected exceptions.
    private const int UnexpectedError = 2;
    
    private readonly ProcessedItem processedItemHolder = new ProcessedItem
    {
         ItemProperties = new List<AbstractProperty>()
    };
    
    public ProcessedItem ProcessItem(Item item)
    {
        processedItemHolder.ErrorCode = 0;
        processedItemHolder.ItemProperties.Clear();
        try
        {
            //Input Properties
            var authorProperty =
                item.ItemProperties.FirstOrDefault(i => i.Name == "Author") as Property<List<string>>;
            var titleProperty =
                item.ItemProperties.FirstOrDefault(i => i.Name == "Title") as Property<string>;
            //Output Properties
            var titleWithAuthorProperty = new Property<string>
            {
                Name = "TitleWithAuthor",
                Value = String.Empty
            };
    
            if (authorProperty != null && titleProperty != null)
            {
                var authors = authorProperty.Value;
                var title = titleProperty.Value;
                var authorsString = String.Join(",", authors);
    
                //if more than one author, replace last comma with word "and"
                var index = authorsString.LastIndexOf(",", System.StringComparison.Ordinal);
                if (index != -1)
                {
                    authorsString = authorsString.Remove(index, 1);
                    authorsString = authorsString.Insert(index, " and ");
                }
                titleWithAuthorProperty.Value = String.Format("{0} by: {1}", title, authorsString);
            }
            else
            {
                processedItemHolder.ErrorCode = UnexpectedType;
            }
            processedItemHolder.ItemProperties.Add(titleWithAuthorProperty);
         }
         catch (Exception ex)
         {
            processedItemHolder.ErrorCode = UnexpectedError;
         }
         return processedItemHolder;
    }
    
  10. Debug or Publish your service

Configuring SharePoint

  1. Create a PowerShell Script
  2. Paste the following code into script:
    Add-PSSnapin "Microsoft.SharePoint.PowerShell" -ErrorAction SilentlyContinue
    $ssa = Get-SPEnterpriseSearchServiceApplication
    #Create Managed property for TitleWithAuthor
    $mp = Get-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa -Identity TitleWithAuthor
    if(!$mp)
    {
        New-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa -Name TitleWithAuthor -Type 1
    
        $mp = Get-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa -Identity TitleWithAuthor
        if($mp)
        {
            #set configuration for new property
            $mp.RespectPriority = $true
            $mp.Searchable = $true
            $mp.Queryable = $true
            $mp.Retrievable = $true
            $mp.HasMultipleValues = $false
            $mp.Refinable = $true
            $mp.Sortable = $true
            $mp.Update()
        }
    }
    #Configure Content Enrichment
    $config = New-SPEnterpriseSearchContentEnrichmentConfiguration
    $config.Endpoint = "http://localhost:50363/ContentProcessingEnrichmentServiceEx1.svc"
    $config.InputProperties = "Author", "Title"
    $config.OutputProperties = "TitleWithAuthor"
    $config.SendRawData = $false
    $config.MaxRawDataSize = 8192
    $config.FailureMode = "Error"
    $config.Timeout = 30000
    $config.Trigger = "Contains(FileExtension,""docx"")"
    $config
    Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication $ssa –ContentEnrichmentConfiguration $config
    
    
  3. Modify the $config.EndPoint to match your service location
  4. Perform a Full Crawl

Review

Now that was a lot of changes, so let us review. First, we created a WCF service that receives two properties: Title and Author. Note that Title has a type of string while Author is a List of string. We created a new property called TitleWithAuthor and initialized it as an empty string. Next we join the Author list into a comma-delimited string of authors and replace the last comma with the word “and” if at least one existed. Next, the Title property and author string are joined and assigned to the TitleWithAuthor property value. Finally, the TitleWithAuthor property is added to the processed item holder and hand backed to the crawl. Your service is now ready to be called by the crawl.

The PowerShell script does two things: creates a managed property and configures content enrichment within SharePoint. The Managed Property, TitleWithAuthor, houses the string returned from the service. If you have created Managed Properties before, you might notice that it is not mapped to a crawled property; this is because its data is generated from the service instead of be discovered while crawling. The content enrichment configuration establishes two input properties, Title && Author, and one output property; this configuration must match what is defined in the service. You might also notice that a trigger has been defined. The trigger ensures that only items with the file extension of “docx” are enhanced; this prevents every item in SharePoint from being processed and improves performance.

The Results

For my example, I uploaded two different documents for two different users: John Doe and Jane Doe. You will want to upload some Microsoft Word documents (.docx) in order to test and then perform a full crawl. After the crawl is complete, check the log and make sure there are no errors related to content enrichment. If you happen to be debugging and stepping through code, you might receive errors if you run past the defined timeout of 30 seconds; when you done debugging, be sure to run a fresh, untouched crawl before viewing your results. Once items have been processed successfully, you should see the new managed property as seen in Figure 2. On a side note, if you are working with search and do not have the SharePoint 2013 Search Query Tool, I highly recommend it: http://sp2013searchtool.codeplex.com.

Figure 2. The content enriched property TitleWithAuthor has been successfully created.

References

Microsoft. (July 1, 2013). Custom content processing with the Content Enrichment web service callout. Retrieved August 4, 2014, from http://msdn.microsoft.com/en-us/library/office/jj163968(v=office.15).aspx.

Microsoft. (October 7, 2013). How to: Use the Content Enrichment web service callout for SharePoint Server. Retrieved August 4, 2014, form http://msdn.microsoft.com/en-us/library/office/jj163982(v=office.15).aspx.

About Chris Hettinger

Chris Hettinger is a technical professional with a vast background in many areas of computing ranging from electronics and hardware to administration and software development. Over 16 years of experience of software development with 8 years of solid experience with Microsoft .NET Framework. Chris has spent the majority of his tenure at AIS working with SharePoint Enterprise Search solutions with both SharePoint 2010 FAST and 2013 Enterprise search.

  • Mamie Wesley

    Good article , Speaking of which , if someone needs to fill out a CT DRS CT-1040X , I filled a template form here https://goo.gl/h1xMf4