Tuesday, November 6, 2018

Stemming Search Terms in Sitecore Lucene Indexes

    It is copy of my article, initially published here to keep everything in one place.

   Sitecore content search is great technology that allows you to get search on your Sitecore website with minimum efforts. But one thing that always disappointed me is that this search doesn’t understand word forms. Single and plural form of a noun will be saved as two separate terms in the index(e.g.: “tool” and “tools”). Single, past tense and normal form of a verb will result in three different terms in the index(e.g.: “deny”, “denies” and “denied”). It gives worse search results. If you will search “deny” then it would not found documents with “denies” or “denied”.

    There are few options how you can “fix” it. First one is usage of similarity parameter in the query: x => x.YourFieldName.Like(“tools”, 0.8f). It is quick and dirty solution. Now, content search will return results with similar words. But there is the other side of the coin. You will get search results with similar words where you don’t expect. E.g.: search for “Ireland” will give “Iceland” and “Island” in results.

    The other option is using “Stemming”. Stemming is the process of reducing inflected (or sometimes derived) words to their word stem. There are a different implementation of stemming algorithms. Lucene.Net has implementation of the Porter Stemming algorithm. It could be used to extend Sitecore content search. We need to implement our own analyser:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
using Lucene.Net.Analysis;
using System;
using System.IO;
 
namespace Feature.StemmedSearch.Search
{
    public class PorterStemLowerCaseKeywordAnalyzer : KeywordAnalyzer
    {
        public override TokenStream TokenStream(string fieldName, TextReader reader)
        {
            return new PorterStemFilter(new LowerCaseFilter(new KeywordTokenizer(reader)));
        }
    }
}
    Then register field mapping in the content search configuration:
<fieldNames>
  <field fieldName="_content"              storageType="YES" indexType="TOKENIZED"    vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
 <analyzer type="Feature.StemmedSearch.Search.PorterStemLowerCaseKeywordAnalyzer, Feature.StemmedSearch" />
  </field>
</fieldNames>
    I used _content field as example, but it is better don’t change Sitecore fields that come out of the box and use your own custom fields. Now, after rebuild of indexes we can see that all search terms are saved in a stemmed way:
    And when you check search queries by turning on verbose logging. You will see that search query terms are also stemmed for _content field:
Hurray! Now, our Sitecore website search is more similar to Google search. :-)
Stay tuned, in the second part we will do the same for Solr indexes.

Tuesday, August 7, 2018

CI/CD for Your Sitecore Pet Project Using AppVeyor

It is copy of my article placed in Sagittarius blog
There is lots of information about how to build continuous integration, continuous delivery and continuous deployment for enterprise Sitecore projects. But, none of these include information about Sitecore projects on GitHub with CI/CD (at least I haven’t faced with them). That is why I have decided to write a step-by-step guide on how to get CI/CD for your Sitecore pet project for free.
Let’s start:
  1. Create accounts for GitHub and AppVeyor (using authorization via GitHub)
  2. Create your Sitecore project and push it to GitHub
  3. Create and push appveyor.yml file to your repository
#Version of project could be based on build number
version: 1.0.{build}
#Path(s) to artifacts. We will create Sitecore update package and ship it as artifact
artifacts:
  - path: build\artifacts\*.update
    name: Sitecore.Akamai
before_build:
  #Configure Nuget to use public Sitecore packages feed
  - nuget sources add -Name SitecorePublicFeed -Source https://sitecore.myget.org/F/sc-packages/api/v3/index.json
  #Restore packages configured in solution
  - nuget restore Sitecore.Akamai.sln
build:
  #Solution file
  project: Sitecore.Akamai.sln
#Configuration of deployment artifacts to GitHub as release. It is not possible to save artifacts on Appveyor due to limited time of storage (30 days)
after_build:
  #Run PowerShell script that will build .update Sitecore package
  - ps: .\build\build.ps1  
deploy:
  #Release name
  release: Sitecore.Akamai-v$(appveyor_build_version)
  #Description of release
  description: 'Using Akamai features inside Sitecore'
  #Deploy to GitHub
  provider: GitHub
  #Secure token to upload files to GitHub https://www.appveyor.com/docs/deployment/github/
  auth_token:
    secure: rkLSxUbN2YMMG/r6lzLq1PN0n07dqJBtk/8ZR2c/InGy0SBOsmqGXfIQWMQOZUAs 
  draft: false
  prerelease: false
  on:
    branch: master                # release from master branch only
    appveyor_repo_tag: true       # deploy on tag push only
install:
  #Intallation of Sitecore.Courier Powershell module to have ability to build Sitecore package
  - ps: Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
  #Intallation of Sitecore.Courier to have ability to build Sitecore package
  - choco install sitecore-courier

    4. Commit and push powershell script that will prepare Sitecore update package with files and items using Sitecore.Courier
#Get package version from Appveyor
$version = $env:APPVEYOR_BUILD_VERSION
if ($version -eq $null) {
    $version = "1.0.0"
}
"Package version: " + $version
 
#Clear build directories
Remove-Item build\package -Recurse -ErrorAction Ignore
Remove-Item build\artifacts -Recurse -ErrorAction Ignore
#Create directories structure for package
New-Item -Name build\package -ItemType directory
New-Item -Name build\artifacts -ItemType directory
New-Item -Name build\package\Data -ItemType directory
New-Item -Name build\package\bin -ItemType directory
New-Item -Name build\package\App_Config\Include\Foundation -ItemType directory
 
#Copy Sitecore items and assemblies to proper location
Copy-Item .\src\Foundation\Akamai\code\bin\Foundation.Akamai* .\build\package\bin
Copy-Item .\src\Foundation\Akamai\code\App_Config\Include\Foundation\Foundation.Sitecore.Akamai.config .\build\package\App_Config\Include\Foundation
Copy-Item .\src\Foundation\Akamai\serialization\* .\build\package\Data -recurse
 
#Run Sitecore Courier to prepare .update package
$packageCmd = "Sitecore.Courier.Runner.exe -t build\package -o build\artifacts\sitecore.akamai." + $version + ".update -r"
iex $packageCmd

   5. Add a new project to Appveyor from GitHub
   6. Configure the publishing of artifacts. Create a new GitHub token. Encrypt it using AppVeyor and update appveyor.yml with new encrypted value.
   7. Update readme.md file with badge of build status
That is all. After next commit to your GitHub repo, new AppVeyor build will be triggered and Sitecore update package will be prepared. Sitecore update package will be available from artifacts tab on build details page. New GitHub release will be created after the creation of the new tag in the repository. The release will contain an archive of source files and Sitecore .update package. Update package will contain files and items and could be installed on any Sitecore website using update installation wizard.

As a bonus, you can configure code metrics on Sonar Cloud for free:
  1. Create a new Sonar Cloud account using GitHub authorization
  2. Generate new Sonar Cloud token and encrypt it using AppVeyor
  3. Change you appveyor.yml configuration:
before_build:
  #Start SonarQube runner
  - MSBuild.SonarQube.Runner.exe begin /k:"Sitecore.Akamai" /d:"sonar.host.url=https://sonarqube.com" /d:"sonar.login=%sonar_token%" /d:"sonar.organization=github-antonytm" /n:"Sitecore.Akamai"
after_build:
  #Stop SonarQube runner
  - MSBuild.SonarQube.Runner.exe end /d:"sonar.login=%sonar_token%"
  - ps: .\build\build.ps1  
environment:
  sonar_token:
    secure: xxx #Put your encrypted Sonar Cloud token here
install:
  #Install msbuild SonarQube runner
  - choco install "msbuild-sonarqube-runner" -y

     4. Update readme.md file with Sonar Cloud code quality badges.

It took some time to set up this configuration first time. But for the second project, it took less than half an hour. This free lightweight cloud CI/CD is a good investment in saving your time to avoid preparing new builds manually. And you should not necessarily select services described in this article. It is only one way that was shown as an example. GitHub could be replaced with Bitbucket, AppVeyor with TravisCI, etc.

You can see all this in action inside my repository on GitHub.

Sunday, July 8, 2018

Sitecore and Akamai Hidden Gems, Part 2: Device Detection

     At the first part of Akamai article series I wrote how to utilize GeoIp data on Sitecore. Another useful information that could be provided by Akamai is device characteristics:
  1. Akamai maintains database of useragent strings
  2. When visitor requests your website under Akamai, it is able to parse useragent header on the fly and add parsed information about device. 
  3. You get X-Akamai-Device-Characteristics request header on your server that contains information that could be used. E.g.: X-Akamai-Device-Characteristics: brand_name=Google; is_tablet=false; device_os=Android
  4.  You can use this information to display different content depending on information that you have got.
    Sitecore code that respond for getting device characteristics is not easy to override comparing to GeoIp detection. But it is possible to build Sitecore rules that will parse Akamai headers. Here is example of "Device is Mobile" rule (real code looks differently, it is just example to have everything in one place for better understanding):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using Sitecore.Diagnostics;
using Sitecore.Rules;
using Sitecore.Rules.Conditions;

namespace Foundation.Akamai.DeviceDetection.Rules.Conditions
{
    public class DeviceIsMobile<T> : OperatorCondition<T> where T : RuleContext
    {
        private HeaderParser HeaderParser
        {
            get
            {
                return new HeaderParser();
            }
        }

        protected override bool Execute(T ruleContext)
        {
            Assert.ArgumentNotNull(ruleContext, "ruleContext");
            Assert.IsNotNullOrEmpty(HttpContext.Current.Request.Headers["X-Akamai-Device-Characteristics"], "Akamai header X-Akamai-Device-Characteristics is null or empty");
            var headerValue = HttpContext.Current.Request.Headers["X-Akamai-Device-Characteristics"];
            var dictionary = headerValue.ParseAkamaiHeader(";");
            bool.TryParse(dictionary["is_mobile"], out bool value);
            return value;
        }
    }

    public static class Extensions
    {
        public static Dictionary<string, string> ParseAkamaiHeader(this string headerValue, string delimiter = ",")
        {
            var pairs = headerValue.Split(new string[] { delimiter }, StringSplitOptions.RemoveEmptyEntries).Select(x => x.Trim());
            var dictionary = new Dictionary<string, string>();
            foreach (var pair in pairs)
            {
                var parts = pair.Split('=');
                if (parts.Length > 1)
                {
                    var key = parts[0];
                    var value = parts[1];
                    dictionary.Add(key, value);
                }
            }

            return dictionary;
        }
    }
}

    It is possible to create a lot of rules based on Akamai information about visitor location or his device without overriding anything in Sitecore.


   All code and Sitecore items serialization are available in GitHub repository, also it is possible to download Sitecore update package and use it.

Sitecore and Akamai Hidden Gems, Part 1: GeoIp Detection

    Akamai is well known CDN(content delivery network) provider. It provides content delivery features that are very widely used  across popular websites. Besides CDN, Akamai provides additional features, but not everyone is aware of them. One of them is providing GeoIp data. In brief, how it works:
  1. Akamai maintains database of IP addresses
  2. When user requests your website under Akamai, Akamai is able to parse IP address of request on the fly and add information about user’s geographic location, network, connection speed, etc. to request header. 
  3. You get X-Akamai-Edgescape request header on your server that contains a lot of information that could be used. E.g.: X-Akamai-Edgescape: georegion=263,country_code=US,region_code=MA,city=CAMBRIDGE,dma=50 6,pmsa=1120,areacode=617,county=MIDDLESEX,fips=25017,lat=42.3933,l ong=-71.1333,timezone=EST,zip=02138-02142+02238-02239,continent=NA ,throughput=vhigh,asnum=21399
  4. You are able parse this header and get more information about your visitor to show him relevant information on website
      Out of the box Sitecore also provides provides GeoIp personalization. Both Akamai and Sitecore GeoIp detection have it’s benefits. Akamai GeoIp is free(if you are already customer of Akamai) and could be quicker, because you don’t need to spent server time to get information about IP address. Sitecore GeoIp is ready to use out of the box and is better integrated with Sitecore Experience Platform. Depending on project you can prefer different GeoIp provider.
    If you choose Akamai then you are able to find article that describes how to do it. That solution works properly for Sitecore 6.6 - 7.2, but doesn’t work on Sitecore 8+. Since some time Sitecore moved parsing GeoIp headers to separate thread:

if (orCreate.GeoIpResolveState == GeoIpResolveState.Unresolved)
{
 GeoIpManager.StartResolvingThread(orCreate);
}

That thread doesn’t know nothing about your context, that is why call of HttpContext.Current.Request.Headers["X-Akamai-Edgescape"] causes “Object reference not set to an instance of an object” exception. It means that only overriding of LookupManager will not work. I have managed to transfer data from Akamai header to Sitecore analytics context in next way:
  1. Disable Sitecore.Analytics.Pipelines.CommitSession.UpdateGeoIpData, Sitecore.Analytics.Pipelines.CreateVisits.UpdateGeoIpData and Sitecore.Analytics.Pipelines.EnsureClassification.UpdateGeoIpData processors
  2. Override Sitecore.Analytics.Pipelines.StartTracking.UpdateGeoIpData processor

using Foundation.Akamai.GeoIp;

namespace Foundation.Akamai.Pipelines
{
    public class UpdateGeoIpData 
    {
        public void Process(object args)
        {
            var whois = new LookupProvider().GetInformationByIp("");
            if(whois!=null)
            {
                Sitecore.Analytics.Tracker.Current.Interaction.SetGeoData(whois);
            }
        }
    }
}

    This approach works, because parsing header is quick operation that doesn’t need running separate thread. As I understand four UpdateGeoIpData processors in different pipelines were required to sync thread that get GeoIp information with main HttpRequest thread.
    To get more details you can review my repository on GitHub or download package and try it by yourself.

Wednesday, June 13, 2018

Passing Sitecore Certified Platform Associate Developer

I passed  Sitecore Certified Platform Associate Developer exam. There are a lot of posts in the internet that describes topics and question that are present on exam. This post is not about it, furthermore it is not allowed. It is about technical issues that you can face(and I faced) during passing online exam:
  • Be very attentive to time that you select for exam. I selected 3 PM, but I got invitation on 2 PM. I was confused and was ready to pass it on 2 PM, but exam started at 3 PM. I think that this issue is related to daylight-saving time zones issue in http://webassessor.com system.
  • You can't check Sentinel Secure software before running exam. You should have backup plan if something will go wrong with your software/hardware.
  • Sentinel Secure software throws error on Windows server 2012
  • Sentinel Secure software doesn't work with all webcams. (I assume that issue is present for 4k cams)

You can ask me how I faced with all of these issues during passing exam:
  1. I tried to start exam on 2 PM, but it was unavailable until 2:50 PM
  2. I started exam on my laptop, but exam software was not able to see video from my camera. Video was black blank screen in application, however in other software camera worked as expected.
  3. I moved to machine with Windows server 2012 R2 installed. I was not able even to start exam application. After checking system it wrote "Error" with one option: exit.
  4. I moved back to laptop and tried different camera and fortunately it started to work.


Wednesday, May 30, 2018

Adding HtmlCache Viewing Feature to Sitecore.Rocks Visual Studio Extension

It is copy of my article placed in Sagittarius blog

After writing an article about viewing Sitecore HTML cache, I thought that it is not convenient to move custom code from one project to another. When, instead, there is ability to extend widely used existing Sitecore development tool Sitecore.Rocks then it should be done. I decided to contribute into it.

First of all, you need to fork Sitecore.Rocks project on GitHub.

    Sitecore.Rocks works like server-client application. You have server side (Sitecore.Rocks.Server, connector in other words) that is copied to all Sitecore websites and client side: interface in Visual Studio to display different things from Sitecore. Fortunately, there is already “Data” column for cache viewer to display cache details and we don’t need to modify interface to add something new. But this column shows only path to item, when cache key is Sitecore identifier. In all other cases Data column will be empty. Let’s extend amount of data that could be shown in this column.

     Logic that returns data to be displayed in this table located in /src/Sitecore.Rocks.Server/Requests/Caches/GetCacheKeys.cs file. We can modify it and add ability to return cache value when cache type is string or id:


    After building a solution, we need copy Sitecore.Rocks.Server.dll to bin folder of our website. Now, you are able to see what it inside cache. If you are interested in viewing other types of cache(not only string and id) then you are able to extend code above with your needs.




If my pull request is accepted, then this feature will be available in one of next Sitecore.Rocks version.

Friday, May 4, 2018

How to Check What is Inside Sitecore HTML Cache?

It is a copy of this blog post.

Sitecore has a lot of caches: AccessResultCache, DataCache, DeviceItemsCache, HTML cache, ItemCache, ItemPathsCache, PathCache, RegistryCache, RuleCache, StandardValuesCache, ViewStateCache, XslCache, FieldReaderCache and others. When you facing with caching issue on your website(something is either not cached, or cached but should not), it is important to have the ability to see what is present in cache.

All Sitecore caches consist of cache key and cache value. Cache key is always string. Cache value could be any type of object depending on cache. We can take as example HtmlCache cache. When you see what cache key you have and what data is there, you can better tune caching of controls, especially when you have custom(not out of the box) cache criterias (vary by IP, vary by country, vary by role, etc.) and a lot of controls on a page. Here is code snippet to get everything inside HtmlCache:
It requires using reflection due to private methods in HTML cache. The result of execution GetCaches method will be List<Tuple<string, string>>, that you can filter, make search or display on the page. It can save you a lot of time when debugging Sitecore caching issues. Here is an example of service page that displays cache keys and cache values:



If you can’t add this code to some secure page in your solution(you don’t want to do it or need some out of the box solution) then there are also other useful tools for troubleshooting issues with Sitecore cache. But they will not show you the cached value of HtmlCache entry: