Search Extension

LIDS Search extension provides Elasticsearch functionality to standard LIDS and SAMO clients.

Elasticsearch is a distributed, RESTful search and analytics engine. It provides scale and high availability. Elasticsearch has the ability to be schema-less, which means that documents can be indexed without explicitly providing a schema.

Elasticsearch stores data to be evaluated in the embedded database. This is secondary data storage necessary to provide the functionality and this data storage must be synchronized with the primary data storage in LIDS.

Search extension is responsible for communication between LIDS and Elasticsearch engine. The functionality of Search extension include:

on demand updating the data in Elasticsearch database
on the fly updating the data in Elasticsearch database during standard data manipulation in LIDS
evaluating queries from clients and providing the query result from Elasticsearch

Example: Structure of search.xml.

<?xml version="1.0" encoding="utf-8"?>
<ber:search xmlns:ber="http://www.berit.com/ber" xmlns:xlink="http://www.w3.org/1999/xlink">
  <ber:version>
    <ber:metadata>1</ber:metadata>
    <ber:minClient>5140</ber:minClient>
    <ber:minAS>12601</ber:minAS>
  </ber:version>
  <ber:defaultBulkSize>50</ber:defaultBulkSize>
  <ber:esConnection>
    <ber:esClusterName>${elasticClusterName}</ber:esClusterName>
    <ber:esTransportAddressesArray>
      <ber:esTransportAddress address="${elasticHost}" port="${elasticPort}"/>
    </ber:esTransportAddressesArray>
    <ber:indexPrefix>s__</ber:indexPrefix>
  </ber:esConnection>
  <ber:featureTypeIndexImageArray>
    <ber:featureTypeIndexImage id="fti_1" featureTypeId="*">
      <ber:indexedAttributeArray>
        <ber:indexedAttribute>
          <ber:includeAttribute>*</ber:includeAttribute>
        </ber:indexedAttribute>
      </ber:indexedAttributeArray>
      <ber:indexGeometry>true</ber:indexGeometry>
      <ber:indexGeometryForSearch>false</ber:indexGeometryForSearch>
      <ber:geometryPrecision>5m</ber:geometryPrecision>
      <ber:languageSpecificKeyword>true</ber:languageSpecificKeyword>
    </ber:featureTypeIndexImage>
  </ber:featureTypeIndexImageArray>
  <ber:codelistIndexImageArray>
    <ber:codelistIndexImage id="cdi_1" codelistId="*">
      <ber:indexedAttributeArray>
        <ber:includeAttribute>*</ber:includeAttribute>
      </ber:indexedAttributeArray>
    </ber:codelistIndexImage>
    <ber:codelistIndexImage id="presetsExample" extends=" cdi_1" codelistId="cl_1">
      <ber:indexedAttributeArray>
        <ber:indexedAttribute>
          <ber:includeAttribute>ca_1</ber:includeAttribute>
          <ber:fieldPreset>autocomplete</ber:fieldPreset>
          <ber:fieldPreset>startswith</ber:fieldPreset>
          <ber:fieldPreset>withoutdiacritics</ber:fieldPreset>
        </ber:indexedAttribute>
        <ber:indexedAttribute>
          <ber:includeAttribute>*</ber:includeAttribute>
        </ber:indexedAttribute>
      </ber:indexedAttributeArray>
    </ber:codelistIndexImage>
  </ber:codelistIndexImageArray>
  <ber:indexedCodelistsArray>
    <ber:indexedGroup groupId="gr_1"/>
    <ber:indexedCodelist codelistId="cl_1"/>
    <ber:indexedCodelist codelistId="cl_2"/>
    <ber:indexedCodelist codelistId="cl_abc*"/>
  </ber:indexedCodelistsArray>
  <ber:indexedFeaturesArray>
    <ber:indexedGroup groupId="gr_1"/>
    <ber:indexedGroup groupId="gr_2"/>
    <ber:indexedFeatureType featureTypeId="ft_abc*"/>
    <ber:indexedFeatureType featureTypeId="ft_1"/>
  </ber:indexedFeaturesArray>
  <ber:cronTriggerArray>
    <ber:cronTrigger id="everyHour">
      <ber:cronExpression>0 0 *?* *</ber:cronExpression>
      <ber:type>full</ber:type>
      <ber:codelistsArray>
        <ber:codelist codelistId="cl_1"/>
        <ber:codelist codelistId="cl_2"/>
      </ber:codelistsArray>
    </ber:cronTrigger>
  </ber:cronTriggerArray>
</ber:search>

Elastic Search Connection

Defines connection to the Elasticsearch engine, which needs to be installed and configured separately.

esClusterName – name of Elasticsearch cluster
addressport – URL + port of the running Elastic search
indexPrefix – prefix for index names. Necessary when one Elasticsearch serves more projects (with different data) to ensure unique index names. Index prefix must be lowercase.

tip

These parameters can be also defined using variables so that various instances of the same project can share common definition. Values of these variables are then set by environment properties for each environment.

tip

Elasticsearch provides REST API which enables health check. It’s possible to use this API to evaluate, if the connection parameters are set correct.

Example: Connection definition

  <ber:esConnection>
    <ber:esClusterName>elasticsearch</ber:esClusterName>
    <ber:esTransportAddressesArray>
      <ber:esTransportAddress address="project.domain.com" port="10150"/>
    </ber:esTransportAddressesArray>
    <ber:indexPrefix>s__</ber:indexPrefix>
  </ber:esConnection>

Example: Connection defined by parameters

  <ber:esConnection>
    <ber:esClusterName>${elasticClusterName}</ber:esClusterName>
    <ber:esTransportAddressesArray>
      <ber:esTransportAddress address="${elasticHost}" port="${elasticPort}"/>
    </ber:esTransportAddressesArray>
    <ber:indexPrefix>s__</ber:indexPrefix>
  </ber:esConnection>

Feature Indexing Configuration

It’s necessary to define, what data and how is indexed in Elasticsearch.

indexedFeaturesArray

The list of feature types to be indexed can be specified either by including individual feature types or groups. Definition using categories shouldn’t be used anymore.

tip

In both cases, when defining feature type ID or groupID it’s possible to use mask with asterisk “*” as a wildcard.

warning

In case of shared semantics, the semantic parent should be included in the configuration (both in explicit feature type IDs list or as group member definition). Graphic feature types from shared semantics group are not processed when evaluating the search queries.

Example:

  <ber:indexedFeaturesArray>
    <ber:indexedGroup groupId="gr_1"/>
    <ber:indexedGroup groupId="gr_2*"/>
    <ber:indexedFeatureType featureTypeId="ft_abc*"/>
    <ber:indexedFeatureType featureTypeId="ft_1"/>
  </ber:indexedFeaturesArray>

tip

The list of feature types to be searched from particular application (e.g. LIDS Explorer or LIDS Mobile) can be further narrowed down in the application’s specific configuration.

featureTypeIndexImageArray

The way, how individual features of particular type are indexed is configured in featureTypeIndexImage.

id – identifier of index image
extends – identifier of parent image definition. If defined, the parameters defined in parent image definition are inherited. If not defined, no parameters are inherited and have to be defined
featureTypeId – particular feature type id or mask using wildcard
includeAttribute – particular attribute id or mask for defining set of attributes included in the definition
excludeAttribute – particular attribute id or mask for defining set of attributes excluded from the definition. System attributes, with the exception of createdBy and updatedBy, can't be excluded from indexing — they are always indexed. It's still possible to exclude system attributes from the allField index for searching by setting addToAllField to false
addToAllField – if false, defined attributes will not be added to the allField index, which is evaluated if no particular attribute is defined in the query. Default is true
allFieldDateFormat – optional definition of the date attribute for adding into allField index. If more definitions exist for the same attribute, the same value is added more times to the index. When not defined, then default format (DateTimeFormatter.ISO_LOCAL_DATE e.g. '2011-12-03') is used
fieldPreset - parameters to influence the Elasticsearch filter and tokenizer. For possibilities refer to Elasticsearch documentation
indexGeometry – defines, if the feature geometry is stored in the index
indexGeometryForSearch – defines if feature geometry is stored in the index the way which enables searching for features also according to spatial condition
geometryPrecision – parameter to define precision when indexing geometry for search. Setting the value of the precision to higher number means lower precision, but also lower RAM consumption
repairGeometryForSearch - option that automatically repairs geometry after conversion to WGS84
languageSpecificKeyword – has to be set to true to make the language specific letters to be ordered correctly. If set to false, the specific letters appear at the end of the ordered list. The specification of the language to be applied, is set by lids.search.locale property in environment properties
allFieldTemplate – sophisticated formatting definition when adding to allField index
filterWithoutDiacritics – if set to true, whole index is created using asciifolding so that the search filter ignores diacritics

tip

Individual featureTypeIndexImages can be organized in a hierarchy to allow inheriting common properties from parent images and defining specific properties at children images. In the end, every feature type is indexed just once according to one definition, even if it falls into more index images.

Example:

<ber:featureTypeIndexImageArray>
  <ber:featureTypeIndexImage id="fti_1" featureTypeId="ft_abc*">
    <ber:indexedAttributeArray>
      <ber:indexedAttribute>
        <ber:includeAttribute>*</ber:includeAttribute>
        <ber:excludeAttribute>atTest*</ber:excludeAttribute>
        <ber:excludeAttribute>createdBy</ber:excludeAttribute>
        <ber:excludeAttribute>updatedBy</ber:excludeAttribute>
        <ber:allFieldDateFormatArray>
          <ber:allFieldDateFormat>dd-MM-yyyy</ber:allFieldDateFormat>
          <!-- for cs locale will produce e.g. 23.srpen 2017 -->
          <ber:allFieldDateFormat>dd.LLLL yyyy</ber:allFieldDateFormat>
        </ber:allFieldDateFormatArray>
      </ber:indexedAttribute>
    </ber:indexedAttributeArray>
    <ber:indexGeometry>false</ber:indexGeometry>
    <ber:indexGeometryForSearch>false</ber:indexGeometryForSearch>
    <ber:languageSpecificKeyword>true</ber:languageSpecificKeyword>
    <ber:allFieldTemplate>
      <![CDATA[<#if at_12??>${at_12?number_to_date?string["dd.MM.yyyy"]}</#if>]]>
    </ber:allFieldTemplate>
  </ber:featureTypeIndexImage>
  <ber:featureTypeIndexImage id="fti_2" extends="fti_1" featureTypeId="ft_abc*">
    <ber:indexedAttributeArray>
      <ber:indexedAttribute>
        <ber:includeAttribute>at_1</ber:includeAttribute>
        <ber:addToAllField>false</ber:addToAllField>
        <ber:fieldPreset>autocomplete</ber:fieldPreset>
        <ber:fieldPreset>startswith</ber:fieldPreset>
        <ber:fieldPreset>withoutdiacritics</ber:fieldPreset>
      </ber:indexedAttribute>
    </ber:indexedAttributeArray>
    <ber:indexGeometry>true</ber:indexGeometry>
    <ber:indexGeometryForSearch>true</ber:indexGeometryForSearch>
    <ber:geometryPrecision>5m</ber:geometryPrecision>
    <ber:repairGeometryForSearch>true</ber:repairGeometryForSearch>
  </ber:featureTypeIndexImage>
  <ber:featureTypeIndexImage id="fti_3" featureTypeId="ft_def">
    <ber:indexedAttributeArray>
      <ber:indexedAttribute>
        <ber:includeAttribute>atFoo*</ber:includeAttribute>
        <ber:includeAttribute>atBar*</ber:includeAttribute>
        <ber:excludeAttribute>atFoo123</ber:excludeAttribute>
        <ber:excludeAttribute>atBar123</ber:excludeAttribute>
      </ber:indexedAttribute>
    </ber:indexedAttributeArray>
    <ber:indexGeometry>false</ber:indexGeometry>
    <ber:indexGeometryForSearch>false</ber:indexGeometryForSearch>
    <ber:filterWithoutDiacritics>true</ber:filterWithoutDiacritics>
  </ber:featureTypeIndexImage>
</ber:featureTypeIndexImageArray>

Codelists Indexing Configuration

The definition of codelists indexing is very similar to the feature types indexing definition.

tip

For usage in standard LIDS clients, it’s not necessary to index codelists at all. They are currently used by SAMO dynamic client only.

indexedCodelistsArray

The list of codelists to be indexed is specified either by including individual codelist IDs or groups, with the possibility to use asterisk “*” as a wildcard.

Example:

  <ber:indexedCodelistsArray>
    <ber:indexedGroup groupId="gr_1"/>
    <ber:indexedGroup groupId="gr_2"/>
    <ber:indexedCodelist codelistId="cl_1"/>
    <ber:indexedCodelist codelistId="cl_2"/>
    <ber:indexedCodelist codelistId="cl_abc*"/>
  </ber:indexedCodelistsArray>

codelistIndexImageArray

The way, how individual codelists are indexed is configured in codelistIndexImage. Individual codelistIndexImages can be organized in a hierarchy to allow inheriting common properties from parent images and defining specific properties at children images.

id – identifier of index image
extends – identifier of parent image definition
codelistId – particular codelist id or mask using wildcard
includeAttribute – particular attribute id or mask for defining set of attributes included in the definition
excludeAttribute – particular attribute id or mask for defining set of attributes excluded from the definition
fieldPreset – parameters to influence the Elasticsearch filter and tokenizer. For possibilities refer to Elasticsearch documentation

Example:

<ber:codelistIndexImageArray>
  <ber:codelistIndexImage id="cdi_1" codelistId="cl_abc*">
    <ber:indexedAttributeArray>
      <ber:includeAttribute>*</ber:includeAttribute>
      <ber:excludeAttribute>caTest*</ber:excludeAttribute>
    </ber:indexedAttributeArray>
    <ber:languageSpecificKeyword>true</ber:languageSpecificKeyword>
  </ber:codelistIndexImage>
  <ber:codelistIndexImage id="presetsExample" extends=" cdi_1" codelistId="cl_1">
    <ber:indexedAttributeArray>
      <ber:indexedAttribute>
        <ber:includeAttribute>ca_1</ber:includeAttribute>
        <ber:fieldPreset>autocomplete</ber:fieldPreset>
        <ber:fieldPreset>startswith</ber:fieldPreset>
        <ber:fieldPreset>withoutdiacritics</ber:fieldPreset>
      </ber:indexedAttribute>
    </ber:indexedAttributeArray>
  </ber:codelistIndexImage>
  <ber:codelistIndexImage id="cdi_2" codelistId="cl_xyz">
    <ber:indexedAttributeArray>
      <ber:includeAttribute>ca2</ber:includeAttribute>
      <ber:includeAttribute>ca3</ber:includeAttribute>
      <ber:includeAttribute>ca4</ber:includeAttribute>
      <ber:fieldPreset>withoutdiacritics</ber:fieldPreset>
    </ber:indexedAttributeArray>
  </ber:codelistIndexImage>
</ber:codelistIndexImageArray>

Updating Elasticsearch Index

The consistency between LIDS and Elasticsearch is maintained automatically when working standard way with LIDS data.

Situations when indexing / reindexing must be executed on demand include:

Initial indexing after installing Elasticsearch
Change of Search extension configuration
Some bulk operations in silent mode, such as import
Data modification directly in the database
Change of the attributes definition for indexed feature type
Change of geometry type for indexed feature type

The indexing / reindexing on demand can be activated:

in LIDS Application Server console
by calling the REST API
using a tool like e.g. Postman
by cron scheduler

Reindexing scheduler configurationcronTriggerArrayenables definition of individual triggers which are processed automatically by cron scheduler

cronExpression – specifies the firing schedule. The pattern is a list of six single space-separated fields representing: second, minute, hour, day, month, weekday. Month and weekday names can be given as the first three letters of the English names. Example patterns:
- "0 0 * * * *" = the top of every hour of every day.
- "*/10 * * * * *" = every ten seconds.
- "0 0 8-10 * * *" = 8, 9 and 10 o'clock of every day.
- "0 0/30 8-10 * * *" = 8:00, 8:30, 9:00, 9:30 and 10 o'clock every day.
- "0 0 9-17 * * MON-FRI" = on the hour nine-to-five weekdays
- "0 0 0 25 12 ?" = every Christmas Day at midnight
type – type of reindexing
- full - does full re-index
- updatedSince - does incremental re-index from the moment when it is started
group – group to be indexed
authoredBy – user login; the re-indexing is running under the identity of defined user

Example:

<ber:cronTriggerArray>
  <ber:cronTrigger id="everyNightAt3">
    <ber:cronExpression>0 0 3 ** *</ber:cronExpression>
    <ber:type>updatedSince</ber:type>
    <ber:groupArray>
      <ber:group>group_1</ber:group>
      <ber:group>group_2</ber:group>
    </ber:groupArray>
    <ber:authoredBy>administrator</ber:authoredBy>
  </ber:cronTrigger>
  <ber:cronTrigger id="everySaturdayAt12">
    <ber:cronExpression>0 0 12* *SAT</ber:cronExpression>
    <ber:type>full</ber:type>
    <ber:groupArray>
      <ber:group>group_3</ber:group>
      <ber:group>group_4</ber:group>
    </ber:groupArray>
    <ber:authoredBy>superuser</ber:authoredBy>
  </ber:cronTrigger>
</ber:cronTriggerArray>

warning

Scheduler is enabled only when Environment or System property masterNode is set to true. masterNode can be overridden by:

samo.lids.{context}.search. masterNode
samo.lids.{context}.masterNode
lids.{context}.search.masterNode
lids.{context}.masterNode
samo.lids.search.masterNode
samo.lids.masterNode
lids.search.masterNode
lids.masterNode
search.masterNode

Elasticsearch configuration

Main configuration of Elasticsearch is included in elasticsearch.yml. The suggested parameters to be added follow:

# cluster name has to correspond to the esClusterName parameter in search.xml
cluster.name: elasticsearch
# default ports after installation are 9200 and 9300. Can be changed following way:
http.port: 6200
transport.tcp.port: 6300
action.auto_create_index: .security*,.monitoring*,.watches,.triggered_watches,.watcher-history*,.ml*cluster.routing.allocation.disk.threshold_enabled: false

warning

It’s a YAML file with specific syntax as described here: https://docs.ansible.com/ansible/2.4/YAMLSyntax.html

Windows service memory configuration

Elasticsearch memory usage when running as Windows service is configured by specifying –Xms and –Xmx parameters under registry key:

HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Apache Software Foundation\Procrun 2.0\elasticsearch-service-x64\Parameters\Java\Options

For more information, please, refer to Elasticsearch online documentation.

Limitations

Search extension fully respects following security settings of LIDS:

feature type access rights
attribute access rights
security codelists
ownership

The only exception is FILTER access right on attribute. This option is not considered. So, if some attribute shouldn’t be available for filtering, it has to be excluded from the Search extension indexing.

Elastic Search Connection​

Feature Indexing Configuration​

indexedFeaturesArray​

featureTypeIndexImageArray​

Codelists Indexing Configuration​

indexedCodelistsArray​

codelistIndexImageArray​

Updating Elasticsearch Index​

Reindexing scheduler configurationcronTriggerArrayenables definition of individual triggers which are processed automatically by cron scheduler​

Elasticsearch configuration​

Limitations​

Elastic Search Connection

Feature Indexing Configuration

indexedFeaturesArray

featureTypeIndexImageArray

Codelists Indexing Configuration

indexedCodelistsArray

codelistIndexImageArray

Updating Elasticsearch Index

Reindexing scheduler configurationcronTriggerArrayenables definition of individual triggers which are processed automatically by cron scheduler

Elasticsearch configuration

Limitations