Search Extension
LIDS Search extension provides Elasticsearch functionality to standard LIDS and SAMO clients.
Elasticsearch is a distributed, RESTful search and analytics engine. It provides scale and high availability. Elasticsearch has the ability to be schema-less, which means that documents can be indexed without explicitly providing a schema.
Elasticsearch stores data to be evaluated in the embedded database. This is secondary data storage necessary to provide the functionality and this data storage must be synchronized with the primary data storage in LIDS.
Search extension is responsible for communication between LIDS and Elasticsearch engine. The functionality of Search extension include:
- on demand updating the data in Elasticsearch database
- on the fly updating the data in Elasticsearch database during standard data manipulation in LIDS
- evaluating queries from clients and providing the query result from Elasticsearch
Example: Structure of search.xml.
<?xml version="1.0" encoding="utf-8"?>
<ber:search xmlns:ber="http://www.berit.com/ber" xmlns:xlink="http://www.w3.org/1999/xlink">
<ber:version>
<ber:metadata>1</ber:metadata>
<ber:minClient>5140</ber:minClient>
<ber:minAS>12601</ber:minAS>
</ber:version>
<ber:defaultBulkSize>50</ber:defaultBulkSize>
<ber:esConnection>
<ber:esClusterName>${elasticClusterName}</ber:esClusterName>
<ber:esTransportAddressesArray>
<ber:esTransportAddress address="${elasticHost}" port="${elasticPort}"/>
</ber:esTransportAddressesArray>
<ber:indexPrefix>s__</ber:indexPrefix>
</ber:esConnection>
<ber:featureTypeIndexImageArray>
<ber:featureTypeIndexImage id="fti_1" featureTypeId="*">
<ber:indexedAttributeArray>
<ber:indexedAttribute>
<ber:includeAttribute>*</ber:includeAttribute>
</ber:indexedAttribute>
</ber:indexedAttributeArray>
<ber:indexGeometry>true</ber:indexGeometry>
<ber:indexGeometryForSearch>false</ber:indexGeometryForSearch>
<ber:geometryPrecision>5m</ber:geometryPrecision>
<ber:languageSpecificKeyword>true</ber:languageSpecificKeyword>
</ber:featureTypeIndexImage>
</ber:featureTypeIndexImageArray>
<ber:codelistIndexImageArray>
<ber:codelistIndexImage id="cdi_1" codelistId="*">
<ber:indexedAttributeArray>
<ber:includeAttribute>*</ber:includeAttribute>
</ber:indexedAttributeArray>
</ber:codelistIndexImage>
<ber:codelistIndexImage id="presetsExample" extends=" cdi_1" codelistId="cl_1">
<ber:indexedAttributeArray>
<ber:indexedAttribute>
<ber:includeAttribute>ca_1</ber:includeAttribute>
<ber:fieldPreset>autocomplete</ber:fieldPreset>
<ber:fieldPreset>startswith</ber:fieldPreset>
<ber:fieldPreset>withoutdiacritics</ber:fieldPreset>
</ber:indexedAttribute>
<ber:indexedAttribute>
<ber:includeAttribute>*</ber:includeAttribute>
</ber:indexedAttribute>
</ber:indexedAttributeArray>
</ber:codelistIndexImage>
</ber:codelistIndexImageArray>
<ber:indexedCodelistsArray>
<ber:indexedGroup groupId="gr_1"/>
<ber:indexedCodelist codelistId="cl_1"/>
<ber:indexedCodelist codelistId="cl_2"/>
<ber:indexedCodelist codelistId="cl_abc*"/>
</ber:indexedCodelistsArray>
<ber:indexedFeaturesArray>
<ber:indexedGroup groupId="gr_1"/>
<ber:indexedGroup groupId="gr_2"/>
<ber:indexedFeatureType featureTypeId="ft_abc*"/>
<ber:indexedFeatureType featureTypeId="ft_1"/>
</ber:indexedFeaturesArray>
<ber:cronTriggerArray>
<ber:cronTrigger id="everyHour">
<ber:cronExpression>0 0 *?* *</ber:cronExpression>
<ber:type>full</ber:type>
<ber:codelistsArray>
<ber:codelist codelistId="cl_1"/>
<ber:codelist codelistId="cl_2"/>
</ber:codelistsArray>
</ber:cronTrigger>
</ber:cronTriggerArray>
</ber:search>
Elastic Search Connection
Defines connection to the Elasticsearch engine, which needs to be installed and configured separately.
- esClusterName – name of Elasticsearch cluster
- addressport – URL + port of the running Elastic search
- indexPrefix – prefix for index names. Necessary when one Elasticsearch serves more projects (with different data) to ensure unique index names. Index prefix must be lowercase.
These parameters can be also defined using variables so that various instances of the same project can share common definition. Values of these variables are then set by environment properties for each environment.
Elasticsearch provides REST API which enables health check. It’s possible to use this API to evaluate, if the connection parameters are set correct.
Example: Connection definition
<ber:esConnection>
<ber:esClusterName>elasticsearch</ber:esClusterName>
<ber:esTransportAddressesArray>
<ber:esTransportAddress address="project.domain.com" port="10150"/>
</ber:esTransportAddressesArray>
<ber:indexPrefix>s__</ber:indexPrefix>
</ber:esConnection>
Example: Connection defined by parameters
<ber:esConnection>
<ber:esClusterName>${elasticClusterName}</ber:esClusterName>
<ber:esTransportAddressesArray>
<ber:esTransportAddress address="${elasticHost}" port="${elasticPort}"/>
</ber:esTransportAddressesArray>
<ber:indexPrefix>s__</ber:indexPrefix>
</ber:esConnection>
Feature Indexing Configuration
It’s necessary to define, what data and how is indexed in Elasticsearch.
indexedFeaturesArray
The list of feature types to be indexed can be specified either by including individual feature types or groups. Definition using categories shouldn’t be used anymore.
In both cases, when defining feature type ID or groupID it’s possible to use mask with asterisk “*” as a wildcard.
In case of shared semantics, the semantic parent should be included in the configuration (both in explicit feature type IDs list or as group member definition). Graphic feature types from shared semantics group are not processed when evaluating the search queries.
Example:
<ber:indexedFeaturesArray>
<ber:indexedGroup groupId="gr_1"/>
<ber:indexedGroup groupId="gr_2*"/>
<ber:indexedFeatureType featureTypeId="ft_abc*"/>
<ber:indexedFeatureType featureTypeId="ft_1"/>
</ber:indexedFeaturesArray>
The list of feature types to be searched from particular application (e.g. LIDS Explorer or LIDS Mobile) can be further narrowed down in the application’s specific configuration.
featureTypeIndexImageArray
The way, how individual features of particular type are indexed is configured in featureTypeIndexImage.
- id – identifier of index image
- extends – identifier of parent image definition. If defined, the parameters defined in parent image definition are inherited. If not defined, no parameters are inherited and have to be defined
- featureTypeId – particular feature type id or mask using wildcard
- includeAttribute – particular attribute id or mask for defining set of attributes included in the definition
- excludeAttribute – particular attribute id or mask for defining set of attributes excluded from the definition. System attributes, with the exception of
createdByandupdatedBy, can't be excluded from indexing — they are always indexed. It's still possible to exclude system attributes from the allField index for searching by settingaddToAllFieldtofalse - addToAllField – if
false, defined attributes will not be added to the allField index, which is evaluated if no particular attribute is defined in the query. Default istrue - allFieldDateFormat – optional definition of the date attribute for adding into allField index. If more definitions exist for the same attribute, the same value is added more times to the index. When not defined, then default format (DateTimeFormatter.ISO_LOCAL_DATE e.g. '2011-12-03') is used
- fieldPreset - parameters to influence the Elasticsearch filter and tokenizer. For possibilities refer to Elasticsearch documentation
- indexGeometry – defines, if the feature geometry is stored in the index
- indexGeometryForSearch – defines if feature geometry is stored in the index the way which enables searching for features also according to spatial condition
- geometryPrecision – parameter to define precision when indexing geometry for search. Setting the value of the precision to higher number means lower precision, but also lower RAM consumption
- repairGeometryForSearch - option that automatically repairs geometry after conversion to WGS84
- languageSpecificKeyword – has to be set to
trueto make the language specific letters to be ordered correctly. If set tofalse, the specific letters appear at the end of the ordered list. The specification of the language to be applied, is set bylids.search.localeproperty in environment properties - allFieldTemplate – sophisticated formatting definition when adding to allField index
- filterWithoutDiacritics – if set to
true, whole index is created using asciifolding so that the search filter ignores diacritics
Individual featureTypeIndexImages can be organized in a hierarchy to allow inheriting common properties from parent images and defining specific properties at children images. In the end, every feature type is indexed just once according to one definition, even if it falls into more index images.
Example:
<ber:featureTypeIndexImageArray>
<ber:featureTypeIndexImage id="fti_1" featureTypeId="ft_abc*">
<ber:indexedAttributeArray>
<ber:indexedAttribute>
<ber:includeAttribute>*</ber:includeAttribute>
<ber:excludeAttribute>atTest*</ber:excludeAttribute>
<ber:excludeAttribute>createdBy</ber:excludeAttribute>
<ber:excludeAttribute>updatedBy</ber:excludeAttribute>
<ber:allFieldDateFormatArray>
<ber:allFieldDateFormat>dd-MM-yyyy</ber:allFieldDateFormat>
<!-- for cs locale will produce e.g. 23.srpen 2017 -->
<ber:allFieldDateFormat>dd.LLLL yyyy</ber:allFieldDateFormat>
</ber:allFieldDateFormatArray>
</ber:indexedAttribute>
</ber:indexedAttributeArray>
<ber:indexGeometry>false</ber:indexGeometry>
<ber:indexGeometryForSearch>false</ber:indexGeometryForSearch>
<ber:languageSpecificKeyword>true</ber:languageSpecificKeyword>
<ber:allFieldTemplate>
<![CDATA[<#if at_12??>${at_12?number_to_date?string["dd.MM.yyyy"]}</#if>]]>
</ber:allFieldTemplate>
</ber:featureTypeIndexImage>
<ber:featureTypeIndexImage id="fti_2" extends="fti_1" featureTypeId="ft_abc*">
<ber:indexedAttributeArray>
<ber:indexedAttribute>
<ber:includeAttribute>at_1</ber:includeAttribute>
<ber:addToAllField>false</ber:addToAllField>
<ber:fieldPreset>autocomplete</ber:fieldPreset>
<ber:fieldPreset>startswith</ber:fieldPreset>
<ber:fieldPreset>withoutdiacritics</ber:fieldPreset>
</ber:indexedAttribute>
</ber:indexedAttributeArray>
<ber:indexGeometry>true</ber:indexGeometry>
<ber:indexGeometryForSearch>true</ber:indexGeometryForSearch>
<ber:geometryPrecision>5m</ber:geometryPrecision>
<ber:repairGeometryForSearch>true</ber:repairGeometryForSearch>
</ber:featureTypeIndexImage>
<ber:featureTypeIndexImage id="fti_3" featureTypeId="ft_def">
<ber:indexedAttributeArray>
<ber:indexedAttribute>
<ber:includeAttribute>atFoo*</ber:includeAttribute>
<ber:includeAttribute>atBar*</ber:includeAttribute>
<ber:excludeAttribute>atFoo123</ber:excludeAttribute>
<ber:excludeAttribute>atBar123</ber:excludeAttribute>
</ber:indexedAttribute>
</ber:indexedAttributeArray>
<ber:indexGeometry>false</ber:indexGeometry>
<ber:indexGeometryForSearch>false</ber:indexGeometryForSearch>
<ber:filterWithoutDiacritics>true</ber:filterWithoutDiacritics>
</ber:featureTypeIndexImage>
</ber:featureTypeIndexImageArray>
Codelists Indexing Configuration
The definition of codelists indexing is very similar to the feature types indexing definition.
For usage in standard LIDS clients, it’s not necessary to index codelists at all. They are currently used by SAMO dynamic client only.
indexedCodelistsArray
The list of codelists to be indexed is specified either by including individual codelist IDs or groups, with the possibility to use asterisk “*” as a wildcard.
Example:
<ber:indexedCodelistsArray>
<ber:indexedGroup groupId="gr_1"/>
<ber:indexedGroup groupId="gr_2"/>
<ber:indexedCodelist codelistId="cl_1"/>
<ber:indexedCodelist codelistId="cl_2"/>
<ber:indexedCodelist codelistId="cl_abc*"/>
</ber:indexedCodelistsArray>
codelistIndexImageArray
The way, how individual codelists are indexed is configured in codelistIndexImage. Individual codelistIndexImages can be organized in a hierarchy to allow inheriting common properties from parent images and defining specific properties at children images.
- id – identifier of index image
- extends – identifier of parent image definition
- codelistId – particular codelist id or mask using wildcard
- includeAttribute – particular attribute id or mask for defining set of attributes included in the definition
- excludeAttribute – particular attribute id or mask for defining set of attributes excluded from the definition
- fieldPreset – parameters to influence the Elasticsearch filter and tokenizer. For possibilities refer to Elasticsearch documentation
Example:
<ber:codelistIndexImageArray>
<ber:codelistIndexImage id="cdi_1" codelistId="cl_abc*">
<ber:indexedAttributeArray>
<ber:includeAttribute>*</ber:includeAttribute>
<ber:excludeAttribute>caTest*</ber:excludeAttribute>
</ber:indexedAttributeArray>
<ber:languageSpecificKeyword>true</ber:languageSpecificKeyword>
</ber:codelistIndexImage>
<ber:codelistIndexImage id="presetsExample" extends=" cdi_1" codelistId="cl_1">
<ber:indexedAttributeArray>
<ber:indexedAttribute>
<ber:includeAttribute>ca_1</ber:includeAttribute>
<ber:fieldPreset>autocomplete</ber:fieldPreset>
<ber:fieldPreset>startswith</ber:fieldPreset>
<ber:fieldPreset>withoutdiacritics</ber:fieldPreset>
</ber:indexedAttribute>
</ber:indexedAttributeArray>
</ber:codelistIndexImage>
<ber:codelistIndexImage id="cdi_2" codelistId="cl_xyz">
<ber:indexedAttributeArray>
<ber:includeAttribute>ca2</ber:includeAttribute>
<ber:includeAttribute>ca3</ber:includeAttribute>
<ber:includeAttribute>ca4</ber:includeAttribute>
<ber:fieldPreset>withoutdiacritics</ber:fieldPreset>
</ber:indexedAttributeArray>
</ber:codelistIndexImage>
</ber:codelistIndexImageArray>
Updating Elasticsearch Index
The consistency between LIDS and Elasticsearch is maintained automatically when working standard way with LIDS data.
Situations when indexing / reindexing must be executed on demand include:
- Initial indexing after installing Elasticsearch
- Change of Search extension configuration
- Some bulk operations in silent mode, such as import
- Data modification directly in the database
- Change of the attributes definition for indexed feature type
- Change of geometry type for indexed feature type
The indexing / reindexing on demand can be activated:
- in LIDS Application Server console
- by calling the REST API
- using a tool like e.g. Postman
- by cron scheduler
Reindexing scheduler configurationcronTriggerArrayenables definition of individual triggers which are processed automatically by cron scheduler
- cronExpression – specifies the firing schedule. The pattern is a list of six single space-separated fields representing: second, minute, hour, day, month, weekday. Month and weekday names can be given as the first three letters of the English names. Example patterns:
- "0 0 * * * *" = the top of every hour of every day.
- "*/10 * * * * *" = every ten seconds.
- "0 0 8-10 * * *" = 8, 9 and 10 o'clock of every day.
- "0 0/30 8-10 * * *" = 8:00, 8:30, 9:00, 9:30 and 10 o'clock every day.
- "0 0 9-17 * * MON-FRI" = on the hour nine-to-five weekdays
- "0 0 0 25 12 ?" = every Christmas Day at midnight
- type – type of reindexing
- full - does full re-index
- updatedSince - does incremental re-index from the moment when it is started
- group – group to be indexed
- authoredBy – user login; the re-indexing is running under the identity of defined user
Example:
<ber:cronTriggerArray>
<ber:cronTrigger id="everyNightAt3">
<ber:cronExpression>0 0 3 ** *</ber:cronExpression>
<ber:type>updatedSince</ber:type>
<ber:groupArray>
<ber:group>group_1</ber:group>
<ber:group>group_2</ber:group>
</ber:groupArray>
<ber:authoredBy>administrator</ber:authoredBy>
</ber:cronTrigger>
<ber:cronTrigger id="everySaturdayAt12">
<ber:cronExpression>0 0 12* *SAT</ber:cronExpression>
<ber:type>full</ber:type>
<ber:groupArray>
<ber:group>group_3</ber:group>
<ber:group>group_4</ber:group>
</ber:groupArray>
<ber:authoredBy>superuser</ber:authoredBy>
</ber:cronTrigger>
</ber:cronTriggerArray>
Scheduler is enabled only when Environment or System property masterNode is set to true. masterNode can be overridden by:
samo.lids.{context}.search. masterNodesamo.lids.{context}.masterNodelids.{context}.search.masterNodelids.{context}.masterNodesamo.lids.search.masterNodesamo.lids.masterNodelids.search.masterNodelids.masterNodesearch.masterNode
Elasticsearch configuration
Main configuration of Elasticsearch is included in elasticsearch.yml. The suggested parameters to be added follow:
# cluster name has to correspond to the esClusterName parameter in search.xml
cluster.name: elasticsearch
# default ports after installation are 9200 and 9300. Can be changed following way:
http.port: 6200
transport.tcp.port: 6300
action.auto_create_index: .security*,.monitoring*,.watches,.triggered_watches,.watcher-history*,.ml*cluster.routing.allocation.disk.threshold_enabled: false
It’s a YAML file with specific syntax as described here: https://docs.ansible.com/ansible/2.4/YAMLSyntax.html
Elasticsearch memory usage when running as Windows service is configured by specifying –Xms and –Xmx parameters under registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Apache Software Foundation\Procrun 2.0\elasticsearch-service-x64\Parameters\Java\Options
For more information, please, refer to Elasticsearch online documentation.
Limitations
Search extension fully respects following security settings of LIDS:
- feature type access rights
- attribute access rights
- security codelists
- ownership
The only exception is FILTER access right on attribute. This option is not considered. So, if some attribute shouldn’t be available for filtering, it has to be excluded from the Search extension indexing.