Blogs

Using MoreLike This Component in Solr

May 09, 2013 by Jasgeet Singh

The MLT request handler will give MLT results based on a specific indexed document. This is commonly used in reaction to a user clicking a "more like this" link on existing search results. The key input to this option is a reference to the indexed document that you want similar results for.

MLT Search Component

MLT Parameters

params

Description

mlt

true to enable MoreLikeThis results

mlt.count

The number of similar documents to return for each result.

 

params

Description

mlt.fl

A comma or space separated list of fields to consider in MLT. The interesting terms are searched within these fields only, then each field should ideally have either termVectors or stored set to true in the schema.

mlt.qf

Different field boosts can optionally be specified with this parameter. The fields referenced should also be listed in mlt.fl. If there is a title or similar identifying field, then this field should probably be boosted higher.

mlt.mintf

The minimum number of times a term must be used within a document (across those fields in mlt.fl ) for it to be an interesting term. The default is 2.

mlt.mindf

The minimum number of documents that a term must be used in for it to be an interesting term. It defaults to 5.

mlt.minwl

The minimum number of characters in an interesting term. It defaults to 0.

mlt.maxwl

The maximum number of characters in an interesting term. It defaults to 0.

mlt.maxqt

The maximum number of interesting terms that will be used in an MLT query. It is limited to 25 by default, which is plenty.

mlt.maxntp

The maximum number of tokens to parse in each example doc field that is not stored with TermVector support. It default to 5000.

mlt.boost

This boolean toggles whether or not to boost each interesting term used in the MLT query. It default to false.

 

The MLT request handler needs to be configured in solrconfig.xml.

	<requestHandler name="mlt_tracks" class="solr.MoreLikeThisHandler">

	<lst name="defaults">

	<str name="mlt.fl">category, specialities</str>

	<str name="mlt.mintf">1</str>

	<str name="mlt.mindf">2</str>

	<str name="mlt.boost">true</str>

	</lst>

	</requestHandler>

Configure schema.xml:

  1. <field name="category"  type="textProcessed" indexed="true" stored="false" multiValued="true" termVectors="true"/>
   2. <field name="specialities" type="textProcessed" indexed="true" stored="false" multiValued="true" termVectors="true"/>

 

Solr Query:

This query returns similar documents for each document in the response set. Perhaps this should    be called "MoreLikeThese".

http://localhost:8080/solr/select/?qt=mlt&q=id:11&mlt.fl=category,specialities&fl=id&rows=5

 

 

Using Query Term Completion component in Solr

May 01, 2013 by Jasgeet Singh

Query Term Completion

One of the most effective features of a search user interface is automatic/instant-search or completion of query input in a search input box. It is typically displayed as a drop-down menu that appears automatically after typing.

Via Suggester:

New in Solr 3 is a high-speed approach to implement term completion, called the Suggester. The Suggester principally features the fastest search performance - handful of milliseconds per search at most.

The following request handler definition just for using the Suggester, goes in solrconfig.xml:

	<requestHandler name="/suggest" class="solr.SearchHandler">

	<lst name="defaults">

	<str name="spellcheck">true</str>

	<str name="spellcheck.dictionary">a_suggest</str>

	<str name="spellcheck.onlyMorePopular">true</str>

	<str name="spellcheck.count">5</str>

	<str name="spellcheck.collate">true</str>

	</lst>

	<arr name="components">

	<str>suggest</str>

	</arr>

	</requestHandler>


	<searchComponent name="suggest" class="solr.SpellCheckComponent">

	<lst name="spellchecker">

	<str name="name">a_suggest</str>

	<str name="classname"> org.apache.solr.spelling.suggest.Suggester</str>

	<str name="lookupImpl"> org.apache.solr.spelling.suggest.fst.FSTLookup</str>

	<str name="field">a_spell</str>

	<str name="buildOnOptimize">true</str>

	<int name="weightBuckets">100</int><!-- an FST option -->

	</lst>

	</searchComponent>

spellcheck.onlyMorePopular - this parameter means that the results should be sorted by frequency versus alphabetically almost certainly what is desired.

weightBuckets -  option is specific to FSTLookup which approximates the frequency of the terms it stores into a discrete set of buckets, defaulting to 10. It can range between 1 and 255. Increasing weightBuckets to 100 or more improves the fidelity of the weights, resulting in more relevant suggestions.

Using Highlighting Component in Solr

May 01, 2013 by Jasgeet Singh

Search components must be registered with Solr to be activated so that they can then be referred to in a components list. All of the standard components are pre-registered. Here's an example of how a search component named “elevator” is registered in solrconfig.xml:


<searchcomponent class="solr.QueryElevationComponent" name="elevator"> 
<str name="queryFieldType">string</str> 
<str name="config-file">elevate.xml</str> 
</searchcomponent>

The Highlighting Component

The component is using for returning highlighted text snippets of matching text in the original data.

The following are the parameters observed by the highlighter search component:

• hl: Set to true to enable search highlighting. Without this, the other parameters are ignored, and highlighting is effectively disabled.

	SolrQuery solrQuery = new SolrQuery(“*”)

	solrQuery.setHighlight(true)

• hl.fl: A comma or space separated list of fields that will be highlighted. It is important for a field to be marked as stored in the schema in order to highlight on it. You may use an asterisk wildcard to conveniently highlight on all of the text fields, such as * or r_*. If you use a wildcard, then consider enabling the hl.requireFieldMatch option.

	SolrQuery solrQuery = new SolrQuery(“*”)

	solrQuery.setHighlight(true)

	solrQuery.setHighlightRequireFieldMatch(true)

• hl.snippets: This is the maximum number of highlighted snippets that will be generated per field. It defaults to 1, which you will probably not change. By setting this to 0 for a particular field, you can effectively disable highlighting for that field. You might do that if you used a wildcard for hl.fl and want to make an exception.

	SolrQuery solrQuery = new SolrQuery(“*”)

	solrQuery.setHighlight(true)

	solrQuery.setHighlightSnippets(1)

• hl.fragsize: The maximum number of characters returned in each snippet, measured in characters. The default is 100. If 0 is specified, then the field is not fragmented and whole field values are returned. Obviously, be wary of doing this for large text fields.

	SolrQuery solrQuery = new SolrQuery(“*”)

	solrQuery.setHighlight(true)

	solrQuery.setHighlightSnippets(1)

	solrQuery.setHighlightFragsize(100)

• hl.formatter: Choose how to format the highlighting. This parameter refers to a named <formatter/> element in <highlighting/> in solconfig.xml. The default implementation named html further specifies two more parameters:

° hl.simple.pre and hl.simple.post: This is the text that will    be inserted immediately before and after matched terms in the snippet in order to demarcate them from the surrounding text. Note that the circumstantial presence of whatever values are chosen in the original text, such as HTML with pre-existing emphasis tags, are not escaped, and in rare circumstances may lead to a false highlight.

	SolrQuery solrQuery = new SolrQuery(“*”)

	solrQuery.setHighlight(true)

	solrQuery.setHighlightSimplePre(“before”)

	solrQuery.setHighlightSimplePost(“after”)

Hope it helps!

Using Solr queries in Solr4J

April 29, 2013 by Jasgeet Singh

I learned a few things about using Solr Queries in a project where we used Solr4J and Solr full Text search. I am blogging about some of my learnings on How to use Solr Queries here . If you are not aware about Solr Search Engine, you can read about the same from  link here:

I will start with some examples

Example 1: If we have facet Company Type which has some facet values like Retail, Accommodation, Travels & Construction and facet Specialities. We want the result where all company types must have at least one speciality except Accommodation.

	def solrQuery = new SolrQuery ("*”)

	solrQuery.addField("company_type")

	solrQuery.addField("specialities")

	solrQuery.addFilterQuery("((+category: Retail) AND (specialities: [* TO *])) OR (category :( Accommodation)) OR ((category: Travels) AND (specialities: [* TO *])) OR ((category: Construction) AND (specialities: [* TO *]))")

Example 2: If we want to find search terms not from whole document only from some facets.

	def solrQuery = new SolrQuery ("*”)

	solrQuery.addField ("location")

	solrQuery.addField ("city")

	solrQuery.addField ("country")

	def filterQuery = / (location: "searchTerm"+*) OR (city: "searchTerm"+*) OR (country: "searchTerm"+*)/"

	solrQuery.addFilterQuery (filterQuery)

Here at line no 9, I add “*” with the search term, it means we find the search term from facets containing the words starting with “search terms”. It must to add in order to refine your searching.

Example 3: How to get all facet values that particular facet contains.

	SolrQuery solrQuery = new SolrQuery ("*")

	solrQuery.add("wt","json")

	solrQuery.add("json.nl", "map")

	solrQuery.setRows(0)

	solrQuery.setFacet(true)

	solrQuery.addFacetField("company_type")

From this query you get the JSONObject containing facet company_type as key and its facet values.

Existence (and non-existence) queries:

This is actually not a new syntax case, but an application of range queries. Suppose you wanted to match all of the documents that have an indexed value in a field. Here we find all of the documents that have something in facet Employee_Name:

Employee_Name: [* TO *]

This can be negated to find documents that do not have a value for Employee_Name, as shown in the following code:

- Employee_Name: [* TO *]

 

Hope it helps!

Using facets in Solr using Solr4J in Java or Grails applications

October 08, 2012 by Jasgeet Singh

I am using faceting, one of the features of Solr (java search server) in my current project  and would like to share my experience and learning.

Faceting is Solr's killer-feature. It's a must-have feature for most search implementations, especially those with structured data. Faceting enhances search results with aggregated information over all the documents found in the search, not the entire index.Faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search.Now might be a good time for a screenshot,

facet

If we revisit the comparison of search technology to databases, then faceting is more or less analogous to SQL's GROUP BY feature on a column with count(*). However, in Solr, facet processing is performed subsequent to an existing search as part of a single request-response with both the primary search results and the faceting results coming back together. Like as in following screenshot after selecting some facet values we have breadcrumb trail using the following snippet:

def solrQuery = new SolrQuery( “*” )
solrQuery.setFilterQueries (/subcat: Accommodation Provider AND tags: tours/)

Note: *- query parameter which matches all documents

fq - (filter query) parameter, which allows one to filter by a query.

facet

Faceting Field Values

Field value faceting is the most common type of faceting. The following are the request parameters for using it.

• facet.field: You must set this parameter to a field's name in order to facet on that field. Repeat this parameter for each field to be faceted on.

solrQuery.addFacetField(“subcat”)
solrQuery.addFacetField(“tags”)
solrQuery.addFacetField(“languages”)

Note: parameter – field name from the IndexSchema.

• facet.sort : It is set to either count to sort the facet values by descending totals or to index to sort lexicographically, as if you sorted on the field. If facet.limitis greater than zero (it's 100 by default), then Solr picks count as the default, otherwise index is chosen.

solrQuery.setFacetSort(FacetParams.FACET_SORT_INDEX)
solrQuery.setFacetSort(FacetParams.FACET_SORT_COUNT)

•facet.limit: It defaults to 100. It limits the number of facet values returned in the searchresults of a field. As these are usually going to be displayed to the user, it doesn't make sense to have a large number of these in the response. If you need all of them then disable the limit with a value of -1.

solrQuery.setFacetLimit(10)

• facet.mincount: This defaults to 0. It filters out facet values that have facet counts less than this. This is applied before limit and offset so that paging works as expected. It is common to set this to 1 since 0 is almost useless.

solrQuery.setFacetMinCount(1)

• facet.missing: It defaults to blank and is set to true or on for the facet value listing to include an unnamed count at the end, which is the number of searched documents that have no indexed terms.

solrQuery.setFacetMissing(FacetParams.FACET_MISSING)

Resources

Read more about Solr on the Solr Home Page.

Read Solr’s Documentation

Read the Solr tutorial

Solr Integration with Grails using SolrJ API

September 27, 2012 by Varun Sharma

Solr - Apache Solr is a fast open-source Java search server. Solr enables you to easily create search engines which searches websites, databases and files.

Solrj is a java client to access solr. It offers a java interface to add, update, and query the solr index.

 

Configuring Solr - The 2 most important files in the Solr configuration are:

1. schema.xml

2. solrconfig.xml

 

schema.xml

schema.xml is usually the first file you configure when setting up a new Solr installation. The schema declares:

1. what kinds of fields there are

2. which field should be used as the unique/primary key

3. which fields are required

4. how to index and search each field

For e.g. If you create facet field name “product” you configure facet in schema.xml like as

Field-


 

Valid attributes for fields:

1. name: mandatory - the name for the field

2. type: mandatory - the name of a previously defined type from the section

3. indexed: true if this field should be indexed (searchable or sortable)

4. stored: true if this field should be retrievable

5. multiValued: true if this field may contain multiple values per document

6. omitNorms: (expert) set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.

7. termVectors: [false] set to true to store the term vector for a given field. When using more like this, fields used for similarity should be stored for best performance.

8. termPositions: Store position information with the term vector. This will increase storage costs.

9. termOffsets: Store offset information with the term vector. This will increase storage costs.

10. default: a value that should be used if no value is specified when adding a document.

11. copyField- copyField commands copy one field to another at the time a document is added to the index. It's used either to index the same field differently, or to add multiple fields to the same field for easier/faster searching.




 

solrconfig.xml

solrconfig.xml is usually the second file you configure when setting up a new Solr installation, after schema.xml. The more commonly-used elements in solrconfig.xml are:

1. data directory location

2. cache parameters

3. request handlers

4. search components

 

SolrJ API-

1. CommonsHttpSolrServer – Return a solr server

CommonsHttpSolrServer server = new CommonsHttpSolrServer
("http://localhost:8983/solr");

 

 

CommonsHttpSolrServer allows setting connection properties.

 String url = "http://localhost:8983/solr"
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url)
 server.setSoTimeout(1000);              
// socket read timeout
 server.setConnectionTimeout(100);
 server.setDefaultMaxConnectionsPerHost(100);
 server.setMaxTotalConnections(100);
 server.setFollowRedirects(false);              
// defaults to false   
 server.setAllowCompression(true);
 server.setMaxRetries(1);                 
// defaults to 0.  > 1 not recommended.
 server.setParser(new XMLResponseParser());     
// binary parser is used by default

 

2. SolrInputDocument – Adding Data to Solr (Add documents)

SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( "id", "id1");
doc1.addField( "name", "doc1");
doc1.addField( "price", “10”);
server.add(doc1);

server.commit() 
//Performs an explicit commit, causing pending documents to be 
committed for indexing

 

3. SolrQuery - Solr Search

String url = "http://localhost:8983/solr"
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url)
def solrQuery = new SolrQuery( params.searchTerms )
solrQuery.setRows(10)
solrQuery.setStart(params.start as int)
solrQuery.setFacet(true)
solrQuery.setFilterQueries('/' +filterQuery+ '/')
solrQuery.setSortField("order_group",SolrQuery.ORDER.asc)
server.query( solrQuery, SolrRequest.METHOD.POST).getResults()

 

4. Delete everything from Solr Document

server.deleteByQuery (“*:*”)

>Hope it helps !

Varun Sharma varun.sharma@oodlestechnologies.com

http://oodlestechnologies.com/

Powered byApache Solr

Follow Us

Recent Entries