Apache Solr vs ElasticSearch

The Feature Smackdown


API

Feature Solr 4.7.0 ElasticSearch 1.0
Format XML,CSV,JSON JSON
HTTP REST API
Binary API SolrJ TransportClient, Thrift (through a plugin)
JMX support ES specific stats are exposed through the REST API
Client libraries PHP, Ruby, Perl, Scala, Python, .NET, Javascript PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Erlang, Clojure
3rd-party product integration (open-source) Drupal, Magento, Django, ColdFusion, Wordpress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna) Drupal, Django, Symfony2, Wordpress, CouchBase
3rd-party product integration (commercial) DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR SearchBlox, Hortonworks Data Platform, MapR
Output JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java JSON, XML/HTML (via plugin)

Indexing

Feature Solr 4.7.0 ElasticSearch 1.0
Data Import DataImportHandler - JDBC, CSV, XML, Tika, URL, Flat File Rivers modules - ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia
ID field for updates and deduplication
DocValues
Partial Doc Updates with stored fields with _source field
Custom Analyzers and Tokenizers
Per-field analyzer chain
Per-doc/query analyzer chain
Synonyms Supports Solr and Wordnet synonym format
Multiple indexes
Near-Realtime Search/Indexing
Complex documents Flat document structure. No native support for nesting documents
Schemaless 4.4+
Multiple document types per schema One set of fields per schema, one schema per core
Online schema changes Schema change requires restart. Workaround possible using MultiCore. Only backward-compatible changes.
Apache Tika integration
Dynamic fields
Field copying via multi-fields
Hash-based deduplication

Searching

Feature Solr 4.7.0 ElasticSearch 1.0
Lucene Query parsing
Structured Query DSL Need to programmatically create queries if going beyond Lucene query syntax.
Span queries via SOLR-2703
Spatial search
Multi-point spatial search
Faceting The way top N facets work now is by getting the top N from each shard, and merging the results. This can give incorrect counts when num shards > 1.
Advanced Faceting blog post
Pivot Facets
More Like This
Boosting by functions
Boosting using scripting languages
Push Queries JIRA issue Percolation. Distributed percolation supported in 1.0
Field collapsing/Results grouping possibly 1.0+ link
Spellcheck Suggest API
Autocomplete Added in 0.90.3 here
Query elevation workaround
Joins It's not supported in distributed search. See LUCENE-3759. via has_children and top_children queries
Resultset Scrolling New to 4.7.0 via scan search type
Filter queries also supports filtering by native scripts
Filter execution order local params and cache property _cache and _cache_key property
Alternative QueryParsers DisMax, eDisMax query_string, dis_max, match, multi_match etc
Negative boosting but awkward. Involves positively boosting the inverse set of negatively-boosted documents.
Search across multiple indexes it can search across multiple compatible collections
Result highlighting
Custom Similarity
Searcher warming on index reload Warmers API

Customizability

Feature Solr 4.7.0 ElasticSearch 1.0
Pluggable API endpoints
Pluggable search workflow via SearchComponents
Pluggable update workflow
Pluggable Analyzers/Tokenizers
Pluggable Field Types
Pluggable Function queries
Pluggable scoring scripts
Pluggable hashing
Pluggable webapps site plugin
Automated plugin installation Installable from GitHub, maven, sonatype or elasticsearch.org

Distributed

Feature Solr 4.7.0 ElasticSearch 1.0
Self-contained cluster Depends on separate ZooKeeper server Only ElasticSearch nodes
Automatic node discovery ZooKeeper internal Zen Discovery or ZooKeeper
Partition tolerance The partition without a ZooKeeper quorum will stop accepting indexing requests or cluster state changes, while the partition with a quorum continues to function. Partitioned clusters can diverge unless discovery.zen.minimum_master_nodes set to at least N/2+1, where N is the size of the cluster. If configured correctly, the partition without a quorum will stop operating, while the other continues to work. See this
Automatic failover If all nodes storing a shard and its replicas fail, client requests will fail, unless requests are made with the shards.tolerant=true parameter, in which case partial results are retuned from the available shards.
Automatic leader election
Shard replication
Sharding
Automatic shard rebalancing it can be machine, rack, availability zone, and/or data center aware. Arbitrary tags can be assigned to nodes and it can be configured to not assign the same shard and its replicates on a node with the same tags.
Change # of shards Shards can be added (when using implicit routing) or split (when using compositeId). Cannot be lowered. Replicas can be increased anytime. each index has 5 shards by default. Number of primary shards cannot be changed once the index is created. Replicas can be increased anytime.
Relocate shards and replicas can be done by creating a shard replicate on the desired node and then removing the shard from the source node can move shards and replicas to any node in the cluster on demand
Control shard routing shards or _route_ parameter routing parameter
Consistency Indexing requests are synchronous with replication. A indexing request won't return until all replicas respond. No check for downed replicas. They will catch up when they recover. When new replicas are added, they won't start accepting and responding to requests until they are finished replicating the index. Replication between nodes is synchronous by default, thus ES is consistent by default, but it can be set to asynchronous on a per document indexing basis. Index writes can be configured to fail is there are not sufficient active shard replicas. The default is quorum, but all or one are also available.

Misc

Feature Solr 4.7.0 ElasticSearch 1.0
Web Admin interface bundled with Solr via site plugins: elasticsearch-head, bigdesk, kopf, elasticsearch-HQ, Hammer
Hosting providers WebSolr, Searchify, Hosted-Solr, IndexDepot, OpenSolr, gotosolr bonsai.io, Indexisto, qbox.io, IndexDepot


Thoughts...

As a number of folks point out in the discussion below, feature comparisons are inherently shallow and only go so far. I think they serve a purpose, but shouldn't be taken to be the last word on these 2 fantastic search products.

If you're running a smallish site and need search features without fancy bells-and-whistles, I think you'll be very happy with either Solr or ElasticSearch.

I've found ElasticSearch to be friendlier to teams which are used to REST APIs, JSON etc and don't have a Java background. If you're planning a large installation that requires running distributed search instances, I suspect you're also going to be happier with ElasticSearch.

As Matt Weber points out below, ElasticSearch was built to be distributed from the ground up, not tacked on as an 'afterthought' like it was with Solr. This is totally evident when examining the design and architecture of the 2 products, and also when browsing the source code.



Resources



Contribute

If you see any mistakes, or would like to append to the information on this webpage, you can clone the GitHub repo for this site with:

git clone https://github.com/superkelvint/solr-vs-elasticsearch

and submit a pull request.



Popular books related to Search

 
 
 
 
 
 
 
 


Discussion

blog comments powered by Disqus