ELK Stack (Part 1): Elasticsearch

An open-source and a very powerful analytics platform. ELK stack is a combination of Elasticsearch, Logstash and Kibana. All three are developed by an organization by the name ELASTIC. Elastic started off as an enterprise search platform vendor, but the success and wide adoption of Elasticsearch has helped it become a full-service analytics software company. All the three systems Elasticsearch, Logstash and Kibana though complement each other; they all are separate projects.

Analytics at its core is search coupled with excellent visualization. Lucene with all its search goodness was brought together with the distributed-computing goodness that is Elasticsearch. Logstash came onto the scene to normalize all kinds of time-series data. Throw in Kibana’s ultra-simple visualization tool, and voila you have ELK a complete analytics tool.

To top it all ELK is very versatile. It can be used as standalone application or can be integrated with other existing applications to get most current data. For example, Kibana often goes together with Solr/Lucene.
Let us now look into each of these one by one.


Elasticsearch provides scalable search, has near real-time search, and supports multitenancy. “Elasticsearch is distributed, which means that indices can be divided into shards and each shard can have zero or more replicas. Each node hosts one or more shards, and acts as a coordinator to delegate operations to the correct shard(s). Rebalancing and routing are done automatically.

Elasticsearch uses Lucene and tries to make all its features available through the JSON and Java API. It supports facetting and percolating, which can be useful for notifying if new documents match for registered queries.

Another feature is called “gateway” and handles the long-term persistence of the index; for example, an index can be recovered from the gateway in the event of a server crash. Elasticsearch supports real-time GET requests, which makes it suitable as a NoSQL datastore.

So basically Elasticsearch is a juggernaut solution for your data extraction problems. Going a bit deeper into the benefits explained above:

Real-time data and real-time analytics: The ELK stack gives you the power of real-time data insights, with the ability to perform super-fast data extractions from virtually all structured or unstructured data sources.

Scalable, high-availability, multi-tenant: With Elasticsearch, you can start small and expand it along with your business growth-when you are ready. It is built to scale horizontally out of the box. As you need more capacity, simply add another node and let the cluster reorganize itself to accommodate and exploit the extra hardware. Elasticsearch clusters are resilient, since they automatically detect and remove node failures. You can set up multiple indices and query each of them independently or in combination.

Full text search: Elasticsearch uses Lucene to provide the most powerful full-text search capabilities available in any open-source product. The search features come with multi-language support, an extensive query language, geolocation support, and context-sensitive suggestions, and autocompletion.

Notable users of Elasticsearch include Wikimedia, Facebook, StumbleUpon, Mozilla, Amadeus IT Group, Quora, Foursquare, etc