Topics Discussed:
- Introduction
- Fluent family
- Advantage
- Logstash configuration
- Q&A
- Splunk and Kibana use case
Introduction
What is observability?
It is an ability to observe a system.
Example: Dashboard of a Car
If we take an IT system, we can observe the system using dashboards and monitoring tools. Observability’s three pillars are logs, metrics and tracing ELK stack is one such tool for observability. Others are: Datadig, splunk, sumo logic etc.
Due to the emergence of public clouds, we need to get the logs in a centralized system and then need to observe the system from there.
ELK stack is using for this purpose.
E = Elasticsearch
L = Logstash
K = Kibana
Logstash =Ingest, Extraction , transform, enrichment , and output logs to storage
Beats family (filebeats, logbeats )are using to send different types of data to collection platform
Elasticagent is a single agent for shipping logs and is from elasticsearch
Fluent family:
fluentbit its simple log shipping agent.
Fluentd: having more features, other shipper tool from fluent family.
Elasticsearch: Can be used as a search and storage tool by shipping logs to the same. It is the world’s best distributed search engine.
Elastic SIEM: Security information and event management. Detect, investigate, and respond to evolving threats. Harness any data source at cloud scale
Elasticsearch built on apache lucene. Apache solr , other search engine based apache lucene. Apache to do to licencing was using formerly
Advantage:
Commodity hardware can use to build large clusters.
Kibana: Data visualization component in the stack Started as a tool to view the logs. Latest versions are having better capabilities now. It is able to provide caching and good visualizations can be created using Kibana.
Anomaly detection: Automated time series anomaly detection. For a fixed situation a threshold based alerting will work smoothly, but for a dynamic traffic based system it will be difficult using manual threshold alerting. In an elasticsearch based system we can train the system to find a lower and higher level deviations to figure out anomalies and alerts. elastic.co/downloads
Logstash configuration:
Three parts: input, filter and output Input and output are must Filter is transformation and enrichment.
Try to perform type conversion in logstash itself to avoid complexities in kibana visualizations.
Test syntax : Logstash -f -t ‘filename’ To run logstash: Logstash -f ‘conf file’
Why is port 5601 used for Kibana ? If we write 1065 and mirror it, we can see it as logs Elasticsearch has an inverted index and that is the reason for fast searching capability. In the indices all starting with _ is system indices.
Q&A
Beats , logstash and kafka are using as a stack why is it so? Beats are lighter shipper than logstash. If the logs are only using json then beats are the best option. If a system is having a spike, then it will create ingestion backlogs . To avoid this we can use a queuing system. So we can send logs from shipper to a kafka stream and then can consume from there to elasticsearch.
For a dynamic set of headers csv files, we can update configuration and need a restart for logstash.
Apache solr + cassandra two way sync is present, how can we replace this stack with elasticsearch and cassandra two way sync. We need connectors or default plugins , if it is not there we need a streaming solution.
How to observe an observability platform of elasticsearch?
For shippers it’s challenging to achieve HA. Persistent queues in logstash will help us to retain logs in case it fails to send logs to output.
Fleets UI to see whatever agent status from kibana itself.
Beats configuration can help us to read logs from a breaking point using beats’ own db. But it will not work in metric agents since it is lively collecting.
Kibana watchers plugins: elast alert is one plugin for the purpose.
Or else grafana is an option. We can use elasticsearch as a data source in grafana and can configure alerts based on metrics. Kibana alerts can also be used for the same purpose.
Splunk and Kibana use case: Splunk is an alternative to ELK stack. Its query language needs training since it’s proprietary.
RBAC is a free feature for elasticsearch. Always enable and use authentication with initial setup to avoid data exposed to the public.
ELK and EFK brief comparison. EFK is mainly using for kubernetes clusters. F – Fluentd