ELK Stack 대안 - OpenSearch로 로그 분석 시스템 구축하기

728x90

현대의 마이크로서비스 아키텍처와 클라우드 네이티브 환경에서 로그 분석과 모니터링은 필수적인 요소가 되었습니다.

기존에 널리 사용되던 ELK Stack(Elasticsearch, Logstash, Kibana)의 라이선스 변경으로 인해 많은 기업들이 대안을 찾고 있습니다.

이러한 상황에서 Amazon OpenSearch(구 Elasticsearch Service)는 강력하고 비용 효율적인 대안으로 주목받고 있습니다.

이 글에서는 OpenSearch를 활용한 로그 분석 시스템의 구축 방법과 실제 운영 경험을 공유하겠습니다.

OpenSearch란 무엇인가?

OpenSearch는 Apache 2.0 라이선스 하에 배포되는 오픈소스 검색 및 분석 엔진입니다.

Elasticsearch 7.10.2 버전을 기반으로 포크되어 개발되었으며, Amazon Web Services에서 주도적으로 개발하고 있습니다.

OpenSearch는 실시간 애플리케이션 모니터링, 로그 분석, 웹사이트 검색 등 다양한 용도로 사용할 수 있습니다.

기존 Elasticsearch와 높은 호환성을 유지하면서도 오픈소스 정신을 계승하고 있어, 많은 개발자들에게 신뢰받는 선택지가 되고 있습니다.

OpenSearch의 주요 특징으로는 분산 처리 능력, RESTful API 지원, 다양한 데이터 타입 지원, 실시간 검색 및 분석 기능 등이 있습니다.

또한 OpenSearch Dashboards를 통해 직관적인 데이터 시각화와 대시보드 구성이 가능합니다.

ELK Stack과 OpenSearch 비교 분석

라이선스 및 비용 구조

ELK Stack의 경우 Elastic License 2.0과 Server Side Public License(SSPL) 하에 배포되어,

상업적 사용에 제약이 있을 수 있습니다.

반면 OpenSearch는 Apache 2.0 라이선스를 사용하여 완전히 자유로운 사용이 가능합니다.

이는 특히 클라우드 서비스 제공업체나 SaaS 기업에게 중요한 고려사항입니다.

성능 및 기능 비교

OpenSearch는 Elasticsearch 7.10.2를 기반으로 하면서도 독자적인 성능 개선을 지속하고 있습니다.

인덱싱 성능은 Elasticsearch와 유사하거나 일부 워크로드에서 더 우수한 성능을 보입니다.

특히 대용량 로그 데이터 처리에서 메모리 사용량 최적화와 쿼리 응답 시간 개선이 눈에 띕니다.

생태계 및 커뮤니티

OpenSearch는 비교적 새로운 프로젝트이지만, AWS의 강력한 지원과 활발한 오픈소스 커뮤니티를 바탕으로 빠르게 성장하고 있습니다.

기존 Elasticsearch 플러그인들의 상당수가 OpenSearch와 호환되며, 새로운 플러그인들도 지속적으로 개발되고 있습니다.

OpenSearch 클러스터 구축하기

하드웨어 요구사항

OpenSearch 클러스터를 구축하기 위해서는 적절한 하드웨어 사양이 필요합니다.

최소 요구사항으로는 4GB RAM, 2 CPU 코어, 20GB 디스크 공간이 권장됩니다.

프로덕션 환경에서는 32GB RAM, 8 CPU 코어, SSD 스토리지를 사용하는 것이 좋습니다.

Docker를 이용한 OpenSearch 설치

가장 간단한 방법은 Docker를 사용하는 것입니다.

다음은 기본적인 OpenSearch 클러스터를 구성하는 docker-compose.yml 파일 예시입니다:

version: '3'
services:
  opensearch-node1:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node1
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node1
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - opensearch-data1:/usr/share/opensearch/data
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - opensearch-net

  opensearch-node2:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node2
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node2
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - opensearch-data2:/usr/share/opensearch/data
    networks:
      - opensearch-net

  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:latest
    container_name: opensearch-dashboards
    ports:
      - 5601:5601
    environment:
      OPENSEARCH_HOSTS: '["https://opensearch-node1:9200","https://opensearch-node2:9200"]'
    networks:
      - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:

networks:
  opensearch-net:

이 구성을 통해 2노드 OpenSearch 클러스터와 OpenSearch Dashboards를 함께 실행할 수 있습니다.

클러스터 상태 확인

클러스터가 정상적으로 구동되었는지 확인하려면 다음 명령어를 사용합니다:

curl -X GET "localhost:9200/_cluster/health?pretty" -u 'admin:admin' -k

정상적인 응답은 다음과 같습니다:

{
  "cluster_name" : "opensearch-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 1,
  "active_shards" : 2,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

로그 수집 파이프라인 구성

Fluent Bit을 이용한 로그 수집

OpenSearch와 함께 사용할 수 있는 로그 수집 도구는 여러 가지가 있습니다.

Fluent Bit은 경량화된 로그 수집기로 OpenSearch와 완벽하게 호환됩니다.

다음은 Fluent Bit 설정 파일 예시입니다:

[SERVICE]
    Flush         1
    Log_Level     info
    Daemon        off
    Parsers_File  parsers.conf

[INPUT]
    Name              tail
    Path              /var/log/app/*.log
    Parser            json
    Tag               app.logs
    Refresh_Interval  5

[FILTER]
    Name                kubernetes
    Match               kube.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token

[OUTPUT]
    Name            opensearch
    Match           *
    Host            opensearch-node1
    Port            9200
    Index           fluent-bit-logs
    Type            _doc
    HTTP_User       admin
    HTTP_Passwd     admin
    tls             On
    tls.verify      Off

Logstash 호환성

기존에 Logstash를 사용하고 있던 경우, OpenSearch와의 호환성을 고려해야 합니다.

Logstash의 opensearch 출력 플러그인을 사용하면 기존 설정을 큰 변경 없이 OpenSearch로 마이그레이션할 수 있습니다.

output {
  opensearch {
    hosts => ["https://opensearch-node1:9200", "https://opensearch-node2:9200"]
    user => "admin"
    password => "admin"
    index => "logstash-logs-%{+YYYY.MM.dd}"
    ssl => true
    ssl_certificate_verification => false
  }
}

인덱스 템플릿 및 매핑 설정

인덱스 템플릿 생성

효율적인 로그 분석을 위해서는 적절한 인덱스 템플릿 설정이 중요합니다.

다음은 애플리케이션 로그를 위한 인덱스 템플릿 예시입니다:

{
  "index_patterns": ["app-logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 2,
      "number_of_replicas": 1,
      "index.refresh_interval": "5s",
      "index.max_result_window": 50000
    },
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "strict_date_optional_time||epoch_millis"
        },
        "level": {
          "type": "keyword"
        },
        "message": {
          "type": "text",
          "analyzer": "standard"
        },
        "service": {
          "type": "keyword"
        },
        "host": {
          "type": "keyword"
        },
        "thread": {
          "type": "keyword"
        },
        "logger": {
          "type": "keyword"
        }
      }
    }
  },
  "priority": 200,
  "composed_of": []
}

이 템플릿을 OpenSearch에 등록하려면 다음 명령어를 사용합니다:

curl -X PUT "localhost:9200/_index_template/app-logs-template" \
  -H "Content-Type: application/json" \
  -u 'admin:admin' -k \
  -d @app-logs-template.json

동적 매핑 제어

OpenSearch는 기본적으로 동적 매핑을 지원하지만, 프로덕션 환경에서는 명시적인 매핑 정의가 권장됩니다.

{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "@timestamp": {"type": "date"},
      "level": {"type": "keyword"},
      "message": {"type": "text"}
    }
  }
}

dynamic을 "strict"로 설정하면 정의되지 않은 필드가 추가될 때 오류가 발생하여 데이터 품질을 보장할 수 있습니다.

OpenSearch Dashboards로 시각화하기

기본 대시보드 구성

OpenSearch Dashboards는 Kibana와 유사한 인터페이스를 제공합니다.

로그 분석을 위한 기본 대시보드를 구성하는 과정은 다음과 같습니다:

Index Pattern 생성: app-logs-* 패턴으로 인덱스 패턴을 만듭니다.
Discover 탭에서 로그 데이터를 탐색합니다.
Visualize 탭에서 다양한 차트를 생성합니다.
Dashboard 탭에서 여러 시각화를 조합합니다.

실시간 로그 모니터링

실시간 로그 모니터링을 위해서는 다음과 같은 시각화가 유용합니다:

로그 레벨별 분포: 파이 차트를 사용하여 ERROR, WARN, INFO 로그의 비율을 표시
시간별 로그 발생량: 히스토그램을 사용하여 시간대별 로그 발생 패턴을 분석
서비스별 로그 분포: 막대 그래프를 사용하여 각 마이크로서비스의 로그 발생량을 비교

알림 설정

OpenSearch Dashboards의 Alerting 기능을 사용하여 특정 조건에 대한 알림을 설정할 수 있습니다.

예를 들어, 5분 내에 ERROR 로그가 100개 이상 발생하면 Slack이나 이메일로 알림을 받도록 설정할 수 있습니다.

{
  "name": "High Error Rate Alert",
  "type": "monitor",
  "monitor_type": "query_level_monitor",
  "enabled": true,
  "schedule": {
    "period": {
      "interval": 5,
      "unit": "MINUTES"
    }
  },
  "inputs": [{
    "search": {
      "indices": ["app-logs-*"],
      "query": {
        "size": 0,
        "query": {
          "bool": {
            "filter": [{
              "range": {
                "@timestamp": {
                  "from": "{{period_end}}||-5m",
                  "to": "{{period_end}}",
                  "include_lower": true,
                  "include_upper": true
                }
              }
            }, {
              "term": {
                "level": "ERROR"
              }
            }]
          }
        }
      }
    }
  }],
  "triggers": [{
    "name": "High error rate trigger",
    "severity": "1",
    "condition": {
      "script": {
        "source": "ctx.results[0].hits.total.value > 100"
      }
    },
    "actions": [{
      "name": "Send Slack notification",
      "destination_id": "slack-destination",
      "message_template": {
        "source": "High error rate detected: {{ctx.results.0.hits.total.value}} errors in the last 5 minutes"
      }
    }]
  }]
}

성능 최적화 전략

인덱스 라이프사이클 관리

대용량 로그 데이터를 효율적으로 관리하기 위해서는 Index State Management(ISM) 정책을 활용해야 합니다.

다음은 로그 인덱스의 라이프사이클을 관리하는 ISM 정책 예시입니다:

{
  "policy": {
    "policy_id": "log_policy",
    "description": "Log index lifecycle policy",
    "last_updated_time": 1640995200000,
    "schema_version": 1,
    "error_notification": null,
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_size": "10gb",
              "min_doc_count": 10000000,
              "min_index_age": "1d"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": {
              "min_index_age": "7d"
            }
          }
        ]
      },
      {
        "name": "warm",
        "actions": [
          {
            "replica_count": {
              "number_of_replicas": 0
            }
          }
        ],
        "transitions": [
          {
            "state_name": "cold",
            "conditions": {
              "min_index_age": "30d"
            }
          }
        ]
      },
      {
        "name": "cold",
        "actions": [],
        "transitions": [
          {
            "state_name": "delete",
            "conditions": {
              "min_index_age": "90d"
            }
          }
        ]
      },
      {
        "name": "delete",
        "actions": [
          {
            "delete": {}
          }
        ],
        "transitions": []
      }
    ]
  }
}

이 정책은 인덱스를 hot → warm → cold → delete 단계로 관리하여 스토리지 비용을 최적화합니다.

샤드 및 레플리카 설정

적절한 샤드 수 설정은 성능에 큰 영향을 미칩니다.

일반적으로 샤드 크기는 10-50GB로 유지하는 것이 권장됩니다.

# 인덱스별 샤드 크기 확인
curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,store&s=store" -u 'admin:admin' -k

쿼리 최적화

로그 분석 쿼리의 성능을 향상시키기 위해서는 다음 사항들을 고려해야 합니다:

시간 범위 필터링: 모든 로그 쿼리에 적절한 시간 범위를 설정합니다.
키워드 필드 활용: 정확한 매칭이 필요한 필드는 keyword 타입을 사용합니다.
집계 쿼리 최적화: 카디널리티가 높은 필드에 대한 집계는 피합니다.

{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "gte": "now-1h",
              "lte": "now"
            }
          }
        },
        {
          "term": {
            "service.keyword": "user-service"
          }
        }
      ]
    }
  },
  "aggs": {
    "log_levels": {
      "terms": {
        "field": "level.keyword",
        "size": 10
      }
    }
  }
}

보안 설정 및 모니터링

사용자 인증 및 권한 관리

OpenSearch는 기본적으로 Security 플러그인을 제공합니다.

역할 기반 접근 제어(RBAC)를 통해 세밀한 권한 관리가 가능합니다.

# roles.yml
log_reader:
  cluster_permissions:
    - "cluster_composite_ops_ro"
  index_permissions:
    - index_patterns:
        - "app-logs-*"
      allowed_actions:
        - "indices_monitor"
        - "read"

log_admin:
  cluster_permissions:
    - "cluster_all"
  index_permissions:
    - index_patterns:
        - "app-logs-*"
      allowed_actions:
        - "indices_all"

SSL/TLS 설정

프로덕션 환경에서는 반드시 SSL/TLS를 활성화해야 합니다.

OpenSearch는 자체 서명 인증서를 제공하지만, 실제 운영에서는 신뢰할 수 있는 CA에서 발급받은 인증서를 사용하는 것이 좋습니다.

# opensearch.yml
plugins.security.ssl.transport.pemcert_filepath: node.pem
plugins.security.ssl.transport.pemkey_filepath: node-key.pem
plugins.security.ssl.transport.pemtrustedcas_filepath: root-ca.pem
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: node.pem
plugins.security.ssl.http.pemkey_filepath: node-key.pem
plugins.security.ssl.http.pemtrustedcas_filepath: root-ca.pem

감사 로그 설정

보안 이벤트를 추적하기 위해 감사 로깅을 활성화할 수 있습니다.

plugins.security.audit.type: internal_opensearch
plugins.security.audit.config.disabled_rest_categories: NONE
plugins.security.audit.config.disabled_transport_categories: NONE

실제 운영 사례 및 베스트 프랙티스

대용량 트래픽 처리 경험

실제 프로덕션 환경에서 일일 100GB 이상의 로그 데이터를 처리하는 경우의 베스트 프랙티스를 공유합니다.

클러스터 구성:

마스터 노드 3개 (4GB RAM, 2 CPU)
데이터 노드 6개 (32GB RAM, 8 CPU, 1TB SSD)
인제스트 노드 2개 (16GB RAM, 4 CPU)

인덱스 설정:

샤드 수: 6개 (데이터 노드 수와 동일)
레플리카 수: 1개
리프레시 간격: 30초

장애 대응 및 복구 전략

OpenSearch 클러스터의 안정성을 위해 다음과 같은 모니터링 및 복구 전략을 수립해야 합니다:

클러스터 상태 모니터링: 매분마다 클러스터 상태를 확인하여 yellow 또는 red 상태 시 즉시 알림
디스크 사용량 모니터링: 디스크 사용률이 85%를 초과하면 경고, 90%를 초과하면 자동으로 인덱스 삭제
스냅샷 백업: 매일 자동으로 스냅샷을 생성하여 S3에 백업

# 스냅샷 저장소 등록
curl -X PUT "localhost:9200/_snapshot/s3_repository" \
  -H "Content-Type: application/json" \
  -u 'admin:admin' -k \
  -d '{
    "type": "s3",
    "settings": {
      "bucket": "opensearch-backups",
      "region": "us-west-2",
      "base_path": "snapshots"
    }
  }'

# 자동 스냅샷 정책 생성
curl -X PUT "localhost:9200/_plugins/_sm/policies/daily-snapshot" \
  -H "Content-Type: application/json" \
  -u 'admin:admin' -k \
  -d '{
    "policy": {
      "description": "Daily snapshot policy",
      "default_state": "creation",
      "states": [
        {
          "name": "creation",
          "actions": [
            {
              "snapshot": {
                "repository": "s3_repository",
                "snapshot": "daily-snapshot-{{ctx.trigger.scheduled_time | date:yyyy.MM.dd}}"
              }
            }
          ],
          "transitions": []
        }
      ],
      "sm_template": [
        {
          "index_patterns": ["app-logs-*"],
          "priority": 100
        }
      ]
    }
  }'

비용 최적화 전략

OpenSearch 운영 비용을 최적화하기 위한 실제 경험을 공유합니다:

인덱스 라이프사이클 관리: 오래된 로그는 압축률이 높은 cold storage로 이동
압축 설정: LZ4 압축을 사용하여 스토리지 사용량을 30% 절약
불필요한 필드 제거: 로그 파싱 단계에서 분석에 불필요한 필드는 제거

마이그레이션 가이드

Elasticsearch에서 OpenSearch로 마이그레이션

기존 Elasticsearch 클러스터에서 OpenSearch로 마이그레이션하는 과정은 다음과 같습니다:

호환성 확인: Elasticsearch 7.10.2 이하 버전에서 직접 마이그레이션 가능
스냅샷 생성: 기존 데이터의 스냅샷을 생성
OpenSearch 클러스터 구축: 새로운 OpenSearch 클러스터 설정
데이터 복원: 스냅샷을 OpenSearch 클러스터로 복원
애플리케이션 연결 변경: 클라이언트 애플리케이션의 엔드포인트 변경

# Elasticsearch 스냅샷 생성
curl -X PUT "elasticsearch:9200/_snapshot/migration_repo/migration_snapshot" \
  -H "Content-Type: application/json" \
  -d '{
    "indices": "*",
    "include_global_state": false
  }'

# OpenSearch에서 스냅샷 복원
curl -X POST "opensearch:9200/_snapshot/migration_repo/migration_snapshot/_restore" \
  -H "Content-Type: application/json" \
  -u 'admin:admin' -k \
  -d '{
    "indices": "*",
    "include_global_state": false
  }'

클라이언트 라이브러리 업데이트

기존 Elasticsearch 클라이언트를 OpenSearch 클라이언트로 변경해야 합니다.

Java 예시:

// Elasticsearch 클라이언트 (기존)
// RestHighLevelClient client = new RestHighLevelClient(
//     RestClient.builder(new HttpHost("localhost", 9200, "http")));

// OpenSearch 클라이언트 (신규)
OpenSearchClient client = new OpenSearchClient(
    OpenSearchTransport.create(
        RestClient.builder(new HttpHost("localhost", 9200, "https"))
    )
);

Python 예시:

# elasticsearch-py (기존)
# from elasticsearch import Elasticsearch
# es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

# opensearch-py (신규)
from opensearchpy import OpenSearch
client = OpenSearch([{'host': 'localhost', 'port': 9200}])

결론

OpenSearch는 ELK Stack의 강력한 대안으로, 오픈소스 라이선스의 자유로움과 함께 엔터프라이즈급 로그 분석 시스템을 구축할 수 있게 해줍니다.

Amazon의 지속적인 투자와 활발한 커뮤니티 지원으로 안정성과 성능이 지속적으로 개선되고 있습니다.

특히 클라우드 네이티브 환경에서의 로그 분석과 APM(Application Performance Monitoring) 구축에 있어 OpenSearch는 비용 효율적이면서도 확장 가능한 솔루션을 제공합니다.

본 글에서 소개한 구축 방법과 최적화 전략을 통해 안정적이고 효율적인 로그 분석 시스템을 구축할 수 있을 것입니다.

향후 OpenSearch 생태계는 더욱 발전할 것으로 예상되며, 머신러닝 기반 이상 탐지, 고급 시계열 분석, 향상된 보안 기능 등이 지속적으로 추가될 예정입니다.

따라서 로그 분석 시스템의 현대화를 고려하고 있다면, OpenSearch는 충분히 검토해볼 만한 가치 있는 선택지라고 할 수 있습니다.

참고 자료

추가 리소스

커뮤니티 및 지원

교육 자료