MongoDB Aggregation Pipeline로 N+1 문제 해결하기: $lookup과 $facet 활용

실제 경험과 인사이트를 AI와 함께 정리한 글 2025. 11. 3. 15:38

작성일: 2025-11-02
카테고리: MongoDB, Database, Performance, Aggregation
난이도: 중급

TL;DR

문제: 기존 코드가 3번 쿼리 + 메모리 조인으로 페이지네이션마다 반복 조회
해결: Aggregation Pipeline의 $facet + $lookup으로 1번 쿼리에 모든 것 처리
핵심: $facet + $lookup의 주요 장점은 네트워크 왕복 횟수 감소 (count + data + join을 1번에)
결과: 약 58% 성능 개선 예상 (이론적 계산), 쿼리 횟수 66% 감소 (3번 → 1번)
주의: $lookup은 서버 측 조인, $in + Map은 애플리케이션 조인 - 환경에 따라 선택

들어가며

imprun.dev는 "API 개발부터 AI 통합까지, 모든 것을 하나로 제공"하는 Kubernetes 기반 API 플랫폼입니다. CloudFunction Directory API를 개발하면서 N+1 문제와 맞닥뜨렸습니다.

우리가 마주한 질문:

❓ 페이지네이션할 때마다 gateway 정보를 다시 조회하는 게 맞나?
❓ Map으로 메모리 조인하는 게 정말 최선일까?
❓ MongoDB Aggregation Pipeline을 제대로 활용하고 있나?

검증 과정:

$in + Map 패턴 (기존)
- ✅ PostgreSQL에서 익숙한 패턴
- ❌ 페이지마다 3번 쿼리 (count + find + $in)
- ❌ 메모리 조인 코드 복잡
Aggregation Pipeline (개선) ← 최종 선택
- ✅ 1번 쿼리로 모든 것 처리
- ✅ $facet으로 count와 data 병렬 처리
- ✅ $lookup으로 서버 측 조인

결론:

✅ 쿼리 횟수 66% 감소 (3번 → 1번)
✅ 코드 30줄 단축 (105줄 → 75줄)
✅ 메모리 조인 제거로 코드 간결화

이 글은 imprun.dev 플랫폼 구축 경험을 바탕으로, MongoDB Aggregation Pipeline을 활용한 실전 N+1 해결 패턴을 공유합니다.

N+1 쿼리 문제란?

N+1 쿼리 문제는 관계형 데이터를 조회할 때 발생하는 전형적인 성능 이슈입니다.

전형적인 N+1 패턴

// ❌ N+1 문제 발생 코드
const functions = await db.find('CloudFunction').limit(20) // 1번 쿼리

for (const func of functions) {
  const gateway = await db.findOne('ApiGateway', {
    gatewayId: func.gatewayId
  }) // N번 쿼리!
  console.log(gateway.name)
}

결과: 20개 조회 시 21번 쿼리 (1 + 20)

graph TB
    App[Application]
    DB[(MongoDB)]

    App -->|"1. find CloudFunction (limit 20)"| DB
    App -->|"2. findOne ApiGateway (gatewayId=1)"| DB
    App -->|"3. findOne ApiGateway (gatewayId=2)"| DB
    App -->|"... 20번 반복 ..."| DB

    style App stroke:#2563eb,stroke-width:2px
    style DB stroke:#dc2626,stroke-width:3px

기존 코드의 문제점

imprun.dev의 Function Directory API 초기 구현을 분석해보겠습니다.

Before: $in + Map 패턴 (3번 쿼리)

파일: server/src/function/function.service.ts (리팩토링 전)

async getFunctionsDirectory(
  gatewayId?: string,
  search?: string,
  page: number = 1,
  pageSize: number = 20,
) {
  const db = SystemDatabase.db

  // Build query
  const query: any = {}
  if (gatewayId) {
    query.gatewayId = gatewayId
  }
  if (search) {
    query.$or = [
      { name: { $regex: search, $options: 'i' } },
      { desc: { $regex: search, $options: 'i' } },
    ]
  }

  // 🔴 1번 쿼리: Count
  const total = await db
    .collection('CloudFunction')
    .countDocuments(query)

  // Calculate pagination
  const skip = (page - 1) * pageSize

  // 🔴 2번 쿼리: Find functions
  const functions = await db
    .collection('CloudFunction')
    .find(query)
    .sort({ updatedAt: -1 })
    .skip(skip)
    .limit(pageSize)
    .toArray()

  // 🔴 3번 쿼리: Find gateways with $in
  const gatewayIds = [...new Set(functions.map((fn) => fn.gatewayId))]
  const gateways = await db
    .collection('ApiGateway')
    .find({ gatewayId: { $in: gatewayIds } })
    .toArray()

  // 🔴 메모리 조인: Map 생성
  const gatewayNameMap = new Map<string, string>()
  gateways.forEach((gw) => {
    gatewayNameMap.set(gw.gatewayId, gw.name)
  })

  // 🔴 메모리 조인: Map.get()
  const items = functions.map((fn) => ({
    _id: fn._id.toHexString(),
    gatewayId: fn.gatewayId,
    gatewayName: gatewayNameMap.get(fn.gatewayId) || 'Unknown Gateway',
    name: fn.name,
    // ... 기타 필드
  }))

  return { items, pagination: { page, pageSize, total } }
}

문제점 분석

graph LR
    subgraph "Page 1 요청"
        Q1["1️⃣ count"]
        Q2["2️⃣ find"]
        Q3["3️⃣ $in"]
        Q1 --> Q2 --> Q3
    end

    subgraph "Page 2 요청"
        Q4["1️⃣ count"]
        Q5["2️⃣ find"]
        Q6["3️⃣ $in (재조회!)"]
        Q4 --> Q5 --> Q6
    end

    style Q3 stroke:#dc2626,stroke-width:3px
    style Q6 stroke:#dc2626,stroke-width:3px

핵심 문제:

페이지마다 3번 쿼리 반복: count + find + $in을 매번 실행
같은 gateway 재조회: 페이지를 넘길 때마다 동일한 gateway 정보를 다시 조회
메모리 조인 복잡성: Map 생성 및 관리 코드

해결책: Aggregation Pipeline

MongoDB의 Aggregation Pipeline을 활용하면 1번의 쿼리로 모든 것을 처리할 수 있습니다.

After: $facet + $lookup (1번 쿼리)

파일: server/src/function/function.service.ts:897-1034 (리팩토링 후)

async getFunctionsDirectory(
  gatewayId?: string,
  search?: string,
  page: number = 1,
  pageSize: number = 20,
) {
  const db = SystemDatabase.db

  // Build match stage
  const matchStage: any = {}
  if (gatewayId) {
    matchStage.gatewayId = gatewayId
  }
  if (search) {
    matchStage.$or = [
      { name: { $regex: search, $options: 'i' } },
      { desc: { $regex: search, $options: 'i' } },
      { tags: { $regex: search, $options: 'i' } },
    ]
  }

  const skip = (page - 1) * pageSize

  // ✅ 1번 쿼리: Aggregation Pipeline
  const result = await db
    .collection('CloudFunction')
    .aggregate([
      // 1️⃣ 필터링
      { $match: matchStage },

      // 2️⃣ $facet으로 count와 data를 병렬 처리
      {
        $facet: {
          // Count pipeline
          metadata: [{ $count: 'total' }],

          // Data pipeline
          data: [
            { $sort: { updatedAt: -1 } },
            { $skip: skip },
            { $limit: pageSize },

            // 3️⃣ $lookup으로 ApiGateway 조인
            {
              $lookup: {
                from: 'ApiGateway',
                localField: 'gatewayId',
                foreignField: 'gatewayId',
                as: 'gateway',
              },
            },

            // 4️⃣ gateway name 추출
            {
              $addFields: {
                gatewayName: {
                  $ifNull: [
                    { $arrayElemAt: ['$gateway.name', 0] },
                    'Unknown Gateway'
                  ],
                },
              },
            },

            // 5️⃣ gateway 배열 제거 (불필요)
            { $project: { gateway: 0 } },
          ],
        },
      },
    ])
    .toArray()

  // Extract results from $facet
  const total = result[0]?.metadata[0]?.total || 0
  const functions = result[0]?.data || []

  // Transform to DTO
  const list = functions.map((fn: any) => ({
    _id: fn._id.toHexString(),
    gatewayId: fn.gatewayId,
    gatewayName: fn.gatewayName, // ✅ 이미 조인됨!
    name: fn.name,
    // ... 기타 필드
  }))

  return { list, total, page, pageSize }
}

개선 결과

graph TB
    AGG["1️⃣ aggregate (match + facet + lookup)"]
    R["결과: count + data (gateway name 포함)"]

    AGG --> R

    style AGG stroke:#16a34a,stroke-width:3px
    style R stroke:#16a34a,stroke-width:2px

개선 효과:

메트릭	Before	After	개선율
쿼리 횟수	3번	1번	66% 감소
코드 라인	105줄	75줄	30줄 단축
메모리 조인	Map 필요	불필요	제거
재조회 문제	있음	없음	해결

Aggregation Pipeline 핵심 개념

1. $facet: 병렬 파이프라인

$facet은 여러 파이프라인을 병렬로 실행합니다.

{
  $facet: {
    // Pipeline 1: Count
    metadata: [
      { $count: 'total' }
    ],

    // Pipeline 2: Data with pagination
    data: [
      { $sort: { updatedAt: -1 } },
      { $skip: 0 },
      { $limit: 20 },
      // ... 기타 stage
    ]
  }
}

장점:

✅ Count와 data 조회를 1번의 쿼리로 처리
✅ 각 파이프라인은 독립적으로 동작
✅ 네트워크 왕복 횟수 감소

2. $lookup: 서버 측 조인

$lookup은 다른 컬렉션과 조인합니다.

{
  $lookup: {
    from: 'ApiGateway',           // 조인할 컬렉션
    localField: 'gatewayId',      // 현재 문서의 필드
    foreignField: 'gatewayId',    // 조인 대상 필드
    as: 'gateway'                 // 결과 배열 이름
  }
}

결과:

{
  _id: ObjectId("..."),
  gatewayId: "gw-123",
  name: "myFunction",
  gateway: [                      // ← $lookup 결과
    {
      gatewayId: "gw-123",
      name: "My Gateway"
    }
  ]
}

3. $arrayElemAt + $addFields: 배열에서 값 추출

{
  $addFields: {
    gatewayName: {
      $ifNull: [
        { $arrayElemAt: ['$gateway.name', 0] },  // 배열 첫 번째 요소
        'Unknown Gateway'                         // 기본값
      ]
    }
  }
}

결과:

{
  _id: ObjectId("..."),
  gatewayId: "gw-123",
  name: "myFunction",
  gatewayName: "My Gateway",      // ← 추출된 값
  gateway: [...]                   // 원본 배열 유지
}

4. $project: 불필요한 필드 제거

{
  $project: {
    gateway: 0  // gateway 배열 제거
  }
}

Before vs After 상세 비교

코드 복잡도 비교

Before (105줄):

// 1. Count 쿼리
const total = await db.collection('CloudFunction').countDocuments(query)

// 2. Find 쿼리
const functions = await db
  .collection('CloudFunction')
  .find(query)
  .sort({ updatedAt: -1 })
  .skip(skip)
  .limit(pageSize)
  .toArray()

// 3. $in 쿼리
const gatewayIds = [...new Set(functions.map((fn) => fn.gatewayId))]
const gateways = await db
  .collection('ApiGateway')
  .find({ gatewayId: { $in: gatewayIds } })
  .toArray()

// 4. Map 생성
const gatewayNameMap = new Map<string, string>()
gateways.forEach((gw) => {
  gatewayNameMap.set(gw.gatewayId, gw.name)
})

// 5. 메모리 조인
const items = functions.map((fn) => ({
  gatewayName: gatewayNameMap.get(fn.gatewayId) || 'Unknown Gateway',
  // ...
}))

After (75줄):

// 1. Aggregation Pipeline (1번 쿼리)
const result = await db.collection('CloudFunction').aggregate([
  { $match: matchStage },
  {
    $facet: {
      metadata: [{ $count: 'total' }],
      data: [
        { $sort: { updatedAt: -1 } },
        { $skip: skip },
        { $limit: pageSize },
        { $lookup: { from: 'ApiGateway', ... } },
        { $addFields: { gatewayName: ... } },
        { $project: { gateway: 0 } },
      ],
    },
  },
]).toArray()

// 2. 결과 추출 (이미 조인됨!)
const total = result[0]?.metadata[0]?.total || 0
const functions = result[0]?.data || []

// 3. DTO 변환
const list = functions.map((fn: any) => ({
  gatewayName: fn.gatewayName, // ✅ 이미 있음!
  // ...
}))

실행 흐름 비교

Before (3번 왕복):

Client → MongoDB: countDocuments()
MongoDB → Client: { total: 142 }

Client → MongoDB: find().skip().limit()
MongoDB → Client: [{ gatewayId: "gw-1", ... }, ...]

Client → MongoDB: find({ gatewayId: { $in: [...] } })
MongoDB → Client: [{ gatewayId: "gw-1", name: "Gateway A" }, ...]

Client: Map 생성 + 메모리 조인

After (1번 왕복):

Client → MongoDB: aggregate([...])
MongoDB: $match → $facet [
  metadata: $count
  data: $sort → $skip → $limit → $lookup → $addFields → $project
]
MongoDB → Client: {
  metadata: [{ total: 142 }],
  data: [{ gatewayName: "Gateway A", ... }, ...]
}

Client: 결과 추출만

심화: $lookup vs $in + Map 성능 비교

MongoDB 서버에서 조인하는 것과 애플리케이션에서 Map으로 조인하는 것, 실제로 어느 것이 더 빠를까요?

핵심 차이점

$lookup (MongoDB 서버 조인):

✅ 1번 네트워크 왕복
⚠️ 서버 측에서 조인 처리 (내부 최적화 방식은 MongoDB에 위임)
📝 참고: 공식 문서에 따르면 $lookup + $unwind + $match는 optimizer가 최적화 가능

$in + Map (애플리케이션 조인):

❌ 2번 네트워크 왕복
✅ 고유한 값만 조회 (중복 제거 후 조회)
✅ 메모리 조인 O(1)

시나리오별 선택

$lookup + $facet 권장:

✅ 네트워크 latency가 큰 환경 (VPN, 멀티 리전)
✅ 페이지네이션 + count 동시 조회
✅ 관계가 균등하게 분산된 경우

$in + Map 권장:

✅ 중복이 매우 많은 관계 (N:1)
✅ 캐싱 가능한 데이터
✅ 네트워크가 빠른 환경 (같은 데이터센터)

imprun.dev의 경우

환경: Tailscale VPN (latency

~~50ms), 평균 3~~

5개 gateway에 20개 function 분산

이론적 계산 (네트워크 latency 기반):

기존 (3번 쿼리): ~168ms

count (55ms) + find (60ms) + $in (52ms) + 메모리 조인 (1ms) ≈ 168ms

개선 ($facet 1번 쿼리): ~71ms

aggregate [count + find + $lookup + addFields] ≈ 71ms

결과: 약 58% 개선 예상 🎉

⚠️ 참고: 위 수치는 네트워크 latency(~50ms)를 기반으로 한 이론적 계산입니다. 실제 성능은 데이터 분포, 인덱스, MongoDB 내부 최적화에 따라 달라질 수 있습니다.

★ Insight ─────────────────────────────────────
핵심: $facet + $lookup의 주요 장점은 네트워크 왕복 횟수 감소!

3번 왕복 → 1번 왕복으로 감소
VPN/멀티 리전 환경에서 네트워크 latency가 큰 경우 효과적
MongoDB 서버 내부에서 조인 처리 (추가 네트워크 비용 없음)
─────────────────────────────────────────────────

실전 적용 경험

imprun.dev에서 Function Directory API를 리팩토링한 경험을 공유합니다.

운영 환경

인프라:

MongoDB 7.0 (ReplicaSet 3대)
Kubernetes 3노드 클러스터 (ARM64, 4 cores, 24GB)
네트워크: Tailscale VPN

리팩토링 과정

기존 코드 문제 발견
- 사용자가 "N+1 같다"고 지적
- 페이지네이션마다 3번 쿼리 확인
Aggregation Pipeline 학습
- MongoDB 공식 문서 참고
- $facet + $lookup 패턴 연구
단계별 리팩토링
- $match 단계 작성
- $facet으로 count/data 분리
- $lookup으로 조인 추가
- $addFields로 필드 추출
테스트 및 검증
- 기존 API 응답과 동일한지 확인
- 쿼리 횟수 모니터링

운영 결과

개발 경험:

✅ 쿼리 횟수 66% 감소 (3번 → 1번)
✅ 코드 30줄 단축 (가독성 향상)
✅ Map 관리 코드 제거 (유지보수 편의성)

주의사항:

⚠️ gatewayId 필드에 인덱스 필수 ($lookup 성능)
⚠️ Aggregation Pipeline은 16MB 메모리 제한
⚠️ 복잡한 파이프라인은 explain()으로 성능 확인 필요

코드 작성 가이드

체크리스트

Aggregation Pipeline 최적화 체크리스트:

인덱스 확인: $match와 $lookup 필드에 인덱스 있는지 확인
$match 최상단 배치: 필터링을 최대한 앞에서 처리
$facet 활용: count와 data를 병렬 처리
$project로 필드 제거: 불필요한 필드는 제거하여 메모리 절약
explain() 실행: 쿼리 실행 계획 확인

템플릿 코드

/**
 * Aggregation Pipeline 템플릿: Pagination + Join
 */
async function getPaginatedWithJoin<T>(
  collection: string,
  joinCollection: string,
  joinField: string,
  query: any = {},
  page: number = 1,
  pageSize: number = 20,
) {
  const skip = (page - 1) * pageSize

  const result = await db.collection(collection).aggregate([
    // 1. Filter
    { $match: query },

    // 2. $facet: Count + Data
    {
      $facet: {
        metadata: [{ $count: 'total' }],
        data: [
          { $sort: { createdAt: -1 } },
          { $skip: skip },
          { $limit: pageSize },

          // 3. Join
          {
            $lookup: {
              from: joinCollection,
              localField: joinField,
              foreignField: joinField,
              as: 'joined',
            },
          },

          // 4. Extract joined field
          {
            $addFields: {
              joinedName: {
                $ifNull: [
                  { $arrayElemAt: ['$joined.name', 0] },
                  'Unknown'
                ],
              },
            },
          },

          // 5. Remove joined array
          { $project: { joined: 0 } },
        ],
      },
    },
  ]).toArray()

  const total = result[0]?.metadata[0]?.total || 0
  const data = result[0]?.data || []

  return {
    list: data,
    total,
    page,
    pageSize,
  }
}

// 사용 예시
const result = await getPaginatedWithJoin(
  'CloudFunction',
  'ApiGateway',
  'gatewayId',
  { name: /^dev\// },
  1,
  20,
)

인덱스 설정

// MongoDB 인덱스 생성
db.CloudFunction.createIndex({ gatewayId: 1 })
db.CloudFunction.createIndex({ updatedAt: -1 })
db.ApiGateway.createIndex({ gatewayId: 1 })

// 복합 인덱스 (필요 시)
db.CloudFunction.createIndex({
  gatewayId: 1,
  updatedAt: -1
})

성능 측정

// explain()으로 실행 계획 확인
const explain = await db.collection('CloudFunction')
  .aggregate([...])
  .explain('executionStats')

console.log('Execution time:', explain.executionStats.executionTimeMillis)
console.log('Documents examined:', explain.executionStats.totalDocsExamined)

// IXSCAN이 나오면 인덱스 사용 중 (✅ 좋음)
// COLLSCAN이 나오면 전체 스캔 (❌ 인덱스 추가 필요)

마무리

핵심 요약

MongoDB에서 N+1 문제를 해결하는 최적 패턴은 Aggregation Pipeline의 $facet + $lookup입니다.

$match: 필터링 (최상단 배치)
$facet: count와 data 병렬 처리
$lookup: 다른 컬렉션과 조인
$addFields: 조인 결과에서 필드 추출
$project: 불필요한 필드 제거

핵심 선택 기준

$facet + $lookup이 최적인 경우:

✅ 네트워크 latency가 큰 환경 (VPN, 멀티 리전)
✅ 페이지네이션 + count 동시 조회 필요
✅ 코드 간결성과 유지보수성 중요

$in + Map이 최적인 경우:

✅ 중복이 매우 많은 N:1 관계
✅ 캐싱 가능한 데이터 (자주 변경되지 않음)
✅ 네트워크가 빠른 환경 (같은 데이터센터)

실제로는: 네트워크 왕복 횟수가 성능에 가장 큰 영향!

실제 적용 결과

imprun.dev 환경 (Tailscale VPN):

✅ 약 58% 성능 개선 예상 (이론적 계산: 168ms → 71ms)
✅ 쿼리 횟수 66% 감소 (3번 → 1번)
✅ 코드 30줄 단축 (105줄 → 75줄)
✅ 메모리 조인 제거 (Map 관리 불필요)