Back to Blog
NestJSKafkaClickHouseData Analytics

Kafka + ClickHouse High-Performance Analytics — Building a Modular Data Platform with NestJS

RaytonX
3 min read

Overview

In this article, we’ll explore how to use the NestJS framework to build a well-structured data analytics platform. The system is composed of four main modules:

  1. Database Module (database) – Encapsulates connections to ClickHouse, Kafka, and RabbitMQ
  2. API Module (api) – Exposes endpoints for data ingestion and query
  3. Scheduler Module (scheduler) – Handles scheduled data aggregation and analysis
  4. Consumer Module (consume) – Processes and writes asynchronous data from message queues

Module Breakdown

1. API Module

Responsible for exposing HTTP endpoints that allow:

  • Receiving user data ingestion requests
  • Submitting asynchronous analysis tasks
  • Querying analysis results
  • Tracking task statuses

Typically deployed as a gateway or main API service.

2. Database Module

Centralized wrapper for all middleware and database connections, including:

  • ClickHouse – Columnar storage for analytical workloads
  • Kafka – High-throughput message queue for data pipelines
  • RabbitMQ – Lightweight message broker for task dispatch
  • MongoDB (optional) – Used for storing configs or raw data

This module is designed as a global service so other modules can reuse it efficiently with shared connection pools, logging, and retry mechanisms.

3. Scheduler Module

Handles recurring data tasks such as:

  • Aggregating hourly user activity
  • Daily cleanup of logs or generating backups

This module typically runs in the background and is scheduled using cron-like intervals. To avoid conflicts in multi-instance deployments, distributed locking (e.g., via Redis) is recommended.

4. Consumer Module

Dedicated to processing messages from Kafka or RabbitMQ:

  • Parses Kafka ingestion messages and writes formatted data into ClickHouse
  • Handles RabbitMQ tasks such as triggering computational jobs

This service is also designed to run persistently in the background and supports horizontal scaling for high-throughput scenarios.

Service Configuration & Execution

To ensure clear separation of concerns and independent scalability, each module is given its own entry point:

  • API Service: src/main.ts
  • Scheduler Service: src/main.scheduler.ts
  • Consumer Service: src/main.consumer.ts

These services can be configured with separate startup scripts to support both development and production environments.

package.json Example

// package.json
{
  "scripts": {
    "build": "nest build",
    "start:dev": "nest start --watch",
    "start:debug": "nest start --debug --watch",
    "start:prod": "cross-env NODE_ENV=production node dist/src/main",
    "start:scheduler:dev": "nest start --entryFile main.scheduler --watch",
    "start:scheduler:prod": "cross-env NODE_ENV=production node dist/src/main.scheduler",
    "start:consumer:dev": "nest start --entryFile main.consumer --watch",
    "start:consumer:prod": "cross-env NODE_ENV=production node dist/src/main.consumer",
    "lint": "eslint \"{src,apps,libs,test}/**/*.ts\" --fix"
  }
}

Running Services

# Start API service
pnpm start:dev

# Start scheduler service
pnpm start:scheduler:dev

# Start consumer service
pnpm start:consumer:dev

Best Practices & Deployment Notes

  • Background services (scheduler & consumers) should not expose HTTP interfaces and should be run as long-lived processes.
  • Use distributed locks (e.g., Redis) to ensure only one scheduler instance runs tasks when horizontally scaled.
  • API and consumer modules can scale horizontally, making them resilient under high load.

Conclusion

By leveraging NestJS’s modular architecture, combined with Kafka, ClickHouse, and RabbitMQ, you can build a robust, scalable, and maintainable data analytics platform.