Kafka + ClickHouse High-Performance Analytics — Building a Modular Data Platform with NestJS | RaytonX | RaytonX

Overview

In this article, we’ll explore how to use the NestJS framework to build a well-structured data analytics platform. The system is composed of four main modules:

Database Module (database) – Encapsulates connections to ClickHouse, Kafka, and RabbitMQ
API Module (api) – Exposes endpoints for data ingestion and query
Scheduler Module (scheduler) – Handles scheduled data aggregation and analysis
Consumer Module (consume) – Processes and writes asynchronous data from message queues

Module Breakdown

1. API Module

Responsible for exposing HTTP endpoints that allow:

Receiving user data ingestion requests
Submitting asynchronous analysis tasks
Querying analysis results
Tracking task statuses

Typically deployed as a gateway or main API service.

2. Database Module

Centralized wrapper for all middleware and database connections, including:

ClickHouse – Columnar storage for analytical workloads
Kafka – High-throughput message queue for data pipelines
RabbitMQ – Lightweight message broker for task dispatch
MongoDB (optional) – Used for storing configs or raw data

This module is designed as a global service so other modules can reuse it efficiently with shared connection pools, logging, and retry mechanisms.

3. Scheduler Module

Handles recurring data tasks such as:

Aggregating hourly user activity
Daily cleanup of logs or generating backups

This module typically runs in the background and is scheduled using cron-like intervals. To avoid conflicts in multi-instance deployments, distributed locking (e.g., via Redis) is recommended.

4. Consumer Module

Dedicated to processing messages from Kafka or RabbitMQ:

Parses Kafka ingestion messages and writes formatted data into ClickHouse
Handles RabbitMQ tasks such as triggering computational jobs

This service is also designed to run persistently in the background and supports horizontal scaling for high-throughput scenarios.

Service Configuration & Execution

To ensure clear separation of concerns and independent scalability, each module is given its own entry point:

API Service: src/main.ts
Scheduler Service: src/main.scheduler.ts
Consumer Service: src/main.consumer.ts

These services can be configured with separate startup scripts to support both development and production environments.

package.json Example

// package.json
{
  "scripts": {
    "build": "nest build",
    "start:dev": "nest start --watch",
    "start:debug": "nest start --debug --watch",
    "start:prod": "cross-env NODE_ENV=production node dist/src/main",
    "start:scheduler:dev": "nest start --entryFile main.scheduler --watch",
    "start:scheduler:prod": "cross-env NODE_ENV=production node dist/src/main.scheduler",
    "start:consumer:dev": "nest start --entryFile main.consumer --watch",
    "start:consumer:prod": "cross-env NODE_ENV=production node dist/src/main.consumer",
    "lint": "eslint \"{src,apps,libs,test}/**/*.ts\" --fix"
  }
}

Running Services

# Start API service
pnpm start:dev

# Start scheduler service
pnpm start:scheduler:dev

# Start consumer service
pnpm start:consumer:dev

Best Practices & Deployment Notes

Background services (scheduler & consumers) should not expose HTTP interfaces and should be run as long-lived processes.
Use distributed locks (e.g., Redis) to ensure only one scheduler instance runs tasks when horizontally scaled.
API and consumer modules can scale horizontally, making them resilient under high load.

Conclusion

By leveraging NestJS’s modular architecture, combined with Kafka, ClickHouse, and RabbitMQ, you can build a robust, scalable, and maintainable data analytics platform.