Overview
In this article, we’ll explore how to use the NestJS framework to build a well-structured data analytics platform. The system is composed of four main modules:
- Database Module (
database) – Encapsulates connections to ClickHouse, Kafka, and RabbitMQ - API Module (
api) – Exposes endpoints for data ingestion and query - Scheduler Module (
scheduler) – Handles scheduled data aggregation and analysis - Consumer Module (
consume) – Processes and writes asynchronous data from message queues
Module Breakdown
1. API Module
Responsible for exposing HTTP endpoints that allow:
- Receiving user data ingestion requests
- Submitting asynchronous analysis tasks
- Querying analysis results
- Tracking task statuses
Typically deployed as a gateway or main API service.
2. Database Module
Centralized wrapper for all middleware and database connections, including:
- ClickHouse – Columnar storage for analytical workloads
- Kafka – High-throughput message queue for data pipelines
- RabbitMQ – Lightweight message broker for task dispatch
- MongoDB (optional) – Used for storing configs or raw data
This module is designed as a global service so other modules can reuse it efficiently with shared connection pools, logging, and retry mechanisms.
3. Scheduler Module
Handles recurring data tasks such as:
- Aggregating hourly user activity
- Daily cleanup of logs or generating backups
This module typically runs in the background and is scheduled using cron-like intervals. To avoid conflicts in multi-instance deployments, distributed locking (e.g., via Redis) is recommended.
4. Consumer Module
Dedicated to processing messages from Kafka or RabbitMQ:
- Parses Kafka ingestion messages and writes formatted data into ClickHouse
- Handles RabbitMQ tasks such as triggering computational jobs
This service is also designed to run persistently in the background and supports horizontal scaling for high-throughput scenarios.
Service Configuration & Execution
To ensure clear separation of concerns and independent scalability, each module is given its own entry point:
- API Service:
src/main.ts - Scheduler Service:
src/main.scheduler.ts - Consumer Service:
src/main.consumer.ts
These services can be configured with separate startup scripts to support both development and production environments.
package.json Example
// package.json
{
"scripts": {
"build": "nest build",
"start:dev": "nest start --watch",
"start:debug": "nest start --debug --watch",
"start:prod": "cross-env NODE_ENV=production node dist/src/main",
"start:scheduler:dev": "nest start --entryFile main.scheduler --watch",
"start:scheduler:prod": "cross-env NODE_ENV=production node dist/src/main.scheduler",
"start:consumer:dev": "nest start --entryFile main.consumer --watch",
"start:consumer:prod": "cross-env NODE_ENV=production node dist/src/main.consumer",
"lint": "eslint \"{src,apps,libs,test}/**/*.ts\" --fix"
}
}
Running Services
# Start API service
pnpm start:dev
# Start scheduler service
pnpm start:scheduler:dev
# Start consumer service
pnpm start:consumer:dev
Best Practices & Deployment Notes
- Background services (scheduler & consumers) should not expose HTTP interfaces and should be run as long-lived processes.
- Use distributed locks (e.g., Redis) to ensure only one scheduler instance runs tasks when horizontally scaled.
- API and consumer modules can scale horizontally, making them resilient under high load.
Conclusion
By leveraging NestJS’s modular architecture, combined with Kafka, ClickHouse, and RabbitMQ, you can build a robust, scalable, and maintainable data analytics platform.