Building an ML Threat Detection Platform: Isolation Forest, One-Class SVM, and Autoencoder Ensemble

One of the central challenges in network security is detecting threats that have never been seen before. Signature-based systems are effective against known attacks but by definition fail on novel ones. This is the problem that unsupervised anomaly detection is designed to solve, and it is the foundation on which ThreatSentinel is built.

ThreatSentinel is an end-to-end ML threat detection platform I developed as both a research portfolio project and an architectural foundation for a commercial Zero Trust Network Access (ZTNA) security analytics product. It processes network telemetry from two public datasets, CICIDS-2017 and UNSW-NB15, through a three-model anomaly detection ensemble, feeding into a dynamic risk aggregation engine that scores users, devices, and sessions in near real time.

Why Three Models?

No single anomaly detection model dominates across all threat types and network environments. Each approach has a different inductive bias, and those differences are precisely what makes an ensemble valuable.

Isolation Forest works by randomly partitioning the feature space using decision trees. Anomalies, which by definition occupy sparse regions of the feature space, require fewer partitions to isolate than normal samples. This gives Isolation Forest a natural advantage in high-dimensional network telemetry like CICIDS-2017's 78 flow-based features, where it operates quickly and without assuming any particular distribution of normal traffic.

One-Class SVM takes a fundamentally different approach. It learns a kernel-based decision boundary tightly enclosing the normal traffic manifold in a high-dimensional space, using the RBF kernel to capture non-linear structure. The nu parameter controls how tightly that boundary is drawn. A tighter boundary catches more anomalies but risks more false positives. The tradeoff is explicit and tunable, which is valuable in production where false positive rate directly affects analyst workload.

The Autoencoder works on reconstruction error. Trained to compress and reconstruct normal traffic through a low-dimensional bottleneck, it learns what normal network behavior looks like at a representational level. When it encounters an attack pattern it has never seen, reconstruction fails, producing a high error score. I implemented this using PCA-based linear reconstruction, which is standard in production security analytics and more stable than deep neural network autoencoders in environments with limited labelled data.

Training on Normal Traffic Only

A critical design decision in ThreatSentinel is that all three models are trained exclusively on benign (normal) traffic. No attack samples are used during training. This is the correct approach for one-class anomaly detection and reflects the production reality that labelled attack data is almost never available in advance. The models learn what normal looks like and score deviations from that baseline, rather than learning to classify known attack signatures.

The preprocessing pipeline is also fitted on normal traffic only. RobustScaler is used rather than StandardScaler because network attack traffic often contains extreme outliers that would otherwise distort the scaling parameters and bias detection against normal traffic.

Risk Aggregation and Entity Scoring

Raw anomaly scores from three separate models need to be combined into something operationally meaningful. ThreatSentinel's risk aggregation engine does this through a weighted ensemble that produces a composite score on a 0 to 100 scale, mapped to four risk tiers: LOW, MEDIUM, HIGH, and CRITICAL.

The weights reflect each model's relative precision in the specific deployment environment. One-Class SVM receives the highest weight by default because it is the most conservative, raising fewer false alarms at the cost of slightly lower recall. Isolation Forest receives a moderate weight because of its breadth and speed. The Autoencoder receives the lowest weight because reconstruction-based methods can be sensitive to feature distribution shifts between training and deployment.

Risk scores are tracked per entity, meaning per user, per device, and per session, using an exponential moving average with a decay factor of 0.85. This prevents score inflation from isolated historical events while still responding quickly to sustained anomalous behavior. The decay also means that an entity whose behavior returns to normal will see its risk score drop over time without manual intervention.

Detection Performance

On the CICIDS-2017 dataset, the ensemble achieves ROC-AUC scores between 0.74 and 0.93 across the three models with zero labelled attack data used in training. Recall is consistently high at around 0.87, meaning most attacks are surfaced. Precision is lower, which is the expected tradeoff for unsupervised detection. In a production deployment, precision improves significantly when the alert threshold is tuned against historical incident data and when the entity risk scoring layer is allowed to accumulate behavioral context before triggering enforcement actions.

What Comes Next

ThreatSentinel is designed to grow. The streaming pipeline that currently simulates near real-time event processing would connect to a Kafka or Kinesis consumer in production. The detection models would be augmented with LSTM and Transformer architectures for temporal behavioral sequence modeling. SHAP-based feature attribution would provide explainable alerts for analyst triage. And an autonomous remediation layer using agentic AI would handle investigation and containment for high-confidence incidents without requiring manual intervention for every alert.

The project is open source and fully documented. The live dashboard is at threat-sentinel.streamlit.app and the code is at github.com/babalolaseyip/threat-sentinel.