Network Anomaly Detection with MIDAS, MIDAS-R, and MIDAS-F

Posted on 2021-03-27 Edited on 2023-08-15 Disqus:

Background

Anomaly detection has always been a challenging research field. An anomaly indicates a sudden and short-lived pattern change, while a detection algorithm aims to identify anomalies promptly. Detecting anomaly on networks levels up the problem as network monitoring devices usually collect data at high rates, which means a network anomaly detection algorithm should handle high-dimension, noisy, and massive data under power and communication constraints. We should also acknowledge that different anomalies exhibit themselves in network statistics in a different manner. A general anomaly detection model often does not exist. A model that detects surprising edges in a network is probably cannot detect micro cluster anomalies.

Pytorch Implementation of GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

Posted on 2020-07-12 Edited on 2023-08-15 Disqus:

Motivation

Recently I have been studying on varies methods of anomaly detection, ranging from the traditional methods, such as Isolation Forest to the latest deep-neural-network-based methods. All these methods have their beauty and shortcoming. The reason why I selected and implemented this paper, GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection, is because it used an autoencoder trained with incomplete and noisy data for an anomaly detection task.

Pytorch Implementation of Deep Packet: A Novel Approach For Encrypted Traﬃc Classiﬁcation Using Deep Learning

Posted on 2020-04-06 Edited on 2023-08-15 Disqus:

Why Traffic Classification

The authors explained that network traffic classification attracts many interests in both academia and industrial area is because it is one of the prerequisites for advanced network management task. Network architecture today is designed to be asymmetric, based on the assumption that clients demand download more than upload. However, this assumption doesn’t hold anymore due to the rise of voice over IP (VoIP), P2P, and other symmetric-demand application. Network providers require the knowledge of the application their clients used to allocate adequate resources.

ML Prediction with XGBoost and PySpark

Posted on 2020-03-01 Edited on 2023-08-15 Disqus:

Once a XGBoost model is trained, we would like to use PySpark for batch predictions.

The method we use here is through Pandas UDF.

Data Pipeline: From PySpark to PyTorch

Posted on 2020-01-17 Edited on 2023-08-15 Disqus:

TL;DR

Uber’s Petastorm library provides a data reader for PyTorch that reads files generated by PySpark. Clone the project from Github for more information.

Summary of Malware Detection By Eating a Whole EXE in One Picture

Posted on 2020-01-16 Edited on 2023-08-15 Disqus: