Jump to the main content block

Data Cleaning for training dataset on Deep Learning


Title of Invention

Data Cleaning for training dataset on Deep Learning


In recent years, deep learning has made a great progress in many fields, such as anomaly detection, classification, clustering and forecasting. Despite the outperformance of deep learning, it may face the risk of attack and manipulation form the attacker. Dataset, which is one of the most important role that made machine learning performing great, is sometimes, however not guaranteed to be safe. Users would download dataset directly from some questionable source. On the other hand ,malicious user would upload some poisoned data to the crowdsource systems, making dataset unreliable. The attack that manipulate training dataset is called poisoning attack. We proposed a data cleaning method for poisoning attack on deep learning, which removed the malicious manipulations on training dataset, could alleviate or even disable the effect of attack.


By using the mechanism of data cleaning, which washed all data in dataset, we can eliminate the malicious signal on dataset. After the cleaning procedure, the abnormal data would be purified, while the normal data would not change a lot. This data cleaning method is not designed for a specific attack algorithm or deep learning model, it can be applied on multiple attack algorithm. Even on the unknown attack, it still work good.

Possible Applications/ Industry Categories

  1. License Plate Recognition
  2. Face recognition

Contact Information

Innovation Headquarters, NCKU

Contact personClaire Huang


Click Num: