MARC表示: Practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification

Practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification

Peer-to-Peer (P2P) detection by Machine Learning (ML) classification is affected by the quality and recency of training dataset. Hence, to classify P2P traffic on-line requires the removal of these limitations. In this research work, a novel practical training dataset generation and automatic retrai...

詳細記述

保存先:

書誌詳細
第一著者:	Zarei, Roozbeh
フォーマット:	学位論文
言語:	English
出版事項:	2012
主題:	TK Electrical engineering. Electronics Nuclear engineering
オンライン･アクセス:	http://eprints.utm.my/id/eprint/33398/5/RoozehZareiMFKE2012.pdf http://eprints.utm.my/id/eprint/33398/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:72709?site_name=Restricted Repository
タグ:	タグ追加タグなし, このレコードへの初めてのタグを付けませんか!

id	my.utm.33398
record_format	eprints
spelling	my.utm.333982018-05-27T08:07:40Z http://eprints.utm.my/id/eprint/33398/ Practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification Zarei, Roozbeh TK Electrical engineering. Electronics Nuclear engineering Peer-to-Peer (P2P) detection by Machine Learning (ML) classification is affected by the quality and recency of training dataset. Hence, to classify P2P traffic on-line requires the removal of these limitations. In this research work, a novel practical training dataset generation and automatic retraining mechanism for on-line P2P traffic classification are proposed. These two proposals are integrated in a system that removes the limitations of ML classification and makes them suitable for on-line P2P traffic classification. For the first part, a novel two-stage training dataset generation is proposed by combining a 3-class heuristic and a 3-class statistical classification to accurately generate training dataset. In the heuristic stage, traffic is classified as P2P, nonP2P or unknown. In statistical stage, a dual-Decision Tree (DT) is built based on dataset generated in heuristic stage to classify unknown traffic into three classes in order to reduce the amount of classified unknown traffics. The final training dataset is generated based on all flows which are classified in these two stages. In the second part of the system, an automatic retraining mechanism is proposed to satisfy the needs of retraining ML classifier by detecting the changes of traffic behavior and updating the on-line ML classifier with recent accurate training dataset. This mechanism evaluates the accuracy of the on-line ML classifier based on flows labeled by the two-stage training dataset generation. The on-line ML classifier is retrained if its accuracy falls below a predefined threshold. The proposed system has been evaluated on traces captured from the Universiti Teknologi Malaysia (UTM) campus network between October and November 2011. The overall results shows that the two-stage training dataset generation can generate accurate training dataset by classifying more than 95% of total flows with high accuracy (98:59%) and low false positive (0:91%). The on-line ML classifier which is built based on (J48) algorithm and training dataset generated by the two-stage training dataset generation classifies traffic with high accuracy (99%) by using the 25 feature extracted from first 5 packets of each flow. The results also show that using automatic retraining mechanism allow the on-line ML classifier able to maintain its accuracy above a set threshold over time. 2012-01 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/33398/5/RoozehZareiMFKE2012.pdf Zarei, Roozbeh (2012) Practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification. Masters thesis, Universiti Teknologi Malaysia, Faculty of Electrical Engineering. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:72709?site_name=Restricted Repository
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
language	English
topic	TK Electrical engineering. Electronics Nuclear engineering
spellingShingle	TK Electrical engineering. Electronics Nuclear engineering Zarei, Roozbeh Practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification
description	Peer-to-Peer (P2P) detection by Machine Learning (ML) classification is affected by the quality and recency of training dataset. Hence, to classify P2P traffic on-line requires the removal of these limitations. In this research work, a novel practical training dataset generation and automatic retraining mechanism for on-line P2P traffic classification are proposed. These two proposals are integrated in a system that removes the limitations of ML classification and makes them suitable for on-line P2P traffic classification. For the first part, a novel two-stage training dataset generation is proposed by combining a 3-class heuristic and a 3-class statistical classification to accurately generate training dataset. In the heuristic stage, traffic is classified as P2P, nonP2P or unknown. In statistical stage, a dual-Decision Tree (DT) is built based on dataset generated in heuristic stage to classify unknown traffic into three classes in order to reduce the amount of classified unknown traffics. The final training dataset is generated based on all flows which are classified in these two stages. In the second part of the system, an automatic retraining mechanism is proposed to satisfy the needs of retraining ML classifier by detecting the changes of traffic behavior and updating the on-line ML classifier with recent accurate training dataset. This mechanism evaluates the accuracy of the on-line ML classifier based on flows labeled by the two-stage training dataset generation. The on-line ML classifier is retrained if its accuracy falls below a predefined threshold. The proposed system has been evaluated on traces captured from the Universiti Teknologi Malaysia (UTM) campus network between October and November 2011. The overall results shows that the two-stage training dataset generation can generate accurate training dataset by classifying more than 95% of total flows with high accuracy (98:59%) and low false positive (0:91%). The on-line ML classifier which is built based on (J48) algorithm and training dataset generated by the two-stage training dataset generation classifies traffic with high accuracy (99%) by using the 25 feature extracted from first 5 packets of each flow. The results also show that using automatic retraining mechanism allow the on-line ML classifier able to maintain its accuracy above a set threshold over time.
format	Thesis
author	Zarei, Roozbeh
author_facet	Zarei, Roozbeh
author_sort	Zarei, Roozbeh
title	Practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification
title_short	Practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification
title_full	Practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification
title_fullStr	Practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification
title_full_unstemmed	Practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification
title_sort	practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification
publishDate	2012
url	http://eprints.utm.my/id/eprint/33398/5/RoozehZareiMFKE2012.pdf http://eprints.utm.my/id/eprint/33398/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:72709?site_name=Restricted Repository
_version_	1643649318985400320
score	13.252575

Practical training dataset generation and retraining mechanism for on-line peer-to-peer traffic classification

類似資料