Paper
25 May 2023 A spam classification method based on NB and SVM
Liyun Li
Author Affiliations +
Proceedings Volume 12636, Third International Conference on Machine Learning and Computer Application (ICMLCA 2022); 126362O (2023) https://doi.org/10.1117/12.2675375
Event: Third International Conference on Machine Learning and Computer Application (ICMLCA 2022), 2022, Shenyang, China
Abstract
The dataset used in this project is derived from the SMS spam classification dataset in the UCI Dataset Repository, and it is necessary to understand what the dataset looks like before pre-processing the data. The first step is text clean-up, and the second step is text feature extraction. This paper investigates and compares the performance of classifiers combining different dimensionality reduction methods on spam datasets to provide a reference for related classification studies. The project then uses the scikit-learn machine learning library to train the classifier, dividing the dataset into 75% training sets and 25% test sets, and introducing classifiers such as NB, IR, SVM for training. After classifier training is complete, test the result of the model on the test set. Use trained classification models to predict the category of a message (regular mail or spam) The result shows that the best performer among the various classifiers is the SVM.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Liyun Li "A spam classification method based on NB and SVM", Proc. SPIE 12636, Third International Conference on Machine Learning and Computer Application (ICMLCA 2022), 126362O (25 May 2023); https://doi.org/10.1117/12.2675375
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Deep learning

Library classification systems

Feature extraction

Data modeling

Classification systems

Machine learning

Back to Top