ANALISIS SENTIMEN KOMPARASI SELEKSI FITUR MUTUAL INFORMATION, N-GRAM DAN CHI-SQUARE MENGGUNAKAN METODE SVM PADA IMDB REVIEW

NANDARESTA, SEVIRA CHAIRUNNISA (2024) ANALISIS SENTIMEN KOMPARASI SELEKSI FITUR MUTUAL INFORMATION, N-GRAM DAN CHI-SQUARE MENGGUNAKAN METODE SVM PADA IMDB REVIEW. Other thesis, Nusa Putra University.

[thumbnail of Skripsi] Text (Skripsi)
SEVIRA FIRA MIRNA.pdf

Download (697kB)

Abstract

Sentiment analysis is the computational process of extracting, finding, or assessing opinions or sentiments in text, such as reviews, social media posts, news articles, or other text types. In sentiment analysis, feature selection is a technique for selecting and filtering the most relevant features from text data to improve the performance of the sentiment analysis model. This method aims to improve computational efficiency, improve interpretability of results, reduce data dimensions, reduce overfitting, and improve predictability. The authors use datasets derived from the IMDb Review website, which has 10,000 review data with a ratio of 80% to 20% for training and test data for this study. In this study, sentimental analysis was conducted using the Support Vector Machine (SVM) algorithm, which used three feature selections: Mutual Information, N-Gram, and Chi-Square. This study aimed to look at the accuracy of comparisons of SVM sentimental analysis using feature selection and determine the most compelling feature selection method for SVM sentimental analysis. After that, each feature selection method is implemented with the same source and amount of data. The results of the applied tests showed that using the Support Vector Machine method for feature selection in spam text classification could improve classification performance. In this study, the classification of the Support Vector Machine algorithm without Feature Selection resulted in an accuracy of 85.35%. The results of the Mutual Information feature selection received an accuracy score of 90.00%, an accuracy on the N-Gram of 88.50%, and an accuracy on the Chi-Square of 88.65%. Thus, it can be concluded that testing and comparing the Support Vector Machine algorithm models with feature selection can improve classification performance. In this study, the Mutual Information feature selection score received the highest score with 90.00% accuracy.

Keywords : Feature Selection, Sentiment Analysis, Support Vector Machine, IMDb Review, Mutual Information, N-Gram, Chi-Square

Item Type: Thesis (Other)
Subjects: Computer > Information System
Divisions: Faculty of Engineering, Computer and Design > Information System
Depositing User: Mr Perpus
Date Deposited: 11 Jan 2025 07:22
Last Modified: 11 Jan 2025 07:22
URI: http://repository.nusaputra.ac.id/id/eprint/1267

Actions (login required)

View Item
View Item