Open Access Open Access  Restricted Access Subscription or Fee Access

Enhancing the Performance of Hate Speech Classification Using Dimensionality Reduction Approach

Kaushar Ansari, Anshul Sarawagi


In recent times most of the people are using online platforms for sharing their emotions. These emotions can be classified as a positive comment or as a negative comment. But these comments play an important role when comments are made in the form of reviews for any particular purpose. Today all the ecommerce websites, election parties and many other online business forums are predicting these reviews for evaluating the performance of their product or work. Many times people used to post hate speech on social media, so it is very much need that we must predict the hate speech for further improvement. Traditional machine learning algorithms are not able to accurately predict the hate speech. In this work we have applied dimensionality reduction approach for performing the classification of hate speech on the basis of which classifiers has improved the performance. The feature selection approached is done through Information Gain, Term frequency–Inverse Document frequency and Logistic Regression Cross Validation and we have achieved the F1 score of 0.81, 0.90 and 0.87 for the gradient boosting, random forest, and extreme gradient boosting classifiers respectively.

Full Text:



K. Subrahmanyam, S.M. Reich, N. Waechter, and G. Espinoza, “Online and offline social networks: Use of social networking sites by emerging adults,” Journal of Applied Developmental Psychology, 2008, doi: 10.1016/j.appdev.2008.07.003.

R. Nair and A. Bhagat, “A Life Cycle on Processing Large Dataset-LCPL Rajit Nair,” vol. 179,no. 53, pp. 27–34, 2018.

P. Arya, A. Bhagat, and R. Nair, “Improved performance of machine learning algorithms via ensemble learning methods of sentiment analysis,” International Journal on Emerging Technologies, 2019.

H. Youn et al., “On the universal structure of human lexical semantics,” Proceedings of the National Academy of Sciences of the United States of America, 2016, doi: 10.1073/pnas.1520752113.

H.M. Wallach, “Topic modeling: Beyond bag-of-words,” 2006. doi: 10.1145/1143844.1143967.

A. Dey, M. Jenamani, and J.J. Thakkar, “Senti-N-Gram: An n-gram lexicon for sentiment analysis,” Expert Systems with Applications, 2018, doi: 10.1016/j.eswa.2018.03.004.

P. Fortuna and S. Nunes, “A survey on automatic detection of hate speech in text,” ACM

Computing Surveys. 2018. doi: 10.1145/3232676.

R. Nair, V. Jain, A. Bhagat, and R. Agarwal, “An Efficient Approach for Sentiment Analysis Using Regression Analysis Technique,” no. 3, pp. 161–165, 2019.

D.M. Blei, A.Y. Ng, and M.I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, 2003, doi: 10.1016/b978-0-12-411519-4.00006-9.

D. Yin, Z. Xue, L. Hong, B. D. Davison, A. Kontostathis, and L. Edwards, “Detection of Harassment on Web 2.0,” Proceedings of the Content Analysis in the WEB., 2009.

Y. Wang, Z. Zhou, S. Jin, D. Liu, and M. Lu, “Comparisons and Selections of Features and Classifiers for Short Text Classification,” 2017. doi: 10.1088/1757-899X/261/1/012018.

R. Johnson and T. Zhang, “Effective use of word order for text categorization with convolutional neural networks,” 2015. doi: 10.3115/v1/n15-1011.

R.D. King and G.M. Sutton, “High times for hate crimes: Explaining the temporal clustering of hate-motivated offending,” Criminology, 2013, doi: 10.1111/1745-9125.12022.

P. Burnap and M.L. Williams, “Us and them: identifying cyber hate on Twitter across multiple protected characteristics,” EPJ Data Science, 2016, doi: 10.1140/epjds/s13688-016-0072-6.

E.A. Corrêa Júnior, V.Q. Marinho, and L.B. dos Santos, “NILC-USP at SemEval-2017 Task 4: A Multi-view Ensemble for Twitter Sentiment Analysis,” 2018. doi: 10.18653/v1/s17-2100.

R. Wagh and P. Punde, “Survey on Sentiment Analysis using Twitter Dataset,” 2018. doi:10.1109/ICECA.2018.8474783.

R.D. Desai, “Sentiment Analysis of Twitter Data,” 2019. doi:10.1109/ICCONS.2018.8662942.

R. Nair and A. Bhagat, “An Introduction to Clustering Algorithms in Big Data,” 2020. doi:10.4018/978-1-7998-3479-3.ch040.

T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” 2017.

R. Nair and A. Bhagat, “Feature selection method to improve the accuracy of classification algorithm,” International Journal of Innovative Technology and Exploring Engineering, 2019.


  • There are currently no refbacks.