Gender identification in Russian written texts
Tatiana A. Litvinova – Pavel V. Seredin – Olga A. Litvinova – Olga V. Zagorovskaya
This article examines the identification of the gender of authors of Russian written texts using the quantitative parameters analysis approach. Identification of the gender of authors of texts is viewed as part of authorship profiling task.
The material used for the study was a specially designed corpus of Russian texts “RusPersonality“ which (along with other Slavic languages) has obtained little attention in authorship profiling studies. We made use of high-frequency text parameters occurring in texts of diffetent topics and genres. The correlation analysis data obtained using Russian texts were compared with those in other languages. The regression analysis was employed. The suggested approach allows one to identify gender as accurately as 64% using only 5 parameters.
Key words: corpus, corpus linguistics, gender attribution, authorship profiling, stylomentry, Russian language