Comparative Analysis of Automatic Gender Detection from Names: Evaluating the Stability and Performance of ChatGPT Namsor, and Gender-API

Overview

Journal PeerJ Comput Sci

Date 2024 Dec 9

PMID 39650401

Authors

Adrian Dominguez-Diaz

Manuel Goyanes

Luis de-Marcos

Victor Pablo Prado-Sanchez

Affiliations

Soon will be listed here.

Abstract

The gender classification from names is crucial for uncovering a myriad of gender-related research questions. Traditionally, this has been automatically computed by gender detection tools (GDTs), which now face new industry players in the form of conversational bots like ChatGPT. This paper statistically tests the stability and performance of ChatGPT 3.5 Turbo and ChatGPT 4o for gender detection. It also compares two of the most used GDTs (Namsor and Gender-API) with ChatGPT using a dataset of 5,779 records compiled from previous studies for the most challenging variant, which is the gender inference from full name without providing any additional information. Results statistically show that ChatGPT is very stable presenting low standard deviation and tight confidence intervals for the same input, while it presents small differences in performance when prompt changes. ChatGPT slightly outperforms the other tools with an overall accuracy over 96%, although the difference is around 3% with both GDTs. When the probability returned by GDTs is factored in, differences get narrower and comparable in terms of inter-coder reliability and error coded. ChatGPT stands out in the reduced number of non-classifications (0% in most tests), which in combination with the other metrics analyzed, results in a solid alternative for gender inference. This paper contributes to current literature on gender detection classification from names by testing the stability and performance of the most used state-of-the-art AI tool, suggesting that the generative language model of ChatGPT provides a robust alternative to traditional gender application programming interfaces (APIs), yet GDTs (especially Namsor) should be considered for research-oriented purposes.

References

McHugh M . Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012; 22(3):276-82. PMC: 3900052. View

Filardo G, da Graca B, Sass D, Pollock B, Smith E, Martinez M . Trends and comparison of female first authorship in high impact medical journals: observational study (1994-2014). BMJ. 2016; 352:i847. PMC: 4775869. DOI: 10.1136/bmj.i847. View

VanHelene A, Khatri I, Hilton C, Mishra S, Gamsiz Uzun E, Warner J . Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality. PLOS Digit Health. 2024; 3(10):e0000456. PMC: 11521266. DOI: 10.1371/journal.pdig.0000456. View

Astegiano J, Sebastian-Gonzalez E, de Toledo Castanho C . Unravelling the gender productivity gap in science: a meta-analytical review. R Soc Open Sci. 2019; 6(6):181566. PMC: 6599789. DOI: 10.1098/rsos.181566. View

Holman L, Stuart-Fox D, Hauser C . The gender gap in science: How long until women are equally represented?. PLoS Biol. 2018; 16(4):e2004956. PMC: 5908072. DOI: 10.1371/journal.pbio.2004956. View

Goyanes M, de-Marcos L, Demeter M, Toth T, Jorda B . Editorial board interlocking across the social sciences: Modelling the geographic, gender, and institutional representation within and between six academic fields. PLoS One. 2022; 17(9):e0273552. PMC: 9439229. DOI: 10.1371/journal.pone.0273552. View

Sebo P . How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format. J Med Libr Assoc. 2022; 110(2):205-211. PMC: 9014919. DOI: 10.5195/jmla.2022.1289. View

Cimpian J, Kim T, McDermott Z . Understanding persistent gender gaps in STEM. Science. 2020; 368(6497):1317-1319. DOI: 10.1126/science.aba7377. View

Sebo P . What Is the Performance of ChatGPT in Determining the Gender of Individuals Based on Their First and Last Names?. JMIR AI. 2024; 3:e53656. PMC: 11041478. DOI: 10.2196/53656. View

10.

Lariviere V, Ni C, Gingras Y, Cronin B, Sugimoto C . Bibliometrics: global gender disparities in science. Nature. 2013; 504(7479):211-3. DOI: 10.1038/504211a. View

11.

Mihaljevic-Brandt H, Santamaria L, Tullney M . The Effect of Gender in the Publication Patterns in Mathematics. PLoS One. 2016; 11(10):e0165367. PMC: 5079651. DOI: 10.1371/journal.pone.0165367. View

12.

Sebo P . Performance of gender detection tools: a comparative study of name-to-gender inference services. J Med Libr Assoc. 2021; 109(3):414-421. PMC: 8485937. DOI: 10.5195/jmla.2021.1185. View

13.

Santamaria L, Mihaljevic H . Comparison and benchmark of name-to-gender inference services. PeerJ Comput Sci. 2021; 4:e156. PMC: 7924484. DOI: 10.7717/peerj-cs.156. View

14.

Sebo P . Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference. J Med Libr Assoc. 2021; 109(4):609-612. PMC: 8608220. DOI: 10.5195/jmla.2021.1252. View