» Articles » PMID: 34858090

Using Genderize.io to Infer the Gender of First Names: How to Improve the Accuracy of the Inference

Overview
Date 2021 Dec 3
PMID 34858090
Citations 10
Authors
Affiliations
Soon will be listed here.
Abstract

Objective: We recently showed that genderize.io is not a sufficiently powerful gender detection tool due to a large number of nonclassifications. In the present study, we aimed to assess whether the accuracy of inference by genderize.io can be improved by manipulating the first names in the database.

Methods: We used a database containing the first names, surnames, and gender of 6,131 physicians practicing in a multicultural country (Switzerland). We uploaded the original CSV file (file #1), the file obtained after removing all diacritic marks, such as accents and cedilla (file #2), and the file obtained after removing all diacritic marks and retaining only the first term of the compound first names (file #3). For each file, we computed three performance metrics: proportion of misclassifications (errorCodedWithoutNA), proportion of nonclassifications (naCoded), and proportion of misclassifications and nonclassifications (errorCoded).

Results: naCoded, which was high for file #1 (16.4%), was reduced after data manipulation (file #2: 11.7%, file #3: 0.4%). As the increase in the number of misclassifications was small, the overall performance of genderize.io (i.e., errorCoded) improved, especially for file #3 (file #1: 17.7%, file #2: 13.0%, and file #3: 2.3%).

Conclusions: A relatively simple manipulation of the data improved the accuracy of gender inference by genderize.io. We recommend using genderize.io only with files that were modified in this way.

Citing Articles

Gender and Authorship in Annals of Surgery: A nineteen-year review including the pandemic.

Liang J, Chang M, Stein S, Salles A Ann Surg Open. 2024; 5(4):e491.

PMID: 39711681 PMC: 11661747. DOI: 10.1097/AS9.0000000000000491.


Comparative analysis of automatic gender detection from names: evaluating the stability and performance of ChatGPT Namsor, and Gender-API.

Dominguez-Diaz A, Goyanes M, de-Marcos L, Prado-Sanchez V PeerJ Comput Sci. 2024; 10:e2378.

PMID: 39650401 PMC: 11623165. DOI: 10.7717/peerj-cs.2378.


Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality.

VanHelene A, Khatri I, Hilton C, Mishra S, Gamsiz Uzun E, Warner J PLOS Digit Health. 2024; 3(10):e0000456.

PMID: 39471154 PMC: 11521266. DOI: 10.1371/journal.pdig.0000456.


Analysis of science journalism reveals gender and regional disparities in coverage.

Davidson N, Greene C Elife. 2024; 12.

PMID: 38804191 PMC: 11132680. DOI: 10.7554/eLife.84855.


Systematic Review of Women Leading and Participating in Nephrology Randomized Clinical Trials.

Lodhi S, Kibret T, Mangalgi S, Reid L, Noel A, Syed S Kidney Int Rep. 2024; 9(4):898-906.

PMID: 38765601 PMC: 11101787. DOI: 10.1016/j.ekir.2024.01.031.


References
1.
Cevik M, Haque S, Manne-Goehler J, Kuppalli K, Sax P, Majumder M . Gender disparities in coronavirus disease 2019 clinical trial leadership. Clin Microbiol Infect. 2021; 27(7):1007-1010. PMC: 7785275. DOI: 10.1016/j.cmi.2020.12.025. View

2.
Sebo P . Performance of gender detection tools: a comparative study of name-to-gender inference services. J Med Libr Assoc. 2021; 109(3):414-421. PMC: 8485937. DOI: 10.5195/jmla.2021.1185. View

3.
Gottlieb M, Krzyzaniak S, Mannix A, Parsons M, Mody S, Kalantari A . Sex Distribution of Editorial Board Members Among Emergency Medicine Journals. Ann Emerg Med. 2020; 77(1):117-123. DOI: 10.1016/j.annemergmed.2020.03.027. View

4.
Peters S, Norton R . Sex and gender reporting in global health: new editorial policies. BMJ Glob Health. 2018; 3(4):e001038. PMC: 6074620. DOI: 10.1136/bmjgh-2018-001038. View

5.
Sebo P, Clair C . Are female authors under-represented in primary healthcare and general internal medicine journals?. Br J Gen Pract. 2021; 71(708):302. PMC: 8249004. DOI: 10.3399/bjgp21X716249. View