Gender of Frasier Characters

“I am not a man…”

For work, I need to take a list of names and try to infer the gender. Here I try an R package on the character names in the TV show Fraiser.

The gender package

#install.packages("gender") #works fine

## user needs to download database too
#install_genderdata_package() #did not work ("error reading from connection")

## as suggested by the bug report at https://github.com/ropensci/drat/issues/6
#install.packages("devtools")
#library(devtools)
#devtools::install_github("ropensci/genderdata")

Trial Run

library(gender)
library(ggpubr)
library(tidyverse)

gender("frasier", method = "ssa", years = c(1940, 1990))
## # A tibble: 1 x 6
##   name    proportion_male proportion_female gender year_min year_max
##   <chr>             <dbl>             <dbl> <chr>     <dbl>    <dbl>
## 1 frasier               1                 0 male       1940     1990

Cast of Characters

Now I will try to run the gender function over a list of names (criteria: characters that appeared in at least 6 episodes).

#making a data frame of characters (i.e. the TV show characters)
name <- c("frasier", "daphne", "niles", "roz", "martin", 
          "eddie", "bulldog", "kenny", "gil", "noel", 
          "gertrude", "donny", "lilith", "bebe", "mel",
          "ronee", "alice", "julia", "frederick", "simon",
          "lana", "sherry", "kirby", "charlotte")
start_year <- 1940
end_year <- 1990
df <- data.frame(name, start_year, end_year, stringsAsFactors=FALSE)
main_cast <- df %>% slice(1:5)

gender_df(df, method = "ssa", name_col = "name", year_col = c("start_year", "end_year"))
## # A tibble: 23 x 6
##    name      proportion_male proportion_female gender year_min year_max
##  * <chr>               <dbl>             <dbl> <chr>     <dbl>    <dbl>
##  1 alice              0.0033            0.997  female     1940     1990
##  2 bebe               0                 1      female     1940     1990
##  3 charlotte          0.0026            0.997  female     1940     1990
##  4 daphne             0.0004            1.00   female     1940     1990
##  5 donny              0.996             0.0039 male       1940     1990
##  6 eddie              0.959             0.0414 male       1940     1990
##  7 frasier            1                 0      male       1940     1990
##  8 frederick          0.995             0.0052 male       1940     1990
##  9 gertrude           0.0008            0.999  female     1940     1990
## 10 gil                1                 0      male       1940     1990
## # ... with 13 more rows

Sorting the Data

results <- gender_df(df, method = "ssa", name_col = "name", year_col = c("start_year", "end_year"))

results_classified <- results %>%
  select(name, proportion_male, gender) %>%
  arrange(desc(proportion_male)) 

ggtexttable(results_classified, rows = NULL)

Note that “Mel” is a female character in the TV show. The calculations are tougher when the input is a nickname.

# just the main cast
main_cast_classified <- gender_df(main_cast, 
          method = "ssa", name_col = "name", 
          year_col = c("start_year", "end_year")) %>%
  select(name, proportion_male, gender) %>%
  arrange(desc(proportion_male))

ggtexttable(main_cast_classified, rows = NULL)