Data mining

Fall Plans for American Universities

Introduction Today’s coding practice is based on the following article and data source (there is literally a “Get the Data” link): Here’s a List of Colleges’ Plans for Reopening in the Fall library("geofacet") library("rvest") library("tidyverse") # load data df_raw <- read_csv("data-w8lLG.csv") colnames(df_raw) ## [1] "Institution" "Control" "State" "Category" Data Wrangling # filter out Excel artifacts (trivial, empty rows) df <- df_raw %>% filter(Institution != "#REF!") #States as factors states_alphabetically <- sort(unique(df$State)) df$State_factor <- factor(df$State, levels = states_alphabetically) #extracting text from urls (rvest!

Analysis of NYT Sample of Covid-19 Obituaries

Obtaining the Data On May 24, 2020, the New York Times published an article called “An Incalcuable Loss” "America is fast approaching a grim milestone in the coronavirus outbreak — each figure here represents one of the nearly 100,000 lives lost so far. But a count reveals only so much. Memories, gathered from obituaries across the country, help us to reckon with what was lost." What I am trying to do today is summarize the 1001 obituaries.