Recent Posts

More Posts

Data Source: USA Facts — downloaded July 6, 2020 library("tidyverse") library("zoo") start_date <- "5/28/20" end_date <- "7/5/20" county_list <- c("Santa Clara", "Stanislaus", "Calaveras", "San Benito", "Merced", "Tuolumne", "Fresno", "Madera", "Mariposa") lag <- 7 #number of days for rolling average #loads files cases_raw <- read_csv("covid_confirmed_usafacts.csv") populations <- read_csv("covid_county_population_usafacts.csv") Data Wrangling raw_data_merged <- cases_raw %>% full_join(populations, by = c("County Name", "State")) # find column positions by date column_names <- colnames(raw_data_merged) start_loc <- match(start_date, column_names) end_loc <- match(end_date, column_names) cases_filtered <- cases_raw %>% filter(State == "CA") %>% select("County Name", all_of(start_loc:end_loc)) populations_filtered <- populations %>% filter(State == "CA") %>% select("County Name", "population") df_merged <- cases_filtered %>% full_join(populations_filtered, by = "County Name") df_clean <- df_merged %>% # avoids unallocated cases and the cruise ship!


Introduction Today, for practice with ggplot2, I wish to replicate @JoshuaFeldman’s wonderful #TidyTuesday submission about the dataset of Roman emperors. library("tidyverse") TidyTuesday’s Roman Emperor dataset — posted on August 13, 2019 # TidyTuesday's given line of code to load the data emperors <- readr::read_csv("") Exploring the Data dim(emperors) ## [1] 68 16 colnames(emperors) ## [1] "index" "name" "name_full" "birth" "death" ## [6] "birth_cty" "birth_prv" "rise" "reign_start" "reign_end" ## [11] "cause" "killer" "dynasty" "era" "notes" ## [16] "verif_who" emperors %>% filter(birth_prv !


Introduction library("tidyverse") Today, I am going to create an overly simplified view of the past 10 Supreme Court decisions for the sake of coding practice with the ggplot package. data source: SCOTUS Blog useful tool: Convert Town’s “Column to Comma Separated Values” function Data Just in case anyone actually uses my blog post, I will type out the data manually instead of load a separate CSV file so that anyone can copy-and-paste the code for replicability.


Introduction Today’s coding practice is based on the following article and data source (there is literally a “Get the Data” link): Here’s a List of Colleges’ Plans for Reopening in the Fall library("geofacet") library("rvest") library("tidyverse") # load data df_raw <- read_csv("data-w8lLG.csv") colnames(df_raw) ## [1] "Institution" "Control" "State" "Category" Data Wrangling # filter out Excel artifacts (trivial, empty rows) df <- df_raw %>% filter(Institution != "#REF!") #States as factors states_alphabetically <- sort(unique(df$State)) df$State_factor <- factor(df$State, levels = states_alphabetically) #extracting text from urls (rvest!


If you are planning to do the R assignments on your own computer (recommended), then here is a quick outline for obtaining the software. There are two separate software programs. Most people find it easier to use RStudio. than just R, but you need to install R first before installing RStudio (analogously speaking: you need an cell phone before you can use an cell phone case). If you have R and RStudio from a previous course, you still need to update to the current versions!



I am a teaching instructor for the following courses at the University of California at Merced:

  • Math 15: first semester data science for life science students
  • Bio 18: second semester data science for life science students
  • Bio 184 (TBD): Python for DNA analysis


  • Derek Sollberger
    School of Natural Sciences
    5200 North Lake Road
    Merced, CA, 95343
  • email for appointment