Recent Posts

More Posts

Unsupervised Learning Supervised learning has the goal of making predictions with a set of known labels for the response variable. In unsupervised learning, we try to find structure in the data of the response variable without predetermined labels. Goal: organize the items available in the Animal Crossing video game Data set: Animal Crossing Source: VillagerDB, MetaCritic, and TidyTuesday Animal Crossing Tidy Tuesday library("ggrepel") library("tidyverse") # critic <- readr::read_tsv('https://raw.

CONTINUE READING

Unsupervised Learning Supervised learning has the goal of making predictions with a set of known labels for the response variable. In unsupervised learning, we try to find structure in the data of the response variable without predetermined labels. Goal: organize the items available in the Animal Crossing video game Data set: Animal Crossing Source: VillagerDB, MetaCritic, and TidyTuesday Animal Crossing Tidy Tuesday library("ggrepel") library("tidyverse") # critic <- readr::read_tsv('https://raw.

CONTINUE READING

Supervised Learning Supervised learning has the goal of making predictions with a set of known labels for the response variable. In unsupervised learning, we try to find structure in the data of the response variable without predetermined labels. Goal: predict the personality type of each character in Animal Crossing Data set: Animal Crossing Source: VillagerDB, MetaCritic, and TidyTuesday Animal Crossing Tidy Tuesday library("caret") library("randomForest") library("tidymodels") library("tidyverse") # critic <- readr::read_tsv('https://raw.

CONTINUE READING

Goals for today introduce machine learning (ideas and terminology) introduce tidymodels package practice with a TidyTuesday data set library("tidymodels") library("tidyverse") Data: Tour de France Source: TidyTuesday data set from April 7, 2020 tdf_winners <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-04-07/tdf_winners.csv') str(tdf_winners) ## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 106 obs. of 19 variables: ## $ edition : num 1 2 3 4 5 6 7 8 9 10 ... ## $ start_date : Date, format: "1903-07-01" "1904-07-02" .

CONTINUE READING

“Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development.” https://www.bioconductor.org/ library("dplyr") library("ggplot2") Installation To install core packages, type the following in an R command window. This may take around 5 minutes When the option for updating packages appears, type in “a” for “all” #leave as eval = FALSE when knitting if (!

CONTINUE READING

Teaching

I am a teaching instructor for the following courses at the University of California at Merced:

  • Math 15: first semester data science for life science students
  • Bio 18: second semester data science for life science students
  • Bio 184 (TBD): Python for DNA analysis

Contact

  • dsollberger@ucmerced.edu
  • Derek Sollberger
    School of Natural Sciences
    5200 North Lake Road
    Merced, CA, 95343
  • email for appointment