Special Summer Workshop - (Intro to) Deep Learning for Life Scientists (in R!)

This pilot workshop will introduce deep learning methods with a focus on biological data analysis. Tentative topics include data importation and manipulation, image analysis with convolutional neural networks, sequence processing and recurrent networks, feature interpretation and visualization, and a discussion of where deep learning fits in the larger machine-learning landscape.

Materials will be taught in R using the Tensorflow/Keras libraries. Familiarity with R vector and matrix operations, lists, and functions will be useful; those with Python+Numpy are welcome to self-translate. Participants are welcome to bring datasets, though example datasets will be available. Modern GPU-enabled computing resources will also be available.

Times and Dates: July 1--August 9, 2019, Mon/Weds/Fri 2:00-2:50

Cost: $250 (1/2 off usual CGRB 6-week workshop cost for pilot run!)

Signup: https://oregonstate.qualtrics.com/jfe/form/SV_dgTN5hRiRPTv59H

Questions: shawn.oneil@cgrb.oregonstate.edu

 

Registration

We are no longer accepting registrations for Spring term, 2019 (see descriptions below). Information for Fall, 2019 will be posted when dates and locations are known. Tentative offerings for Fall 2019 include Introduction to Unix/Linux and Command-Line Data Analysis, Data Programming in R, and RNA-Seq.

Introduction to Unix/Linux and Command-Line Data Analysis (Instructor: Matthew Peterson)

  • These two modules are offered as a back to back pair:
    • Introduction to Unix/Linux, Apr 1 – Apr 29, Mon/Wed 2:00pm – 2:50pm
    • Command-Line Data Analysis, May 6 – June 5, Mon/Wed, 2:00pm – 2:50pm
  • They are available for student credit:
    • Introduction to Unix/Linux, MCB599 CRN 56764
    • Command-Line Data Analysis, MCB599 CRN 56773
  • Non-students can attend as a workshop:
    • Introduction to Unix/Linux, $250
    • Command-Line Data Analysis, $250

To sign up for workshop or get other information, email Matthew Peterson (Workshop payments can be made via OSU index or check; indicate in your email which payment method you wish to use.) 

Metabarcoding (focusing on 16s rRNA amplicon sequencing) (Instructor: Andrew Black)

  • Apr 2 – May 2, Tue/Thur 1:00pm – 1:50pm
  • Available for student credit: MCB599 CRN 60037
  • Non-students can attend as a workshop: $250

To sign up for workshop or get other information, email Andrew Black (Workshop payments can be made via OSU index or check; indicate in your email which payment method you wish to use.) 

Note: Metabarcoding and Metagenomics are taught as independent topics, those taking both will encounter a small amount of replicated background material.

Metagenomics (shotgun sequencing) (Instructor: Andrew Black)

  • May 7 – June 6, Tue/Thur, 1:00pm – 1:50pm
  • Available for student credit: MCB599 CRN 60038
  • Non-students can attend as a workshop: $250

To sign up for workshop or get other information, email Andrew Black (Workshop payments can be made via OSU index or check; indicate in your email which payment method you wish to use.) 

Note: Metabarcoding and Metagenomics are taught as independent topics, those taking both will encounter a small amount of replicated background material.

Introduction to R and RStudio (Online) (Instructor: Shawn O'Neil)

    • For this 6-week online workshop participants work through readings, videos, and exercises at their own pace, with guidance from an instructor as needed.
    • Dates: Apr 1 – May 10
    • Registration: https://pace.oregonstate.edu/catalog/introduction-r-and-rstudio (pricing and latest dates for registration will be posted there soon)
    • For information or questions, email the instructor, Shawn O'Neil(Online workshop payments must be made via the PACE registration form.)

       

      Overview

      In collaboration with the Departments of Statistics and Molecular and Cellular Biology, the CGRB offers a number of workshops and classes available to both internal and external faculty, staff, postdocs, and students. Generally these are 1- or 2-credit, 5-week classes offered in academic terms according to the schedule illustrated below. Note that some recommend some familiarity with Unix/Linux command-line. For questions on course content, please see the descriptions below and/or feel free to contact the trainers with questions.

      Most of these utilize our Advanced Cyberinfrastructure Teaching Facility.

      (Note: this schedule is in development, and represents our best estimate for upcoming offerings.)

       

      WORKSHOP DESCRIPTIONS
      Infrastructure Programming Analysis

      Infrastructure

      Command Line

      Introduction to Unix/Linux (5 weeks @ 2 hrs per week)

      This module introduces the natural environment of bioinformatics: the Linux command line. Material will cover logging into remote machines, filesystem organization and file manipulation, and installing and using software (including examples such as HMMER, BLAST, and MUSCLE). Finally, we introduce the CGRB research infrastructure (including submitting batch jobs) and concepts for data analysis on the command line with tools such as grep and wc.

      Command-Line Data Analysis (5 weeks @ 2 hrs per week)

      The Linux command-line environment has long been used for analyzing text-based and scientific data, and there are a large number of tools pre-installed for data analysis. These can be chained together to form powerful pipelines. Material will cover these and related tools (including grep, sort, awk, sed, etc.) driven by examples of biological data in a problem-solving context that introduces programmatic thinking. This module also covers regular expressions, a useful syntax for matching and substituting string and sequence data.

       

      Programming

      Python

      Python I (5 weeks @ 2 hrs per week)

      This module introduces programming concepts, driven by examples of biological data analysis, in the Python programming language. Topics covered will include variables and data types (including strings, integers and floats, dictionaries and lists), control flow (loops, conditionals, and some boolean logic), variable scope and its proper use, basic usage of regular expressions, functions, file input and output, and interacting with the larger Unix/Linux environment. Prior experience with the Unix/Linux command-line is recommended (previously or simultaneously taking Intro to Unix/Linux satisfies). 

      Python II (5 weeks @ 2 hrs per week)

      Part II of the Python series expands on basic programming topics and explores a common concept in modern software development called Object Oriented design, driven again by examples of biological data analysis. Although we will not cover the subtopics of inheritance or public/private variables, we will discuss the use of objects (and their blueprints: classes) in encapsulating functionality into easily used blocks of code that more closely match the biological concepts at hand. Other topics in this area include APIs and syntactic sugar. Finally, we’ll use these ideas to explore creating and using packages such as the BioPython package. Prior experience with the Unix/Linux command-line is recommended (previously or simultaneously taking Intro to Unix/Linux satisfies). 

      R

      Data Programming in R (6 weeks @ 3 hrs per week)

      The R programming language is widely used for the analysis of statistical data sets. This course introduces the language from a computer science perspective, covering topics such as basic data types (e.g. integers, numerics, characters, vectors, lists, matrices, and data frames), importing and manipulating data (in particular, vector and data-frame indexing), control flow (loops, conditionals, and functions), and good practices for producing readable, reusable, and efficient R code. We'll also explore functional programming concepts and the powerful data manipulation and visualization packages dplyr and tidyr, and ggplot2.

      Introduction to R and RStudio (6 weeks, Online and Instructor-Led)

      This is an online version of Data Programming in R. Participants work through readings, videos, and exercises at their own pace, with guidance from an instructor as needed.

       

      Analysis

      Genotyping By Sequencing

      GBS I (5 weeks @ 2 hrs per week)

      This module covers the analysis of data generated by genotyping-by-sequencing (GBS), a restriction-enzyme approach allowing the deep sequencing of many individuals or samples at select regions of the genome. GBS I covers applications of GBS for non-model organisms, focusing on reference-free analysis with the Stacks pipeline. There are no prerequisites, as necessary concepts on the command-line and in R will also be covered.

      GBS II (5 weeks @ 2 hrs per week)

      This followup to GBS I covers applications of GBS for model organisms, focusing on reference-guided alignment and analysis with the Stacks pipeline. Other topics may include case studies and examples of GBS application. There are no prerequisites, as necessary concepts on the command-line and in R will also be covered.

       

      RNA-Seq

      RNA-Seq I (5 weeks @ 2 hrs per week)

      The first in a pair on analyzing RNA-seq data covers the development of de-novo transcriptome assemblies. This includes data cleaning and preparation, comparing methods of assembly, filtering of contigs and assessing the quality of output. There are no prerequisites, as necessary concepts on the command-line and in R will also be covered. 

      RNA-Seq II (5 weeks @ 2 hrs per week)

      This second in the pair on analyzing RNA-seq data covers the analysis of differential expression. Topics include data preparation, read mapping, region identification and statistical analysis with R and Bioconductor. There are no prerequisites, as necessary concepts on the command-line and in R will also be covered.

       

      Meta(BARCODING|genomics)

      Metabarcoding (5 weeks @ 2 hrs per week)

      This short (5-wk) course will provide computational experience with the analysis of 16s rRNA amplicon data. Starting with raw sequence data, attendees will work through a series of exercises utilizing two different pipelines (Mothur & Dada2) for classifying 16s rRNA data. There are no prerequisites, as necessary concepts on the command-line and in R will also be covered.

      Metagenomics (5 weeks @ 2 hrs per week)

      This short (5-wk) course will provide computational experience with the analysis of shotgun metagenomic data. Starting with raw sequence data, attendees will work through a series of exercises for profiling the taxonomy and function of metagenomic samples with Metaphlan2 and Humann2. There are no prerequisites, as necessary concepts on the command-line and in R will also be covered.