Registration

We are accepting registrations for Fall term, 2019 (see descriptions below):

Introduction to Unix/Linux and Command-Line Data Analysis (Instructor: Matthew Peterson)

  • These two modules are offered as a back to back pair:
    • Introduction to Unix/Linux, Sep 25 – Oct 23, Mon/Wed 2:00pm – 2:50pm
    • Command-Line Data Analysis, Nov 4  – Dec 4, Mon/Wed, 2:00pm – 2:50pm
  • They are available for student credit:
    • Introduction to Unix/Linux, BDS599 CRN 20579
    • Command-Line Data Analysis, BDS599 CRN 20580
  • Non-students can attend as a workshop:
    • Introduction to Unix/Linux, $250
    • Command-Line Data Analysis, $250

To sign up for a workshop or to get other information, email Matthew Peterson (Workshop payments can be made via OSU index or check; indicate in your email which payment method you wish to use.) 

 

RNA-Sequencing (Instructor: Andrew Black)

  • Dates: Sept 25 – Dec 5, Tue/Thur 11:00am – 11:50am
  • Course available for student credit
    • BDS 599, CRN 20581
  • Non-students can attend as a workshop:
    • RNA-seq, $500

To sign up for a workshop or to get other information, email Andrew Black (Workshop payments can be made via OSU index or check; indicate in your email which payment method you wish to use.) 

 

Data Programming in R (Instructor: Shawn O'Neil)

  • Dates: Sept. 25 – Nov. 6, Mon/Weds/Fri 9:00am – 9:50am
  • Course available for student credit (2 credits, letter grade)
    • ST 599, CRN 17196
    • (Note that the registrar lists the end date as Dec. 6, we're getting that fixed)
  • Non-students can attend as a workshop:
    • Data Programming in R, $500

To sign up for a workshop or to get other information, email Shawn O'Neil (Workshop payments can be made via OSU index or check; indicate in your email which payment method you wish to use.) 

 

Overview

In collaboration with the Departments of Statistics and Biological Data Science, the CGRB offers a number of workshops and classes available to both internal and external faculty, staff, postdocs, and students. Generally these are 1- or 2-credit, 5- or 10-week classes offered in academic terms according to the schedule illustrated below.  For questions on course content, please see the descriptions below and/or contact the trainers with questions.

Most of these utilize our Advanced Cyberinfrastructure Teaching Facility.

(Note: this schedule is in development, and represents our best estimate for upcoming offerings.)

 

WORKSHOP DESCRIPTIONS
Infrastructure Programming Analysis

Infrastructure

Command Line

Introduction to Unix/Linux (5 weeks @ 2 hrs per week)

This module introduces the natural environment of bioinformatics: the Linux command line. Material will cover logging into remote machines, filesystem organization and file manipulation, and installing and using software (including examples such as HMMER, BLAST, and MUSCLE). Finally, we introduce the CGRB research infrastructure (including submitting batch jobs) and concepts for data analysis on the command line with tools such as grep and wc.

Command-Line Data Analysis (5 weeks @ 2 hrs per week)

The Linux command-line environment has long been used for analyzing text-based and scientific data, and there are a large number of tools pre-installed for data analysis. These can be chained together to form powerful pipelines. Material will cover these and related tools (including grep, sort, awk, sed, etc.) driven by examples of biological data in a problem-solving context that introduces programmatic thinking. This module also covers regular expressions, a useful syntax for matching and substituting string and sequence data.

 

Programming

Python

Python I (5 weeks @ 2 hrs per week)

This module introduces programming concepts, driven by examples of biological data analysis, in the Python programming language. Topics covered will include variables and data types (including strings, integers and floats, dictionaries and lists), control flow (loops, conditionals, and some boolean logic), variable scope and its proper use, basic usage of regular expressions, functions, file input and output, and interacting with the larger Unix/Linux environment. Prior experience with the Unix/Linux command-line is recommended (previously or simultaneously taking Intro to Unix/Linux satisfies). 

Python II (5 weeks @ 2 hrs per week)

Part II of the Python series expands on basic programming topics and explores a common concept in modern software development called Object Oriented design, driven again by examples of biological data analysis. Although we will not cover the subtopics of inheritance or public/private variables, we will discuss the use of objects (and their blueprints: classes) in encapsulating functionality into easily used blocks of code that more closely match the biological concepts at hand. Other topics in this area include APIs and syntactic sugar. Finally, we’ll use these ideas to explore creating and using packages such as the BioPython package. Prior experience with the Unix/Linux command-line is recommended (previously or simultaneously taking Intro to Unix/Linux satisfies). 

R

Data Programming in R (6 weeks @ 3 hrs per week)

The R programming language is widely used for the analysis of statistical data sets. This course introduces the language from a computer science perspective, covering topics such as basic data types (e.g. integers, numerics, characters, vectors, lists, matrices, and data frames), importing and manipulating data (in particular, vector and data-frame indexing), control flow (loops, conditionals, and functions), and good practices for producing readable, reusable, and efficient R code. We'll also explore functional programming concepts and the powerful data manipulation and visualization packages dplyr and tidyr, and ggplot2.

Introduction to R and RStudio (6 weeks, Online and Instructor-Led)

This is an online version of Data Programming in R. Participants work through readings, videos, and exercises at their own pace, with guidance from an instructor as needed.

 

Analysis

Genotyping By Sequencing

GBS  (10 weeks @ 2 hrs per week)

This course provides a general introduction to, and practical experience with, Genotyping By Sequencing (GBS). After a general overview, hands on experience will be obtained in basic concepts of command line, R-studio, and accessing and utilizing a computing infrastructure before exploring the methodology associated with GBS and other types of restriction-based sequencing techniques (e.g. RAD-seq). Starting with raw sequence data, students will then work through a series of exercises to generate and analyze test GBS data.

 

RNA-SeqUENCING

RNA-Seq (10 weeks @ 2 hrs per week)

This course provides an introduction to, and practical experience with, the computational component of bulk-RNA-sequencing. After a general overview, participants will obtain a working introduction to command line, R-studio, and accessing and utilizing a computing infrastructure. Students with then work through a series of exercises cleaning raw FASTQ files, aligning reads to a reference genome, quasi-mapping reads to a transcriptome / de novo assembly, followed by data visualization and Differential Gene Expression analysis.

 

Environmental Sequence Analyses

Environmental Sequence analyses (10 weeks @ 2 hrs per week)

This course provides practical experience with, 16s rRNA amplicon sequencing and shotgun metagenomics. After a general overview, participants will be given a working introduction to command line, R-studio, and accessing and utilizing a computing infrastructure. Beginning with raw sequence data, students will then work through a series of hands-on exercises for profiling 16s rRNA sequences (using MOTHUR & DADA2) and determining the taxonomy and functional composition of metagenomic samples (using METAPHLAN2 & HUMANN2).