Illumina index switching ("index hopping") mitigation strategies

A recent manuscript [1] documented the "index switching" of multiplexed samples on the Illumina platforms. This issue is distinct from the "cross-talk"/"sample bleeding" issue we previously identified [2]. Illumina has released a white paper [3] ("Effects of Index Misassignment on Multiplexing and Downstream Analysis") that addresses "index switching"/"index hopping."

We are in the process of evaluating Illumina's recommendations and will follow up with the community regarding the implementation of mitigation strategies. 

In the meantime, until we can provide further information, we recommend to avoid multiplexing samples together when index switching from one sample to another at a level of up to 2% could cause a biologically misleading false positive result. If you are uncertain about your multiplexing design please consult with the CGRB staff

References

[1] http://biorxiv.org/content/early/2017/04/09/125724 
[2] http://cgrb.oregonstate.edu/core/illumina-hiseq-3000/illumina-barcodes 
[3] https://www.illumina.com/content/dam/illumina-marketing/documents/products/whitepapers/index-hopping-white-paper-770-2017-004.pdf

Illumina index cross-talk (“sample bleeding”) mitigation strategies

Please be advised that we have observed cases in which the Illumina HiSeq and MiSeq platforms have incorrectly assigned an index (“barcode”) to a sequence from an adjacent cluster on the flow cell. This “sample bleeding” [1] results in sequences (including PhiX clusters that have no index read) that are mis-binned into the incorrect FASTQ file during the demultiplexing step. Sample bleeding occurs due to characteristics of the Illumina hardware/software and is not related to biological cross-contamination. Anecdotally, sample bleeding has been observed to occur in less than 0.1% of sequences using single indexes and in less than 0.001% of sequences using dual indices.

Experiments using rare sequences in a sample to indicate the presence of a novel sequence may be impacted, e.g., such as in identifying the presence of a transcript in an RNA-seq experiment via a few reads or identifying novel sequence in a genome based on contigs with very low coverage.

The following may help mitigate the sample bleeding issue:

  • Use of dual indexing kits instead of single index kits in library preparation [2]
  • Increasing the percentage of PhiX spike-in, which will reduce the number of adjacent clusters with indices [3]
  • Lower the amount of library loaded on the CGRB’s MiSeq to decrease the cluster density
    • This technique will not work for the CGRB’s HiSeq 3000 “patterned” flow cell [4]
    • OHSU’s HiSeq 2500 can be utilized with lowered amounts of library loaded; contact Mark for details [5]
  • Run only one sample in a lane or carefully consider which samples share the same lane


[1] “Strategies for Achieving High Sequencing Accuracy for Low Diversity Samples and Avoiding Sample Bleeding Using Illumina Platform” (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393298/)
[2] http://cgrb.oregonstate.edu/core/illumina-hiseq-3000/illumina-barcodes
[3] http://cgrb.oregonstate.edu/core/illumina-hiseq-3000/phix-spike
[4] http://cgrb.oregonstate.edu/core/illumina-hiseq-3000
[5] http://www.ohsu.edu/xd/research/research-cores/massively-parallel-sequencing.cfm 

Illumina unintended binning (from unsupported single lane mixing of indexing kits)

Please be advised that we have observed cases in which the (unsupported) mixing of indexing kits in a single lane on the HiSeq and MiSeq platforms (e.g., dual and single indexed samples in the same lane) have resulted in unintended binning, i.e., sequences may be mis-binned (mis-assigned) into the incorrect FASTQ file during the demultiplexing step.

This mis-binning can occur as both the HiSeq 3000 and MiSeq demultiplexing step allows for a 1 base mismatch in the index read by default. Indexes that are close in base composition may be mis-binned due to sequencing error when allowing for this 1 base mismatch.

This unintended binning issue can be mitigated by re-demultiplexing a lane allowing for a 0 base mismatch in the index reads, i.e., indices must be perfectly matched to be “binned.” 

Please contact Matthew if you have questions or require re-demultiplexing of a lane that may be affected by this issue.

Illumina indexing

Illumina samples can be barcoded or indexed using two different techniques:

Commercial Kits

  • Several commercial "barcode" kits are available:
    • Illumina TruSeq LT kits for DNA and RNA support 24 indices per lane.
    • Illumina TruSeq HT kits for DNA and RNA support 96 libraries per lane using dual indexing.
    • Illumina TruSeq Small RNA kits support 48 indices per lane.
    • Illumina Nextera XT kits for DNA support 384 libraries per lane using dual indexing.
    • Third party index kits are also available (ex. Bioo Scientific NEXTflex).
  • Be aware of pooling guidelines for TruSeq, Nextera kits and other comercial kits.  "Illumina uses a green laser to sequence G/T and a red laser to sequence A/C.  At each cycle, at least one of two nucleotides for each color channel needs to be read to ensure proper image registration.  It is important to maintain color balance for each base of the index read being sequenced, otherwise index read sequencing could fail due to registration failure".
  • Flow cells support the use of several different "barcode" kits simultaneously. 
    • Mixing of kits in a single lane is not supported. 
    • If you do choose to mix kits in a single lane with different length indices, e.g., on the Illumina MiSeq, it may incur an additional custom charge to demultiplex the samples.
  • Please contact Mark Dasenko if you have questions regarding the new Illumina kits or the submission of samples.
  • Please contact Matthew Peterson to help set up the SampleSheet.csv file when performing a multiplexed run.
  • Multiplexed samples are saved in the following directory structure:
    • {RUN FOLDER}/L{LANE NUMBERS}/
      • e.g., /hts2/illumina/150604_J00107_0013_AH2W2VBBXX_1303/L13457/

"Homegrown" (prefixed) Adapters

  • Illumina does not have an official Sample Prep Protocol for "homegrown" (first read or prefix indexing) barcodes; "homegrown" barcodes are not supported by Illumina. Illumina will not reimburse reagent costs for problematic lanes using "homegrown" barcodes.
  • If users choose to use "homegrown" barcodes, they do so at their own risk. Failure to have sufficient diversity in sequence in the barcodes has the strong potential to yield no data.
  • If you choose to use "homegrown" barcodes, the following techniques have been suggested by other labs:
    • "Illumina uses a green laser to sequence G/T bases and a red laser to sequence A/C bases. At each cycle at least one of two nucleotides for each color channel needs to be read to ensure proper registration." - Illumina Sequencing Questions & Answers
    • Have an even base distribution per cycle (25% of each nucleotide), e.g.
      • GATCT
      • ATCGT
      • TCGAT
      • CGATT
    • An A overhang is added using the polymerase activity of Klenow fragment. Adapters are then ligated via a T-overhang. As a consequence, the first base after the "home made" barcode will be a T as seen in the above example.
    • Avoid repeating nucleotides (e.g., AA) in your barcodes.
    • Use a minimum of four barcodes per lane. Additional barcodes (8 or more) allow for greater sequence diversity, which yields better results.
  • It is the responsibility of end-users to demultiplex their sequence files that contain homegrown barcoded samples.