Workflow

The following is a full DNAscent workflow, where we’ll start off after Dorado has finished running. The recommended Dorado basecalling model for v4.0.3 is dna_r10.4.1_e8.2_400bps_fast@v5.0.0. In particular, we assume the following:

  • You have a directory of R10.4.1 Oxford Nanopore POD5 files (which may be in subdirectories) that you want to use for detection.

  • These POD5 files and a reference/genome file have been passed to Dorado (available from Oxford Nanopore) to produce a bam file.

Example Workflow

Pull the Singularity image:

singularity pull DNAscent.sif library://mboemo/dnascent/dnascent:4.1.1

Alternatively, you can download and compile DNAscent:

git clone --recursive https://github.com/MBoemo/DNAscent.git
cd DNAscent
git checkout 4.1.1
make
cd ..

Let’s index the run:

DNAscent index -f /full/path/to/pod5

This should only take a few seconds to run and will put a file called index.dnascent in the current directory.

Suppose we have an output from Dorado called alignment.bam (which doesn’t need to be sorted or indexed). You can run DNAscent detect (on 10 threads, for example) by running:

DNAscent detect -b alignment.bam -r /full/path/to/reference.fasta -i index.dnascent -o detect_output.bam -t 10

If the system has a CUDA-compatible GPU in it, we can run nvidia-smi to get an output that looks like the following:

Thu Aug 20 21:06:57 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:05:00.0 Off |                    0 |
| N/A   41C    P0    52W / 250W |   2571MiB / 16280MiB |     43%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    178943      C   ...DNAscent_dev/bin/DNAscent     2569MiB |
+-----------------------------------------------------------------------------+

From this, we can see that the GPU’s device ID is 0 (just to the left of Tesla) so we can run:

DNAscent detect -b alignment.bam -r /full/path/to/reference.fasta -i index.dnascent -o detect_output.bam -t 10 --GPU 0

Note that we’re assuming the CUDA libraries for the GPU have been set up properly (see Getting Started). If these libraries can’t be accessed, DNAscent will splash a warning saying so and default back to using CPUs.

When DNAscent detect is finished, it will should put a file in modbam format called detect_output.bam in the current directory.

We can run DNAscent forkSense on the output of DNAscent detect to measure replication fork movement. Suppose that in our experimental protocol, we pulsed BrdU first followed by EdU. Let’s run it on four threads and specify that we want it to keep track of replication origins, forks, termination sites, and analogue tracks:

DNAscent forkSense -d detect_output.bam -o output.forkSense -t 4 --markOrigins --markTerminations --markAnalogues --markForks --order BrdU,EdU

Note that we need, at a minimum, to specify --markForks and --markAnalogues if we want to use DNAscent seeBreaks below.

We now have the following files from DNAscent forkSense:

  • origins_DNAscent_forkSense.bed (with our origin calls),

  • terminations_DNAscent_forkSense.bed (with our termination calls),

  • leftForks_DNAscent_forkSense.bed (with our leftward-moving fork calls),

  • rightForks_DNAscent_forkSense.bed (with our rightward-moving fork calls),

  • BrdU_DNAscent_forkSense.bed (with our BrdU analogue tracks),

  • EdU_DNAscent_forkSense.bed (with our EdU analogue tracks),

  • output.forkSense.

We can load detect_output.bam as well as the above bed files files directly into IGV to see where origins, forks, analogue tracks, and terminiations were called in the genome.

If we’ve used an agent that targets the DNA damage response, or if we’re working in a cell line that’s prone to replication stress, we might want to see there are elevated levels of DNA breaks at replication forks. We can do this by passing the results of DNAscent detect and DNAscent forkSense to DNAscent seeBreaks:

DNAscent seeBreaks -d detect_output.bam -o output.seeBreaks --left leftForks_DNAscent_forkSense.bed --right rightForks_DNAscent_forkSense.bed --analogue EdU_DNAscent_forkSense.bed

The resulting file, output.seeBreaks, will contain statistics on the number of analogue tracks that terminate at read ends compared to the number that would be expected by chance. In particular, it includes a 95% confidence interval on the difference between observed and expected values. We would generally say breaking is elevated if zero lies outside this interval. You can see an example in the Python Cookbook of how to parse and plot the distributions of expected and observed values.

We might also be interested in inter-origin distance, the spacing between fired replication origins, which is an important marker of replication stress. Suppose the duration of our first analogue pulse was 5 minutes and the second analogue pulse was 10 minutes. We can do this by passing the pulse durations (in minutes) along with theresults of DNAscent detect and DNAscent forkSense to DNAscent meIODy:

DNAscent meIODy -l leftForks_DNAscent_forkSense.bed -r rightForks_DNAscent_forkSense.bed --origin origins_DNAscent_forkSense.bed --termination terminations_DNAscent_forkSense.bed -d detect_output.bam --tPulse1 5. --tPulse2 10. -o output.IOD

The resulting file, output.IOD, will contain statistics on the inter-origin distance, including the median IOD and a 95% confidence interval. You can see an example in the meIODy of how to visualise your results.

Barcoding

The workflow for a barcoded run is very similar to the workflow above. Rather than using the bam file directly from the Dorado basecaller executable, this bam file is first passed to the Dorado demux executable and the resulting bam files are sorted and passed to DNAscent detect.