.. _workflows: Workflow =============================== The following is a full DNAscent workflow, where we'll start off after Dorado has finished running. The recommended Dorado basecalling model for v4.0.3 is ``dna_r10.4.1_e8.2_400bps_fast@v5.0.0``. In particular, we assume the following: * You have a directory of R10.4.1 Oxford Nanopore POD5 files (which may be in subdirectories) that you want to use for detection. * These POD5 files and a reference/genome file have been passed to Dorado (available from Oxford Nanopore) to produce a bam file. Example Workflow ---------------- Pull the Singularity image: .. code-block:: console singularity pull DNAscent.sif library://mboemo/dnascent/dnascent:4.1.1 Alternatively, you can download and compile DNAscent: .. code-block:: console git clone --recursive https://github.com/MBoemo/DNAscent.git cd DNAscent git checkout 4.1.1 make cd .. Let's index the run: .. code-block:: console DNAscent index -f /full/path/to/pod5 This should only take a few seconds to run and will put a file called ``index.dnascent`` in the current directory. Suppose we have an output from Dorado called ``alignment.bam`` (which doesn't need to be sorted or indexed). You can run DNAscent detect (on 10 threads, for example) by running: .. code-block:: console DNAscent detect -b alignment.bam -r /full/path/to/reference.fasta -i index.dnascent -o detect_output.bam -t 10 If the system has a CUDA-compatible GPU in it, we can run ``nvidia-smi`` to get an output that looks like the following: .. code-block:: console Thu Aug 20 21:06:57 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... On | 00000000:05:00.0 Off | 0 | | N/A 41C P0 52W / 250W | 2571MiB / 16280MiB | 43% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 178943 C ...DNAscent_dev/bin/DNAscent 2569MiB | +-----------------------------------------------------------------------------+ From this, we can see that the GPU's device ID is 0 (just to the left of Tesla) so we can run: .. code-block:: console DNAscent detect -b alignment.bam -r /full/path/to/reference.fasta -i index.dnascent -o detect_output.bam -t 10 --GPU 0 Note that we're assuming the CUDA libraries for the GPU have been set up properly (see :ref:`installation`). If these libraries can't be accessed, DNAscent will splash a warning saying so and default back to using CPUs. When ``DNAscent detect`` is finished, it will should put a file in modbam format called ``detect_output.bam`` in the current directory. We can run ``DNAscent forkSense`` on the output of ``DNAscent detect`` to measure replication fork movement. Suppose that in our experimental protocol, we pulsed BrdU first followed by EdU. Let's run it on four threads and specify that we want it to keep track of replication origins, forks, termination sites, and analogue tracks: .. code-block:: console DNAscent forkSense -d detect_output.bam -o output.forkSense -t 4 --markOrigins --markTerminations --markAnalogues --markForks --order BrdU,EdU Note that we need, at a minimum, to specify ``--markForks`` and ``--markAnalogues`` if we want to use ``DNAscent seeBreaks`` below. We now have the following files from ``DNAscent forkSense``: * origins_DNAscent_forkSense.bed (with our origin calls), * terminations_DNAscent_forkSense.bed (with our termination calls), * leftForks_DNAscent_forkSense.bed (with our leftward-moving fork calls), * rightForks_DNAscent_forkSense.bed (with our rightward-moving fork calls), * BrdU_DNAscent_forkSense.bed (with our BrdU analogue tracks), * EdU_DNAscent_forkSense.bed (with our EdU analogue tracks), * output.forkSense. We can load ``detect_output.bam`` as well as the above bed files files directly into IGV to see where origins, forks, analogue tracks, and terminiations were called in the genome. If we've used an agent that targets the DNA damage response, or if we're working in a cell line that's prone to replication stress, we might want to see there are elevated levels of DNA breaks at replication forks. We can do this by passing the results of ``DNAscent detect`` and ``DNAscent forkSense`` to ``DNAscent seeBreaks``: .. code-block:: console DNAscent seeBreaks -d detect_output.bam -o output.seeBreaks --left leftForks_DNAscent_forkSense.bed --right rightForks_DNAscent_forkSense.bed --analogue EdU_DNAscent_forkSense.bed The resulting file, ``output.seeBreaks``, will contain statistics on the number of analogue tracks that terminate at read ends compared to the number that would be expected by chance. In particular, it includes a 95% confidence interval on the difference between observed and expected values. We would generally say breaking is elevated if zero lies outside this interval. You can see an example in the :ref:`cookbook` of how to parse and plot the distributions of expected and observed values. We might also be interested in inter-origin distance, the spacing between fired replication origins, which is an important marker of replication stress. Suppose the duration of our first analogue pulse was 5 minutes and the second analogue pulse was 10 minutes. We can do this by passing the pulse durations (in minutes) along with theresults of ``DNAscent detect`` and ``DNAscent forkSense`` to ``DNAscent meIODy``: .. code-block:: console DNAscent meIODy -l leftForks_DNAscent_forkSense.bed -r rightForks_DNAscent_forkSense.bed --origin origins_DNAscent_forkSense.bed --termination terminations_DNAscent_forkSense.bed -d detect_output.bam --tPulse1 5. --tPulse2 10. -o output.IOD The resulting file, ``output.IOD``, will contain statistics on the inter-origin distance, including the median IOD and a 95% confidence interval. You can see an example in the :ref:`meIODy` of how to visualise your results. Barcoding --------- The workflow for a barcoded run is very similar to the workflow above. Rather than using the bam file directly from the ``Dorado basecaller`` executable, this bam file is first passed to the ``Dorado demux`` executable and the resulting bam files are sorted and passed to ``DNAscent detect``.