forkSense

DNAscent forkSense is a DNAscent subprogram that interprets the pattern of BrdU and EdU incorporation on each molecule, segmenting the read to show where leftward- or rightward-moving forks were moving during the BrdU and EdU pulses.

Usage

To run DNAscent forkSense, do:
   DNAscent forkSense -d /path/to/BrdUCalls.detect -o /path/to/output.forkSense --order EdU,BrdU
Required arguments are:
  -d,--detect               path to output file from DNAscent detect,
  -o,--output               path to output file for forkSense,
     --order                order in which the analogues were pulsed (EdU,BrdU or BrdU,EdU).
Optional arguments are:
  -t,--threads              number of threads (default: 1 thread),
     --markAnalogues           writes analogue incorporation locations to a bed file (default: off),
     --markOrigins             writes replication origin locations to a bed file (default: off),
     --markTerminations        writes replication termination locations to a bed file (default: off),
     --markForks               writes replication fork locations to a bed file (default: off).

The required inputs of DNAscent forkSense are the output file produced by DNAscent detect, a new output file name for DNAscent forkSense to write on, and the order in which the analogues were pulsed. In the example command above, the --order flag indicates that EdU was pulsed first, and BrdU was pulsed second. The order of the pulses is important for determining fork direction and differentiating between origins and termination sites, but no information about the pulse length is needed. Note that the detect file must have been produced using the >v3.0.0 ResNet algorithm; DNAscent forkSense is not compatible with legacy HMM-based detection. Note further that >v3.0.0 DNAscent forkSense is not back compatible with the previous BrdU-only protocol, as it relies on the incorporation of both BrdU and EdU to determine fork direction. Users with data from a BrdU-only pulse-chase protocol should use DNAscent v2.0.2.

Output

Main Output File

DNAscent forkSense will produce a human-readable output file with the name and location that you specified using the -o flag. Like the output of DNAscent detect, this file starts with a short header:

#DetectFile /path/to/DNAscent.detect
#Threads 1
#Compute CPU
#SystemStartTime 10/06/2022 13:04:33
#Software /path/to/DNAscent
#Version 3.0.0
#Commit b9598a9e5bfa5f8314f92ba0f4fed39be1aee0be
#EstimatedRegionBrdU 0.559506
#EstimatedRegionEdU 0.202767

The fields in this header are analagous to the header from DNAscent detect, but it includes two additional lines with an estimate of the thymidine-to-BrdU substitution rate in BrdU-positive regions and an estimate of the thymidine-to-EdU substitution rate in EdU-positive regions. In the example above, approximately 56% of thymidines are substituted for BrdU in BrdU-positive regions.

The rest of this file has similar formatting to that of DNAscent detect. The format for the read headers is the same. From left to right, the tab-delimited columns indicate:

  • the coordinate on the reference,
  • a Boolean (0 or 1) indicating whether that position is in an EdU-positive region,
  • a Boolean (0 or 1) indicating whether that position is in an BrdU-positive region.

The following example output shows an example:

>806d1f69-1054-4b74-8356-d935a282a22e 11 1089865 1130164 fwd
1089873 0       0
1089874 0       0
1089877 0       0
1089878 0       0
1089879 0       0
1089880 0       0
1089882 0       0
1089895 0       0
1089899 0       0

Only reads that have at least one BrdU-positive or EdU-positive segment are written to this file. Reads with no base analogue segments called on them are omitted from this file, as 0’s everywhere across these reads is implied. Note that the format of this file has changed substantially from DNAscent v2.*. This design decision stems from a shift in the algorithm used, as well as the desire to avoid using excess disk space with redundant information.

Bed Files

If the --markOrigins flag is passed, DNAscent forkSense will write the genomic region between matched leftward- and rightward-moving forks to a bed file called origins_DNAscent_forkSense.bed in the working directory. Likewise, if the --markTerminations flag is passed, the genomic region between leftward- and rightward-moving forks moving towards each other will be recorded in a bed file called terminations_DNAscent_forkSense.bed. The flag --markAnalogues will create two separate bed files: one containing the genomic location of BrdU-positive segments, and another containing the genomic location of EdU-positive segments.

If the --markForks flag is passed, two bed files will be created in the working directory. The genomic location of leftward- and rightward-moving forks will be written to separate bed files called leftForks_DNAscent_forkSense.bed and rightForks_DNAscent_forkSense.bed.

All output bed files have the following space-separated columns:

  • chromosome name,
  • 5’ boundary of the origin (or terminiation site, or fork),
  • 3’ boundary of the origin (or terminiation site, or fork),
  • read header of the read that the call came from (similar to those in the output file of DNAscent detect).

For origins and termination sites, the “resolution” of the calls (i.e., the third column minus the second column) will depend on your experimental setup. In synchronised early S-phase cells, the genomic distance between the 5’ and 3’ boundaries likely to be small for origins and large for termination sites, as the leftward- and rightward-moving forks should be together near the origin. In asynchronous or mid/late S-phase cells, the origin calls may appear to be a “lower’’ resolution (i.e., larger differences between the 5’ and 3’ boundaries) as the forks from a single origin will have travelled some distance before the pulses. When both forks are together at an origin, the origin bed file will record the midpoint of the analogue segment for the analogue that was pulsed first.

The bed files created by DNAscent forkSense can be opened directly with a genome browser.