OFF-SPOTTER
-----------
This is the source code of Off-Spotter as described in Pliatsika, V, and Rigoutsos, I (2015) "Off-Spotter: very fast and exhaustive enumeration of genomic lookalikes for designing CRISPR/Cas guide RNAs" Biol. Direct 10(1):4. You can also use Off-Spotter online at cm.jefferson.edu/Off-Spotter. 


VERSION INFORMATION
-------------------
VERSION 0.1


GENERAL
-------
Off-Spotter consists of 4 programs and all of them must be run in order to get results. Because of this division of the programs it is to get results fast. The programs are the following:

*Table creation: this needs to be run once per genome. It creates tables for the genome.

*Load Memory: this needs to be run once per result session. It loads the tables on memory.

*Results: this needs to be run once per query. It returns the hits of all gRNAs on input.

*Detach memory: this needs to be run once per result session. It unloads the tables from memory.

For more details look below.

The PAM sequences that are currently included on this program are NGG, NAG, NNNNACA, NNGRRT where R=A or G.


TERMS AND CONDITIONS
--------------------
/***********************************************************************************************************/
/*   This code (© 2015 Thomas Jefferson University, All Rights Reserved) was created by Venetia Pliatsika  */
/*   and Isidore Rigoutsos and is an implementation of the Off-Spotter algorithm that appears in           */
/*   Pliatsika, V, and Rigoutsos, I (2015) "Off-Spotter: very fast and exhaustive enumeration of genomic   */
/*   lookalikes for designing CRISPR/Cas guide RNAs" Biol. Direct 10(1):4.                                 */
/*                                                                                                         */
/* Use of these codes is bound by the following terms and conditions:                                      */
/*                                                                                                         */
/* Terms of Use: This code can be freely used for research, academic and other non-profit activities.      */
/* Only one instance of the code may be used at a time, and then for only one concurrent user. You may not */
/* use the code to conduct any type of application service, service bureau or time-sharing operation or to */
/* provide any remote processing, network processing, network telecommunications or similar services to    */
/* any person, entity or organization, whether on a fee basis or otherwise. The code can be copied and     */
/* compiled on any platform for the use authorized by these terms and conditions. All copies of the code   */
/* must be accompanied by this note. The code cannot be modified without the written permission of the     */
/* Computational Medicine Center of Thomas Jefferson University https://cm.jefferson.edu                   */
/*                                                                                                         */
/* Commercial use is strictly prohibited.  If you wish to use these codes commercially please contact the  */
/* Computational Medicine Center of Thomas Jefferson University: https://cm.jefferson.edu/contact-us/      */
/*                                                                                                         */
/*                                                                                                         */
/*     THE CODE IS PROVIDED “AS IS” WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESSED    */
/*     OR IMPLIED. TO THE FULLEST EXTENT PERMISSIBLE PURSUANT TO APPLICABLE LAW. THOMAS JEFFERSON          */
/*     UNIVERSITY, AND ITS AFFILIATES, DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING, BUT NOT     */
/*     LIMITED TO, THE IMPLIED WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND  */
/*     NON-INFRINGEMENT.                                                                                   */
/*                                                                                                         */
/*     NEITHER THOMAS JEFFERSON UNIVERSITY NOR ITS AFFILIATES MAKE ANY REPRESENTATION AS TO THE RESULTS    */
/*     TO BE OBTAINED FROM USE OF THE CODE.                                                                */
/***********************************************************************************************************/

RUNNING
-------
First time using this genome:
0. Go to the table where all the Off-Spotter files are downloaded to.
1. Run "make" to get the executables.
2. Run "Table_Creation <full path of genome_file>" see below on Table creation section for details on <genome_file>. This creates 2 files called data.bin and index.bin. Those files must be on the same folder and the full path to that folder should be given as input to the next step. 
3. Run "Load_Memory <full path of .bin files>".
4. Run "Results <input full path> <PAm sequence> <max mismatch number> <Annotation file full paths divided by -->". After all gRNAs found on input file are processed the results will appear on a file named "Off-Spotter_output_<input filename>.txt". You can run multiple queries at the same time as long as the input files have different names.
5. When you don't want to run more queries, run "Detach_Memory".

If you already used this genome before, skip steps 1 and 2.


T A B L E   C R E A T I O N
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
This is the first program that needs to be run. It gets an input file that has all the genomic data and creates the two table files that Off-Spotter uses. The first table file has all precompiled hits and the second a hash table on them. This will take some time to run and it doesn't need to be run every time. You can keep the files that it creates for future use.

INPUT
-----
The input file consists of the sequences of the chromosomes. There must be one file that holds all chromosomes. Before each chromosome there should be a header that starts with ">" followed by the chromosome number. The chromosome number can be followed by anything but another number. eg. ">13_hkhk". Each chromosome's bases must be in one line.

OUTPUT
------
2 files. data.bin, index.bin
Keep those files on the same folder and retain them for future use.

NOTES
-----
1. All gRNA hits that include an N are ignored by Off-Spotter and not reported.
2. If you want to use more than 25 chromosomes you need to edit the max chrome number.


L O A D   M E M O R Y
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
This program will load the .bin files on memory. Therefore it is necessary to have at least 23 GB of RAM available. You need to load the tables before you run the REsults program. However you don't have to run this every time. You can load the tables and run Results as many times as you want without re-running this program. To remove the tables from memory you need to run Detach_Memory.

INPUT
-----
The input of this program is the full path of the data.bin and index.bin files that were created by Table_Creation.


OUTPUT
------
No output


NOTES
-----
1. You must have at least 23GB of RAM available.


R E S U L T S
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
INPUT
-----

*INPUT FILE
The input file can have 2 formats.
A) The input file must be in FASTA format. The headers are ignored. At least one line must have length more than 20 though (or it can't have gRNA+PAM and is considered as format B).
B) The input file has only 20mers and only one per line. Thus the file has 20 characters per line. 
If the file has format A then it is searched for gRNAs and then each gRNA is run in turn. If the file has B format then each line is considered a gRNA and run in turn. 

*PAM
The available PAMs are
-NGG
-NAG
-NNNNACA
-NNGRRT
And you can use either of them.

*MAX MISMATCH NUMBER
An integer between 0 and 5 inclusive. If you use x then you'll get al results that have at most x different bases than the input. Hence, 0 returns exact hits only, 1 returns exact hits and hits that differ in 1 position out of the 20 etc. The N's in the PAMs are not considered as mismatches, no need to take them into account.

*ANNOTATION FILE (optional)
The annotation file should be ordered and also should have the following structure:
chromosome name \t strand \t start position \t end position \t info
where chromosome is a number and X=23, Y=24, MT=25. And strand is + or -.
For example:
1       -       739121  739137  ENSG00000269831_ENST00000599533_AL669831.1_Uncharacterized protein
you can give multiple files but you should give the full path of each and separate them with "--" and no spaces.

OUTPUT
------
The output has one line per hit. Each line has the following format:
*Chromosome name
*Strand
*Coordinate start
*Coordinate end
*gRNA used for search
*genomic hit with PAM resolved
*number of mismatches between gRNA used and genomic hit
*Annotation info or - if no entry exists in the annotation files for those coordinates. That column will appear as many times as the annotation files entered, once per file.

NOTES
-----
1. Maximum full path length is 256. You can edit that at the top of Off-Spotter_results.cpp.


D E T A C H   M E M O R Y
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
Removes the tables loaded by Load_Memory from shared memory. Use when you don't need to make more queries.

INPUT
-----
None.

OUTPUT
------
None.

NOTES
-----
-
