Rosa chinensis
patscan

PatScan Form

Enter a pattern


Select a target database


Scan also the complementary strand (default false)



PatScan was developed by Ross Overbeek and Mark D'Souza and is maintained by the Bioinformatics group at Argonne National Laboratory.

If you would like to cite PatScan please use the following reference:
Dsouza M, Larsen N, Overbeek R. Searching for patterns in genomic data. Trends Genet. 1997 Dec;13(12):497-8.

Protein Sequence patterns

There are many options that can be employed in constructing a pattern to scan for. We suggest you consider the following simple examples: There are a number of other useful types of pattern units, but this should be enough to get you started. For more information see Rules for Forming Patterns.

Note that complex patterns could often match against a number of overlapping areas of a sequence: only the first would be reported (after a successful match, the matching algorithm picks up at the first character past the matched substring).

Results should look something like:
all
any(TS) 1...1 GP 1...1 G 4...4 any(LFIVM) G
sp|P02461|CA13_HUMAN:[1120,1131]:  S P GP A G QQGA I G
sp|P02461|CA13_HUMAN:[1132,1143]:  S P GP A G PRGP V G
sp|P02463|CA14_MOUSE:[1095,1106]:  S P GP R G SPGN I G
sp|P02465|CA21_BOVIN:[158,169]  :  S V GP V G PAGP I G
sp|P04258|CA13_BOVIN:[412,423]  :  S P GP R G QPGV M G
sp|P04258|CA13_BOVIN:[964,975]  :  S P GP A G HQGA V G
sp|P04258|CA13_BOVIN:[976,987]  :  S P GP A G PRGP V G
sp|P08125|CA1A_CHICK:[80,91]    :  S P GP Q G PPGP L G
sp|P13941|CA13_RAT:[304,315]    :  S P GP A G PRGP V G
.
.
.
sp|Q02388|CA17_HUMAN:[2720,2731]:  S A GP P G PPGS V G
sp|P05997|CA25_HUMAN:[769,780]  :  T P GP K G DRGG I G
sp|P22138|RPA2_YEAST:[53,64]    :  T E GP D G GLLN L G
sp|P42382|CH60_EHRCH:[29,40]    :  T A GP K G LTVA I G
sp|Q01149|CA21_MOUSE:[749,760]  :  T K GP K G ENGI V G
COMPLETED REQUEST

So, try out some patterns and see what you get. Note that we limit the maximum number of reported hits; you can override the maximum, but we suggest that you only do so once you know that you really want to see a truly large number of matches.

Nucleotide Sequence patterns

There are many options that can be employed in constructing a pattern to scan for. We suggest you consider the following simple examples:
Thus, this pattern would match
cgtaaccaa ggttaacc ttggttacg 
Now for a short aside: PatScan will search only one strand, unless you ask for searches against the complementary strand, as well. With a pattern of the sort we just used, there is no need to search the opposite strand. However, it is normally the case that you will wish to search both the sequence and the opposite strand (i.e., the reverse complement of the sequence). You usually should ask for this option when you scan nucleotide sequences.

Let us stop now and ask "What additional features would one need to really find the kinds of loop structures that characterize tRNAs, rRNAs, and so forth?" Two immediately come to mind: There are a number of other useful types of pattern units, but this should be enough to get you started. For more information see Rules for Forming Patterns.

Note that complex patterns could often match against a number of overlapping areas of a sequence: only the first would be reported (after a successful match, the matching algorithm picks up at the first character past the matched substring).

Note that searches may take some time since they are queued (sometimes for a few hours, but results can often be obtained in just a few minutes).
Results should look something like:
embl|M27249:[142,162]      :  aaaaaaga  aatca    tctttttt 
embl|M35517:[343,363]      :  aaaaaaga  aatca    tctttttt 
embl|V00101:[241,261]      :  aaaaaaga  aatca    tctttttt 
embl|X07796:[343,363]      :  aaaaaaga  aatca    tctttttt 
embl|X56679:[1562,1587]    :  aaaaaagac ccttaggg gtctttttt
embl|M24537:[3334,3359]    :  aaaaaagcc cactagag ggctttttt
embl|M98822:[15,40]        :  aaaaaagcc cactagag ggctttttt
embl|D29985:[5641,5666]    :  aaaaaagcg cccttggg cgctttttt
embl|X73124:[83254,83277]  :  aaaaaccc  tttttaaa gggttttt 
embl|M12501:[611,636]      :  aaaaagact tggaaaca agtcttttt
.
.
.
embl|M77837:[623,644]      :  tttttaaa  ggtaca   tttaaaaa 
embl|M16192:[1522,1542]    :  tttttata  ataat    tataaaaa 
embl|L08822:[930,953]      :  tttttctg  tgctgaaa cagaaaaa 
embl|L25604:[3323,3347]    :  ttttttgaa ataaaac  ttcaaaaaa
embl|M97391:[3519,3543]    :  ttttttgaa gttttgt  ttcaaaaaa
COMPLETED REQUEST

Each line gives an EMBL accession number, followed by the positions in the EMBL entry that were matched by the pattern. The returned hits are sorted on the first field matched by the pattern.

So, try out some patterns and see what you get. Note that we limit the maximum number of reported hits; you can override the maximum, but we suggest that you only do so once you know that you really want to see a truly large number of matches.

Patscan Pattern - Basic Rules

Rules for Patterns

where "name" is one of {p1,p2,p3,...} and X is a basic pattern
unit.  When a named simple pattern unit successfully matches a
section of a sequence, that section can be later referred to in
constructs such as
p1
p2[0,1,0]
~p3
and so forth (see below). The "name" saves the value of the matched substring.

Additional Rules

Interpreting PatScan Results

The results of the search are emailed to the address you provide.

Protein Search Results

PatScan searches the Results should look something like:
all
any(TS) 1...1 GP 1...1 G 4...4 any(LFIVM) G
sp|P02461|CA13_HUMAN:[1120,1131]:  S P GP A G QQGA I G
sp|P02461|CA13_HUMAN:[1132,1143]:  S P GP A G PRGP V G
.
.
.
sp|P42382|CH60_EHRCH:[29,40]    :  T A GP K G LTVA I G
sp|Q01149|CA21_MOUSE:[749,760]  :  T K GP K G ENGI V G
COMPLETED REQUEST
The first line indicates that PatScan searched the entire database, and the second line returns the pattern you input. Subsequent lines list the matches to this pattern found in the database. For example:
sp|P02461|CA13_HUMAN:[1120,1131]:  S P GP A G QQGA I G
can be broken up into 5 fields.
sp  |  P02461  |  CA13_HUMAN  :  [1120,1131]  :  S P GP A G QQGA I G
|         |           |               |                   |
|         |           |               |                   |
Swiss-Prot     |     Swiss-Prot ID    position of       matched sequence
database      |                    matched sequence 
searched      |
Swiss-Prot
Accession number
The output is in alphabetical order based on the first character of the first pattern unit (in this case S first, then T). The matched sequence is broken into 8 segments to match the 8 pattern units of the input pattern. In the above example this breaks up as:
any(TS) 1...1 GP 1...1 G 4...4 any(LFIVM) G
|    |   |    |   |   |         |    |
S    P   GP   A   G  QQGA       I    G

Nucleotide Search results

Results should look something like:
fungi
p1=2...2 3...4  p2=4...5  3...3 ~p2 2...3 ~p1
embl|A02534:[1009,1028]  :  aa acc  cagc  agg gctg  gg  tt
embl|A06260:[1501,1521]  :  aa ata  taag  gaa ctta  tga tt
.
.
.
embl|Z50840:[2998,3019]  :  tt agct ctct  aga agag  tgt aa
embl|Z67741:[458,478]    :  tt gcg  gcgc  ctt gcgc  cag aa
COMPLETED REQUEST
The first line indicates that PatScan searched the fungi database, and the second line returns the pattern you input. Subsequent lines list the matches to this pattern found in the database. For example:
embl|A02534:[1009,1028]  :  aa acc  cagc  agg gctg  gg  tt
can be broken up into 4 fields.
embl  |  A02534  :  [1009,1028]      :  aa acc  cagc  agg gctg  gg  tt
|          |            |                         |
|          |            |                         |
EMBL       EMBL      position of           matched sequence 
database   Accession   matched sequence
searched    Number
The output is in alphabetical order based on the first character of the first pattern unit (in this case AA first, with TT last). The matched sequence is broken into 8 segments to match the 8 pattern units of the input pattern. In the above example this breaks up as:
p1=2...2 3...4  p2=4...5  3...3 ~p2 2...3 ~p1
|     |         |      |    |    |    |
aa   acc       cagc   agg  gctg  gg   tt		     
You can access the EMBL database to get complete information about the nucleotide sequence in which the matched sequence occurs, using: You may want to bookmark this link if you use it often.