CAMBIA's Sequence Project
Now you can search the protein and DNA sequences listed in patents and patent applications, and specifically search sequences in claims, using bioinformatics tools such as NCBI's BLAST.
Because all the sequences in these datasets have not been submitted with full annotation data to Genbank, we are unable to offer NCBI's client server support for all types of BLAST queries (such as filters by organism name), but welcome collaboration to improve what we can offer here through the Patent Lens.
Sequence listings have been available from the USPTO patent search output page for any given patent application, but you had to have a reason to go to that patent application among the thousands of others first. (Also, for any patent application or patent that has a biological sequence listing longer than 300 pages, the listings were only downloadable from a separate webpage!) CAMBIA's work has brought them together in one readily searchable database, and delineated which sequences are actually mentioned in the claims of the US patent applications.
For those who would like to do analysis of larger data sets rather than search single sequences, this project provides FASTA format files of biological sequences extracted from USPTO patent grants and applications. For any derived data products that were produced using the original data set, the user should properly cite the data in any publication or in the metadata, in the following form:
We've supplied the information on the sequences in patent applications, and in claims, to Genbank, which is also working with the USPTO. Genbank sequences can now link to the relevant patent application documents in both the USPTO database and the Patent Lens, as currently happens for US granted patent documents.
An advantage of the link to the patent documents on the Patent Lens, in addition to downloadable pdfs the USPTO does not supply, is that entries are also linked to information on the status of related patents and applications in other countries that report this information. Since many of these countries do not provide searchable databases to the general public, this may be the only notification the public has about pending gene sequence applications in these countries. We welcome collaboration to develop this dataset further, to extend to sequence listings submitted in patent applications to other jurisdictions, for example.