Cambia's Sequence Project
Introduction
This project provides FASTA format files of biological sequences extracted from USPTO patent grants and applications.
To allow searching over just the claimed sequences, the subset of the sequences that are referenced in the claims is also provided.
You can search these data files using bioinformatics tools such as NCBI's BLAST.
The software used to produce this data is made available in the hope that:
- it discloses the methods used and their strengths and weakness and may stimulate suggestions for improvements;
- it may be extended to handle data formats used by other patent offices;
- it provides a parser for WIPO's ST.25 format which could be used to improve patent data quality by validation prior to publication.
Products
The project deliverables are: