The exports module

CIViCpy supports exporting of CIViC records to Variant Call Format (VCF) files. This enables downstream analyses such as integrating with IGV, VEP, and other common bioinformatics tools. VCF exports are maintained via the exports module:

>>>from civicpy import exports

Other file formats are planned for future releases. Suggestions are welcome on our GitHub issues page.

VCF

VCFs are written using the VCFWriter class, to which you add civic.Assertion, civic.Variant, civic.Gene, or civic.Evidence records using the addrecord or addrecords methods. The VCF output has one line per variant. Passing civic.Assertion, civic.Gene, or civic.Evidence objects will expand the record to all variants linked to those objects.

The addrecord function supports various variant types, depending on the curated coordinates available for a variant. All variants require the chromosome name and start position. For SNVs and complex variants the reference sequence and variant sequence information also need to be available. By contrast, insertions require only variant sequence information and deletions require only reference sequence information. Variants that do not meet these minimum requirements will not be added and a warning message is emitted instead. Fusions and other variants with a second set of coordinates are currently not supported.

In order to verify whether a variant can be added to a VCFWriter object, the convenience method is_valid_for_vcf can be called on a civic.Variant object before calling addrecord. Those variants that are unable to be exported into the VCF format are still retrievable as CIViCpy records. Once all desired variants are added to the VCFWriter object, writerecords needs to be called to write the VCF file.

The variants added to the VCFWriter object are written to the VCF file, one VCF record for each civic.Variant object. If two variants share the same chromosome, start position, and reference allele(s), they will not be combined into one VCF record but will instead be written as separate VCF records. Additional CIViC data are added to the VCF as annotations to the CSQ (consequence) INFO field. CIViC evidence items and assertions linked to the variant are added to the CSQ field with one CSQ entry for each evidence item and/or assertion. Whether a specific CSQ entry reflects an evidence item or an assertion is determined by the CIViC Entity Type CSQ field. To differentiate special characters in the field values from field delimiters, spaces are replaced with underscores and other special characters are hex-encoded. By utilizing the CSQ field for annotations, the resulting VCF is compatible for import into Google BigQuery (git.io/bigquery-variant-annotation).

VCF CSQ Field Attributes

CSQ Field

Description

Compound Field [*]

Allele

Alternate allele

No

Consequence

CIViC sequence ontology variant types for this variant

Yes

SYMBOL

HGNC gene symbol for the gene associated with this variant

No

Entrez Gene ID

Entrez gene identifier for the gene associated with this variant

No

Feature_type

“transcript”

No

Feature

The Ensembl identifier for the CIViC representative transcripts of this variant

No

HGVSc

Variant representation using HGVS notation (DNA level), corresponding to the Feature

No

HGVSp

Variant representation using HGVS notation (Protein level), corresponding to the Feature

No

CIViC Variant Name

The CIViC variant name of this variant

No

CIViC Variant ID

The CIViC internal identifier for this variant

No

CIViC Variant Aliases

CIViC aliases for this variant

Yes

CIViC HGVS

CIViC HGVS strings for this variant

Yes

Allele Registry ID

The allele registry identifier for this variant

No

ClinVar IDs

ClinVar IDs associated with this variant

Yes

CIViC Variant Evidence Score

The CIViC evidence score for this variant

No

CIViC Entity Type

The type of entity being annotated, either “evidence” or “assertion”

No

CIViC Entity ID

The CIViC internal identifier for the entity being annotated

No

CIViC Entity URL

The CIViC direct URL to the entity being annotated

No

CIViC Entity Source

For evidence entities, the identifier of the publication used to create the evidence including the source type in the format “sourceId_(sourceType)”

No

CIViC Entity Variant Origin

The variant origin of the entity being annotated, either “Somatic”, “Rare Germline”, “Common Germline”, “Unknown”, or “N/A”

No

CIViC Entity Status

The status of the CIViC entity being annotated, either “submitted”, “accepted”, or “rejected”

No

VCFWriter

class exports.VCFWriter(f, version=4.2)[source]
Parameters:

f (filehandle) – A filehandle for the VCF output file

addrecord(civic_record)[source]

Takes either a civic.Evidence, civic.Assertion, civic.Variant, or civic.Gene object and adds all civic.Variant objects associated with it to the VCFWriter object for processing and writing to the VCF.

Parameters:

civic_record (civic.CivicRecord) – Either a civic.Evidence, civic.Assertion, civic.Variant, or civic.Gene object

addrecords(civic_records)[source]

Takes multiple civic.Evidence, civic.Assertion, civic.Variant, and/or civic.Gene objects and adds all civic.Variant objects associated with them to the VCFWriter object for processing and writing to the VCF. civic_records can contain a mix of these object types.

Parameters:

civic_records (list) – A list of a civic.Evidence, civic.Assertion, civic.Variant, and/or civic.Gene objects

writeheader()[source]

Writes the header lines to the VCF file.

writerecords(with_header=True)[source]

Takes all variant objects saved to the VCFWriter object, processes them, and outputs them to the VCF file

Parameters:

with_header (bool) – Indicates weather or not the VCF header lines should be written as part of this function call.

Example

Here’s an example of how to export all variants from CIViC to VCF:

from civicpy import civic, exports

with open('civic_variants.vcf', 'w', newline='') as file:
        w = exports.VCFWriter(file)
        all_variants = civic.get_all_variants()
        w.addrecords(all_variants)
        w.writerecords()