Python Client Library

This is the GA4GH client API library. This is a convenient wrapper for the low-level HTTP GA4GH API, and abstracts away network centric details such as paging. The methods and types used by the client library are defined by the GA4GH schema.

Warning

This client API should be considered early alpha quality, and may change in arbitrary ways.

Todo

A full description of this API and links to a tutorial on how to use it, as well as a quickstart showing the basic usage.

Types

Todo

Add links to the upstream documentation for the GA4GH types.

Client API

Todo

Add overview documentation for the client API.

Client classes for the GA4GH reference implementation.

class ga4gh.client.client.AbstractClient(log_level=0)

The abstract superclass of GA4GH Client objects.

get_biosample(biosample_id)

Perform a get request for the given Biosample.

Parameters:biosample_id (str) – The ID of the Biosample
Returns:The requested Biosample.
Return type:ga4gh.protocol.Biosample
get_call_set(call_set_id)

Returns the CallSet with the specified ID from the server.

Parameters:call_set_id (str) – The ID of the CallSet of interest.
Returns:The CallSet of interest.
Return type:ga4gh.protocol.CallSet
get_dataset(dataset_id)

Returns the Dataset with the specified ID from the server.

Parameters:dataset_id (str) – The ID of the Dataset of interest.
Returns:The Dataset of interest.
Return type:ga4gh.protocol.Dataset
get_expression_level(expression_level_id)

Returns the ExpressionLevel with the specified ID from the server.

Parameters:expression_level_id (str) – The ID of the ExpressionLevel of interest.
Returns:The ExpressionLevel of interest.
Return type:ga4gh.protocol.ExpressionLevel
get_feature(feature_id)

Returns the feature with the specified ID from the server.

Parameters:feature_id (str) – The ID of the requested feature
Returns:The requested ga4gh.protocol.Feature object.
get_feature_set(feature_set_id)

Returns the FeatureSet with the specified ID from the server.

Parameters:feature_set_id (str) – The ID of the FeatureSet of interest.
Returns:The FeatureSet of interest.
Return type:ga4gh.protocol.FeatureSet
get_individual(individual_id)

Perform a get request for the given Individual.

Parameters:individual_id (str) – The ID of the Individual
Returns:The requested Individual.
Return type:ga4gh.protocol.Individual
get_page_size()

Returns the suggested maximum size of pages of results returned by the server.

get_protocol_bytes_received()

Returns the total number of protocol bytes received from the server by this client.

Returns:The number of bytes consumed by protocol traffic read from the server during the lifetime of this client.
Return type:int
get_read_group(read_group_id)

Returns the ReadGroup with the specified ID from the server.

Parameters:read_group_id (str) – The ID of the ReadGroup of interest.
Returns:The ReadGroup of interest.
Return type:ga4gh.protocol.ReadGroup
get_read_group_set(read_group_set_id)

Returns the ReadGroupSet with the specified ID from the server.

Parameters:read_group_set_id (str) – The ID of the ReadGroupSet of interest.
Returns:The ReadGroupSet of interest.
Return type:ga4gh.protocol.ReadGroupSet
get_reference(reference_id)

Returns the Reference with the specified ID from the server.

Parameters:reference_id (str) – The ID of the Reference of interest.
Returns:The Reference of interest.
Return type:ga4gh.protocol.Reference
get_reference_set(reference_set_id)

Returns the ReferenceSet with the specified ID from the server.

Parameters:reference_set_id (str) – The ID of the ReferenceSet of interest.
Returns:The ReferenceSet of interest.
Return type:ga4gh.protocol.ReferenceSet
get_rna_quantification(rna_quantification_id)

Returns the RnaQuantification with the specified ID from the server.

Parameters:rna_quantification_id (str) – The ID of the RnaQuantification of interest.
Returns:The RnaQuantification of interest.
Return type:ga4gh.protocol.RnaQuantification
get_rna_quantification_set(rna_quantification_set_id)

Returns the RnaQuantificationSet with the specified ID from the server.

Parameters:rna_quantification_set_id (str) – The ID of the RnaQuantificationSet of interest.
Returns:The RnaQuantificationSet of interest.
Return type:ga4gh.protocol.RnaQuantificationSet
get_variant(variant_id)

Returns the Variant with the specified ID from the server.

Parameters:variant_id (str) – The ID of the Variant of interest.
Returns:The Variant of interest.
Return type:ga4gh.protocol.Variant
get_variant_annotation_set(variant_annotation_set_id)

Returns the VariantAnnotationSet with the specified ID from the server.

Parameters:variant_annotation_set_id (str) – The ID of the VariantAnnotationSet of interest.
Returns:The VariantAnnotationSet of interest.
Return type:ga4gh.protocol.VariantAnnotationSet
get_variant_set(variant_set_id)

Returns the VariantSet with the specified ID from the server.

Parameters:variant_set_id (str) – The ID of the VariantSet of interest.
Returns:The VariantSet of interest.
Return type:ga4gh.protocol.VariantSet
list_reference_bases(id_, start=0, end=None)

Returns an iterator over the bases from the server in the form of consecutive strings. This command does not conform to the patterns of the other search and get requests, and is implemented differently.

search_biosamples(dataset_id, name=None, individual_id=None)

Returns an iterator over the Biosamples fulfilling the specified conditions.

Parameters:
  • dataset_id (str) – The dataset to search within.
  • name (str) – Only Biosamples matching the specified name will be returned.
  • individual_id (str) – Only Biosamples matching matching this id will be returned.
Returns:

An iterator over the ga4gh.protocol.Biosample objects defined by the query parameters.

search_call_sets(variant_set_id, name=None, biosample_id=None)

Returns an iterator over the CallSets fulfilling the specified conditions from the specified VariantSet.

Parameters:
  • variant_set_id (str) – Find callsets belonging to the provided variant set.
  • name (str) – Only CallSets matching the specified name will be returned.
  • biosample_id (str) – Only CallSets matching this id will be returned.
Returns:

An iterator over the ga4gh.protocol.CallSet objects defined by the query parameters.

search_datasets()

Returns an iterator over the Datasets on the server.

Returns:An iterator over the ga4gh.protocol.Dataset objects on the server.
search_expression_levels(rna_quantification_id=u'', feature_ids=[], threshold=0.0)

Returns an iterator over the ExpressionLevel objects from the server

Parameters:
  • feature_ids (str) – The IDs of the ga4gh.protocol.Feature of interest.
  • rna_quantification_id (str) – The ID of the ga4gh.protocol.RnaQuantification of interest.
  • threshold (float) – Minimum expression of responses to return.
search_feature_sets(dataset_id)

Returns an iterator over the FeatureSets fulfilling the specified conditions from the specified Dataset.

Parameters:dataset_id (str) – The ID of the ga4gh.protocol.Dataset of interest.
Returns:An iterator over the ga4gh.protocol.FeatureSet objects defined by the query parameters.
search_features(feature_set_id=None, parent_id=u'', reference_name=u'', start=0, end=0, feature_types=[], name=u'', gene_symbol=u'')

Returns the result of running a search_features method on a request with the passed-in parameters.

Parameters:
  • feature_set_id (str) – ID of the feature Set being searched
  • parent_id (str) – ID (optional) of the parent feature
  • reference_name (str) – name of the reference to search (ex: “chr1”)
  • start (int) – search start position on reference
  • end (int) – end position on reference
  • feature_types – array of terms to limit search by (ex: “gene”)
  • name (str) – only return features with this name
  • gene_symbol (str) – only return features on this gene
Returns:

an iterator over Features as returned in the SearchFeaturesResponse object.

search_genotype_phenotype(phenotype_association_set_id=None, feature_ids=None, phenotype_ids=None, evidence=None)

Returns an iterator over the GeneotypePhenotype associations from the server

search_individuals(dataset_id, name=None)

Returns an iterator over the Individuals fulfilling the specified conditions.

Parameters:
  • dataset_id (str) – The dataset to search within.
  • name (str) – Only Individuals matching the specified name will be returned.
Returns:

An iterator over the ga4gh.protocol.Biosample objects defined by the query parameters.

search_phenotype(phenotype_association_set_id=None, phenotype_id=None, description=None, type_=None, age_of_onset=None)

Returns an iterator over the Phenotypes from the server

search_phenotype_association_sets(dataset_id)

Returns an iterator over the PhenotypeAssociationSets on the server.

search_read_group_sets(dataset_id, name=None, biosample_id=None)

Returns an iterator over the ReadGroupSets fulfilling the specified conditions from the specified Dataset.

Parameters:
  • name (str) – Only ReadGroupSets matching the specified name will be returned.
  • biosample_id (str) – Only ReadGroups matching the specified biosample will be included in the response.
Returns:

An iterator over the ga4gh.protocol.ReadGroupSet objects defined by the query parameters.

Return type:

iter

search_reads(read_group_ids, reference_id=None, start=None, end=None)

Returns an iterator over the Reads fulfilling the specified conditions from the specified read_group_ids.

Parameters:
  • read_group_ids (str) – The IDs of the ga4gh.protocol.ReadGroup of interest.
  • reference_id (str) – The name of the ga4gh.protocol.Reference we wish to return reads mapped to.
  • start (int) – The start position (0-based) of this query. If a reference is specified, this defaults to 0. Genomic positions are non-negative integers less than reference length. Requests spanning the join of circular genomes are represented as two requests one on each side of the join (position 0).
  • end (int) – The end position (0-based, exclusive) of this query. If a reference is specified, this defaults to the reference’s length.
Returns:

An iterator over the ga4gh.protocol.ReadAlignment objects defined by the query parameters.

Return type:

iter

search_reference_sets(accession=None, md5checksum=None, assembly_id=None)

Returns an iterator over the ReferenceSets fulfilling the specified conditions.

Parameters:
  • accession (str) – If not null, return the reference sets for which the accession matches this string (case-sensitive, exact match).
  • md5checksum (str) – If not null, return the reference sets for which the md5checksum matches this string (case-sensitive, exact match). See ga4gh.protocol.ReferenceSet::md5checksum for details.
  • assembly_id (str) – If not null, return the reference sets for which the assembly_id matches this string (case-sensitive, exact match).
Returns:

An iterator over the ga4gh.protocol.ReferenceSet objects defined by the query parameters.

search_references(reference_set_id, accession=None, md5checksum=None)

Returns an iterator over the References fulfilling the specified conditions from the specified Dataset.

Parameters:
  • reference_set_id (str) – The ReferenceSet to search.
  • accession (str) – If not None, return the references for which the accession matches this string (case-sensitive, exact match).
  • md5checksum (str) – If not None, return the references for which the md5checksum matches this string (case-sensitive, exact match).
Returns:

An iterator over the ga4gh.protocol.Reference objects defined by the query parameters.

search_rna_quantification_sets(dataset_id)

Returns an iterator over the RnaQuantificationSet objects from the server

search_rna_quantifications(rna_quantification_set_id=u'')

Returns an iterator over the RnaQuantification objects from the server

Parameters:rna_quantification_set_id (str) – The ID of the ga4gh.protocol.RnaQuantificationSet of interest.
search_variant_annotation_sets(variant_set_id)

Returns an iterator over the Annotation Sets fulfilling the specified conditions from the specified variant set.

Parameters:variant_set_id (str) – The ID of the ga4gh.protocol.VariantSet of interest.
Returns:An iterator over the ga4gh.protocol.AnnotationSet objects defined by the query parameters.
search_variant_annotations(variant_annotation_set_id, reference_name=u'', reference_id=u'', start=0, end=0, effects=[])

Returns an iterator over the Variant Annotations fulfilling the specified conditions from the specified VariantSet.

Parameters:
  • variant_annotation_set_id (str) – The ID of the ga4gh.protocol.VariantAnnotationSet of interest.
  • start (int) – Required. The beginning of the window (0-based, inclusive) for which overlapping variants should be returned. Genomic positions are non-negative integers less than reference length. Requests spanning the join of circular genomes are represented as two requests one on each side of the join (position 0).
  • end (int) – Required. The end of the window (0-based, exclusive) for which overlapping variants should be returned.
  • reference_name (str) – The name of the ga4gh.protocol.Reference we wish to return variants from.
Returns:

An iterator over the ga4gh.protocol.VariantAnnotation objects defined by the query parameters.

Return type:

iter

search_variant_sets(dataset_id)

Returns an iterator over the VariantSets fulfilling the specified conditions from the specified Dataset.

Parameters:dataset_id (str) – The ID of the ga4gh.protocol.Dataset of interest.
Returns:An iterator over the ga4gh.protocol.VariantSet objects defined by the query parameters.
search_variants(variant_set_id, start=None, end=None, reference_name=None, call_set_ids=None)

Returns an iterator over the Variants fulfilling the specified conditions from the specified VariantSet.

Parameters:
  • variant_set_id (str) – The ID of the ga4gh.protocol.VariantSet of interest.
  • start (int) – Required. The beginning of the window (0-based, inclusive) for which overlapping variants should be returned. Genomic positions are non-negative integers less than reference length. Requests spanning the join of circular genomes are represented as two requests one on each side of the join (position 0).
  • end (int) – Required. The end of the window (0-based, exclusive) for which overlapping variants should be returned.
  • reference_name (str) – The name of the ga4gh.protocol.Reference we wish to return variants from.
  • call_set_ids (list) – Only return variant calls which belong to call sets with these IDs. If an empty array, returns variants without any call objects. If null, returns all variant calls.
Returns:

An iterator over the ga4gh.protocol.Variant objects defined by the query parameters.

Return type:

iter

set_page_size(page_size)

Sets the requested maximum size of pages of results returned by the server to the specified value.

class ga4gh.client.client.HttpClient(url_prefix, logLevel=30, authentication_key=None, id_token=None)

The GA4GH HTTP client. This class provides methods corresponding to the GA4GH search and object GET methods.

Todo

Add a better description of the role of this class and include links to the high-level API documention.

Parameters:
  • urlPrefix (str) – The base URL of the GA4GH server we wish to communicate with. This should include the ‘http’ or ‘https’ prefix.
  • logLevel (int) – The amount of debugging information to log using the logging module. This is logging.WARNING by default.
  • authentication_key (str) – The authentication key provided by the server after logging in.
  • id_token (str) – The Auth0 id_token key provided by the server after logging in.