Python Client Library¶
This is the GA4GH client API library. This is a convenient wrapper for the low-level HTTP GA4GH API, and abstracts away network centric details such as paging. The methods and types used by the client library are defined by the GA4GH schema.
Warning
This client API should be considered early alpha quality, and may change in arbitrary ways.
Todo
A full description of this API and links to a tutorial on how to use it, as well as a quickstart showing the basic usage.
Types¶
Todo
Add links to the upstream documentation for the GA4GH types.
Client API¶
Todo
Add overview documentation for the client API.
Client classes for the GA4GH reference implementation.
-
class
ga4gh.client.client.
AbstractClient
(log_level=0)¶ The abstract superclass of GA4GH Client objects.
-
get_biosample
(biosample_id)¶ Perform a get request for the given Biosample.
Parameters: biosample_id (str) – The ID of the Biosample Returns: The requested Biosample. Return type: ga4gh.protocol.Biosample
-
get_call_set
(call_set_id)¶ Returns the CallSet with the specified ID from the server.
Parameters: call_set_id (str) – The ID of the CallSet of interest. Returns: The CallSet of interest. Return type: ga4gh.protocol.CallSet
-
get_dataset
(dataset_id)¶ Returns the Dataset with the specified ID from the server.
Parameters: dataset_id (str) – The ID of the Dataset of interest. Returns: The Dataset of interest. Return type: ga4gh.protocol.Dataset
-
get_expression_level
(expression_level_id)¶ Returns the ExpressionLevel with the specified ID from the server.
Parameters: expression_level_id (str) – The ID of the ExpressionLevel of interest. Returns: The ExpressionLevel of interest. Return type: ga4gh.protocol.ExpressionLevel
-
get_feature
(feature_id)¶ Returns the feature with the specified ID from the server.
Parameters: feature_id (str) – The ID of the requested feature Returns: The requested ga4gh.protocol.Feature object.
-
get_feature_set
(feature_set_id)¶ Returns the FeatureSet with the specified ID from the server.
Parameters: feature_set_id (str) – The ID of the FeatureSet of interest. Returns: The FeatureSet of interest. Return type: ga4gh.protocol.FeatureSet
-
get_individual
(individual_id)¶ Perform a get request for the given Individual.
Parameters: individual_id (str) – The ID of the Individual Returns: The requested Individual. Return type: ga4gh.protocol.Individual
-
get_page_size
()¶ Returns the suggested maximum size of pages of results returned by the server.
-
get_protocol_bytes_received
()¶ Returns the total number of protocol bytes received from the server by this client.
Returns: The number of bytes consumed by protocol traffic read from the server during the lifetime of this client. Return type: int
-
get_read_group
(read_group_id)¶ Returns the ReadGroup with the specified ID from the server.
Parameters: read_group_id (str) – The ID of the ReadGroup of interest. Returns: The ReadGroup of interest. Return type: ga4gh.protocol.ReadGroup
-
get_read_group_set
(read_group_set_id)¶ Returns the ReadGroupSet with the specified ID from the server.
Parameters: read_group_set_id (str) – The ID of the ReadGroupSet of interest. Returns: The ReadGroupSet of interest. Return type: ga4gh.protocol.ReadGroupSet
-
get_reference
(reference_id)¶ Returns the Reference with the specified ID from the server.
Parameters: reference_id (str) – The ID of the Reference of interest. Returns: The Reference of interest. Return type: ga4gh.protocol.Reference
-
get_reference_set
(reference_set_id)¶ Returns the ReferenceSet with the specified ID from the server.
Parameters: reference_set_id (str) – The ID of the ReferenceSet of interest. Returns: The ReferenceSet of interest. Return type: ga4gh.protocol.ReferenceSet
-
get_rna_quantification
(rna_quantification_id)¶ Returns the RnaQuantification with the specified ID from the server.
Parameters: rna_quantification_id (str) – The ID of the RnaQuantification of interest. Returns: The RnaQuantification of interest. Return type: ga4gh.protocol.RnaQuantification
-
get_rna_quantification_set
(rna_quantification_set_id)¶ Returns the RnaQuantificationSet with the specified ID from the server.
Parameters: rna_quantification_set_id (str) – The ID of the RnaQuantificationSet of interest. Returns: The RnaQuantificationSet of interest. Return type: ga4gh.protocol.RnaQuantificationSet
-
get_variant
(variant_id)¶ Returns the Variant with the specified ID from the server.
Parameters: variant_id (str) – The ID of the Variant of interest. Returns: The Variant of interest. Return type: ga4gh.protocol.Variant
-
get_variant_annotation_set
(variant_annotation_set_id)¶ Returns the VariantAnnotationSet with the specified ID from the server.
Parameters: variant_annotation_set_id (str) – The ID of the VariantAnnotationSet of interest. Returns: The VariantAnnotationSet of interest. Return type: ga4gh.protocol.VariantAnnotationSet
-
get_variant_set
(variant_set_id)¶ Returns the VariantSet with the specified ID from the server.
Parameters: variant_set_id (str) – The ID of the VariantSet of interest. Returns: The VariantSet of interest. Return type: ga4gh.protocol.VariantSet
-
list_reference_bases
(id_, start=0, end=None)¶ Returns an iterator over the bases from the server in the form of consecutive strings. This command does not conform to the patterns of the other search and get requests, and is implemented differently.
-
search_biosamples
(dataset_id, name=None, individual_id=None)¶ Returns an iterator over the Biosamples fulfilling the specified conditions.
Parameters: Returns: An iterator over the
ga4gh.protocol.Biosample
objects defined by the query parameters.
-
search_call_sets
(variant_set_id, name=None, biosample_id=None)¶ Returns an iterator over the CallSets fulfilling the specified conditions from the specified VariantSet.
Parameters: Returns: An iterator over the
ga4gh.protocol.CallSet
objects defined by the query parameters.
-
search_datasets
()¶ Returns an iterator over the Datasets on the server.
Returns: An iterator over the ga4gh.protocol.Dataset
objects on the server.
-
search_expression_levels
(rna_quantification_id=u'', feature_ids=[], threshold=0.0)¶ Returns an iterator over the ExpressionLevel objects from the server
Parameters:
-
search_feature_sets
(dataset_id)¶ Returns an iterator over the FeatureSets fulfilling the specified conditions from the specified Dataset.
Parameters: dataset_id (str) – The ID of the ga4gh.protocol.Dataset
of interest.Returns: An iterator over the ga4gh.protocol.FeatureSet
objects defined by the query parameters.
-
search_features
(feature_set_id=None, parent_id=u'', reference_name=u'', start=0, end=0, feature_types=[], name=u'', gene_symbol=u'')¶ Returns the result of running a search_features method on a request with the passed-in parameters.
Parameters: - feature_set_id (str) – ID of the feature Set being searched
- parent_id (str) – ID (optional) of the parent feature
- reference_name (str) – name of the reference to search (ex: “chr1”)
- start (int) – search start position on reference
- end (int) – end position on reference
- feature_types – array of terms to limit search by (ex: “gene”)
- name (str) – only return features with this name
- gene_symbol (str) – only return features on this gene
Returns: an iterator over Features as returned in the SearchFeaturesResponse object.
-
search_genotype_phenotype
(phenotype_association_set_id=None, feature_ids=None, phenotype_ids=None, evidence=None)¶ Returns an iterator over the GeneotypePhenotype associations from the server
-
search_individuals
(dataset_id, name=None)¶ Returns an iterator over the Individuals fulfilling the specified conditions.
Parameters: Returns: An iterator over the
ga4gh.protocol.Biosample
objects defined by the query parameters.
-
search_phenotype
(phenotype_association_set_id=None, phenotype_id=None, description=None, type_=None, age_of_onset=None)¶ Returns an iterator over the Phenotypes from the server
-
search_phenotype_association_sets
(dataset_id)¶ Returns an iterator over the PhenotypeAssociationSets on the server.
-
search_read_group_sets
(dataset_id, name=None, biosample_id=None)¶ Returns an iterator over the ReadGroupSets fulfilling the specified conditions from the specified Dataset.
Parameters: Returns: An iterator over the
ga4gh.protocol.ReadGroupSet
objects defined by the query parameters.Return type:
-
search_reads
(read_group_ids, reference_id=None, start=None, end=None)¶ Returns an iterator over the Reads fulfilling the specified conditions from the specified read_group_ids.
Parameters: - read_group_ids (str) – The IDs of the
ga4gh.protocol.ReadGroup
of interest. - reference_id (str) – The name of the
ga4gh.protocol.Reference
we wish to return reads mapped to. - start (int) – The start position (0-based) of this query. If a reference is specified, this defaults to 0. Genomic positions are non-negative integers less than reference length. Requests spanning the join of circular genomes are represented as two requests one on each side of the join (position 0).
- end (int) – The end position (0-based, exclusive) of this query. If a reference is specified, this defaults to the reference’s length.
Returns: An iterator over the
ga4gh.protocol.ReadAlignment
objects defined by the query parameters.Return type: - read_group_ids (str) – The IDs of the
-
search_reference_sets
(accession=None, md5checksum=None, assembly_id=None)¶ Returns an iterator over the ReferenceSets fulfilling the specified conditions.
Parameters: - accession (str) – If not null, return the reference sets for which the accession matches this string (case-sensitive, exact match).
- md5checksum (str) – If not null, return the reference sets for
which the md5checksum matches this string (case-sensitive, exact
match). See
ga4gh.protocol.ReferenceSet::md5checksum
for details. - assembly_id (str) – If not null, return the reference sets for which the assembly_id matches this string (case-sensitive, exact match).
Returns: An iterator over the
ga4gh.protocol.ReferenceSet
objects defined by the query parameters.
-
search_references
(reference_set_id, accession=None, md5checksum=None)¶ Returns an iterator over the References fulfilling the specified conditions from the specified Dataset.
Parameters: - reference_set_id (str) – The ReferenceSet to search.
- accession (str) – If not None, return the references for which the accession matches this string (case-sensitive, exact match).
- md5checksum (str) – If not None, return the references for which the md5checksum matches this string (case-sensitive, exact match).
Returns: An iterator over the
ga4gh.protocol.Reference
objects defined by the query parameters.
-
search_rna_quantification_sets
(dataset_id)¶ Returns an iterator over the RnaQuantificationSet objects from the server
-
search_rna_quantifications
(rna_quantification_set_id=u'')¶ Returns an iterator over the RnaQuantification objects from the server
Parameters: rna_quantification_set_id (str) – The ID of the ga4gh.protocol.RnaQuantificationSet
of interest.
-
search_variant_annotation_sets
(variant_set_id)¶ Returns an iterator over the Annotation Sets fulfilling the specified conditions from the specified variant set.
Parameters: variant_set_id (str) – The ID of the ga4gh.protocol.VariantSet
of interest.Returns: An iterator over the ga4gh.protocol.AnnotationSet
objects defined by the query parameters.
-
search_variant_annotations
(variant_annotation_set_id, reference_name=u'', reference_id=u'', start=0, end=0, effects=[])¶ Returns an iterator over the Variant Annotations fulfilling the specified conditions from the specified VariantSet.
Parameters: - variant_annotation_set_id (str) – The ID of the
ga4gh.protocol.VariantAnnotationSet
of interest. - start (int) – Required. The beginning of the window (0-based, inclusive) for which overlapping variants should be returned. Genomic positions are non-negative integers less than reference length. Requests spanning the join of circular genomes are represented as two requests one on each side of the join (position 0).
- end (int) – Required. The end of the window (0-based, exclusive) for which overlapping variants should be returned.
- reference_name (str) – The name of the
ga4gh.protocol.Reference
we wish to return variants from.
Returns: An iterator over the
ga4gh.protocol.VariantAnnotation
objects defined by the query parameters.Return type: - variant_annotation_set_id (str) – The ID of the
-
search_variant_sets
(dataset_id)¶ Returns an iterator over the VariantSets fulfilling the specified conditions from the specified Dataset.
Parameters: dataset_id (str) – The ID of the ga4gh.protocol.Dataset
of interest.Returns: An iterator over the ga4gh.protocol.VariantSet
objects defined by the query parameters.
-
search_variants
(variant_set_id, start=None, end=None, reference_name=None, call_set_ids=None)¶ Returns an iterator over the Variants fulfilling the specified conditions from the specified VariantSet.
Parameters: - variant_set_id (str) – The ID of the
ga4gh.protocol.VariantSet
of interest. - start (int) – Required. The beginning of the window (0-based, inclusive) for which overlapping variants should be returned. Genomic positions are non-negative integers less than reference length. Requests spanning the join of circular genomes are represented as two requests one on each side of the join (position 0).
- end (int) – Required. The end of the window (0-based, exclusive) for which overlapping variants should be returned.
- reference_name (str) – The name of the
ga4gh.protocol.Reference
we wish to return variants from. - call_set_ids (list) – Only return variant calls which belong to call sets with these IDs. If an empty array, returns variants without any call objects. If null, returns all variant calls.
Returns: An iterator over the
ga4gh.protocol.Variant
objects defined by the query parameters.Return type: - variant_set_id (str) – The ID of the
-
set_page_size
(page_size)¶ Sets the requested maximum size of pages of results returned by the server to the specified value.
-
-
class
ga4gh.client.client.
HttpClient
(url_prefix, logLevel=30, authentication_key=None, id_token=None)¶ The GA4GH HTTP client. This class provides methods corresponding to the GA4GH search and object GET methods.
Todo
Add a better description of the role of this class and include links to the high-level API documention.
Parameters: - urlPrefix (str) – The base URL of the GA4GH server we wish to communicate with. This should include the ‘http’ or ‘https’ prefix.
- logLevel (int) – The amount of debugging information to log using
the
logging
module. This islogging.WARNING
by default. - authentication_key (str) – The authentication key provided by the server after logging in.
- id_token (str) – The Auth0 id_token key provided by the server after logging in.