ΔTissue Demonstrator (v0.4)

⬅️  Back Next ➡️

© 2022 Andra Waagmeester and Josh Moore

This is the first release of the ΔTissue Demonstrator which provides starting points for the exploration of public, linked data. Linked data is core to FAIR data sharing and is a key component of the FAIR data model. In this release we use Wikidata as a central hub for linked data related to the ΔTissue disease areas (TB, TBNC, GBM).

Resources that are reachable via data links include: Wikipathways, Reactome, Uniprot, OpenCitations, Cellosaurus, NCBI gene, Ensembl, Pubchem, cBioPortal. Other resources, like Genomic Data Commons Data Portal or sfaira portal provide access to large quantities of data using the same (or compatible) identifiers but which are not directly linkable. Finally, other sites like Scholia provide enhanced visualization of the existing data links, like the image below showing the topics of all ΔTissue authors who could be found in Wikidata.

For the purpose of this demonstrator workflows have been developed:

  1. to complete publication records for a list of authors and publications
  2. to make biological pathways published in the scientific literature machine readable
  3. to query GDC enties via GraphQL and download related tabular data
  4. to download datasets listed on the sfaira portal and load them with the sfaira Python library

In some cases, entries have been created in Wikidata and elsewhere in order to establish initial links. This, however, has not been done systematically without the input of the domain experts.

The demonstrator is a POC to explore existing linked data on the ΔTissue disease areas in Wikidata. Currently, the coverage of Wikidata on the ΔTissue disease areas appears to be incomplete. Relevant data either needs to be added systematically or the existing data needs to be updated, keeping in mind that Wikidata follows applies a CC0 license. Many resources do not. To be able to render a full picture of the linked data cloud related to the disease areas, either more public data must be added or a linked-data resource that hosts non-CC0 data will be needed.

Please note that the queries in this repository were written by the authors who are not domain experts in the disease areas. Suggestions for different queries on the disease areas are welcome using this form

Contents

  1. An Introduction (NEW)
    1. What is Linked Data?
    2. Toy example
    3. Initial project
    4. Outlook
    5. Terminology
  2. Publication Record
    1. Leap group leaders
    2. Group leaders and their past and present affiliations
    3. Group leaders and their publication records
    4. Publications and their topics
    5. Author name strings and their publications
  3. Triple-negative breast record
    1. Genes associated with TNBC through gene variant annotations (negative result)
    2. Therapies associate with positive and negative predicators (partial result)
    3. Genes list linked through pathways on triple-negative breast cancer
    4. Concatenated list of TNBC genes
    5. Chemical compounds part of a pathway on triple-negative breast cancer
    6. Genes and the cellular components where encoded gene products are found in Pathways on TNBC
    7. Cell types related to breast cancer via markers
  4. Glioblastoma
    1. Genes associated with Glioblastoma through gene variant annotations
    2. Gene list linked through pathways on glioblastoma
    3. Concatenated list of GBM genes
    4. Chemical compounds part of a pathway on glioblastoma
    5. All cell lines associated with glioblastoma
  5. Tuberculosis
    1. Translations of the Disease Ontology term DOID:399 (Tuberculosis)
    2. Top 100 authors of publications covering tuberculosis (according to Wikidata)
    3. Genes involved in the immune response to tuberculosis
    4. Concatenated list of tuberculosis genes
  6. Data Resources
    1. sfaira
    2. TCGA

Future work

Future editions of this demonstrator will include:

  • Direct linked-data validation and enrichment using Shape Expressions
  • Federated querying of linked data where the data is hosted on multiple sources.
  • Direct download of linked data from the sources.
  • Example queries on a revived linked-tcga SPARQL endpoints.

Impressum

This demonstration is written in Markdown with additional instructions consisting of SPARQL queries that are dynamically loaded from https://www.wikidata.org/. While the website itself is licensed under CC-BY-SA, all SPARQL queries in this resource can be used under the CCZero license/waiver. Feedback can be sent via this GitHub repository.

⬅️  Back Next ➡️