From Raw Data to Predictive Biomarkers with Genedata Profiler
Dr. Sebastien Ribrioux, Data Scientist - Genedata, shares his experience streamlining biomarker discovery by centralizing experimental results, standardizing data processing and exploring a range of statistical analyses with Genedata Profiler.
Q: Can you explain how data integration through automated workflows contributes to scientific insight?
A: Genedata Profiler’s mission is to support the development of precision therapeutics by integrating and analyzing highly diverse data. In the biomarker discovery process, data collection comes from various sources, such as preclinical studies, patient samples, physiological tests, molecular assays or microscopy-based methods. There are different instruments typically in play, which tend to lead to data being stored in different locations, in different file formats or still needing processing, like raw sequencing data. These datasets vary not only in format but also in structure and complexity.
Genedata Profiler addresses this by offering automated, configurable workflows that can incorporate data from all kinds of different sources and any format. These workflows not only import data but also perform quality checks, harmonization and integration, ensuring that the resulting datasets are clean, consistent and ready for analysis. This is crucial when working with data that requires different preprocessing steps. Once datasets are integrated, Genedata Profiler provides various options for analysis. Researchers can utilize built-in visualization tools, connect to their preferred business intelligence platforms or apply statistical and machine learning methods directly within the system.
We See a Way to design smarter clinical trials to improve the success rate by 20%.
Genedata | Genedata Profiler
Q: How are data compliance and secure access managed for biomarker research?
A: From a compliance standpoint, Genedata Profiler is GxP-ready, supporting validating biomarkers and developing companion diagnostics. It maintains data lineage by tracking the source within the system, which is critical for audits and validation. When datasets grow large, the Data Portal helps you search for your relevant biomarker data, no matter how diverse they are, using intuitive tagging for filtering.
Data access is restricted based on the membership of a given study, ensuring security and regulatory compliance. Only users assigned to a study can access its data and each user has specific study roles that allow them to upload, delete, run statistical analysis or run a processing workflow for genomic data.
Q: How does Genedata Profiler software support biomarker research in cancer studies?
A: Genedata Profiler helps to manage and analyze biomarker research data by organizing it according to studies, which you can also think of as projects. This project-centric organization makes it easier to find data and trace it across its life cycle. Another aspect that makes data easier to find is that each study is tagged with attributes such as experimental modality, technology used, indication, stage of development, tissue type and treatment type. Genedata Profiler also supports the import and processing of public datasets like those from The Cancer Genome Atlas (TCGA), clinical data from patients and omics data. This includes clinical annotations, gene expression profiles and mutation data.
These datasets are processed using workflows to create “Views,” which are analysis-ready datasets. They have been quality-checked and harmonized, making them ready for scientific experts and non-expert users to dive into and analyze.
Q: Can you describe how predictive biomarkers were identified and what impact that could have on improving treatment strategies?
A: Based on a customer project, we can see how leveraging machine learning in Genedata Profiler can aid biomarker discovery for Crohn’s disease. There's a lot of variability in the response to the primary drug, an anti-TNF alpha antibody. The current approach that doctors use is really based on trial and error, and biomarkers could help guide treatment here. The Crohn’s and Colitis Foundation gathered data from around 1000 pediatric patients over more than a decade with the goal of identifying biomarkers that would help stratify patients most likely to benefit from the anti-TNF alpha antibody. This could facilitate earlier interventions, prevent further exacerbation and irreversible damage. The clinical data for this project were imported into Genedata Profiler, along with transcriptomics data from ileal biopsies collected during colonoscopies and proteomics analysis from blood samples. A type of machine learning algorithm called a Support Vector Machine was used for biomarker identification in addition to gene expression analysis. These biomarkers were cross-validated and tested against a subset of the data the algorithm hadn't yet seen. This approach was successful, and a set of 14 prognostic biomarkers was identified, which can predict future complications. A further three proteins were identified, which predict responses to the anti-TNF alpha antibody. As a next step, this biomarker panel would be developed and validated for clinical diagnosis.
Watch OnDemand
Innovative AI Software Solutions Driving Biomarker Research Efficiency