This job has expired

Bioinformatics Data Engineer

Employer: Dana Farber Cancer Institute
Location: Boston, MA
Closing date: Nov 27, 2021

Sector: Science, Physical Sciences and Engineering, Pharmaceutical, Data Management/Statistics
Organization Type: Corporate

Overview

Located in Boston and the surrounding communities, Dana-Farber Cancer Institute (DFCI) brings together world renowned clinicians, innovative researchers and dedicated professionals, allies in the common mission of conquering cancer, HIV/AIDS and related diseases. Combining extremely talented people with the best technologies in a genuinely positive environment, we provide compassionate and comprehensive care to patients of all ages; we conduct research that advances treatment; we educate tomorrow's physician/researchers; we reach out to underserved members of our community; and we work with amazing partners, including other Harvard Medical School-affiliated hospitals.

The Department of Informatics and Analytics (I&A) at Dana-Farber has multiple openings for highly motivated bioinformatics software or data engineers who are passionate about the potential for molecular data to inform clinical care of cancer patients and the groundbreaking discoveries that are a product of genomic data integration and analysis. The qualified candidates will have opportunities to collaborate with other functional teams such as Bioinformatics, Enterprise Data Warehouse (EDW), Data Engineering, and the Data Services team to build a robust and comprehensive molecular data ecosystem to leverage high-dimensional molecular data for cancer research and patient care. You will be making real impact on patients and cancer research at Dana-Farber.

Among multiple projects this role will collaborate on, one exciting initiative is to build a cloud-based bioinformatics analysis platform that will enable researchers at DFCI manage their genomic data in the cloud environment, select and kick off containerized NGS analysis workflows, monitor the execution status and eventually visualize the results. Another project is to leverage and expand our fully automated data pipelines to harmonize and integrate additional genomic profiling results from internal and external vendors.

The successful candidates will be focused on ensuring the highest quality in our bioinformatics data that includes understanding and interrogating data for themselves as well as shepherding data to other systems for research and the enablement of precision medicine through the use of automation and data pipelines.

Responsibilities

Develop new genomic data pipelines for internal and external genomic data resources
Specify data requirement specifications and collaborate with EDW team to build enterprise data warehouse solution for molecular data
Create and review data QC metrics to monitor and improve the pipeline execution
Develop and maintain the automated test to ensure the successful pipeline execution and deployment of new features
Collaborate with other teams across departments and institutes to build a data ecosystem to leverage high-dimensional genomic data and reveal new scientific insights in cancer research and patient care
Evaluate different strategies and solutions for genomic data indexing, search, and retrieval
Assist product management for the documentation of the ETL processes, QC metrics, data validation rules, and unified genomic data dictionaries
Promote FAIR data principles, adopt the genomic data standards commonly used for research and clinical care, including but not limited to NCI Genomic Data Commons, gnomAD, ClinVar, COSMIC, VICC, and consortiums in genomic data curation and annotation
Manage relationships with other groups that share interdependencies on Dana-Farber molecular data
Mentor new team members
Write technical documentation
Bring creative and innovative thinking to your work

Qualifications

Qualifications:

Bachelor's degree required, MS preferred in Bioinformatics, Computational Biology, Data Science, Computer Science, Software Engineering, or related discipline
2 years of professional experience required; combination of applicable work experience and/or Master's degree may substitute for degree
Strong programming skills in at least one language, preferably in Python
Prior experience with genomics or Next Generation Sequencing data preferred
Experience in working with data transformation pipelines and understanding of nuances of such pipelines
Experience with common cancer genetics databases preferred (GDC, ClinVar, dbGaP, NCI GDC, COSMIC, etc.)

SKA's:

Familiar with relational databases such as MySQL, Oracle, or similar
Detail oriented with ability and drive to gain deep understanding of technical systems
Ability to prioritize and manage various tasks and projects reliably and in a timely manner
Requires minimal direction from leadership and possesses the ability to adapt to new challenges as they arise
Excellent interpersonal skills, passionate about innovative solutions

At Dana-Farber Cancer Institute, we work every day to create an innovative, caring, and inclusive environment where every patient, family, and staff member feels they belong. As relentless as we are in our mission to reduce the burden of cancer for all, we are equally committed to diversifying our faculty and staff. Cancer knows no boundaries and when it comes to hiring the most dedicated and diverse professionals, neither do we. If working in this kind of organization inspires you, we encourage you to apply.

Dana-Farber Cancer Institute is an equal opportunity employer and affirms the right of every qualified applicant to receive consideration for employment without regard to race, color, religion, sex, gender identity or expression, national origin, sexual orientation, genetic information, disability, age, ancestry, military service, protected veteran status, or other groups as protected by law.

Send job

Get job alerts

Create a job alert and receive personalized job recommendations straight to your inbox.

Create alert