This job has expired

Bioinformatics Data Engineer

Dana Farber Cancer Institute
Boston, MA
Closing date
Nov 27, 2021

View more

Science, Physical Sciences and Engineering, Pharmaceutical, Data Management/Statistics
Organization Type
You need to sign in or create an account to save a job.

Located in Boston and the surrounding communities, Dana-Farber Cancer Institute (DFCI) brings together world renowned clinicians, innovative researchers and dedicated professionals, allies in the common mission of conquering cancer, HIV/AIDS and related diseases. Combining extremely talented people with the best technologies in a genuinely positive environment, we provide compassionate and comprehensive care to patients of all ages; we conduct research that advances treatment; we educate tomorrow's physician/researchers; we reach out to underserved members of our community; and we work with amazing partners, including other Harvard Medical School-affiliated hospitals.

The Department of Informatics and Analytics (I&A) at Dana-Farber has multiple openings for highly motivated bioinformatics software or data engineers who are passionate about the potential for molecular data to inform clinical care of cancer patients and the groundbreaking discoveries that are a product of genomic data integration and analysis. The qualified candidates will have opportunities to collaborate with other functional teams such as Bioinformatics, Enterprise Data Warehouse (EDW), Data Engineering, and the Data Services team to build a robust and comprehensive molecular data ecosystem to leverage high-dimensional molecular data for cancer research and patient care. You will be making real impact on patients and cancer research at Dana-Farber.

Among multiple projects this role will collaborate on, one exciting initiative is to build a cloud-based bioinformatics analysis platform that will enable researchers at DFCI manage their genomic data in the cloud environment, select and kick off containerized NGS analysis workflows, monitor the execution status and eventually visualize the results. Another project is to leverage and expand our fully automated data pipelines to harmonize and integrate additional genomic profiling results from internal and external vendors.

The successful candidates will be focused on ensuring the highest quality in our bioinformatics data that includes understanding and interrogating data for themselves as well as shepherding data to other systems for research and the enablement of precision medicine through the use of automation and data pipelines.

  • Develop new genomic data pipelines for internal and external genomic data resources
  • Specify data requirement specifications and collaborate with EDW team to build enterprise data warehouse solution for molecular data
  • Create and review data QC metrics to monitor and improve the pipeline execution
  • Develop and maintain the automated test to ensure the successful pipeline execution and deployment of new features
  • Collaborate with other teams across departments and institutes to build a data ecosystem to leverage high-dimensional genomic data and reveal new scientific insights in cancer research and patient care
  • Evaluate different strategies and solutions for genomic data indexing, search, and retrieval
  • Assist product management for the documentation of the ETL processes, QC metrics, data validation rules, and unified genomic data dictionaries
  • Promote FAIR data principles, adopt the genomic data standards commonly used for research and clinical care, including but not limited to NCI Genomic Data Commons, gnomAD, ClinVar, COSMIC, VICC, and consortiums in genomic data curation and annotation
  • Manage relationships with other groups that share interdependencies on Dana-Farber molecular data
  • Mentor new team members
  • Write technical documentation
  • Bring creative and innovative thinking to your work


  • Bachelor's degree required, MS preferred in Bioinformatics, Computational Biology, Data Science, Computer Science, Software Engineering, or related discipline
  • 2 years of professional experience required; combination of applicable work experience and/or Master's degree may substitute for degree
  • Strong programming skills in at least one language, preferably in Python
  • Prior experience with genomics or Next Generation Sequencing data preferred
  • Experience in working with data transformation pipelines and understanding of nuances of such pipelines
  • Experience with common cancer genetics databases preferred (GDC, ClinVar, dbGaP, NCI GDC, COSMIC, etc.)

  • Familiar with relational databases such as MySQL, Oracle, or similar
  • Detail oriented with ability and drive to gain deep understanding of technical systems
  • Ability to prioritize and manage various tasks and projects reliably and in a timely manner
  • Requires minimal direction from leadership and possesses the ability to adapt to new challenges as they arise
  • Excellent interpersonal skills, passionate about innovative solutions

At Dana-Farber Cancer Institute, we work every day to create an innovative, caring, and inclusive environment where every patient, family, and staff member feels they belong. As relentless as we are in our mission to reduce the burden of cancer for all, we are equally committed to diversifying our faculty and staff. Cancer knows no boundaries and when it comes to hiring the most dedicated and diverse professionals, neither do we. If working in this kind of organization inspires you, we encourage you to apply.

Dana-Farber Cancer Institute is an equal opportunity employer and affirms the right of every qualified applicant to receive consideration for employment without regard to race, color, religion, sex, gender identity or expression, national origin, sexual orientation, genetic information, disability, age, ancestry, military service, protected veteran status, or other groups as protected by law.
You need to sign in or create an account to save a job.

Get job alerts

Create a job alert and receive personalised job recommendations straight to your inbox.

Create alert