Curriculum Vitae


Contents


Personal Profile

Research Software Engineer and Data Infrastructure Specialist with expertise in scientific computing, high-dimensional biological data, and standards-based data integration.

As Lead Developer on the AGENT Project at the Royal Botanic Gardens, Kew, I designed and delivered FAIR-compliant, analytics-ready digital infrastructure supporting the integration and interoperability of global plant genetic resource data.

My work focuses on relational data modelling, ETL pipeline architecture, RESTful API development (BrAPI), and quantitative software systems that prioritise data quality, reproducibility, and cross-institutional interoperability.


Technical Skills

  • Core Languages:
  • Python
  • SQL
  • R
  • C++
  • Java
  • Data Infrastructure & Engineering:
  • Relational data modelling
  • ETL pipeline architecture
  • Data cleaning
  • Deduplication
  • Data validation
  • RESTful API design (BrAPI)
  • FAIR data standards
  • Reproducible workflows
  • Docker-based deployment
  • Scientific & Quantitative Computing:
  • High-dimensional data analysis
  • Statistical modelling
  • NumPy
  • SciPy
  • MATLAB
  • VTK-based 3D computational visualisation
  • Databases:
  • PostgreSQL
  • MariaDB/MySQL
  • Oracle
  • SQLite
  • Data Visualisation:
  • Matplotlib
  • Plotly
  • ggplot
  • Web & Application Frameworks:
  • Django
  • Flask
  • Spring
  • JavaScript (jQuery)

Employment History

Lead Developer - AGENT Project

Royal Botanical Gardens, Kew Mar 2022 - Oct 2025

  • Engineered FAIR-compliant, analytics-ready digital infrastructure to integrate and harmonize global plant genetic resource datasets across multiple genebanks, ensuring interoperability, semantic consistency, and reproducibility.
  • Designed and implemented relational data models, ETL pipelines, and BrAPI-based RESTful APIs, enabling programmatic access to harmonized datasets for advanced analysis, modelling, and cross-institutional research.
  • Orchestrated automated and manual data validation workflows, improving data quality and enabling reliable quantitative analysis of high-dimensional biological datasets.
  • Collaborated with multidisciplinary teams spanning biology, bioinformatics, and data architecture to translate complex scientific requirements into robust, production-grade software systems.

Developer – Useful Plants and Fungi of Columbia

Royal Botanical Gardens, Kew Dec 2020 - Feb 2022

  • Engineered a robust data management platform for Colombian plant and fungal datasets, enabling standardized recording, integration, and analysis across multiple scientific domains.
  • Collaborated with experts in taxonomy, genomics, bioinformatics, spatial analysis, and data architecture to structure and harmonize complex, high-dimensional datasets for downstream analytics.
  • Designed and implemented relational data models and data ingestion pipelines, incorporating cleaning, deduplication, and validation workflows to ensure data quality, integrity, and reproducibility.
  • Iteratively refined platform functionality through close collaboration with users, supporting diverse research and analytical workflows.

Volunteer Software Developer

Medical Physics and Bio-engineering, UCLH Nov 2020

  • Translated medical imaging algorithms from C++ to Python, improving accessibility and enabling further quantitative analysis.
  • Developed 3D imaging visualisations using VTK, including vertex-to-vertex scan comparisons to support quantitative assessment and visual interpretation of anatomical differences.

Website Designer / Developer

3DAM LTD Aug 2020 - Oct 2020

  • Designed and delivered a client-facing business website, translating requirements into a functional, user-friendly solution.

Volunteer Back-end / Server Engineer

Physics, Royal Marsden Hospital Jun 2020 - Aug 2020

  • Configured servers, monitoring systems, and backup pipelines using Docker and ZFS/RAIDZ2, ensuring reliable infrastructure under COVID-19 constraints.

Junior Clinical Scientist: Bioinformatics (Physical Sciences)

Medical Physics and Bio-engineering, UCLH Sep 2018 - Oct 2019

  • Delivered computational and quantitative solutions across multiple clinical and scientific teams, supporting medical physics and healthcare informatics projects.
  • Designed and implemented statistical and mathematical software models to assist radiation protection planning and clinical decision-making.
  • Built, maintained, and enhanced ETL data pipelines for clinical datasets, including cleaning, deduplication, validation, and reporting workflows.
  • Conducted exploratory data analysis and produced actionable reports for clinical and technical stakeholders, adhering to data governance and international best practices.

Publications

Contributor to publications involving large-scale biological datasets, data standardisation, and international research collaboration.


Education

Loughborough University

Computer Science BSc; Second Class Honours, Upper Division 2014 - 2017


References

Available upon request.


Links & Contact Info

Github:

github.com/JosephRuff

LinkedIn:

linkedin.com/in/joseph-ruff-a99062393

Contact Information:

For full contact information, please download the pdf version of my CV.

pdf Download