Data Scientist - R&D (Behavioral & Predictive Modeling)
Gratiago
Gratiago exists to become a patient’s everyday companion between consultations, precisely because doctors have less and less time and patients struggle to find reliable information and the right kind of daily support. Our product goal is to help patients see value in their treatment and to make the moment of treatment intake an important, meaningful part of their day, not a mechanical reminder that gets ignored.
We are intentionally building a small, high-trust, high-standards team. Early hires are the DNA of the company.
We aim to create an environment where exceptional builders can do deep work with minimal noise, make decisions with ownership, and ship what matters.
What you’ll work on (core focus)
You will contribute to the intelligence layer of Gratiago, with responsibilities evolving as the product and data mature.
-
Designing predictive models to estimate adherence risk and potential churn, including early-stage / cold-start scenarios where data is limited.
-
Exploring patient behavior through clustering and segmentation, with the goal of identifying meaningful behavioral patterns rather than academic perfection.
-
Prioritizing interpretable and explainable models, so that decisions can be understood and discussed with clinicians and partners.
-
Translating raw interaction data (timestamps, reminders, confirmations, skipped actions) into insights about routines, motivation, and friction.
-
Working with noisy, sparse, and self-reported data, and helping define what “good enough” means in a real-world healthcare context.
Technical stack & environment:
You don’t need to know everything from day one, but you should be comfortable growing into this stack.
-
Strong working knowledge of Python for data analysis and modeling (Pandas, NumPy, Scikit-learn).
-
Ability to explore and extract data yourself, including writing SQL queries.
-
Experience working with or around data lakes / warehouses (Google BigQuery is a plus).
-
Comfort experimenting in Jupyter notebooks or similar environments, with the ability to progressively clean up and document your work.
-
Willingness to collaborate with a developer who will handle production deployment of models.
-
Environment: Experience with (or strong desire to master) Google Cloud Platform (GCP), specifically Vertex AI for model training, orchestration (Vertex Pipelines), and deployment.
-
Deep Learning & Sequential Modeling: Foundational understanding of architectures suited for temporal data (e.g., LSTMs, GRUs, or Transformers) to model patient "trajectories" and "state transitions". Familiarity with LLM orchestration (e.g., prompt engineering, RAG, or fine-tuning via Vertex AI) is a significant advantage for our Adaptive Coaching Agent.
Domain & constraints (healthcare-aware)
-
Comfort working exclusively with pseudonymized data, without direct access to personal identifiers.
-
Sensitivity to privacy, ethics, and trust in a healthcare setting.
-
Curiosity for behavioral science and how technical signals map to human behavior.
-
Awareness of concepts such as reinforcement learning or adaptive systems is a plus, even if not applied immediately.
Collaboration & role evolution
-
You will work closely with the Lead / Founding Engineer, defining model logic, inputs, outputs, and limitations.
-
The role is expected to evolve over time: early on, it may be more exploratory; later, more structured and product-integrated.
-
The position can be part-time or fractional, as long as ownership, continuity, and clear documentation are ensured.
-
We are open to shaping the role based on your strengths and interests, as long as they align with Gratiago’s objectives.
What we’re looking for (profile)
-
A data scientist who enjoys applied problems and imperfect data.
-
Someone comfortable with ambiguity and early-stage constraints.
-
A profile that values learning, iteration, and explainability over theoretical complexity.
-
Ability to communicate clearly with non-technical stakeholders.
Nice-to-have (pluses, not requirements)
-
Experience in healthcare, life sciences, or regulated environments.
-
Familiarity with synthetic data generation.
-
Prior exposure to reinforcement learning or adaptive systems.
-
Experience contributing to R&D or innovation-funded projects.
In short
We are not looking for a finished product, but for someone with solid foundations who wants to grow into a key R&D role and help shape how Gratiago learns from patients over time