Predicting High Risk Breast Cancer - Phase 1 (2022)

Daily submission limit: 1

Predicting High Risk Breast Cancer: a Nightingale OS & AHLI data challenge



Every year, 40 million women get a mammogram; some go on to have an invasive biopsy to better examine a concerning area. Underneath these routine tests lies a deep—and disturbing—mystery. Since the 1990s, we have found far more ‘cancers’, which has in turn prompted vastly more surgical procedures and chemotherapy. But death rates from metastatic breast cancer have hardly changed.

When a pathologist looks at a biopsy slide, she is looking for known signs of cancer: tubules, cells with atypical looking nuclei, evidence of rapid cell division. These features, first identified in 1928, still underlie critical decisions today: which women must receive urgent treatment with surgery and chemotherapy? And which can be prescribed “watchful waiting”, sparing them invasive procedures for cancers that would not harm them?

There is already evidence that algorithms can predict which cancers will metastasize and harm patients on the basis of the biopsy image. Fascinatingly, these algorithms also hone in on features that humans neglect, for example, the nature of the non-cancerous tissue surrounding the tumor. But to date, the datasets linking biopsy images to patient outcomes—metastasis, death—have been far smaller than what is needed to apply modern approaches.

This dataset contains images and outcomes for 72,400 biopsy slides that correspond to 4,200 cases ranging from 2014 to 2020. Please refer to the full version of the dataset documentation as you get started to learn more about the cohort and key variables for this challenge including mortality and cancer stage.


Providence St. Joseph, Nightingale OS, and The Association for Health Learning and Inference (AHLI) developed this challenge in order to catalyze the development of algorithms that find new signal in digital pathology images, ultimately providing new insights into which patients may be at risk and need preventive treatment.

The goal of this challenge is to predict the stage of a patient’s cancer, using only the slide images generated by a breast biopsy.

Cancer staging is a complex, multidisciplinary task: while it does take into account some features of the biopsy, it also integrates a wide variety of external information: the size of the lesion biopsied, its appearance and location on imaging, and a variety of other tests (imaging and more) to determine whether the cancer has spread to other locations in the body. This important contextual information, most of which is not present in the whole slide image, serves as our ground-truth label for the challenge. By linking features of the whole slide image to this label, algorithmic approaches have the potential to find new sources of signal—beyond the tubules, atypical nuclei, and cell division markers pathologists consider today—that can identify patients with benign or deadly cancers.

Building on successful work in this challenge, a particularly interesting next step is to identify predictable “outliers”: patients whose cancer is far more—or less—benign that it appears to the pathologists. Researchers at Providence, who have access to rich and granular data on pathologists’ judgments, are eager to collaborate on this exciting follow-on work.

Prizes and Compute Credits

Nightingale OS and AHLI have announced prizes up to $20,000 and free compute credits worth up to $50,000 for the two phases.

For phase 1, the winning team will be awarded a cash prize of $5,000 and the runner-up team will be awarded a cash prize of $3,000, and are subject to rules governing the prizes and the contest.

Compute credits will be allocated among various teams fairly based on availability and are valid for use until the end of the contest.


This dataset contains whole slide images from 4,335 breast biopsies, in 3,425 patients, over the years 2014 to 2020. For our purposes, an observation in this dataset corresponds to a biopsy (i.e., performance in the hold-out set will be evaluated at the biopsy level).

Images: Each biopsy generates between one to one hundred physical slides (processed with hematoxylin and eosin stain). The slides have been digitized at 40x magnification with a Hamamatsu slide scanner, yielding a whole slide image. These images have a resolution around 100,000 x 150,000 pixels, and are stored as a single NDPI file (average size ~2GB). A NDPI file is a TIFF-like file, and libraries like openslide can be used to interact with them. The 4,335 biopsies of this dataset generate 69,606 whole slide images, with a median of 13 WSI per biopsy.

Labels: The primary label is the cancer stage associated with a biopsy. Table 1 shows that not all cancers are staged: only those with an initial diagnosis and first round of treatment at Providence will have a stage assigned (by convention, cases are staged and reported at the time of diagnosis, by the institution at which the diagnosis was made; 94% of staging judgments in this dataset are made within one month of biopsy).

Dataset splits: The dataset has been split randomly at the patient level with 75% of the data made available. The 25% holdout will be used for validation purposes. Refer to Table 1 for what is expected to be made available.

Table 1

Train Holdout
N biopsies 3258 1077
 N patients 2569 856
 N images 52707 16899
N biopsies with stage 2722 886
 0 15% redacted
 I 52%
 II 24%
 III 6.4%
 IV 2.2%
N unstaged biopsies 536 191

Model performance measurement: The primary metric we will evaluate model performance is prediction of cancer stage, among staged cases in the holdout set. More detailed information on the exact scoring methodology is below. Similar to the training dataset, the holdout set contains staged and unstaged biopsies (see Table 1), but only staged biopsies will be used to calculate the primary Challenge metric. We will award additional prizes for other aspects of performance, reflecting both clinical utility of models and Nightingale Open Science’s commitment to equity.

You are free to use images from unstaged cases in any way you’d like in the training process. You are also free to use any other available information in the training process: detailed demographic information including age, sex, and self-reported race; and other information about the cancer and its progression beyond stage, including mortality and ICD codes for metastatic disease (though keep in mind the many caveats of this information, as noted for example here). However, note that none of this contextual information will be provided in the holdout set—only the slide images. Please refer to the full version of the dataset documentation as you get started to learn more about the cohort and key variables.

Important Challenge Dates

ATTENTION: Phase 1 ended on Jan 12, 2023. Phase 2 of the contest is here.


  1. Registration required. Only registered Nightingale OS users may participate in the contest.
    1. Registration is open to anyone worldwide who has an active affiliation with an accredited academic institution.
    2. Your registration must be approved by Nightingale before you can access contest data or any other Nightingale OS resources.
    3. You can only sign up for one account.
    4. You must abide by all Nightingale Terms of Service. In particular, no portion of a Nightingale dataset may be downloaded.
  2. Teams and collaboration
    1. Size limit. Teams can be any size. As a general rule, contests often limit team sizes to about 5. Because this is a multi-disciplinary research area, Nightingale recognizes that teams may be larger from time to time.
    2. No merges. You can only be a member of one team during a given phase.
      1. If you entered the competition as an individual, you may join someone else’s project only if you have not submitted any entries for scoring during the contest’s current phase.
    3. No sharing. Sharing code between teams is prohibited unless the information shared is free and publicly available.
    4. Public discussion. You may not publicly describe the methodology you are using for the competition until after the submission deadline for a given phase. You may describe your methodology in the context of other datasets as long as you don’t also indicate this is the methodology you used in competition.
    5. Publication. Please use the recommended citation found on the dataset documentation page. Although the dataset is available to all Nightingale users, please do not submit contest-related work to other conferences or journals until the current contest has ended.
  3. Scoring
    1. Predictions CSV. Teams submit entries for scoring in the form of a predictions file. Nightingale will score the entry according to the methodology described in the Scoring methodology section.
    2. Entry limit. Each team can submit 1 entry per day.
    3. Public leaderboard. During the competition period, each team’s current ranking will be visible on the competition public leaderboard. After the end date, the leaderboard will reveal each team’s score and model description.
    4. High score. The best score from all your team’s submissions will be used in the leaderboard, along with your description of that entry, if any. See Scoring for details.
    5. Tiebreaker. In the event of a tie, the first submission will outrank subsequent submissions.
    6. Submission requirements. Nightingale may make efforts to help you validate your predictions file and fix any issues before scoring. However, invalid submissions may result in no score and may count against your team’s entry limit.
  4. Resources
    1. Use of non-public data or software. Use of free and publicly available external data is allowed, including pre-trained models. If you use any proprietary software or data that is not free and publicly available, such as a pre-trained model using private data, then you should declare this by including “[NON-PUBLIC]” in your Nightingale project description field. Rankings and scores for teams using non-public resources will be published on the leaderboard and encouraged to contribute to the conferences but will not be eligible for any prizes.
    2. Billing. Nightingale OS provides both free and non-free compute resources to contest participants. To support participants with the cost of non-free compute servers, Nightingale allocates a limited amount of free compute credits to the teams on sign up. Additional free compute credits will be allocated periodically based on availability of credits and each team’s activity. Teams are responsible for all additional costs incurred for non-free resources after they consume the free compute credits.
  5. Prizes
    1. Review of entries and Deadlines. All top scoring entries are subject to Nightingale’s review for compliance with guidelines and quality of results. Because the goal of this contest is to promote research and to stimulate interest in these kinds of applications, we could extend the deadline in order to get broader participation and high quality submissions. We will let everyone know in advance if we do change any key dates.
    2. Licensing of work. To be eligible for prizes, the teams must submit the winning solution as open source under the Apache License v2 or a compatible OSI-approved open source license. Third-party resources, if used, must be freely available to the public, and must not form a significant part of the solution as determined by the judges. This clause is intended to promote the repeatability and reproducibility of science. If a team is unable to comply with this clause, they may still compete and will be placed on the leaderboard, but they will not be eligible for prizes.
    3. Export control. Prize money is subject to US export control regulations. Teams in countries prohibited by US export control regulations will not be eligible for prizes.


Scoring methodology

Cancer stage takes on discrete values (0, I, II, III, IV), and clinically, some errors are more costly than others: in broad strokes, metastatic disease (stage IV) is managed very differently from locoregional disease, and carries a much higher mortality rate. That said, we felt that mean squared error was a good approximation of the clinical loss function.

$$ {MSE =\frac{1}{n}\displaystyle\sum_{i=1}^{n}(predicted\ stage_i – actual\ stage_i)^2} $$

We will accept continuous predictions from 0 to 4: we want to reward getting as close to the recorded stage as possible, rather than asking participants to convert predictions to whole numbers.

Nightingale randomly samples a set of patients to create a hold-out set, on which entries will be scored. You can see the images in the dataset directory in the holdout subdirectory, but of course, stage and mortality fields are excluded. We have included both the staged and unstaged biopsies for the holdout. A table is included in the holdout that indicates which biopsies have been staged. When submitting the prediction file, submit predictions for all staged cases and any unstaged cases.


To submit an entry, predict the stage for each biopsy and write results to a CSV file.

Predictions CSV file format

Although teams are being asked to only use slides to train their algorithms, predictions for this contest are for biopsies. Teams will need to incorporate multiple slides into their algorithms or aggregate slide predictions. Also be aware that biopsy_id, slide_id, and patient_ngsci_id all are hexadecimal strings of the same length.

Example CSV


Submission description (recommended)

How to submit an entry

Submit your predictions file inside any instance in your Nightingale project:

>>> import ngsci
>>> ngsci.submit_contest_entry("path/to/your.csv", description="our model")
 (<Result.SUCCESS: 1>, 'success')

After you submit a file, Nightingale will attempt to validate the schema of your CSV. If it fails validation, you will see one or more errors in the Submissions tab in your Nightingale OS project (not in your JupyterLab editor). Any validation errors detected automatically will not count against your submission limit.

After a predictions file passes validation, it will be submitted for scoring. The result will be posted in your project Submissions view, and if the result is a new high score for your team, the public leaderboard will also be updated.

Sample submission

A sample CSV and notebook are located in ~/datasets/brca-psj-path/holdout/sample-submission. The sample CSV is in the proper format and contains all the biopsy_ids needed for a submission. The notebook describing the holdout tables and the process of submitting an entry.

How to form a team

After you have registered for Nightingale and a Nightingale admin has admitted you, then you can enter the currently active contest. For example: Predicting High Risk Breast Cancer 2022.

Collaboration in Nightingale OS happens inside projects. In the case of a contest, your project is your team. After you create your team, you can add teammates by adding them as members of your project.

Your team name and description will appear on the public leaderboard.

For any issues creating teams, please contact

Host organizations

This is a challenge jointly hosted by Nightingale Open Science; The Association for Health Learning and Inference; and Providence St. Joseph Health.

Nightingale Open Science is a platform connecting researchers with deidentified, cutting edge medical datasets. The Nightingale OS team works closely with health systems around the world to create and curate datasets of medical images linked to ground-truth labels, and make them freely available to academic researchers. Nightingale OS launched at the 2021 NeurIPS conference with five anchor datasets spanning different disease areas.

The Association for Health Learning and Inference (AHLI) is a not-for-profit organization dedicated to building a transdisciplinary machine learning and health community. AHLI works with its partners to advance health data quality and access, knowledge discovery, and meaningful use of complex health data. AHLI was founded in September 2021 with generous support from Schmidt Futures.

Providence St. Joseph Health is a not-for-profit health care system operating in seven states and serves as the parent organization for 100,000 caregivers. The combined system includes 51 hospitals, 829 clinics, and other health, education and social services across Washington, Oregon, California, Alaska, Montana, New Mexico, and Texas.

Together, our three teams are thrilled to collaborate on this contest to spur collaboration and competition in the field of computational medicine.


We thank our generous funders: Schmidt Futures, The Gordon and Betty Moore Foundation, and Ken Griffin. We would also like to acknowledge the team that created and conceived of this dataset, and worked with us to make this challenge possible: Carlo Bifulco, MD, Director of Molecular Pathology and Pathology Informatics; Brian Piening, PhD, Technical Director of Clinical Genomics; Tucker Bower, Bioinformatics Scientist. Many thanks also to the leadership of Ari Robicsek, Chief Medical Analytics Officer at Providence, Bill Wright, VP of Health Innovation at Providence, and Raina Tamakawna, Enterprise and GME Research Program Manager at Providence.

We express our thanks to Hamamatsu as well – developers of the NanoZoomer 360 platform, Hamamatsu supported this work with a grant from their Product Marketing Division and partnered with Providence to ensure a seamless start to the dataset creation process.

NGSCI logo
AHLI logo
Providence logo