Exporting Flow Cytometry Statistics for Papers: MFI, Percent Positive, and Gate Labels
Exporting Flow Cytometry Statistics for Papers: MFI, Percent Positive, and Gate Labels
Reviewer 2 asks for the median fluorescence intensity values behind Figure 3, and you realize the CSV you exported six months ago labeled the column “MFI” without specifying whether it was the arithmetic mean, geometric mean, or median. Re-running the analysis is possible, but the gating template lives on a colleague’s laptop that left with them when they moved labs. This guide covers how to export flow cytometry statistics for publication in a way that survives peer review, software migrations, and the inevitable methods-section follow-up — with specific attention to the labeling ambiguities that cause the most reviewer back-and-forth.
What reviewers actually look for in a flow cytometry statistics export
A reproducible statistics export carries five things, in this order of importance:
- An unambiguous statistic name — “median FI”, “geometric mean FI”, or “arithmetic mean FI”, never the bare term “MFI”
- The full gate path for each population (e.g.,
Singlets/Live/CD3+/CD4+), not just the leaf name - The parent gate for every percentage so a reviewer can reconstruct what “% positive” is a percentage of
- The event count in each gate, both raw and as % parent — rare populations under 100 events are unreliable and reviewers will flag them
- The transform used for axis scaling (biexponential parameters, arcsinh cofactor, or log) for any fluorescence statistic, because median and geometric mean both depend on it
The fifth item is the one practitioners forget most often. A median expressed on log axes is not the same number as the median expressed on biexponential axes — the underlying data is identical, but the statistic reported by the software changes depending on display scaling. If you compare medians across samples analyzed under different transforms, the difference may be in the transform rather than the biology.
Step 1 — Pick the right statistic for what you are claiming
The choice of central-tendency statistic should match the claim, not the default the software happens to offer.
- Median fluorescence intensity: use when comparing populations across samples. It is the field default for comparing marker expression because it tolerates skewed distributions and outliers from doublets or aggregates. The International Society for Advancement of Cytometry (ISAC) recommends it as the primary reporting statistic for fluorescence intensity.
- Geometric mean fluorescence intensity: use when comparing on log-distributed data and you need a single summary that respects the log scale. Common in clinical flow cytometry (CD4 monitoring) and in cell-cycle DNA content where the distribution is log-skewed.
- Arithmetic mean: rarely the right choice for compensated fluorescence data because it includes negative numbers and dim populations skew it heavily. Acceptable for FSC/SSC scatter on linear scales.
- CV (coefficient of variation): use to report population spread, not expression level. The G0/G1 peak CV in cell cycle analysis is reported with the median peak position as a quality metric, where CV >6% suggests biological heterogeneity or technique issues.
State the statistic name in the column header, in the figure legend, and in the methods section. Three places. The CSV column header MFI is not enough — write Median_FI_CD3-FITC instead.
Step 2 — Export the full gate path, not just the leaf name
A statistics row labeled “CD4+” tells the reader nothing about what was upstream. A row labeled Singlets/Live/CD3+/CD4+ tells them the full gating story: singlet discrimination first, viability gating second, lineage gating third, sub-population fourth. The full path lets a reviewer (or a future analyst, including yourself) confirm that doublets were excluded, dead cells were removed, and the CD4+ population is conditional on CD3+, not on total events.
Most modern analysis software exports gate paths by default, but confirm this is on before exporting — a “leaf name only” toggle is the failure mode to watch for. If you see ambiguous column headers like just “CD8+” in your export, your gate path is being truncated. The fix is usually a single checkbox in the export dialog labeled “Include parent gates” or “Full hierarchy path”.
Step 3 — Include both % parent and % total, plus absolute counts
For every gated population, export three numbers:
- % of parent — the primary biological statistic. “12% of live CD3+ cells are CD8+” is a population-aware claim.
- % of total events — secondary, but useful for cross-experiment normalization where parent gates differ slightly.
- Event count (raw number) — required to assess statistical confidence. A population reported at 0.3% means very different things at 10 events vs 10,000 events. Reviewers in immunology and rare-cell work will ask.
The Poisson rule of thumb: the relative standard error on an event count n is approximately 1/√n. At 100 events the relative error is 10%; at 1000 events it is about 3%. If your paper compares rare populations across conditions, aim for at least 100 events per population in every sample, and call out which samples do not meet that threshold. We covered the event-count math for rare cell detection in detail in a separate post.
Step 4 — Export the gating template alongside the statistics
The CSV is the result. The gating template is the method. Without the template, the statistics cannot be reproduced — not by reviewers, not by collaborators, not by you in two years when a follow-up paper asks for the same analysis on new data.
Three formats are worth knowing about:
- FlowJo workspace files (.wsp): portable across FlowJo installations, but tied to FlowJo’s proprietary biexponential transform and not reliably readable by other software. The right choice if your lab and collaborators all use FlowJo.
- Gating-ML 2.0 (.xml): the ISAC-standardized XML format for gate definitions. Open-format, portable across compliant software, and the right choice for supplementary materials and long-term reproducibility.
- R/Bioconductor flowWorkspace: for computational labs, exporting the gating tree as a flowWorkspace GatingSet allows the entire pipeline to be re-run from scripts.
For peer review, Gating-ML 2.0 attached as supplementary material is the best practice. It is plain-text XML that lives alongside the paper for the journal’s archival lifetime, and it does not lock the reader into specific software.
Step 5 — Export to GraphPad Prism the right way
Prism (.pzfx) is the standard biological-statistics graphing tool, and most flow cytometry data ends up there for the actual figure generation. Three table types Prism accepts directly:
- Column tables: one column per sample, one row per condition. Use for paired comparisons (treated vs untreated within donor).
- Grouped tables: rows = conditions, columns = treatments. Use for two-way ANOVA designs.
- XY tables: rows = time points or doses, columns = samples. Use for time-courses and dose-response curves.
The mistake people make: exporting Prism-ready data without aggregating biological replicates. Prism expects per-sample columns to be raw values, not pre-averaged. If you average triplicates in the export step, you lose the ability for Prism to compute SEM, run multiple-comparison tests, or display individual data points overlaid on bar charts. Export the raw per-sample value; let Prism do the math.
If your software offers direct Prism export (.pzfx file), prefer that over CSV-then-paste — it preserves data types and avoids the silent comma-as-decimal issue that hits when European-locale Excel touches the file in between.
Step 6 — Document AI provenance and unmixing controls in methods
If your gating used AI-suggested boundaries, AI-detected populations, or automated clustering (FlowSOM, PhenoGraph, FlowAI for QC), the methods section needs to say so. Specifically:
- The algorithm name and version — “FlowSOM v2.4 with 10x10 SOM grid” not “automated clustering”
- What the AI input was — “applied to all events in the CD45+ gate” not “applied to the data”
- What the practitioner did with the output — “cluster phenotypes were verified by back-gating onto FSC/SSC and CD-marker plots before inclusion in analysis”
- For per-user trained AI gating, the number of training gates and the panel type
For spectral data, methods must also include the autofluorescence reference source, the residuals interpretation step, and any unmixing-QC flags that were overridden. The spectral vs conventional comparison covers why unmixing residuals matter and what an acceptable threshold looks like.
Step 7 — Export MIFlowCyt metadata if the journal or funder requires it
MIFlowCyt is the ISAC-standardized minimum information standard for flow cytometry experiments. The NIH Data Management and Sharing Policy (effective January 2023) requires that flow cytometry data deposited in repositories include MIFlowCyt-compliant metadata, and several journals (Cytometry Part A in particular) require it for accepted manuscripts.
The standard covers four domains: experiment overview, sample, instrumentation, and data analysis. Most modern analysis software pre-fills these from FCS file headers (instrument, fluorochrome panel) and the gating hierarchy (analysis). The fields that practitioners must complete manually are usually the sample-prep section — donor demographics, treatment conditions, fixation method. Plan for 15–30 minutes of metadata authoring per experiment.
Export MIFlowCyt as XML for repository deposit, or as JSON for human-readable supplementary materials. Both formats are interchangeable; the XML form is the ISAC archival format.
Common mistakes that post-submission revisions catch
- “MFI” used without specifying which mean. Reviewer 2 will ask. State it as
Median_FIorGeoMean_FIin column headers, figure legends, and methods. - Mixing biexponential and log scales across samples. If one panel of the same paper uses biexponential and another uses log for the same fluorochrome, the medians are not comparable. Pick one transform, apply it across all samples in a comparison, and state it in methods.
- Percentages without parent gates. “8% of cells are CD8+” is meaningless without “8% of which cells”. Always export the parent gate name with every percentage.
- Single-replicate “n=1” figures. A bar chart with no error bars is a screenshot, not data. Aim for biological triplicates (separate donors or animals), not technical triplicates (the same sample run three times).
- Missing compensation status. State in methods whether files were compensated, who built the matrix, what controls were used, and whether the matrix was validated by the bivariate-plot check. A reviewer who knows flow will look for this.
A reproducibility checklist before you submit
Before exporting your final statistics for the manuscript:
- Every column header names the statistic explicitly (Median_FI, GeoMean_FI, Pct_Parent, Event_Count, CV)
- Every population row carries the full gate path
- The transform used for any fluorescence statistic is documented (biexponential parameters, arcsinh cofactor, or log) in methods
- Event counts under 100 are flagged as low-confidence in figure legends
- Compensation status is stated, and the spillover matrix is included as supplementary data
- The gating template is exported as Gating-ML 2.0 (or .wsp with a note about FlowJo dependency) and attached as supplementary material
- If the journal requires MIFlowCyt, the metadata document is complete and attached
- Any AI-assisted gating, clustering, or QC step is named with algorithm and version in methods
If your analysis software is the bottleneck on any of these — particularly the AI provenance trail, Gating-ML export, or MIFlowCyt metadata pre-fill — consider whether your tool stack can produce them. We compared FlowJo, Kaluza, Cytobank, and modern alternatives on these and other reproducibility features in a separate software comparison post.
For panel design before the experiment runs — the upstream side of the reproducibility problem — the fluorophore spectrum viewer visualizes overlap between candidate fluorochromes so spillover and unmixing complexity is known before acquisition, not discovered at review.
Try Cytomaton
AI-assisted flow cytometry analysis that learns your gating style. Free during beta.
Join the beta