Artificial Intelligence in Flow Cytometry Diagnosis

Introduction/Objective: Flow cytometry plays a crucial role in identifying and characterizing cells in various diseases. However, interpreting complex data can be challenging and depends on the expertise of each professional. In this context, we propose the application of machine learning models, with deep learning, to improve diagnostic efficiency and accuracy in identifying cell populations and subtypes of oncohematology diseases.  

Methodology: To build our pilot model, we used data from 36 cases of acute leukemia, being in the process of increasing our sample N within the logic of data science, collected through screening panels with 10 colors that reflect international guidelines, acquired using a 13-colors cytometer. We carried out a data pre-processing process, incorporating the variation in the fluorescence of markers according to clusters, similar to the gate system for delimiting populations, and generating graphical representations for better understanding of the data according to the average fluorescence intensity, standard deviation and coefficients of variation of the markers in relation to the clusters among our cases. Therefore, there are two approaches here, the one to group the cells, and the one that, based on the immunophenotypic characterization of these groups, makes the diagnostic correlation. We then implemented a random forest approach to learn the complex patterns in cellular data with didactic visual resources to direct the cytometrist’s rationale, thus suggesting probabilities associated with the B-ALL and AML diagnostic conclusions, these being the ones that made up our 36 cases. in this prototype, according to the constitution of our data for the model of this methodology, with the perspective of constant development with greater optimization and applicability. It is important to highlight that this is a pilot in development of data grouping agreement for greater applicability and refinement of the developed machine learning. Furthermore, a webapp using the Shiny interactive interface, through R, was created in order to illustrate the approach, in which a .fcs file, acquired according to our methodology, can be applied to an example of possible developments.

Results/Discussion: The initial metrics in our model to group cells into 10 populations, on average, were accuracy 0.957, sensitivity (recall) 0.948, precision 0.948, and F1 Score 0. 948, while for the diagnostic correlation they were accuracy 0. 591, sensitivity (recall) 0. 938, precision 0. 640, and F1 Score 0. 761. The results obtained, and the review of the associated literature, reveal that the cytometry data, as it is a SingleCell study, refer to many omic science technologies, in addition to following the Gaussian distribution, justify the logical application of advanced statistical techniques. Within this context, a potential for important improvements in diagnostic efficacy by complementing conventional approaches is perceived. Furthermore, the visual representations generated during pre-processing facilitated the interpretation of the results, making them more accessible to professionals with different levels of experience in cytometry. The practical application of this approach in laboratories and hospitals can offer notable advantages. The (semi)automation of the interpretation process, combined with the visual guides provided by the model, can reduce the time required for data analysis, increasing the operational efficiency of laboratories, despite the inherent complexity that requires professional expertise to do so. Furthermore, improved precision contributes to more assertive decision-making, enabling faster and more personalized clinical intervention.  

Conclusion: The integration of deep learning into machine learning models, such as, random forest or neural networks applied to cytometry offers an innovative approach to the diagnosis of acute leukemia. The combination of international guidelines, advanced pre-processing and visual interpretability, highlights the robustness and practicality of the proposed model. Implementing this approach on a multicenter scale can significantly contribute to the efficiency and accuracy of flow cytometry diagnoses in laboratories and hospitals, optimizing clinical-laboratory practice.  

Topic: Data Analysis 

Contributing Authors: Ian Antunes, Bianca Falasco, Raisa Maia, Gustavo Oliveira, Rodrigo Barroso. Flowmentor – Integrated Diagnostics and Flow Cytometry