Overview
At MetronMind, deep learning techniques are used to create neural networks which can compute geometric measures on images. To do this we train the system with many images from diverse sources. We then evaluate the performance on a set of images that were not used during training, and which have been measured by human experts. This leads to a system which can automatically measure images in various ways and can be a powerful assistant to the veterinarian. This page reports on the details of the validation of our Canine VHS and VLAS measures.
History
First public version of our VHS algorithm was in August 2020.
Our algorithms have been in practical use since then: VIN website, Metron-DVM software, Metron-IQ software
Periodic algorithm updates (as techniques and/or training data improve)
Images produced by our system show version number in lower right corner. (an “E” indicates human editing of AI results)
Performance of our algorithm of November 2020 was presented at ECVIM 2020
Accuracy of deep learning enabled software to measure vertebral heart size in dogs with myxomatous mitral valve disease
K.T. Sykes, S. Gordon, J. Craig, et al.
Our algorithm of July 2022 (version 4.4.2) added the VLAS measurement.
Work is in progress on a journal article to more fully report on our latest algorithm (Aug 2022, version 4.4.3).
Training Set
10,000+ Images from over 50 clinics around the world
Rich mix of images from various imaging systems
Mix of image quality from excellent to poor
These are from working veterinary clinics and so are a sample of the population "all dogs radiographed in veterinary clinics"
Working with Sonya Gordon, DVM, DVSc, Diplomate ACVIM, to validate our algorithms
Professor Texas A&M University
Member of the Cardiac Education Group (CEG) — see https://cardiaceducationgroup.org
Her group supplied our gold-standard "ground truth" DB of more than 1,000 images which were hand-measured by her team of experts for both VHS and VLAS.
She is now working on a publication for a refereed journal which will report on performance of our "August 2022 algorithm" (4.4.3)
Performance of our current (August 2022) algorithm
Define:
AH = VHS value measured by AI, and HH = VHS value by Human expert.
AL = VLAS value measured by AI, and HL = VLAS value by Human expert.
For a given image we can compute:
Mismatch (for VHS) computed as: M = Abs(AH-HH)
Mismatch Percentage (for VHS) computed as: MP = (100*M)/HH
(In our discussion here, we’ll use these simple measures. More complete statistical analysis is done in our publications.)
For the set of all images in the validation database:
Average Mismatch for VHS is 0.27
Average Mismatch Percentage (MP) for VHS is 2.45%
Percentage of images with MP > 10% for VHS is 0.85%
Likewise, for VLAS:
Average Mismatch for VLAS is 0.20
Average Mismatch Percentage (MP) for VLAS is 9.32%
Percentage of images with MP > 25% for VLAS is 4.9%
A publication by Prof. Gordon's group (presented ECVIM 2021) shows that for a group of 14 human experts, the inter-observer variability for VHS was 5.3% and for VLAS was 14.6%, which tells us that the VLAS measurement is a 'more difficult' measure for any system (human or AI). These percentages, compared to the results stated above for our algorithm, suggest that the AI system is working "as well as a human expert".
Reproducibility and repeatability of radiographic measurements of cardiac size in dogs
E. Malcolm, S. Gordon, J. Häggströmb, S. Wesselowskia, R. Friesc, S. Kadotanic, C. Pouliota, et al.
Validation
"Explainable AI" - our system places points accurately on the image in order to generate the VHS and VLAS measurements just as a human expert would. These results are shown, and so what the AI has done is very apparent - and, the human user, is allowed to edit the results if for some reason they don't agree with the AI. This is completely different from other approaches in which an AI system generates an estimate of VHS (or VLAS) and gives only the numerical output.
Our system is inherently robust to image quality, both because our training set contains a variety of image qualities, and also due to the nature of our scheme of placing point based on anatomical morphology, rather than regressing a single number.
In an internal study, we implemented the simpler style of AI which generates a single numerical value for VHS. We trained using our same training database, and then computed performance with our same validation database:
Average Mismatch for VHS is: 0.52
Average Mismatch Percentage for VHS is 4.62%
Hence, it appears that our point-placing approach will outperform these simpler schemes.
Comparison with Alternate
Approaches
The user must supply a "standard practice" canine Right Lateral Thorax radiograph
Our algorithms are not validated for Feline (coming soon)
A Left Lateral is not currently measured (most published data are for right recumbency, so we support only that)
A Left Lateral that has been mirrored to appear as a right is now detected by our algorithm and the user is warned, Our algorithm to detect a “mirrored right” is working at 97% accuracy.
Spinous processes of the thoracic vertebrae should be shown
Scapula should normally be shown
Pelvis should normally not be shown. If too much of the abdomen is shown, our system may class the image as “abdomen” rather than “thorax” in which case the VHS/VLAS will not be performed
Dog should be flat on the table, so that the image does not show 'rotation'
image quality should be as high as possible, and image resolution must be a minimum of 800 pixels in each dimension
These guidelines are easily met by most practitioners, and by most existing images