Overview

At MetronMind, deep learning techniques are used to create neural networks which can compute geometric measures on images. To do this we train the system with many images from diverse sources. We then evaluate the performance on a set of images that were not used during training, and which have been measured by human experts. This leads to a system which can automatically measure images in various ways and can be a powerful assistant to the veterinarian. This page reports on the details of the validation of our Canine VHS and VLAS measures.

History

First public version of our VHS algorithm was in August 2020.
Our algorithms have been in practical use since then: VIN website, Metron-DVM software
Periodic algorithm updates (as techniques and/or training data improve)
- Images produced by our system show version number in lower right corner.
Performance of our algorithm of November 2020 was presented at ECVIM 2020
- Accuracy of deep learning enabled software to measure vertebral heart size in dogs with myxomatous mitral valve disease
  K.T. Sykes, S. Gordon, J. Craig, et al.
Our most recent algorithm (July 2022, version 4.4.2) is reported on below.
Work is in progress on a journal article to more fully report on the algorithm.

Training Set

Images from over 50 clinics around the world
Rich mix of images from various imaging systems
Mix of image quality from excellent to poor
These are from working veterinary clinics and so are a sample of the population "all dogs radiographed in veterinary clinics"

Working with Sonya Gordon, DVM, DVSc, Diplomate ACVIM, to validate our algorithms
- Professor Texas A&M University
- Member of the Cardiac Education Group (CEG) — see https://cardiaceducationgroup.org
Her group supplied our gold-standard "ground truth" DB of more than 1,000 images which were hand-measured by her team of experts for VHS.
She is now working on a publication for a refereed journal which will report on performance of our "July 2022 algorithm"

Performance of our current (July 2022) algorithm

Define:
AH = VHS value measured by AI, and HH = VHS value by Human expert.

For a given image we can compute:
Mismatch (for VHS) computed as: M = Abs(AH-HH)
Mismatch Percentage (for VHS) computed as: MP = (100*M)/HH
(In our discussion here, we’ll use these simple measures. More complete statistical analysis is done in our publications.)

For the set of all images in the validation database:

Average Mismatch for VHS is 0.30
Average Mismatch Percentage for VHS is 2.71%
Percentage of images with MP > 10% for VHS is 2.8%

A publication by Prof. Gordon's group (presented ECVIM 2021) shows that for a group of 14 human experts, the inter-observer variability for VHS was 5.3%. This percentage, compared to the results stated above for our algorithm, suggest that the AI system is working "as well as a human expert".

Reproducibility and repeatability of radiographic measurements of cardiac size in dogs
E. Malcolm, S. Gordon, J. Häggströmb, S. Wesselowskia, R. Friesc, S. Kadotanic, C. Pouliota, et al.

Validation

"Explainable AI" - our system places points accurately on the image in order to generate the VHS measurement just as a human expert would. These results are shown, and so what the AI has done is very apparent - and, the human user, is allowed to edit the results if for some reason they don't agree with the AI. This is completely different from other approaches in which an AI system generates an estimate of VHS and gives only the numerical output.
Our system is inherently robust to image quality, both because our training set contains a variety of image qualities, and also due to the nature of our scheme of placing point based on anatomical morphology, rather than regressing a single number.
In an internal study, we implemented the simpler style of AI which generates a single numerical value for VHS. We trained using our same training database, and then computed performance with our same validation database:
- Average Mismatch for VHS is: 0.52
- Average Mismatch Percentage for VHS is 4.62%
Hence, our point-placing approach outperforms this kind of simpler scheme.

Comparison with Alternate
Approaches

The user must supply a "standard practice" canine right lateral thorax radiograph
- Our algorithms are not validated for Feline
- A Left Lateral is measured, but a warning is issued
- A Left Lateral that has been mirrored to appear as a right is highly discouraged (and soon, will be detected)
Spinous processes of the thoracic vertebrae should be shown
Scapula should normally be shown
Pelvis should normally not be shown
Dog should be flat on the table, so that the image does not show 'rotation'
image quality should be as high as possible, and image resolution must be a minimum of 800 pixels in each dimension
These guidelines are easily met by most practitioners, and by most existing images
See sample 'good' and 'poor' images below (coming soon!)

VHS-Details

Overview

History

Training Set

Performance of our current (July 2022) algorithm

Validation

Comparison with Alternate
Approaches

Image Guidelines

VHS-Details

Overview

History

Training Set

Performance of our current (July 2022) algorithm

Validation

Comparison with Alternate Approaches

Image Guidelines

Comparison with Alternate
Approaches