ai_ct_scans.sectioning module

class ai_ct_scans.sectioning.CTEllipsoidFitter(min_area_ratio=0.75, min_eccentricity=0.5, min_ellipse_long_axis=0.0, max_ellipse_long_axis=inf, max_ellipse_contour_centre_dist=10, min_area=25, max_area=25000)

Bases: object

Class for fitting ellipses within 3D CT scans

draw_ellipses_2d(image)

Fit ellipses within an image and then draw them onto a blank image of the same dimensions

Parameters

image (np.ndarray) – 2D grayscale image. Internally converted to uint8 values, so pixel values should round
1-255 (to distinct integers between) –

Returns

2D image with ellipses drawn on

Return type

(np.ndarray)

draw_ellipsoid_walls(arr, sectioner=None, sectioner_kwargs=None, filterer=None, filter_kernel=None, return_sectioned=False)

Runs through a 3D array of scan data in each dimension, sectioning and then drawing ellipses around any 2D elliptical structures found. By doing this in each axis, ellipsoidal shells are built up, which can guide to the location of ellipsoid_volume lesions

Parameters

arr (np.ndarray) – 3D CT scan data
sectioner (TextonSectioner) – A sectioning object with a .label_im method, to pixel-wise label an image
sectioner_kwargs (dict) – Kwargs for the instantiation of a sectioner object
filterer (method) – A filtering method to apply after sectioning, typically scipy.signal.medfilt2d, which can
boundaries. (round the edges of sectioned tissue) –
filter_kernel (tuple of ints) – Shape of the kernel to use with filterer
return_sectioned (bool) – Whether to return the sectioned 3D scan as well as the ellipsoidal view

Returns

if return_sectioned=False, only return a 3D ndarray with 1s, 2s or 3s wherever ellipse edges were detected in each 2D slice. A value of 1 means an ellipse edge was only detected at that pixel in a single axis, 2 in 2 axes, 3 in 3 axes. If return_sectioned=True, also return a 3D ndarray of the sectioned CT scan

Return type

(tuple of ndarrays or ndarray)

find_ellipsoids(rich_ellipses)

Find ellipsoids within a list of 2D ellipses, where the centres of the 2D ellipses in 3D space must be close together and of the same underlying tissue class

Parameters

rich_ellipses (list of tuples) – A list of rich ellipses, similar to the output of self._ellipses_to_rich_information.

Returns

A list of all ellipsoids found, each one a dict with a ‘centre’: 3D ndarray, ‘max_position’: 1D ndarray length 3 (axial, coronal, sagittal max positions of a bounding box), ‘min_position’: 1D ndarray length 3, minimum bounding box positions, ‘volume’: ellipsoid volume in pixels, ‘axis_ellipses_count’: list of ints length 3, the number of ellipses that contributed to the detection on

an ellipsoid in each axis, ‘class’: int, the numerical tissue class assigned to the ellipse, expected to be offset by one compared to the output of a TextonSectioner.label_im output, as -1 is treated as background by the sectioner and 0 is treated as background in the ellipse fitter, so an offset will have been applied to aid ellipse detection

Return type

(list of dicts)

class ai_ct_scans.sectioning.DinoSectioner(max_thresh=4628.0, total_samples=5000, samples_per_image=500)

Bases: TextonSectioner

Class for using DINO-trained models to section images. Much of the code is refactored from the DINO repository, itself stored in ai_ct_scans.dino.

load(load_path)

Reload a TextonSectioner using pickle

Parameters: load_path (pathlib Path) – Path to a pickled TextonSectioner

load_dino_model(arch='vit_tiny', patch_size=16, pretrained_weights='', checkpoint_key='teacher')

save(out_path)

Save the TextonSectioner using pickle. Only the minimal set of clusterers, clusterer_titles and filters that are required to load and produce new predictions on images are saved.

Parameters: out_path (pathlib Path to a .pkl file) – Where to save the TextonSectioner

single_image_texton_descriptors(image, threshold=None)

Get the texton descriptors for a single image. This is largely a refactoring of code from the DINO repository, which can be seen in ai_ct_scans/dino/visualize_attention

Parameters

image (ndarray) –
threshold –

Returns:

class ai_ct_scans.sectioning.EllipseFitter(min_area_ratio=0.75, min_eccentricity=0.5, min_ellipse_long_axis=0.0, max_ellipse_long_axis=inf, max_ellipse_contour_centre_dist=10, min_area=25, max_area=10000)

Bases: object

fit_ellipses(image, background_val=0)

Find valid ellipses in a pre-sectioned 2D image. Pre-sectioned here meaning that pixels in an original image have been replaced by class labels, such that nearby pixels are likely to share properties and therefore have been set to the same class

Parameters

image (np.ndarray) – The 2D image within which to find ellipses
background_val (int) – A valid uint8 number to be considered as background and skipped, default 0

Returns

A list of found ellipses, each as returned by cv2.fitEllipse

Return type

(list of tuples of floats)

class ai_ct_scans.sectioning.HierarchicalMeanShift

Bases: object

A clustering algorithm that performs MeanShiftWithProbs, then performs a second MeanShiftWithProbs for each class discovered in the first MeanShiftWithProbs by separating the data points by found class and finding probabilities of each point in that class belonging to that class, and training a new MeanShiftWithProbs clusterer on those probabilities for each class. The second order MeanShiftWithProbs clusterers do not see training points outside the first order class to which they are fitting.

fit(samples)

Trains the base_clusterer and second_level_clusterers

Parameters: samples (np.ndarray) – N samples by M dimensions dataset

predict(samples)

Predicts the first order class of an array of samples

Parameters: samples (np.ndarray) – N samples by M dimensions
Returns: N predictions
Return type: (np.ndarray of ints)

predict_full(samples)

Predicts the class according to second order clusterers. First use the first order clusterer to section the samples down to those that should feed into each sub-clusterer, then label with the sub-clusterer

Parameters: samples (np.ndarray) – N samples by M dimensions
Returns: N predictions
Return type: (np.ndarray of ints)

predict_proba(samples, cluster_label=None)

Predict probability of samples’ membership to particular clusters, using first order clusterer only.: Following the naming convention of sklearn’s other clusterer’s, having predict_proba, to enable integration with other methods

Parameters

samples (np.ndarray) – N by M data points
cluster_label (int or None) – A cluster for which to predict the probability of each data point in sample’s
None (membership. If) –
class (return the probabilities for each) –

Returns

The probabilities of membership for each data point. If cluster_label was None, this will be shape (N, [number of clusters known by clusterer]), if cluster_label was an index of a known class, it will be shape (N,)

Return type

(np.ndarray)

predict_proba_secondary(samples, primary_label, sub_cluster_label=None)

Get the probability predictions from a second order clusterer on samples

Parameters

samples (predict the probabilities of membership of for) – N data points by M dimensions set of samples to predict probability on
primary_label (int) – The index of the secondary clusterer associated with the label predicted by the primary
clusterer –
sub_cluster_label (int) – The index of the sub-class from the secondary clusterer for which you want to
samples –

Returns

1D set of probabilities, same length as samples

Return type

(np.ndarray)

predict_secondary(samples, primary_label)

Predicts the second order class of an array of samples, within a class primary_label predicted by the first order clusterer. These predictions will start at class 0 and run to the number of clusters found when the relevant sub-clusterer was first trained, and hence will have a value offset when compared to predictions made by predict_full, which starts each new sub-clusterer’s labels at the running total of clusters found by previous sub-clusterers

Parameters

samples (np.ndarray) – N samples by M dimensions
primary_label (int) – The numerical class from the first order clusterer within which you wish to return
predictions (second order) –

Returns

N predictions

Return type

(np.ndarray of ints)

class ai_ct_scans.sectioning.MeanShiftWithProbs

Bases: MeanShift

A class for getting probability predictions of class membership using the sklearn MeanShift algorithm

fit(samples)

Trains the clusterer

Parameters: samples (np.ndarray) – N samples by M dimensions dataset

predict_proba(samples, cluster_label=None)

Predict probability of samples’ membership to particular clusters. Following the naming convention of sklearn’s other clusterer’s, having predict_proba, to enable integration with other methods

Parameters

samples (np.ndarray) – N by M data points
cluster_label (int or None) – A cluster for which to predict the probability of each data point in sample’s
None (membership. If) –
class (return the probabilities for each) –

Returns

The probabilities of membership for each data point. If cluster_label was None, this will be shape (N, [number of clusters known by clusterer]), if cluster_label was an index of a known class, it will be shape (N,)

Return type

(np.ndarray)

class ai_ct_scans.sectioning.TextonSectioner(filter_type='intensity', total_samples=100000, samples_per_image=50, kernels=None, blur_kernel=None, clusterers=None, clusterer_titles=None, medfilt_kernel=None)

Bases: object

Section images using textons - generate per-pixel descriptors using convolution-based filters or simple intensity values, use clustering algorithms to separate these descriptors into classes, and enable sectioning of new images using the trained clusterers. TextonSectioner runs through axial views of patients from the dataset, by default randomly, to generate descriptors.

build_sample_texton_set(threshold=None, random=True)

Cycle through images from the dataset and build up a texton sample set in self.texton_sample_set

Parameters

threshold (int) – Value below which not to accept texton descriptors into the training set for. Typically
500 –
air (to rule out) –
otherwise (which dominates the training set) –
random (bool) – Whether to select images randomly from the MultiPatientAxialStreamer or simply step through
etc (the first patient followed by the second patient) –

label_im(im, threshold=None, clusterer_ind=0, sub_structure_class_label=None, full_sub_structure=False)

Sections a new image with class labels assigned by a trained clusterer

Parameters

im (np.ndarray) – 2D image to be labelled pixel-wise
threshold (int) – A value below which to assign all image pixel classes to -1, useful for sectioning out
set (air with a threshold of ~500 when clusterers have not been trained with air in the texton sample) –
clusterer_ind (int) – The clusterer index in self.clusterers you wish to label the image with. Defaults to 0
sub_structure_class_label (int or None) – The class predicted by the clusterer at clusterer_ind in
a (self.clusterers you wish to predict sub-class labels for - this must be used with) –
labelled (self.clusterer[clusterer_ind] that has hierarchical style) –
method (i.e. has a predict_secondary) –
as –
None (in HierarchicalMeanShift. If) –
clusterer (only use a first order) –
full_sub_structure (bool, optional) – If a hiererachical clusterer has been selected with clusterer_ind,
predictions (whether to return the full sub-class predictions rather than the first order) –

Returns

The class predictions, same shape as im

Return type

(np.ndarray of ints)

load(load_path)

Reload a TextonSectioner using pickle

Parameters: load_path (pathlib Path) – Path to a pickled TextonSectioner

probabilities_im(im, threshold=None, clusterer_ind=0, cluster_label=None, return_sub_structure=False, sub_structure_class_label=None)

Get the probabilities that each pixel in an image belong to a particular class predicted by a clusterer, as well as getting the class predictions image itself

Parameters

im (np.ndarray) – 2D image to be labelled pixel-wise
threshold (int) – A value below which to assign all image pixel classes to -1, useful for sectioning out
set (air with a threshold of ~500 when clusterers have not been trained with air in the texton sample) –
clusterer_ind (int) – The clusterer index in self.clusterers you wish to label the image with. Defaults to 0
cluster_label (int) – The class within which to predict probabilities for
return_sub_structure (bool) – Whether to use a secondary clusterer to predict the probabilities, e.g. from
HierarchicalMeanShift –
sub_structure_class_label (int) – If return_sub_structure is True, the sub-class label to predict
for (probabilities) –

Returns

First element: The class predictions for each pixel, second element: the probabilities image

Return type

(tuple of np.ndarrays)

save(out_path)

Save the TextonSectioner using pickle. Only the minimal set of clusterers, clusterer_titles and filters that are required to load and produce new predictions on images are saved.

Parameters: out_path (pathlib Path to a .pkl file) – Where to save the TextonSectioner

single_image_texton_descriptors(im)

Get the texton descriptors for each pixel in an image

Parameters: im (np.ndarray) – 2D image
Returns: The texton descriptors for the image, shaped into (number of descriptors, *im.shape)
Return type: (3D ndarray)

train_clusterers(clusterer_inds=None)

Train each clusterer in self.clusterers against the texton dataset, or a subset of clusterers. If any clusterer fails to train, remove it from self.clusterers

Parameters: clusterer_inds (list of ints, optional) – The indices of clusterers to train, defaults to train all of them