ai_ct_scans.sectioning module
- class ai_ct_scans.sectioning.CTEllipsoidFitter(min_area_ratio=0.75, min_eccentricity=0.5, min_ellipse_long_axis=0.0, max_ellipse_long_axis=inf, max_ellipse_contour_centre_dist=10, min_area=25, max_area=25000)
Bases:
object
Class for fitting ellipses within 3D CT scans
- draw_ellipses_2d(image)
Fit ellipses within an image and then draw them onto a blank image of the same dimensions
- Parameters
image (np.ndarray) – 2D grayscale image. Internally converted to uint8 values, so pixel values should round
1-255 (to distinct integers between) –
- Returns
2D image with ellipses drawn on
- Return type
(np.ndarray)
- draw_ellipsoid_walls(arr, sectioner=None, sectioner_kwargs=None, filterer=None, filter_kernel=None, return_sectioned=False)
Runs through a 3D array of scan data in each dimension, sectioning and then drawing ellipses around any 2D elliptical structures found. By doing this in each axis, ellipsoidal shells are built up, which can guide to the location of ellipsoid_volume lesions
- Parameters
arr (np.ndarray) – 3D CT scan data
sectioner (TextonSectioner) – A sectioning object with a .label_im method, to pixel-wise label an image
sectioner_kwargs (dict) – Kwargs for the instantiation of a sectioner object
filterer (method) – A filtering method to apply after sectioning, typically scipy.signal.medfilt2d, which can
boundaries. (round the edges of sectioned tissue) –
filter_kernel (tuple of ints) – Shape of the kernel to use with filterer
return_sectioned (bool) – Whether to return the sectioned 3D scan as well as the ellipsoidal view
- Returns
if return_sectioned=False, only return a 3D ndarray with 1s, 2s or 3s wherever ellipse edges were detected in each 2D slice. A value of 1 means an ellipse edge was only detected at that pixel in a single axis, 2 in 2 axes, 3 in 3 axes. If return_sectioned=True, also return a 3D ndarray of the sectioned CT scan
- Return type
(tuple of ndarrays or ndarray)
- find_ellipsoids(rich_ellipses)
Find ellipsoids within a list of 2D ellipses, where the centres of the 2D ellipses in 3D space must be close together and of the same underlying tissue class
- Parameters
rich_ellipses (list of tuples) – A list of rich ellipses, similar to the output of self._ellipses_to_rich_information.
- Returns
A list of all ellipsoids found, each one a dict with a ‘centre’: 3D ndarray, ‘max_position’: 1D ndarray length 3 (axial, coronal, sagittal max positions of a bounding box), ‘min_position’: 1D ndarray length 3, minimum bounding box positions, ‘volume’: ellipsoid volume in pixels, ‘axis_ellipses_count’: list of ints length 3, the number of ellipses that contributed to the detection on
an ellipsoid in each axis, ‘class’: int, the numerical tissue class assigned to the ellipse, expected to be offset by one compared to the output of a TextonSectioner.label_im output, as -1 is treated as background by the sectioner and 0 is treated as background in the ellipse fitter, so an offset will have been applied to aid ellipse detection
- Return type
(list of dicts)
- class ai_ct_scans.sectioning.DinoSectioner(max_thresh=4628.0, total_samples=5000, samples_per_image=500)
Bases:
TextonSectioner
Class for using DINO-trained models to section images. Much of the code is refactored from the DINO repository, itself stored in ai_ct_scans.dino.
- load(load_path)
Reload a TextonSectioner using pickle
- Parameters
load_path (pathlib Path) – Path to a pickled TextonSectioner
- load_dino_model(arch='vit_tiny', patch_size=16, pretrained_weights='', checkpoint_key='teacher')
- save(out_path)
Save the TextonSectioner using pickle. Only the minimal set of clusterers, clusterer_titles and filters that are required to load and produce new predictions on images are saved.
- Parameters
out_path (pathlib Path to a .pkl file) – Where to save the TextonSectioner
- single_image_texton_descriptors(image, threshold=None)
Get the texton descriptors for a single image. This is largely a refactoring of code from the DINO repository, which can be seen in ai_ct_scans/dino/visualize_attention
- Parameters
image (ndarray) –
threshold –
Returns:
- class ai_ct_scans.sectioning.EllipseFitter(min_area_ratio=0.75, min_eccentricity=0.5, min_ellipse_long_axis=0.0, max_ellipse_long_axis=inf, max_ellipse_contour_centre_dist=10, min_area=25, max_area=10000)
Bases:
object
- fit_ellipses(image, background_val=0)
Find valid ellipses in a pre-sectioned 2D image. Pre-sectioned here meaning that pixels in an original image have been replaced by class labels, such that nearby pixels are likely to share properties and therefore have been set to the same class
- Parameters
image (np.ndarray) – The 2D image within which to find ellipses
background_val (int) – A valid uint8 number to be considered as background and skipped, default 0
- Returns
A list of found ellipses, each as returned by cv2.fitEllipse
- Return type
(list of tuples of floats)
- class ai_ct_scans.sectioning.HierarchicalMeanShift
Bases:
object
A clustering algorithm that performs MeanShiftWithProbs, then performs a second MeanShiftWithProbs for each class discovered in the first MeanShiftWithProbs by separating the data points by found class and finding probabilities of each point in that class belonging to that class, and training a new MeanShiftWithProbs clusterer on those probabilities for each class. The second order MeanShiftWithProbs clusterers do not see training points outside the first order class to which they are fitting.
- fit(samples)
Trains the base_clusterer and second_level_clusterers
- Parameters
samples (np.ndarray) – N samples by M dimensions dataset
- predict(samples)
Predicts the first order class of an array of samples
- Parameters
samples (np.ndarray) – N samples by M dimensions
- Returns
N predictions
- Return type
(np.ndarray of ints)
- predict_full(samples)
Predicts the class according to second order clusterers. First use the first order clusterer to section the samples down to those that should feed into each sub-clusterer, then label with the sub-clusterer
- Parameters
samples (np.ndarray) – N samples by M dimensions
- Returns
N predictions
- Return type
(np.ndarray of ints)
- predict_proba(samples, cluster_label=None)
- Predict probability of samples’ membership to particular clusters, using first order clusterer only.
Following the naming convention of sklearn’s other clusterer’s, having predict_proba, to enable integration with other methods
- Parameters
samples (np.ndarray) – N by M data points
cluster_label (int or None) – A cluster for which to predict the probability of each data point in sample’s
None (membership. If) –
class (return the probabilities for each) –
- Returns
The probabilities of membership for each data point. If cluster_label was None, this will be shape (N, [number of clusters known by clusterer]), if cluster_label was an index of a known class, it will be shape (N,)
- Return type
(np.ndarray)
- predict_proba_secondary(samples, primary_label, sub_cluster_label=None)
Get the probability predictions from a second order clusterer on samples
- Parameters
samples (predict the probabilities of membership of for) – N data points by M dimensions set of samples to predict probability on
primary_label (int) – The index of the secondary clusterer associated with the label predicted by the primary
clusterer –
sub_cluster_label (int) – The index of the sub-class from the secondary clusterer for which you want to
samples –
- Returns
1D set of probabilities, same length as samples
- Return type
(np.ndarray)
- predict_secondary(samples, primary_label)
Predicts the second order class of an array of samples, within a class primary_label predicted by the first order clusterer. These predictions will start at class 0 and run to the number of clusters found when the relevant sub-clusterer was first trained, and hence will have a value offset when compared to predictions made by predict_full, which starts each new sub-clusterer’s labels at the running total of clusters found by previous sub-clusterers
- Parameters
samples (np.ndarray) – N samples by M dimensions
primary_label (int) – The numerical class from the first order clusterer within which you wish to return
predictions (second order) –
- Returns
N predictions
- Return type
(np.ndarray of ints)
- class ai_ct_scans.sectioning.MeanShiftWithProbs
Bases:
MeanShift
A class for getting probability predictions of class membership using the sklearn MeanShift algorithm
- fit(samples)
Trains the clusterer
- Parameters
samples (np.ndarray) – N samples by M dimensions dataset
- predict_proba(samples, cluster_label=None)
Predict probability of samples’ membership to particular clusters. Following the naming convention of sklearn’s other clusterer’s, having predict_proba, to enable integration with other methods
- Parameters
samples (np.ndarray) – N by M data points
cluster_label (int or None) – A cluster for which to predict the probability of each data point in sample’s
None (membership. If) –
class (return the probabilities for each) –
- Returns
The probabilities of membership for each data point. If cluster_label was None, this will be shape (N, [number of clusters known by clusterer]), if cluster_label was an index of a known class, it will be shape (N,)
- Return type
(np.ndarray)
- class ai_ct_scans.sectioning.TextonSectioner(filter_type='intensity', total_samples=100000, samples_per_image=50, kernels=None, blur_kernel=None, clusterers=None, clusterer_titles=None, medfilt_kernel=None)
Bases:
object
Section images using textons - generate per-pixel descriptors using convolution-based filters or simple intensity values, use clustering algorithms to separate these descriptors into classes, and enable sectioning of new images using the trained clusterers. TextonSectioner runs through axial views of patients from the dataset, by default randomly, to generate descriptors.
- build_sample_texton_set(threshold=None, random=True)
Cycle through images from the dataset and build up a texton sample set in self.texton_sample_set
- Parameters
threshold (int) – Value below which not to accept texton descriptors into the training set for. Typically
500 –
air (to rule out) –
otherwise (which dominates the training set) –
random (bool) – Whether to select images randomly from the MultiPatientAxialStreamer or simply step through
etc (the first patient followed by the second patient) –
- label_im(im, threshold=None, clusterer_ind=0, sub_structure_class_label=None, full_sub_structure=False)
Sections a new image with class labels assigned by a trained clusterer
- Parameters
im (np.ndarray) – 2D image to be labelled pixel-wise
threshold (int) – A value below which to assign all image pixel classes to -1, useful for sectioning out
set (air with a threshold of ~500 when clusterers have not been trained with air in the texton sample) –
clusterer_ind (int) – The clusterer index in self.clusterers you wish to label the image with. Defaults to 0
sub_structure_class_label (int or None) – The class predicted by the clusterer at clusterer_ind in
a (self.clusterers you wish to predict sub-class labels for - this must be used with) –
labelled (self.clusterer[clusterer_ind] that has hierarchical style) –
method (i.e. has a predict_secondary) –
as –
None (in HierarchicalMeanShift. If) –
clusterer (only use a first order) –
full_sub_structure (bool, optional) – If a hiererachical clusterer has been selected with clusterer_ind,
predictions (whether to return the full sub-class predictions rather than the first order) –
- Returns
The class predictions, same shape as im
- Return type
(np.ndarray of ints)
- load(load_path)
Reload a TextonSectioner using pickle
- Parameters
load_path (pathlib Path) – Path to a pickled TextonSectioner
- probabilities_im(im, threshold=None, clusterer_ind=0, cluster_label=None, return_sub_structure=False, sub_structure_class_label=None)
Get the probabilities that each pixel in an image belong to a particular class predicted by a clusterer, as well as getting the class predictions image itself
- Parameters
im (np.ndarray) – 2D image to be labelled pixel-wise
threshold (int) – A value below which to assign all image pixel classes to -1, useful for sectioning out
set (air with a threshold of ~500 when clusterers have not been trained with air in the texton sample) –
clusterer_ind (int) – The clusterer index in self.clusterers you wish to label the image with. Defaults to 0
cluster_label (int) – The class within which to predict probabilities for
return_sub_structure (bool) – Whether to use a secondary clusterer to predict the probabilities, e.g. from
HierarchicalMeanShift –
sub_structure_class_label (int) – If return_sub_structure is True, the sub-class label to predict
for (probabilities) –
- Returns
First element: The class predictions for each pixel, second element: the probabilities image
- Return type
(tuple of np.ndarrays)
- save(out_path)
Save the TextonSectioner using pickle. Only the minimal set of clusterers, clusterer_titles and filters that are required to load and produce new predictions on images are saved.
- Parameters
out_path (pathlib Path to a .pkl file) – Where to save the TextonSectioner
- single_image_texton_descriptors(im)
Get the texton descriptors for each pixel in an image
- Parameters
im (np.ndarray) – 2D image
- Returns
The texton descriptors for the image, shaped into (number of descriptors, *im.shape)
- Return type
(3D ndarray)
- train_clusterers(clusterer_inds=None)
Train each clusterer in self.clusterers against the texton dataset, or a subset of clusterers. If any clusterer fails to train, remove it from self.clusterers
- Parameters
clusterer_inds (list of ints, optional) – The indices of clusterers to train, defaults to train all of them