Automated identification of hip replacement implants using artificial intelligence


Study design and X-ray acquisition

Following institutional review board approval, we retrospectively collected all radiographs taken between June 1, 2011 and December 1, 2020 at a teaching hospital. Images are collected by Neusoft PACS/RIS Version 5.5 on a personal computer running Windows 10. We confirm that all methods were performed in accordance with applicable guidelines and regulations. Images were collected from surgeries performed by 3 fellowship-trained joint replacement surgeons to ensure a variety of implant manufacturers and implant designs. At the time of collection, all identifying information from the images was removed and therefore anonymized. The type of implant was identified using the operating note from the primary surgery and cross-checked with the implant sheets. Implant designs were only included in our analysis if more than 30 images per design were identified14.

From the medical records of 313 patients, a total of 357 images were included in this analysis.

Although Zimmer and Biomet merged (Zimmer Biomet), they were treated as two separate manufacturers. The following 4 designs from four industry leading manufacturers have been included: Biomet Echo Bi-Metric (Zimmer Biomet), Biomet Universal RingLoc (Zimmer Biomet), Depuy Corail (Depuy Synthes), Depuy Pinnacle (Depuy Synthes), LINK Lubinus SP II, LINK Vario cup, and Zimmer Versys FMT and Trilogy (Zimmer Biomet). Implant designs that did not meet the 30 implant threshold were not included. Figure 1 shows an example of anterior-posterior (AP) x-rays of the cup and stem of each implant model included. The four types of implants are referred to as Type A, Type B, Type C, and Type D respectively in this article.

Figure 1

Demonstration of sample x-rays of the cup and stem of each implant model included.

Presentation of the framework

We used convolutional neural network (CNN) based algorithms for hip implant classification. Our training data consists of anteroposterior (AP) view images of the hips. For each image, we manually cut the image into two parts: the stem and the cut. We will train four CNN models, the first using stem images (stem array), the second using cup images (cup array), and the third using the original uncut images (combined array). The fourth is an integration of the trained models with the rod network and the cup network (joint network).

Since the models involve millions of parameters, while our dataset only contained less than a thousand images, it was impossible to train a CNN model from scratch using our data. Therefore, we adopted the transfer learning framework to train our networks17. The transfer learning framework is a paradigm in the machine learning literature that is widely applied in scenarios where training data is sparse relative to the scale of the model.18. In transfer learning, the model is first initialized to a pre-trained model with other datasets containing enough data for a different but related task. Next, we fit the model using our dataset by performing gradient descent (backpropagation) only on the last two layers of the networks. As the parameters of the last two layers of the network are comparable to the size of our dataset (for the target task) and the parameters of the previous layers were fitted from the pre-trained model, the resulting network model may have satisfactory performance on the target task.

In our case, our CNN models that we used are based on the established ResNet50 network pre-trained on the ImageNet dataset19. The target task and our training datasets correspond to the images of the AP views of the hips (stem, cup, and combined).

Figure 2 shows the overview of the framework of our method based on deep learning.

Figure 2
Figure 2

Overview of the framework of our method based on deep learning.


Our dataset contained 714 images of 4 different implant types.

Image preprocessing

We followed standard procedures to pre-process our training data so that it could work with a network trained on ImageNet. We resized each image to a size of 224*224 and normalized it to ImageNet standards. We also performed data augmentation, i.e. random rotation, horizontal flips, etc., to increase the amount of training data and make our algorithm robust to image orientation.

Dataset partition

We first divided all patients into three size groups ~60% (group 1), ~30% (group 2) and ~10% (group 3). This splitting technique was used on a per-design basis to ensure that the ratio of each implant remained constant. Then, we used the images of the cup and the stem of the patients of group 1 for the training, those of the patients of group 2 for the validation and those of the patients of group 3 for the tests. The validation set was used to calculate cross-validation loss for hyperparameter tuning and early termination determination.

Model training

We have adopted the ADAM adaptive gradient method20 form our models. Based on the loss of cross-validation, we chose the hyper-parameters for ADAM as (learning rate (mathrm{alpha }) = 0.001, ({upbeta }_{1}=0.9, {beta }_{2}=0.99, epsilon ={10}^{-8},) decay_weight = 0). The maximum number of epochs was 1000 and the batch size was 16. The early stop threshold was set to 8. During the process of training each network, the early stop threshold was reached after about 50 epochs. As we mentioned above, we formed four networks in total.

The first network is trained with the images of stems, the second with the images of cupules. The third network is trained with the original uncut images, which is one way we propose to combine the power of stem images and cup images. We further integrate the first and second network as an alternative way to use stem and cup images together. The integration was performed via the following method based on logistic regression. We collected the outputs of the rod network and the cup network (both as a 4-dimensional vector, each element corresponding to the classification weight that the network gives to the implant category) and then input powered. to a dual-layer neural network and trained the network with data from the validation set. The integration is similar to a weighted voting procedure between the outputs of the rod network and the cup network, with the weighted votes being calculated through the validation dataset. Note that the construction above relied on our dataset splitting procedure, where the training set, validation set, and test set each contained the stem and cup images from the same set of patients. We called the resulting network constructed from the outputs of the rod network and the cup network the “joint network”.

Model testing

We tested our models (stem, cup, joint) using the test set. The prediction result on each test image was a 4-dimensional vector, each coordinate representing the classification confidence of the corresponding implant category.

statistical analyzes

Since we are studying a multi-class classification problem, we directly present the confusion matrices of our methods on the test data, and compute generalized operation characteristics for multi-class classification.

Ethics Review Committee

The institutional review board approved the study with a waiver of informed consent as all images were anonymized prior to the time of the study.


Comments are closed.