🧠 How I Accidentally Became a Medical AI Wizard (CheXNet, Kaggle GPUs & Pure Fear)
🎬 Prologue: An Intern, A Dream, and 112,120 Chest X-Rays
2019. Third year of my B.Tech.
Living the dream, mostly meaning eating Maggi noodles at 2 AM while debugging Java.
Then came an email:
“Congratulations! You’ve been selected to intern at Endimension Technology (IIT Bombay incubated startup).”
I was thrilled. Then I read the next line:
“You will implement the paper: CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays.”
That’s when I realized:
They were serious.
Welcome to Deep Learning Land — where the GPUs are free (sometimes) and the X-rays are many.
🛠️ Step 1: Imports — Because No Code Starts Without Worshipping the Gods of Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_auc_score
from PIL import Image
from tqdm.auto import tqdm
from typing import Dict
from pathlib import Path
import logging
import time
from prettytable import PrettyTable
from copy import deepcopy
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision.models import densenet121
from torch.optim import Adam, lr_scheduler
import torchvision.transforms as tfms
import torchvision.transforms.functional as T
import albumentations as A
from albumentations.pytorch import ToTensorV2
- If Python libraries were Pokémon, this would be my deck.
albumentations
is cooler thantorch.transforms
! Fight me 😎- Also,
tqdm
— because staring at a blank screen while training is bad for mental health.
Also, create a Config
class. why? Because OCD!
class CFG:
CLASS_NAMES = [
"Atelectasis",
"Cardiomegaly",
"Effusion",
"Infiltration",
"Mass",
"Nodule",
"Pneumonia",
"Pneumothorax",
"Consolidation",
"Edema",
"Emphysema",
"Fibrosis",
"Pleural_Thickening",
"Hernia",
]
BASE_PATH = Path("/kaggle/input/nih-chest-x-ray-14-224x224-resized")
BEST_MODEL_PATH = "models/best_model.pt"
EPOCHS = 20
DEVICE = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
INTERVAL = 10
📂 Step 2: Load the Data and Process — 112,000 X-Rays and Me, Alone
df = pd.read_csv( CFG.BASE_PATH / "Data_Entry_2017.csv")
df = df[["Image Index", "Finding Labels"]]
# Make paths absolute
df["Image Index"] = [
CFG.BASE_PATH / f"images-224/images-224/{path}" for path in df["Image Index"].values
]
# Remove "No Finding"
df = df[df["Finding Labels"].isin(["No Finding"]) == False]
labels = np.zeros(shape=(len(df), 14))
for idx, lab in tqdm(enumerate(df["Finding Labels"].values), total=len(df)):
lbls = lab.split("|")
lbl_arr = np.zeros(len(CFG.CLASS_NAMES))
for l in lbls:
lbl_arr[CFG.CLASS_NAMES.index(l)] = 1
labels[idx] = lbl_arr
d = {k:v for k, v in zip(CFG.CLASS_NAMES, labels.transpose())}
df.to_csv("final.csv", index=False)
Before our model could even think about becoming a medical genius, we had to wrestle a CSV file the size of a small city.
We started by cracking open the sacred scroll (Data_Entry_2017.csv
), hoping to find neatly labeled chest X-rays.
Instead, we found:
- Duplicated file paths.
- Diseases crammed together with a random pipe
|
symbol. - “No Finding” labels — basically, people who walked into a hospital just to chill.
Step 2.1: Slicing Away the Noise
We chopped the dataset down to only two columns:
- The X-ray’s filename
- The doctor’s list of bad news
Because life is too short to scroll through 45 columns of metadata nobody cares about.
Step 2.2: Becoming Path Magicians
We then upgraded all image paths from sketchy relative references to full-blown absolute paths.
No more “file not found” errors at 2 AM while crying into your coffee.
We respect ourselves too much for that.
Step 2.3: Kicking Out the Party Crashers
Next, we yeeted all the images labeled “No Finding.”
If we wanted to teach a model to find nothing, we would’ve just shown it our GPA during exams.
This project is about finding something scary, or it’s not happening at all.
Step 2.4: Translating Human Disasters into Machine Language
Now came the real heavy lifting.
We took disease labels like "Pneumonia|Effusion|Mass"
and smashed them into hot, spicy, one-hot encoded vectors:
1
if the disease exists0
if not
This process turned confusing English into zeroes and ones — the native language of sad GPUs everywhere.
Because deep learning models don’t read English,
they only read existential dread in tensor format.
Step 2.5: Zipping Up Like a Pro
Feeling powerful, we zipped disease names and vectors into a tidy little Python dictionary.
At this point, if you’re not carrying a dictionary everywhere you go,
are you even doing data science?
Step 2.6: Saving Our Masterpiece
Finally — like a proud artist signing their work —
we saved the shiny, cleaned dataframe into a file named final.csv
.
Because if you don’t .to_csv()
,
was your suffering even real?
And just like that, from messy hospital logs to neat tensor-ready files, we took our first baby step toward making CheXNet a thing of beauty.
We could use a handy plotting functionto see the X-rays:
def plot_images(df, rows, columns, figsize=(20, 20)):
"""
Function to plot images
"""
fig, axs = plt.subplots(rows, columns, figsize=figsize)
idx = 0
for i in range(rows):
for j in range(columns):
image = np.asarray(Image.open(df["Image Index"].values[idx]).convert("RGB"))
labels = df["Finding Labels"].values[idx].split("|")
axs[i, j].imshow(image)
axs[i, j].yaxis.set_visible(False)
axs[i, j].set_xticklabels([])
axs[i, j].set_xlabel(labels)
idx += 1
plt.show()
df = pd.read_csv("./final.csv")
plot_images(df, 5, 5)
Step 3: Data Transformation — Where Data Gets a Glow-Up 💪✨
class CheXNetData(Dataset):
def __init__(self, df, transform=None):
self.df = df
self.transform = transform
def __getitem__(self, idx):
image = Image.open(self.df["Image Index"].values[idx]).convert("RGB")
label = self.df.iloc[:, 2:].values[idx]
if self.transform:
image = self.transform(image=np.asarray(image))["image"]
label = torch.tensor(label, dtype=torch.float)
return image, label
def __len__(self):
return len(df)
transforms = A.Compose(transforms=[A.Normalize(), ToTensorV2()])
ds = CheXNetData(df=df, transform=transforms)
# Split into Train & Test
total = len(ds)
train_len = int(0.8 * total)
val_len = total - train_len
train_ds, test_ds = random_split(dataset=ds, lengths=[train_len, val_len])
# Split into Train and Validation
total = len(train_ds)
train_len = int(0.8 * total)
val_len = total - train_len
train_ds, val_ds = random_split(dataset=train_ds, lengths=[train_len, val_len])
3.1: Wrapping Data in a Dataset — The First Step Towards Data Enlightenment
Meet CheXNetData — the elegant wrapper for all your chest X-ray dreams.
It takes in the DataFrame (df
) and your magic transformations (transform
), and turns them into a ready-to-go PyTorch Dataset.
Transform is like the hot sauce you add to your data:
It’s optional, but if you don’t use it, you’ll probably regret it.
3.2: Serving Data Like a Five-Star Restaurant — Image & Label Retrieval
__getitem__
— it's like a room service menu for your neural network.
You give it an index, and it serves you hot, fresh X-ray images with the correct diagnosis (labels) on the side.
Image is loaded in RGB because no one has time for grayscale anymore.
It's like giving your model a 1080p image instead of a VHS tape.
3.3: Transformations — When Data Decides to Hit the Gym
Transformation time: This is where the image hits the gym. 🏋️♂️
If you’ve got transformations, the image resizes, normalizes, and becomes a tensor — basically, it goes from couch potato to elite athlete.
3.4: Normalize — The Secret to Staying Balanced
Normalize: Because even data needs to stay balanced! This makes sure that every pixel in your image is equal and there’s no heavyweight fighting in the data pool. Think of it like making sure every player on the basketball team is the same height. It just works better.
3.5: ToTensorV2 — The Transformation to Deep Learning Nirvana
Finally, we convert to Tensor because your model won’t understand anything that’s not a tensor.
Your images are now officially transformed from raw data into glorious PyTorch tensors, ready to train on the holy land of GPUs. 🌟
3.6: Torchify Labels — The Golden Ticket to Deep Learning
Here we torchify the label.
Why? Because PyTorch won’t speak to anything that isn’t a tensor.
We take labels like [0, 1, 0, 0, ...]
and wrap them in the torch tensor format.
Now they’re ready for the deep learning buffet, no questions asked.
3.7: Return Image and Label — Like a Perfect Date
And just like that, you return the image and label — it’s the perfect date:
No awkward silences, no bad food, just pure harmony.
The neural network receives what it craves: one healthy image, one perfect label, and zero confusion.
Step 4: Setting Up DataLoaders and Model — The Real Deal with Training 💻🔥
ds_size = {
"train": len(train_ds), "val": len(val_ds), "test": len(test_ds)
} # Size dictionary
loaders = {
"train": DataLoader(train_ds, batch_size=64, shuffle=True),
"val": DataLoader(val_ds, batch_size=32),
"test": DataLoader(test_ds, batch_size=32)
}
class DenseNet121(nn.Module):
def __init__(self, n_classes):
super(DenseNet121, self).__init__()
self.densenet121 = densenet121(pretrained=True)
n_features = self.densenet121.classifier.in_features
self.densenet121.classifier = nn.Sequential(
nn.Linear(n_features, n_classes), nn.Sigmoid()
)
def forward(self, x):
x = self.densenet121(x)
return x
# check is models folder exists
Path("models").mkdir(exist_ok=True)
# Set up logger
logging.basicConfig(
filename="train.log",
format="%(asctime)s - %(levelname)s - %(message)s",
level=logging.INFO,
filemode="w",
)
4.1: Creating the Size Dictionary — Because We Love Data Stats 📊
- ds_size — The ultimate cheat sheet for your data.
- Here, we create a dictionary that holds the sizes of each dataset: training, validation, and testing.
- Because knowing how much data you have is basically the first step to world domination (or model training).
- It’s like a map of your data empire. Don’t start training without knowing what you’re dealing with!
4.2: DataLoaders — The Delivery Guys for Your Data 🍕
- DataLoaders: These are the delivery drivers for your data, making sure it’s delivered in neat, manageable batches.
- Train gets 64 items per batch because training is hungry and needs all the data it can consume.
- Validation and test get 32 per batch because, let’s face it, they’re not as demanding.
- They also shuffle the data for training because your neural network hates predictability. It loves a little mystery!
4.3: DenseNet121 — The Model That’s Ready to Flex 💪
- DenseNet121 is like the Arnold Schwarzenegger of deep learning models: big, strong, and ready to handle complex data.
- It’s a pretrained model, so it already has muscles from training on a massive dataset.
- We tweak it to make it specialized for our task by adjusting the final layer for n_classes (the number of possible labels).
- The Sigmoid is the cherry on top, making sure it spits out probabilities for each class. 🍒
4.4: The Forward Pass — Where the Model Struts Its Stuff 🏆
- The forward pass is where all the magic happens.
- You give the model an image, and it goes through DenseNet121, like a model strutting down a catwalk, processing the image and spitting out predictions.
- No more talking, just action. The model does what it’s trained to do: predict the class labels with some swagger.
4.5: Model Folder Creation — Keeping It Organized 📁
- Here, we create a folder for models because even AI needs a home!
- If the folder already exists, we just give it a thumbs-up and move on.
- Organization is key — because who wants to dig through a mess of models and weights later? Not you. You’ve got this.
- Plus, this way you won’t be caught storing models in the wrong drawer. 🎯
4.6: Set Up Logger — Because You Want to Keep Track of Everything 📝
- Logging: This is where the journalist in you comes out.
- You set up a log file where every important event during training gets recorded, from successes to failures (and let’s be honest, there will be a lot of those).
- You’ll know exactly what went down during training because nothing escapes the log.
- It’s like setting up a diary for your model’s emotional journey. 🖋️
4.7: Model Ready for Action — Time to Train! 🚀
At this point, everything is set:
- Data is organized and shuffled.
- DenseNet121 is ready to rock.
- The log file is ready to catch all the drama.
- Your model folder is all set for weight storage.
Step 5: Training the Model — Let the Deep Learning Adventures Begin 🎢💥
def calc_mean_auc(labels: torch.tensor, preds: torch.tensor):
labels = labels.cpu().detach().numpy()
preds = preds.cpu().detach().numpy()
per_class_AUROC = []
for i, name in enumerate(CFG.CLASS_NAMES):
try:
per_class_AUROC.append(roc_auc_score(labels[:, i], preds[:, i]))
except ValueError:
pass
mean_roc_auc = np.array(per_class_AUROC).mean()
return mean_roc_auc
best_AUROC = 0.0 # Global AUROC
def run_one_epoch(
epoch: int,
ds_sizes: Dict[str, int],
dataloaders: Dict[str, DataLoader],
model: nn.Module,
optimizer: torch.optim.Optimizer,
loss: nn.Module,
scheduler: torch.optim.lr_scheduler
):
"""
Run one complete train-val loop
Parameter
---------
ds_sizes: Dictionary containing dataset sizes
dataloaders: Dictionary containing dataloaders
model: The model
optimizer: The optimizer
loss: The loss
Returns
-------
metrics: Dictionary containing metrics
"""
global best_AUROC
metrics = {}
AUROCs = []
for phase in ["train", "val"]:
logging.info(f"{phase.upper()} phase")
if phase == "train":
model.train()
else:
model.eval()
avg_loss = 0
running_corrects = 0
for batch_idx, (images, labels) in enumerate(
tqdm(dataloaders[phase], total=len(dataloaders[phase]))
):
images = images.to(CFG.DEVICE)
labels = labels.to(CFG.DEVICE)
# Zero the gradients
optimizer.zero_grad()
# Track history if in phase == "train"
with torch.set_grad_enabled(phase == "train"):
outputs = model(images)
loss = criterion(outputs, labels)
if phase == "train":
loss.backward()
optimizer.step()
# Calculate AUROC
auroc = calc_mean_auc(labels, outputs)
AUROCs.append(auroc)
avg_loss += loss.item() * images.size(0)
if batch_idx % CFG.INTERVAL == 0:
logging.info(
f"Epoch {epoch} - {phase.upper()} - Batch {batch_idx} - Loss = {round(loss.item(), 3)} | AUROC = {round(auroc, 3)}"
)
epoch_loss = avg_loss / ds_sizes[phase]
epoch_val_mean = np.array(AUROCs).mean()
# step the scheduler
if phase == "train":
scheduler.step(epoch_loss)
# save best model wts
if phase == "val" and epoch_val_mean > best_AUROC:
best_AUROC = epoch_val_mean
best_model_wts = deepcopy(model.state_dict())
timestampTime = time.strftime("%H%M%S")
timestampDate = time.strftime("%d%m%Y")
timestampEND = timestampDate + '-' + timestampTime
best_model_path = f"models/CheXNet-{timestampEND}.pt"
torch.save({
"epoch" : epoch, "val_loss": epoch_loss, "val_AUROC": epoch_val_mean, "model": best_model_wts
}, best_model_path)
# Metrics tracking
if phase == "train":
metrics["train_loss"] = round(epoch_loss, 3)
else:
metrics["val_loss"] = round(epoch_loss, 3)
metrics["val_mean_AUROC"] = round(epoch_val_mean, 3)
return metrics
def train(dataloaders, ds_sizes, model, optimizer, criterion, scheduler):
table = PrettyTable(
field_names=["Epoch", "Train Loss", "Val Loss", "Val Mean AUROC"]
)
for epoch in range(CFG.EPOCHS):
start = time.time()
metrics = run_one_epoch(
epoch=epoch,
ds_sizes=ds_sizes,
dataloaders=dataloaders,
model=model,
optimizer=optimizer,
loss=criterion,
scheduler=scheduler
)
end = time.time() - start
print(f"Epoch completed in: {round(end/60, 3)} mins")
table.add_row(
row=[
epoch + 1,
metrics["train_loss"],
metrics["val_loss"],
metrics["val_mean_AUROC"]
]
)
print(table)
# Write results to file
with open("results.txt", "w") as f:
results = table.get_string()
f.write(results)
model = DenseNet121(n_classes=14).to(CFG.DEVICE)
criterion = nn.BCELoss()
optimizer = Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), weight_decay=1e-4)
scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, mode='min', patience=1)
train(
dataloaders=loaders,
ds_sizes=ds_size,
model=model,
optimizer=optimizer,
criterion=criterion,
scheduler=scheduler
)
5.1: Calculating AUROC — Because Metrics Are Everything 📈
- First, we introduce the magic of AUROC (Area Under the ROC Curve).
- This function takes in the true labels and the predictions, then calculates the mean AUROC across all classes.
- It goes through each class, checks how well our model is performing for that class, and then averages it out. The better the AUROC, the better your model.
- If it hits any errors (because some classes might have zero examples), we just ignore it like that one friend who can’t handle their drinks.
5.2: Run One Epoch — The Training Loop Begins! 🚀
- This is where the real grind happens: running through the train and validation phases.
- Training mode: The model is learning, and we keep updating the weights with every batch.
- Validation mode: The model takes a break from learning and just evaluates its progress. It’s like a student going from studying to giving the final exam.
- Here, optimizer.zero_grad() is like cleaning your board before starting fresh. No past mistakes allowed!
- Loss calculation and backpropagation are like saying: “Alright, model, here’s how bad you did, now learn from it!”
- We also calculate AUROC for every batch. Because performance matters.
5.3: Epoch Metrics — Let’s Keep Track of Our Achievements 📊
- After each epoch (training round), we log the loss and AUROC for both training and validation phases.
- The tqdm progress bar is like that friend who says, “Almost there, just 5 more minutes!”
- If you’re in the training phase, expect the model to be in its zone and pushing weights like a personal trainer. If it’s in validation, it’s critiquing its own form while watching others.
- We track the mean AUROC to make sure the model isn’t just “coasting” and actually getting better.
5.4: Saving the Best Model — Because We Love to Save the Best for Last 🏆
- After every validation phase, we check if the model has improved. If so, we save it because best model = best results.
- You’ll also notice we name the file with a timestamp, because, let’s face it, saving a model is a momentous event.
- No time to waste, the model is saved right away with all the important details: epoch, loss, and AUROC. This way, even if you have to quit unexpectedly, your model is immortalized in time. ⏳
5.5: Metrics Tracking — What’s the Score? 🏅
- As the training goes on, we keep track of the train loss and validation loss.
- We also track the mean AUROC in the validation phase. The model is getting feedback on how well it’s performing.
- If you’re into keeping score, this is your scoreboard.
5.6: The Training Table — Results at a Glance 📅
- We use PrettyTable to display a clean table with all your results.
- Epoch, Train Loss, Val Loss, and Val Mean AUROC get neatly presented for you to admire the model’s progress.
- The PrettyTable is like your personal assistant who organizes all your work in a pretty, colorful table for easy digestion. You can even print it out later for posterity. 😎
5.7: The Grand Finale — Time to Train! 🏁
- The final part is where training kicks off, with the model, optimizer, and scheduler all working in tandem.
- Optimizer adjusts the learning rate, and the scheduler keeps it on track. Together, they are like coaches and referees, making sure the model trains hard, without getting tired.
- And then… we just let it run! Sit back, relax, and watch as your model trains, validates, and iterates. It’s a long journey, but with every epoch, you’re one step closer to that best model.
This is what the final table looks like after all the epochs are done:
+-------+------------+----------+----------------+
| Epoch | Train Loss | Val Loss | Val Mean AUROC |
+-------+------------+----------+----------------+
| 1 | 0.284 | 0.302 | 0.691 |
| 2 | 0.273 | 0.317 | 0.709 |
| 3 | 0.269 | 0.273 | 0.735 |
| 4 | 0.265 | 0.27 | 0.748 |
| 5 | 0.264 | 0.284 | 0.752 |
| 6 | 0.261 | 0.267 | 0.759 |
| 7 | 0.259 | 0.262 | 0.765 |
| 8 | 0.258 | 0.266 | 0.767 |
| 9 | 0.257 | 0.276 | 0.764 |
| 10 | 0.256 | 0.264 | 0.771 |
| 11 | 0.255 | 0.261 | 0.776 |
| 12 | 0.253 | 0.284 | 0.772 |
| 13 | 0.253 | 0.269 | 0.776 |
| 14 | 0.252 | 0.263 | 0.781 |
| 15 | 0.251 | 0.272 | 0.781 |
| 16 | 0.25 | 0.263 | 0.782 |
| 17 | 0.249 | 0.259 | 0.786 |
| 18 | 0.249 | 0.261 | 0.788 |
| 19 | 0.248 | 0.262 | 0.787 |
| 20 | 0.247 | 0.261 | 0.79 |
+-------+------------+----------+----------------+
Step 6: Testing the Model — The Final Exam 🚨📚
with torch.no_grad():
model.eval()
out_gt = torch.FloatTensor()
out_pred = torch.FloatTensor()
for images, labels in tqdm(loaders["test"]):
images = images.to(CFG.DEVICE)
labels = labels.to(CFG.DEVICE)
outputs = model(images)
outputs = outputs.cpu().detach()
out_gt = torch.cat((out_gt, labels.cpu().detach()), 0)
out_pred = torch.cat((out_pred, outputs.data), 0)
labels = out_gt.numpy()
preds = out_pred.numpy()
per_class_AUROC = []
print("-----PER - CLASS AUROC------")
for i, name in enumerate(CFG.CLASS_NAMES):
try:
print(f"{name} - {round(roc_auc_score(labels[:, i], preds[:, i]), 3)}")
except ValueError:
pass
6.1: Testing Mode — The Moment of Truth 🧐
- It’s test time, my friend! The model is in evaluation mode, and there’s no going back now.
- torch.no_grad() is like telling the model, “Hey, no more learning today, just make predictions.” It’s on vacation — no backpropagation, no gradients, just a chill, predict-only mode.
- The model is still doing its thing, making predictions, but we’re not updating anything. Think of it like a student who’s already finished their assignment, they’re just waiting for the results.
6.2: Gathering Predictions — Like Collecting Your Report Card 📝
- As the test data flows in, we’re collecting the predictions in out_pred and the true labels in out_gt.
- It’s like collecting test results in two neat piles: one for the actual answers (labels), and one for the predicted answers (outputs).
- With each batch, we stack them up with torch.cat, creating a tall tower of results. It’s like building a stack of papers and hoping your grades are good enough for a celebration. 📚
6.3: Calculating AUROC — The Big Reveal 🎤
- Now it’s time to calculate the AUROC for each class, which is like checking how many of your predictions were correct.
- The roc_auc_score will tell us how well the model predicted each class. A score closer to 1 means the model is pretty much a genius, and closer to 0 means it’s still learning (but we’re not mad, we all start somewhere).
- We print the AUROC for each class and give a round of applause to the classes that did well, and maybe a bit of a pep talk for the ones that didn’t. 🏅
6.4: Final Report — Time to Celebrate or Cry 😭
- After calculating the AUROC for each class, we can see how well the model did on each type of pneumonia.
- If you get a high AUROC, you can pat yourself on the back — your model is officially a radiologist in training!
- If you get a low AUROC, don’t worry — it’s like the model’s first test. It’s just the beginning. Give it more data, more training, and it’ll get better. Rome wasn’t built in a day!
- This is the final test before you send the model into the real world to detect pneumonia in chest X-rays.
6.5: The Conclusion — Mission Complete ✅
- After all the hard work, you have a trained model that can evaluate test data and output meaningful results.
- You’ve just taken a machine learning model from scratch to testing greatness. Congratulations! 🎉
- Now, you know how to test, evaluate, and calculate metrics to see how well your model is performing.
- The job’s done, and it’s time to celebrate with some well-deserved rest… or maybe another project. But for now, enjoy the victory!
The Grand Reveal: Test Results — The Moment of Glory (Or Despair) 🎉
After battling code errors, optimizer struggles, and trying to understand AUROC scores (which sounds like a mystical spell), the moment of truth arrived — the final test results were in! 🧙♂️
Here’s how my model did across the different pneumonia detection categories. Drumroll, please… 🥁
-----PER - CLASS AUROC------
Atelectasis - 0.735
Cardiomegaly - 0.882
Effusion - 0.82
Infiltration - 0.673
Mass - 0.788
Nodule - 0.728
Pneumonia - 0.647
Pneumothorax - 0.799
Consolidation - 0.689
Edema - 0.832
Emphysema - 0.858
Fibrosis - 0.77
Pleural_Thickening - 0.719
Hernia - 0.846
- Atelectasis — 0.735: Almost there! We’re working on it. Can we get a little more precision on that lung collapse?
- Cardiomegaly — 0.882: BOOM! Nailed it! The heart’s got it. Looks like we might just become cardiologists too. 💖
- Effusion — 0.82: Not bad, not bad at all! Water in the lungs? This model knows. 💦
- Infiltration — 0.673: Uh-oh, seems like the model might be a little confused here. Time for some fine-tuning, maybe? 🤔
- Mass — 0.788: That’s a decent score. Could probably tell a tumor from a shadow, but let’s not get too cocky yet. 🧠
- Nodule — 0.728: Nodules… no big deal. But hey, we’re on the right track! 👏
- Pneumonia — 0.647: Oof. Pneumonia, you gave me some trouble, huh? Looks like we might need a few more training epochs here. 🦠
- Pneumothorax — 0.799: Almost as cool as a pneumothorax is for a diagnosis. Let’s give this model an applause! 👏
- Consolidation — 0.689: A little bit of lung consolidation here and there. Definitely room for improvement! 🫁
- Edema — 0.832: Like water in the lungs, but this time the model’s got it! 💧
- Emphysema — 0.858: Woohoo! High marks for emphysema. Maybe it’s because we all know how to blow out air. 🌬️
- Fibrosis — 0.77: Fibrosis is tricky, but this model is showing promise. Almost there! 🌱
- Pleural_Thickening — 0.719: The model’s on its way. Just needs to learn to recognize the pleural party better. 🫀
- Hernia — 0.846: What a champ! If the model can handle a hernia, nothing’s impossible. 💪
So, what do we have here?
The model’s been through the ringer, and while some categories (I’m looking at you, Pneumonia 😡) could use a little more work, others are knocking it out of the park. 🎯
The average AUROC is pretty solid, but like any true wizard, the work’s never done. Fine-tuning is the name of the game — and I’ve got my wand ready for more epochs. 🧙♂️💫
But, Here’s the Real Struggle — Replicating Those Paper Results 😅
Here’s the thing: replicating the paper’s results is harder than it looks. It’s like trying to get your spellcasting abilities to match a master wizard’s. Sure, the paper authors made it look easy, but getting CheXNet to match their results? Let’s just say, not as simple as flipping a wand. 🪄
While my model has been putting in the work, I still haven’t matched the paper’s results exactly — and honestly, that’s okay. I know exactly what you’re thinking: “But the paper said it was perfect! Why am I getting different numbers?” 😱
Well, getting paper results is tricky. It could be because of:
- Data preprocessing differences 🧐
- Hardware setups that vary (those Kaggle GPUs are great, but maybe not quite the same as the research lab machines)
- Random seeds being set differently (because, in deep learning, reproducibility is like trying to get the same pizza topping combination every single time 🍕).
- Hyperparameter tuning that may still be off by a tiny margin 🔧
So while I still have a few tweaks left to do (especially with pneumonia detection 😡), that doesn’t mean I haven’t made real progress. Sometimes, it’s all about iteration, testing, and fine-tuning — the true wizardry of deep learning.
But don’t worry! I’m not giving up just yet!
If there’s one thing I’ve learned from this journey, it’s that AI doesn’t hand you magic on a silver platter. You have to work for it. I’m still in the lab, tinkering away with the model, trying to get everything just right. But hey, I’ll keep at it, because that’s the fun part of Deep Learning — the process!
So, there you have it — the accidental journey of becoming a Medical AI Wizard!
From Kaggle GPUs to fine-tuning my CheXNet model, and everything in between, I’ve learned a lot. Sure, the paper results are still a distant dream, but I’m confident that with more epochs, data wrangling, and some tweaks, we’ll get there! 💪✨
And who knows? Maybe one day I’ll open up that test result file and see those perfect numbers staring back at me. Until then, I’m just one step closer to becoming the next AI medical wizard. 🧙♂️⚡