Skip to content

whria78/can

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

180 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

'C'linical photographs 'A'nnotated by 'N'eural networks ― CAN dataset

img

The specification of the lesion in this manner is necessary to enhance the performance of CNNs. However, this process demands a significant amount of time and effort from dermatologists. As part of this project, we constructed a dataset of 5,619 images of melanoma and melanocytic nevus by crawling pictures from the internet and annotating them with the assistance of ModelDerm Build2021 (https://modelderm.com; 'C'linical photographs 'A'nnotated by 'N'eural networks = CAN dataset). Comparable to the ImageNet dataset, this dataset is composed of labeled images from the internet. LESION130k was also obtained from 18,482 websites across roughly 80 countries and includes 132,673 lesion images for unsupervised training.

Examples of the synthetic melanoma, nevus, and morphed images

img

A total of 5,000 synthetic images (GAN5000 dataset) were generated using the generative network (StyleGAN2-ADA; Training = CAN2000, Pre-training = LESION130k).

All synthetic images (jpg): https://doi.org/10.6084/m9.figshare.21507189

Web-demo: https://modelderm.com/thismoledoesnotexist

Turing test: https://modelderm.com/turing/?q=1 (q = 1~19 for each test set)

CNN Model Performance

img

The EfficientNet-Lite0 trained on the annotated (CAN5600) or synthetic (GAN5000) images achieved higher or equivalent mean AUC to the EfficientNet-Lite0 trained using the pathologically confirmed public dataset.

Data repository

Path Description
Main Main directory
  ├  DATASET Dataset
    ├  CAN2000.csv CAN2000 dataset (URL, crop location)
    ├  CAN5600.csv CAN5600 dataset (URL, crop location)
    ├  LESION130k.csv LESION130k dataset (URL, crop location)
    ├  download.py Download script
    ├  CAN2000_...torrent CAN2000, CAN5600, LESION130k images (.torrent; 10.57GB)
  ├  SCRIPTS Script
    ├  DATA Training and Test datasets
      ├  asantest Asan test dataset
      ├  snu subset of SNU dataset
      ├  gan2000 2,006 GAN images
      ├  gan5000 5,619 GAN images
      ├  gan5600 GAN5600 dataset
    ├  out_source_0513 An example of projecting images to latent space (nevus; seed0513).
    ├  out_source_1119 An example of projecting images to latent space (melanoma; seed1119).
    ├  morph.py Morphing script
    ├  project.py project.py (from STYLEGAN2-ADA-PYTORCH)
    ├  run_all.py Run all configurations using train.py
    ├  train.py Training EfficientNet & Testing with various datasets
  ├  Result Result
    ├  raw_AUC_ACC... Raw result of paper (AUC, ACC, SE, SP, PPV, NPV)

1. Download CAN5600 / CAN2000 / LESION130k datasets

This repository only contains the download URLs of datasets. The images of datasets are available at the following torrent address (10.57GB): https://github.com/whria78/can/raw/main/DATASET/CAN2000_CAN5600_GAN5000_DATASET.zip.torrent

Please use qBittorent or other torrent clients for downloading : https://www.qbittorrent.org/

Please check dependencies.

pip install wget opencv-python numpy

The script will download raw images and generate the dataset. Some images may not available for the deletion of the link.

# python3 [LINUX] / python [WINDOWS] 
# CAN5600
python3 download.py CAN5600.csv
or
python download.py CAN5600.csv
# CAN2000
python3 download.py CAN2000.csv
or
python download.py CAN2000.csv
# LESION130k
python3 download.py LESION130k.csv
or
python download.py LESION130k.csv

2. Download Public Datasets

3. Training GAN Models

For the GAN training, StyleGAN2-ADA configuration in the StyleGAN3 was used (https://github.com/NVlabs/stylegan3).

# Data Preparation
# Scaled down 512x512 resolution using dataset_tool.py
# https://github.com/NVlabs/stylegan3/blob/main/dataset_tool.py
# python3 [LINUX] / python [WINDOWS] 
python3 dataset_tool.py --source=[DATA/LESION130k] --dest=[DATA/LESION130k_512] --resolution=512x512
python3 dataset_tool.py --source=[DATA/CAN2000/malignantmelanoma] --dest=[DATA/CAN2000_mel512] --resolution=512x512
python3 dataset_tool.py --source=[DATA/CAN2000/melanocyticnevus] --dest=[DATA/CAN2000_n512] --resolution=512x512
python3 dataset_tool.py --source=[DATA/CAN2000] --dest=[DATA/CAN2000_512] --resolution=512x512


# Pretrain
# Training for the Pretrain GAN Model of General Skin Disorders
# https://github.com/NVlabs/stylegan3/blob/main/train.py
python3 train.py --outdir=training-runs --data=[DATA/LESION130k_512] --mirror=1 --gpus=2 --gamma=8.2 --cfg=stylegan2 --batch=16 --batch-gpu=8 --map-depth=2 --glr=0.003 --dlr=0.003 --resume=ffhq512.pkl --kimg=10000 --snap=10 

# https://github.com/NVlabs/stylegan3/blob/main/gen_images.py
python3 gen_images.py --network=c10000.pkl --seeds=0-9999 --outdir=c10000


# Training and Generating GAN5000 dataset
python3 train.py --outdir=training-runs --data=[DATA/CAN2000_mel512] --mirror=1 --gpus=1 --gamma=32 --cfg=stylegan2 --kimg=500 --snap=1 --map-depth=2 --batch=16 --batch-gpu=8 --glr=0.003 --dlr=0.003 --resume=c10000.pkl --freezed=13

python3 gen_images.py --network=m500.pkl --seeds=0-2500 --outdir=[GAN5000/malignantmelanoma]

python3 train.py --outdir=training-runs --data=[DATA/CAN2000_n512] --mirror=1 --gpus=1 --gamma=32 --cfg=stylegan2 --kimg=500 --snap=1 --map-depth=2 --batch=16 --batch-gpu=8 --glr=0.003 --dlr=0.003 --resume=c10000.pkl --freezed=13

python3 gen_images.py --network=n500.pkl --seeds=0-2500 --outdir=[GAN5000/melanocyticnevus]


# Training for Morphing
python3 train.py --outdir=training-runs --data=[DATA/CAN2000_512] --mirror=1 --gpus=2 --gamma=32 --cfg=stylegan2 --kimg=500 --snap=1 --map-depth=2 --batch=32 --batch-gpu=8 --glr=0.003 --dlr=0.003 --resume=c10000.pkl --freezed=13; 

# Projecting images to latent space.
# To get the morphed images, we used the project.py in the StyleGAN2-ADA.
# https://github.com/NVlabs/stylegan2-ada-pytorch/blob/main/projector.py
python3 projector.py --save-video 0 --num-steps 1000 --outdir=out_source_0513 --target=seed0513.jpg --network=mn500.pkl
python3 projector.py --save-video 0 --num-steps 1000 --outdir=out_source_1119 --target=seed1119.jpg --network=mn500.pkl

# Generating the morphed images.
# https://github.com/whria78/can/blob/main/SCRIPTS/morph.py
python3 morph.py --network=mn500.pkl --source=out_source_0513/projected_w.npz --target=out_source_1119/projected_w.npz

Trained GAN Models: https://doi.org/10.6084/m9.figshare.21507189

Please check this tutorial for the detail of morphing: https://www.youtube.com/watch?v=J2nTo0cYVBk

4. Training CNN Models

⚠️ We strongly recommend to run this script on Linux platform. On Windows, irregular errors were observed.

Please check the dependencies

pip install scikit-learn scipy matplotlib torch_optimizer openpyxl 

Please check the test folders - [DATA/asan], [DATA/snu], [DATA/pad], [DATA/seven], [DATA/water], [DATA/edin], [DATA/mednode]

cf) Edinburgh dataset [DATA/edin] is a commercial dataset.

⚠️ All images should be squared off with cropped edges. Please refer to the composition of Asan & Snu images

# An example of training a Efficient-Lite0 with GAN5000 & Testing with SNU dataset
python3 train.py --model efficientnet --opt radam --batch 64 --epoch 30 --lr 0.001 --train [DATA/GAN5000] --test [DATA/snu] --result log.txt

# An example of running all test configurations
# Please add Edinburgh dataset [DATA/edin] manually. Edinburgh dataset is a commecial dataset.
# https://github.com/whria78/can/blob/main/SCRIPTS/run_all.py
python3 run_all.py --result_file log.txt

Deployment Tool for the Custom Algorithm (Experimental)

We have released a simple training and web deployment code for testing in real-world settings (Experimental). https://github.com/whria78/data-in-paper-out

List of Dermatology Datasets - Clinical Photographs

List of Dermatology Atlas Sites

License

MIT license

Citation

Generation of a Melanoma and Nevus Dataset from Unstandardized Clinical Photographs on the Internet 
JAMA Dermatology, October 4, 2023

Soo Ick Cho ― Lunit Inc
Cristian Navarrete-Dechent ― Pontificia Universidad Católica de Chile
Roxana Daneshjou ― Stanford University
Sung Eun Chang, Hye Soo Cho ― Asan Medical Center
Seong Hwan Kim ― Hallym University
Jung-Im Na ― Seoul National University
Seung Seog Han ― I Dermatology Clinic; IDerma Inc

https://jamanetwork.com/journals/jamadermatology/article-abstract/2810087

About

CAN dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages