Description
Does anyone remember how exactly we came about the channel mean
s and std
s we use for the preprocessing?
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
I think the first mention of the preprocessing in this repo is in #39. In that issue @soumith points to https://github.com/pytorch/examples/tree/master/imagenet for reference. If you look at the history of main.py the commit pytorch/examples@27e2a46 first introduced the values. Unfortunately it contains no explanation, hence my question.
Specifically, I'm seeking answers to the following questions:
- Are these values
round
ed,floor
ed, or evenceil
ed? - Did we use only the images in the training set of
ImageNet
or additionally the images of the validation set? - Did we perform any kind of resizing or cropping on each image before the calculations were performed?
I've tested some combinations and will post my results here.
Parameters | mean | std |
---|---|---|
train set only, no resizing / cropping | [0.4803, 0.4569, 0.4083] |
[0.2806, 0.2736, 0.2877] |
train set only, resize to 256 and center crop to 224 | [0.4845, 0.4541, 0.4025] |
[0.2724, 0.2637, 0.2761] |
train set only, center crop to 224 | [0.4701, 0.4340, 0.3832] |
[0.2845, 0.2733, 0.2805] |
While the mean
s match fairly well, the std
differ significantly.
Update:
The process for obtaining the values of mean
and std
was roughly equivalent to the following but the the concrete subset
that was used is lost:
import torch
from torchvision import datasets, transforms as T
transform = T.Compose([T.Resize(256), T.CenterCrop(224), T.PILToTensor(), T.ConvertImageDtype(torch.float)])
dataset = datasets.ImageNet(".", split="train", transform=transform)
means = []
stds = []
for img in subset(dataset):
means.append(torch.mean(img))
stds.append(torch.std(img))
mean = torch.mean(torch.tensor(means))
std = torch.mean(torch.tensor(stds))
See #1965 for the reproduction experiments.