Skip to content

ubuntu-dev: reduce bloat#162

Merged
ishitatsuyuki merged 9 commits into
masterfrom
ubuntu-bloat
Mar 28, 2018
Merged

ubuntu-dev: reduce bloat#162
ishitatsuyuki merged 9 commits into
masterfrom
ubuntu-bloat

Conversation

@ishitatsuyuki

Copy link
Copy Markdown
Collaborator

asciidoc drags in the whole texlive suite, which weighs at about 500MB. apt repositories should be also cleaned, while the impact is not large.

@ishitatsuyuki

Copy link
Copy Markdown
Collaborator Author

Don't merge yet. We seems to have depended on recommended packages.

@jankeromnes jankeromnes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment in passing (will wait for us to be sure about this before approving or merging).

Comment thread ubuntu-dev/ubuntu-dev.dockerfile Outdated
&& apt-get update -q \
&& apt-get upgrade -qy \
&& apt-get install -qy \
--no-install-recommends \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd prefer apt-get install options that aren't packages to be on the same line as apt-get install (just like -qy).

@ishitatsuyuki ishitatsuyuki force-pushed the ubuntu-bloat branch 2 times, most recently from b602ed3 to 1a7550b Compare March 13, 2018 10:18
@ishitatsuyuki

ishitatsuyuki commented Mar 13, 2018

Copy link
Copy Markdown
Collaborator Author

Note: dropped wget. We can re-add if it's widely used (personally I don't because it's not popular in Arch Linux).

Edit: seems it broke some child dockerfiles. I will re-add and revert the curl changes.

@ishitatsuyuki

Copy link
Copy Markdown
Collaborator Author

Pushed some other changes. If you don't like them I will swap them out.

@ishitatsuyuki ishitatsuyuki force-pushed the ubuntu-bloat branch 3 times, most recently from 1e47c1c to 6fd5bbe Compare March 13, 2018 23:49

@jankeromnes jankeromnes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fighting bloat! It's a thankless job that benefits everyone without much celebration.

I'm pretty excited about some of these improvements. However, in general, do you have data on how much bloat you're saving? (Is it just about image size? How many GB does this save?)

Also, philosophically ubuntu-dev is supposed to be the biggest image possible, because every other image and container will be based on it, so if everything possible is already included once in ubuntu-dev, it won't be included N_images + N_containers times in various images and containers. This is why I'm not too keen on aggressively deleting stuff from ubuntu-dev (I'm afraid they'll get re-added down-the-line in a more disk-space-heavy way).

&& gclient sync --delete --jobs=18 \
&& ninja -C out/Default chrome -j18
&& gclient sync --delete --jobs=`nproc` \
&& ninja -C out/Default chrome -j`nproc`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool trick! I didn't know about nproc. (Also, I read somewhere that the ideal -j would be nproc + 2 or similar, but I'm not sure and it shouldn't make much of a difference, so go for nproc! 👍)

# One-line setup command from:
# https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Linux_Prerequisites#Most_Distros_-_One_Line_Bootstrap_Command
RUN sudo apt-get update -q \
RUN sudo apt-get update \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why remove the -q or -qq on every apt-get update?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed and in the Docker library images it's never used. Often when an error occurs it would be more useful if we had the context (which is suppressed with -q).

Comment thread servo/servo.dockerfile
build-essential \
cmake \
python-virtualenv \
python-pip \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for removing these packages! However, I guess our comment # Packages are from https://github.com/servo/servo/blob/master/README.md#on-debian-based-linuxes is no longer true (i.e. when we'll want to update this package list, it will look like Servo introduced the new dependencies g++ build-essential cmake python-virtualenv python-pip, which is unfortunate).

What is the effect of doing a apt-get install of a package that's already installed? If it's essentially a no-op, please keep the list in sync with Servo's README.md.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Yes, apt-get install on installed packages is a noop. I will revert it back.

x11vnc \
xmlto \
xvfb \
&& rm -rf /var/lib/apt/lists/* \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the exact effect of systematically adding rm -rf /var/lib/apt/lists/* to our Dockerfiles? Won't this cause user-triggered sudo apt-get updates to be slower, and to consume disk space outside of any Copy-on-Write?

Speaking of disk space, how much space does rm -rf /var/lib/apt/lists/* actually save? (If it's say > 8 GB then I find it useful, but if it's below that, I consider the bloat to be negligible, whereas forcing every image to do rm -rf /var/lib/apt/lists/* is fragile and cumbersome.)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each Dockerfile RUN record creates a layer; in this case, as long as you always run the rm line every time you do an apt-get, the apt lists will never end up in an image layer.

fluxbox \
gdb \
gettext \
git \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for converting our Git build to a PPA! But are you 100% certain that:

  1. The PPA git will always be the latest stable (e.g. 2.16.2 today)?
  2. The PPA will also set up git-completion.bash and git-prompt.sh for every user session? (We used to source them in ~/.bashrc)
  3. The resulting Git prompt will forcibly include colors?

If not, please address these nits.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the description, it's backed by Ubuntu Git maintainers and claims to have the latest version. Thus, it should have the latest stable packaged rapidly.

The completion files will be installed globally, but the prompt doesn't get modified. I will take a look into this.

Comment thread ubuntu-dev/ubuntu-dev.dockerfile Outdated
valgrind \
wget \
x11vnc \
xmlto \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

# Install basic development packages.
RUN __LLVM_VERSION__="6.0" \
&& apt-get update -q \
&& apt-get upgrade -qy \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not upgrade packages from ubuntu-dev while we're at it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrading packages will result in space overhead due to CoW. The ubuntu base image is normally kept almost updated, so there's no reason to hurry.

ENV PATH="${PATH}:/home/user/.cargo/bin"
RUN rustup install nightly \
&& rustup completions bash | sudo tee /etc/bash_completion.d/rustup.bash-completion
&& rustup completions bash | sudo tee /etc/bash_completion.d/rustup.bash-completion > /dev/null

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why silence tee's output?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this doesn't contribute to diagnostics and makes log less readable.

@ishitatsuyuki ishitatsuyuki left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reduces the main package set from 3.1GB -> 1.7GB, which is primarily due to we have eliminated TeX Live to be included (the full suite takes over 1 gigabyte).

Disabling the recommended packages is a common practice in Docker world, and we have ran CI to ensure that the basic build passes (I've added those missing ones explicitly). This may accidentally reduce the set of available packages, but I believe the impact is small as the primary thing we removed is TeX Live.

fluxbox \
gdb \
gettext \
git \

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the description, it's backed by Ubuntu Git maintainers and claims to have the latest version. Thus, it should have the latest stable packaged rapidly.

The completion files will be installed globally, but the prompt doesn't get modified. I will take a look into this.

ENV PATH="${PATH}:/home/user/.cargo/bin"
RUN rustup install nightly \
&& rustup completions bash | sudo tee /etc/bash_completion.d/rustup.bash-completion
&& rustup completions bash | sudo tee /etc/bash_completion.d/rustup.bash-completion > /dev/null

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this doesn't contribute to diagnostics and makes log less readable.

# Install basic development packages.
RUN __LLVM_VERSION__="6.0" \
&& apt-get update -q \
&& apt-get upgrade -qy \

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrading packages will result in space overhead due to CoW. The ubuntu base image is normally kept almost updated, so there's no reason to hurry.

Comment thread servo/servo.dockerfile
build-essential \
cmake \
python-virtualenv \
python-pip \

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Yes, apt-get install on installed packages is a noop. I will revert it back.

# One-line setup command from:
# https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Linux_Prerequisites#Most_Distros_-_One_Line_Bootstrap_Command
RUN sudo apt-get update -q \
RUN sudo apt-get update \

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed and in the Docker library images it's never used. Often when an error occurs it would be more useful if we had the context (which is suppressed with -q).

@jankeromnes jankeromnes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm starting to really like this change! I'm mostly OK with the current state, but I just have a few minor questions.

This reduces the main package set from 3.1GB -> 1.7GB, which is primarily due to we have eliminated TeX Live to be included (the full suite takes over 1 gigabyte).

That's excellent, thanks! I suppose no developer will ever need TeX Live on Janitor, so it probably makes sense to remove it from ubuntu-dev, and generally shrink it from 6.6GB down to 5.2GB (a 21% save).

One question though, if we completely remove /var/lib/apt/lists/* from all our image layers, won't user-triggered apt-get update commands on Janitor run somewhat slower and use more disk space than if our images already had some copy-on-write cache? (It's not a super-frequent or crucial use case, but I'm curious.)

Disabling the recommended packages is a common practice in Docker world, and we have ran CI to ensure that the basic build passes (I've added those missing ones explicitly).

Good thinking! A few other dependent images to be aware of:

(Note that they might still be based on janx/ubuntu-dev, which we don't update anymore, so your change won't break them until they upgrade.)

xmlto
Why is this needed?

Just curious.

Comment thread servo/servo.dockerfile
libegl1-mesa-dev \
libdbus-1-dev \
libpulse-dev \
libharfbuzz-dev \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool drive-by update, thanks!

# Add source for the latest Mercurial packages.
RUN echo "deb http://ppa.launchpad.net/mercurial-ppa/releases/ubuntu xenial main" > /etc/apt/sources.list.d/mercurial.list \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 41BD8711B1F0EC2B0D85B91CF59CE3A8323293EE
RUN add-apt-repository ppa:mercurial-ppa/releases

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this makes us a little more vulnerable to malicious key changes (e.g. if the PPA gets hacked).

@jankeromnes

jankeromnes commented Mar 27, 2018

Copy link
Copy Markdown
Member

Please also rebase your commits. After that I'll approve this pull request and we can merge it and start saving disk space and network transfer!

asciidoc drags in the whole texlive suite, which weighs at about 500MB. apt repositories should be also cleaned, while the impact is not large.
And refactor apt sources handling
Excessive parallelism can be slow due to more context switch, memory requirements
and IO.
@ishitatsuyuki

Copy link
Copy Markdown
Collaborator Author

Removed xmlto as it's no longer needed with the Git PPA (it was originally installed as recommended package but now we disable them so we needed to list it manually). Ready for review (I hope the CI will be green).

@jankeromnes jankeromnes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

Except for dspace.dockerfile, where it looks like you reverted recent changes. Please fix that and then we can merge.

Comment thread dspace/dspace.dockerfile Outdated
@@ -1,29 +1,45 @@
FROM janitortechnology/ubuntu-dev
FROM janx/ubuntu-dev

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this doesn't look right.

@jankeromnes jankeromnes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for the great work! 🥇 💯 👍 🎉 🎆

@ishitatsuyuki ishitatsuyuki merged commit eaa0cc3 into master Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants