2.3. Building

This section we will explain various options to build your projects. This options can be grouped into four categories:

  1. Sanity check

  • d2lbook build linkcheck will check if all internal and external links are accessible.

  • d2lbook build outputcheck will check if no notebook will contain code outputs

  1. Building results

  • d2lbook build html: build the HTML version into _build/html

  • d2lbook build pdf: build the PDF version into _build/pdf

  • d2lbook build pkg: build a zip file contains all .ipynb notebooks

  1. Additional features

  • d2lbook build colab: convert all notebooks can be run on Google Colab into _build/colab. See more in Section 2.9

  • d2lbook build lib: build a Python package so we can reuse codes in other notebooks. See more in XXX.

  1. Internal stages, which often are triggered automatically.

  • d2lbook build eval: evaluate all notebooks and save them as .ipynb notebooks into _build/eval

  • d2lbook build rst: convert all notebooks into rst files and create a Sphinx project in _build/rst

2.3.1. Building Cache

We encourage you to evaluate your notebooks to obtain code cell results, instead of keeping these results in the source files for two reasons: 1. These results make code review difficult, especially when they have randomness either due to numerical precision or random number generators. 1. A notebook hasn’t evaluated for a while may be broken due to package upgrading.

But the evaluation costs additional overhead during building. We recommend to limit the runtime for each notebook within a few minutes. And d2lbook will reuse the previous built and only evaluate the modified notebooks.

For example, the average runtime of a notebook (section) in Dive into Deep Learning is about 2 minutes on a GPU machine, due to training neural networks. It contains more than 100 notebooks, which make the total runtime cost 2-3 hours. In reality, each code change will only modify a few notebooks and therefore the build time is often less than 10 minutes.

Let’s see how it works. First create a project as we did in Section 2.1.

!mkdir -p cache
%%writefile cache/index.md
# My Book

The starting page of my book with `d2lbook`.

````toc
get_started
````
Writing cache/index.md
%%writefile cache/get_started.md
# Getting Started

Please first install my favorite package `numpy`.
Writing cache/get_started.md
!cd cache; d2lbook build html
[d2lbook:build.py:L147] INFO   2 notebooks are outdated
[d2lbook:build.py:L149] INFO   [1] ./get_started.md
[d2lbook:build.py:L149] INFO   [2] ./index.md
[d2lbook:build.py:L153] INFO   Evaluating notebooks in parallel with 8 CPU workers and 8 GPU workers
[d2lbook:resource.py:L196] INFO   Starting task "Evaluating ./get_started.md" on CPU [0]
[d2lbook:resource.py:L159] INFO     Status: 1 running tasks, 0 done, 1 not started
[d2lbook:resource.py:L164] INFO       - Task "Evaluating ./get_started.md" on CPU [0] is running for 00:00:00
[d2lbook:resource.py:L196] INFO   Starting task "Evaluating ./index.md" on CPU [3]
[d2lbook:resource.py:L159] INFO     Status: 2 running tasks, 0 done, 0 not started
[d2lbook:resource.py:L164] INFO       - Task "Evaluating ./get_started.md" on CPU [0] is running for 00:00:02
[d2lbook:resource.py:L164] INFO       - Task "Evaluating ./index.md" on CPU [3] is running for 00:00:00
[d2lbook:resource.py:L223] INFO   Task "Evaluating ./get_started.md" on CPU [0] is finished in 00:00:03
[d2lbook:resource.py:L223] INFO   Task "Evaluating ./index.md" on CPU [3] is finished in 00:00:02
[d2lbook:resource.py:L142] INFO   All 2 tasks are done, sorting by runtime:
[d2lbook:resource.py:L148] INFO     - 00:00:02 on CPU [3] for Evaluating ./index.md
[d2lbook:resource.py:L148] INFO     - 00:00:03 on CPU [0] for Evaluating ./get_started.md
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build eval" in 00:00:13
[d2lbook:build.py:L322] INFO   2 rst files are outdated
[d2lbook:build.py:L324] INFO   Convert _build/eval/index.ipynb to _build/rst/index.rst
[d2lbook:build.py:L324] INFO   Convert _build/eval/get_started.ipynb to _build/rst/get_started.rst
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build rst" in 00:00:14
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build ipynb" in 00:00:00
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build colab" in 00:00:00
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build sagemaker" in 00:00:00
Running Sphinx v5.3.0
making output directory... done
checking bibtex cache... out of date
parsing bibtex file /home/d2l-worker/workspace/d2l-book/docs/_build/eval/user/cache/_build/rst... WARNING: could not open bibtex file /home/d2l-worker/workspace/d2l-book/docs/_build/eval/user/cache/_build/rst.
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2 source files that are out of date
updating environment: [new config] 2 added, 0 changed, 0 removed

looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done

generating indices... genindex done
writing additional pages... search done
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 1 warning.

The HTML pages are in _build/html.
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build html" in 00:00:15

You can see index.md is evaluated. (Though it doesn’t contain codes, it’s fine to evaluate it as a Jupyter notebook.)

If building again, we will see no notebook will be evaluated.

!cd cache; d2lbook build html
[d2lbook:build.py:L147] INFO   0 notebooks are outdated
[d2lbook:build.py:L153] INFO   Evaluating notebooks in parallel with 8 CPU workers and 8 GPU workers
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build eval" in 00:00:00
[d2lbook:build.py:L322] INFO   0 rst files are outdated
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build rst" in 00:00:00
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build ipynb" in 00:00:00
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build colab" in 00:00:00
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build sagemaker" in 00:00:00
Running Sphinx v5.3.0
loading pickled environment... checking bibtex cache... up to date
done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 0 source files that are out of date
updating environment: 0 added, 0 changed, 0 removed
looking for now-outdated files... none found
no targets are out of date.
build succeeded.

The HTML pages are in _build/html.
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build html" in 00:00:00

Now let’s modify get_started.md, you will see it will be re-evaluated, but not index.md.

%%writefile cache/get_started.md
# Getting Started

Please first install my favorite package `numpy>=1.18`.
Overwriting cache/get_started.md
!cd cache; d2lbook build html
[d2lbook:build.py:L147] INFO   1 notebooks are outdated
[d2lbook:build.py:L149] INFO   [1] ./get_started.md
[d2lbook:build.py:L153] INFO   Evaluating notebooks in parallel with 8 CPU workers and 8 GPU workers
[d2lbook:resource.py:L196] INFO   Starting task "Evaluating ./get_started.md" on CPU [7]
[d2lbook:resource.py:L159] INFO     Status: 1 running tasks, 0 done, 0 not started
[d2lbook:resource.py:L164] INFO       - Task "Evaluating ./get_started.md" on CPU [7] is running for 00:00:00
[d2lbook:resource.py:L223] INFO   Task "Evaluating ./get_started.md" on CPU [7] is finished in 00:00:02
[d2lbook:resource.py:L142] INFO   All 1 tasks are done, sorting by runtime:
[d2lbook:resource.py:L148] INFO     - 00:00:02 on CPU [7] for Evaluating ./get_started.md
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build eval" in 00:00:03
[d2lbook:build.py:L322] INFO   1 rst files are outdated
[d2lbook:build.py:L324] INFO   Convert _build/eval/get_started.ipynb to _build/rst/get_started.rst
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build rst" in 00:00:03
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build ipynb" in 00:00:00
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build colab" in 00:00:00
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build sagemaker" in 00:00:00
Running Sphinx v5.3.0
loading pickled environment... checking bibtex cache... up to date
done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: 0 added, 1 changed, 0 removed

looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done

generating indices... genindex done
writing additional pages... search done
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded.

The HTML pages are in _build/html.
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build html" in 00:00:04

One way to trigger the whole built is removing the saved notebooks in _build/eval, or simply deleting _build. Another way is specifying some dependencies. For example, in the following cell we add config.ini into the dependencies. Every time config.ini is modified, it will invalid the cache of all notebooks and trigger a build from scratch.

%%writefile cache/config.ini

[build]
dependencies = config.ini
Writing cache/config.ini
!cd cache; d2lbook build html
[d2lbook:config.py:L12] INFO   Load configure from config.ini
[d2lbook:build.py:L147] INFO   2 notebooks are outdated
[d2lbook:build.py:L149] INFO   [1] ./get_started.md
[d2lbook:build.py:L149] INFO   [2] ./index.md
[d2lbook:build.py:L153] INFO   Evaluating notebooks in parallel with 8 CPU workers and 8 GPU workers
[d2lbook:resource.py:L196] INFO   Starting task "Evaluating ./get_started.md" on CPU [5]
[d2lbook:resource.py:L159] INFO     Status: 1 running tasks, 0 done, 1 not started
[d2lbook:resource.py:L164] INFO       - Task "Evaluating ./get_started.md" on CPU [5] is running for 00:00:00
[d2lbook:resource.py:L196] INFO   Starting task "Evaluating ./index.md" on CPU [2]
[d2lbook:resource.py:L159] INFO     Status: 2 running tasks, 0 done, 0 not started
[d2lbook:resource.py:L164] INFO       - Task "Evaluating ./get_started.md" on CPU [5] is running for 00:00:02
[d2lbook:resource.py:L164] INFO       - Task "Evaluating ./index.md" on CPU [2] is running for 00:00:00
[d2lbook:resource.py:L223] INFO   Task "Evaluating ./get_started.md" on CPU [5] is finished in 00:00:03
[d2lbook:resource.py:L223] INFO   Task "Evaluating ./index.md" on CPU [2] is finished in 00:00:02
[d2lbook:resource.py:L142] INFO   All 2 tasks are done, sorting by runtime:
[d2lbook:resource.py:L148] INFO     - 00:00:02 on CPU [2] for Evaluating ./index.md
[d2lbook:resource.py:L148] INFO     - 00:00:03 on CPU [5] for Evaluating ./get_started.md
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build eval" in 00:00:05
[d2lbook:build.py:L322] INFO   2 rst files are outdated
[d2lbook:build.py:L324] INFO   Convert _build/eval/get_started.ipynb to _build/rst/get_started.rst
[d2lbook:build.py:L324] INFO   Convert _build/eval/index.ipynb to _build/rst/index.rst
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build rst" in 00:00:05
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build ipynb" in 00:00:00
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build colab" in 00:00:00
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build sagemaker" in 00:00:00
Running Sphinx v5.3.0
loading pickled environment... checking bibtex cache... up to date
done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2 source files that are out of date
updating environment: 0 added, 2 changed, 0 removed

looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done

generating indices... genindex done
writing additional pages... search done
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded.

The HTML pages are in _build/html.
[d2lbook:build.py:L56] INFO   === Finished "d2lbook build html" in 00:00:06

Last, let’s clean our workspace.

!rm -rf cache