re-typing LLVM

a flatbed scan of a grimy laptop keyboard missing three keys
That's not writing. That's typing Truman Capote, about On the Road

a professor in one of my first CS courses began a lecture with something like a parable. playing the hapless software engineer, a text editor was exercised on repetitive text-shuffling edits. moving to the chalkboard, the conclusion was drawn as a Fermi estimate: it is a net saving of your lifetime to invest in learning, as soon as possible, to type quickly — and use vim. it was 2019, GPT-2 had recently released but was not featured in the calculation.

meaningless work

By meaningless work I simply mean work which does not make you money or accomplish a conventional purpose. For instance putting wooden blocks from one box to another, then puttting the blocks back to the original box, back and forth, back and forth, etc., is a fine example of meaningless work. Or digging a hole, then covering it is another example. Walter De Maria

programming has never been about typing, it's still a minor but appreciable part of the craft. what has changed is the relationship between typist and code: the proliferation of LLM-assisted tooling has for many situations removed the need for direct mechanical handling of program text.

to choose the less effecient option is often reasonable, to avoid effeciency altogether is something else. re-typing, rote copying, for no reason other than its own sake, is my extended exercise in meaningless work.

procedure

file count
header file 15230
C program 16551
C++ program 34797
shared library 24
DLL library 6
dynamic library 94
debug info file 51
patch 4
static library 125
assembler program 12599
fortran program 3138
java program 52
objective-C program 572
objective-C++ program 1910
objective-C++ program 572
perl program 14
python program 2805
shell program 148
awk program 2
lua program 6
ruby program 3
symbols file 75
automake file 21
cmake file 2492
configuration file 851
executable file 241
GNU LD script 5
initialization file 8
javascript file 56
JSON file 314
lisp program 12
makefile 970
man page 8
manifest file 11
module-definition file 239
pkg-config file 2
rust program 1
SGML document 54
symbolic link 19
XML document 57
PE binary file 32
ABI list 9
argument file 2
backup file 1
BibTeX document 1
command file 59
compiled object 1469
configure script 1
CSS style sheet 27
CSV file 58
data list 3
DOS batch file 12
doxygen file 8
ELF binary file 90
git file 37
HTML page 103
include file 499
MATLAB file 1
memory dump file 9
ML program 27
MS Office document 5
POD document 1
precompiled header 6
property list 67
protocol buffer file 5
Qt translation file 21
resource file 127
RNG file 1
RPC file 4
RST file 2065
TeX document 11
vim settings file 16
xcode files 1
YAML file 1322
icon 9
image file 197
PDF file 12
vector image file 28
data file 422
directory 15103
archive 81
change log 6
information file 39
license 34
readme file 195
text file 57673
other 1737

the contents of the repository llvm-project (github.com) are progressively re-typed.

snapshot 21.1.8 is used to coincide with the release of Anthropic's LLM-generated C compiler (anthropic.com).

the text is linearized by the following procedure: directories are ordered by modification date, oldest first. contents of each directory are completed together, in alphabetical order. the first to be completed then are SECURITY.md, LICENSE.txt, and .clang-format. copying is done single-pass: typos are left uncorreted. exceptional cases (binary files?) will be handled as they arise.

the repository includes 175,226 files of greatly varying length. by rough estimation this requires 3 years of full-time work to re-type, which I do not very seriously intend to complete. it violates the spirit of the work to benefit from serious study, I accept there are arguably productive side-effects to this.

progress

2026.02.06
complete: SECURITY.md, LICENSE.txt, .clang-format, and llvm-libgcc/