This project is developing code for the automated analysis of the text of Requests for Comment (RFCs) published by the Internet Engineering Task Force, as part of a larger research project studying privacy in technical standard-setting.
For more information, if you want to use these tools or collaborate on their development, please contact Nick Doty.
Some basic graphs produced with this code are available online.
Scripts are not fully parameterized or user friendly. Current usage pattern:
- clone the repository
- download all RFCs (see "Getting the documents" below) as .txt into a
RFC-alldirectory within the main directory of the repository - configure by copying
config.ini.exampletoconfig.iniand pointing it to your downloaded RFCs python search.py --rfcwill create a filerfc-search.jsonwith section titles and lengths and word search counts for every available RFC
Other functionality:
search.pycan do basic string matching against all RFCs (or similar code for all W3C TRs)search.py --iddoes the same parsing for Internet-Drafts if you've rsynced them (and added that directory to yourconfig.ini)- the
graphs/directory containsd3.jsvisualizations of some of the measurements
There are several thousand RFCs and many more drafts and other IETF docs. You can download some or all of those documents for easier local analysis.
Clone the ietf-cli, add the config file to an appropriate location (and specify where you want all the documents synced) and run ./ietf mirror to download all RFCs, drafts and some minutes and other documents. It's more than 2 GB of data and takes at least a few minutes to download.
The RFC Editor maintains zip and tar files of all the RFCs, in TXT and PDF formats, for download with your browser. The compressed RFC-all.zip file is a couple hundred megabytes.