-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Bug Report
zola check
currently reports errors for links where the server returns an error (e.g., 403, 400) when there is not a user agent in the request headers. This is expected behavior, as the current link_checker doesn't set any. Can we allow the link checker to set a user agent, and/or provide a default zola user agent?
Environment
Ubuntu 18.04.4
Zola version:
v0.10.0
Expected Behavior
Tell us what should have happened.
Some servers return errors when the user agent header is missing. For example, when running the link_checker on a URL such as https://arxiv.org/abs/1906.01113, the link_checker will report a 403 and declare this as a dead link. This can be seen using an example test case:
components/link_checker/src/lib.rs
#[test]
fn user_agent_test() {
let url = "https://arxiv.org/abs/1906.01113";
let res = check_url(url, &LinkChecker::default());
assert!(res.is_valid());
assert!(res.code.is_some());
assert!(res.error.is_none());
}
This same test case will pass if a user agent is included, e.g.:
pub fn check_url(url: &str, config: &LinkChecker) -> LinkResult {
{
let guard = LINKS.read().unwrap();
if let Some(res) = guard.get(url) {
return res.clone();
}
}
let mut headers = HeaderMap::new();
headers.insert(ACCEPT, "text/html".parse().unwrap());
headers.append(ACCEPT, "*/*".parse().unwrap());
headers.append(USER_AGENT, "zola/0.10.0 link_checker".parse().unwrap());
...
Without a USER_AGENT, the test will fail.
We could mitigate this issue by:
- Setting a default user agent (such as a a zola-specific user agent shown above)
- Allowing users to specify a user-agent via some configuration
For a default user-agent, we probably do not want a hard-coded string, and rather could just follow the reqwest example:
https://docs.rs/reqwest/0.10.1/reqwest/struct.ClientBuilder.html#method.user_agent
// Name your user agent after your app?
static APP_USER_AGENT: &str = concat!(
env!("CARGO_PKG_NAME"),
"/",
env!("CARGO_PKG_VERSION"),
);
let client = reqwest::Client::builder()
.user_agent(APP_USER_AGENT)
.build()?;
Some other example URLs which return 400/403s without a user agent: