Write a "key auth" authenticator plugin
@sofia I have completed some sample plugins and configurations for the project, and running:
python3 zcmd.py --template=text_extraction.yml
Should create a database, example.db , which lets you see the output of a crawl + text extraction run.
One common use case with crawling a website is authentication. In a later project we will need to authenticate with an API, and some sites require logging in to see content. In V2 of this project each plugin had to do its own authentication, which was very messy. In this project, we will want to instead create a class of plugins, Authenticators, which perform authentication for plugins.
Authenticators will operate by 1) performing whatever authentication is necessary to "log in" when an authenticate() command is given to Zeomine (part of the run script), and 2) providing information to other plugins on how to use the authentication (since each crawler operates differently and may use the authentication in different ways). With this in mind, we will write our first and simplest authenticator, KeyAuth, by doing the following:
- Create a file: src/auth/key_auth.py (use the template in templates/plugin/sample_plugin.py)
- Define the function authenticate(self) , which does nothing (pass)
- In the settings dict, there are two keys: 'label' for the label of the key (e.g. api-key), and 'key' for the actual API key
- Define the authenticator functions: modifyURL, modifyHeaders, modifyCookies, and basicAuthParameters
- For basicAuthParameters, print an error message (self.z_obj.pprint()), and return an empty dict (the plugin doesn't use HTTP Basic Auth)
- For modifyHeaders and modify cookies you may be given a dict, 'headers' or 'cookies' for either function. You will want to set headers_or_cookies[label] = key , per our earlier settings.
- For modifyURL, you will be given a URL string, and it will need the query parameter label=key added to it. If a '?' is already present, you will add &label=key . Otherwise, ?label=key . Return the modified string.
Once these are done, I will test with an endpoint on 2020electionstory.com