Conversation
| from typing import Optional | ||
| from functools import partial | ||
| import logging | ||
| logger = logging.getLogger('ds') |
There was a problem hiding this comment.
Non-blocker, for next time gardening: python world seems to be moving to tighter forms like (a) full-module imports all on one line (alphabetical) and (b) explicit imports per-line
| self.url_drugbank: str = os.environ["URL_DRUGBANK"] if isinstance( | ||
| os.environ["URL_DRUGBANK"], str) else None | ||
| self.query_keywords: [] = os.environ["QUERY_KEYWORDS"].split( | ||
| ",") if isinstance(os.environ["QUERY_KEYWORDS"], str) else None |
There was a problem hiding this comment.
Was a little surprised to see configfile / envvar stuff here vs. regular parameters (ex: testability), but ok!
| return response.json() | ||
|
|
||
| def api_wrapper(self, query, from_study): | ||
| return self.api(query, from_study, from_study+99, self.url_USA) |
There was a problem hiding this comment.
+99 some sort of interval len - make an optional param ..., span=99)?
| return self.api(query, from_study, from_study+99, self.url_USA) | ||
|
|
||
| def getAllStudiesByQuery(self, query: str) -> list: | ||
| logger.info("> STARTING scraping with '{}' keyword".format(query)) |
There was a problem hiding this comment.
When using logger, instead of eagerly interpolating, better to use %s, so interpolation only executes when in that log level:
logger.info("zzzz %s", query)
| def api(query, from_study, to_study, url): | ||
| url = url.format(query, from_study, to_study) | ||
| response = requests.request("GET", url) | ||
| return response.json() |
| "> {} studies found by '{}' keyword".format(nstudies, query)) | ||
| if nstudies > 0: | ||
| studies = temp['FullStudiesResponse']['FullStudies'] | ||
| for study_index in range(from_study+100, nstudies, 100): |
There was a problem hiding this comment.
Again, good to have stride as a parameter w/ default value, ..., span=100)
| return studies | ||
|
|
||
| @staticmethod | ||
| def xls_handler(r): |
There was a problem hiding this comment.
AFAICT, this downloads an xls, rewrites into csv, and rereads the csv w/ pandas, and returns the result
Pandas has an xls reader, and may be able to work directly on the bytes buffer. A bit surprising to see like this, and I recall discussion of some formatting issues encountered along the way.
There was a problem hiding this comment.
(not urgent if works, to be clear)
| session.run(traversal, users=user_data) | ||
| cls.ids = pd.DataFrame({'id': [1, 2, 3, 4, 5]}) | ||
| except Exception as err: | ||
| print(err) |
There was a problem hiding this comment.
Double check the error handling... e.g., should we detect & rerun if error?
lmeyerov
left a comment
There was a problem hiding this comment.
Seems fine-ish to land:
-- Now: I'd take a think on exception handling, esp. as this gets plugging into automation
-- Next time: Some stylistic stuff to keep in mind, see comments
No description provided.