Creating a Python library dependency, license and vulnerability checker
A unique Python issue I come across at work is I'm stuck behind a corporate firewall and am unable to make use of Pip. When I want to use a Python library I have to download the libraries straight from PyPI.
This causes a few problems, the main being dependency hell. Let's say I want to install Jupyterlab. So you
- Download the .whl for Jupyterlab and try a Pip offline install.
- Oh dear, Jupyterlab requires the library Notebook.
- Download Notebook.
- Oh dear, Notebook requires Tornado
- And so on and so on ...
By the time you're done you've wasted half an hour.
I have another issue which is where I'm developing Python that i'm shipping to customers. I need to be careful that I'm meeting the license conditions of anything I'm including. And whether I'm using libraries with any vulnerabilities.
There are commercial solutions like Black Duck and Whitesource Bolt that try and solve this issue. But before I got out my credit card I tried to knock together a quick utility to do this myself.
I created a basic Python library checker. The source is available here.
Interesting thinks I learned do this was
PyPI API PyPI provides a simple JSON API to view metadata about packages. This means that I can get information about dependencies without having to download anything.
def getPypiData(package, version=None):
url = None
if version:
url = "https://pypi.org/pypi/{name}/{version}/json".format(
name=package, version=version
)
else:
url = "https://pypi.org/pypi/{name}/json".format(name=package)
try:
jsonData = requests.get(url).json()
return jsonData
except:
print("Error accessing {}".format(url))
return None
PyPI also contains license information for packages. However, having carried out a few different reviews its often wrong. I include a list of licenses in my utility but I've been having to do manual checks for now.
Packaging The Python library packaging provides objects and methods for the requirements, versions and tags that are used by Pip.
Vulnerability Information I've been learning a lot about CVEs recently and unless its one of the major Python packages, no ones going to record CVEs for them.
Various other companies have attempted to do their own analysis and I settled on using the values in https://github.com/pyupio/safety-db.
Putting it all together I created a simple front-end using Bulma and JQuery to make it easy to access. I used a neat library called JQTree that I used to create the dependency view.
I hosted this using an Azure Function (Microsoft's Serverless offering). You can have a play with it here
There's a few bugs I already know about
- A