API Reference¶
impscan¶
Command line tool to identify minimal imports list and repository sources by parsing package dependency trees
—
Scan imports in a directory, determine which are non-standard library, and then (tentatively) determine the package dependency tree and prune the requirements accordingly, as well as determining which can be obtained from Conda (and on which channels) and which from PyPI.
Unlike some other refactoring tools, impscan
does not
need to operate on a package (e.g. it can just be scripts)
Currently, requirements (AKA “root packages”), imported module name
(“site packages” name) and other features are computed for one build
for every package on conda’s anaconda
and conda-forge
channels
(over 20,000 packages).
Workflow¶
Identify imports
Identify total dependency tree
Prune dependency tree
Identify sources (obeying source preferences if specified)
Save artifacts:
CONDA_SETUP.md
andrequirements.txt
Conda metadata¶
This class represents a file being streamed as a sequence of non-overlapping ranges.
- async impscan.conda_meta.async_utils.fetch(session: httpx.AsyncClient, url: str, can_raise: bool = False) httpx.Response [source]¶
- async impscan.conda_meta.async_utils.process_archive(resp: httpx.Response, lst: list, pbar=None)[source]¶
- class impscan.conda_meta.formats.CondaArchive(source_url: str, defer_pull: bool = False)[source]¶
Bases:
object
- about_info = 'info/about.json'¶
- about_json = None¶
- property archive¶
- check_bz2_info_dir() None [source]¶
Validate the members for assignment to instance attributes. Note: ‘members’ means the filenames within the compressed .tar.bz2 archive.
- determine_site_package_name() str | None [source]¶
Identify the package(s) which can be imported after the conda package is installed, by inspecting the /site-packages/ paths it creates. Multiple names are comma-separated in alphabetical order. Returns None if no such names are found.
- property filename: str¶
- index_info = 'info/index.json'¶
- index_json = None¶
- property info_fields: list¶
- info_is_read = False¶
- property is_bz2¶
- property is_zstd¶
- property members¶
- path_info = 'info/paths.json'¶
- path_json = None¶
- read_info()[source]¶
Load the JSON files from the info archive (otherwise all attempts to access the JSON-parsed dict attributes’ keys will fail) and set the info_is_read flag to show this.
- class impscan.conda_meta.streaming_formats.CondaArchiveStream(source_url: str, defer_pull: bool = True)[source]¶
Bases:
object
- about_info = 'info/about.json'¶
- about_json = None¶
- property archive¶
- check_bz2_info_dir() None [source]¶
Validate the members for assignment to instance attributes. Note: ‘members’ means the filenames within the compressed .tar.bz2 archive.
- determine_site_package_name() str | None [source]¶
Identify the package(s) which can be imported after the conda package is installed, by inspecting the /site-packages/ paths it creates. Multiple names are comma-separated in alphabetical order. Returns None if no such names are found.
- property filename: str¶
- index_info = 'info/index.json'¶
- index_json = None¶
- inflate_archive(db: impscan.db.db_utils.CondaPackageDB)[source]¶
Pull and parse the archive to a database entry, and insert it.
- Parameters
db – The database to insert the entry into.
- property info_fields: list¶
- info_is_read = False¶
- property is_bz2¶
- property is_zstd¶
- property members¶
- path_info = 'info/paths.json'¶
- path_json = None¶
- read_info()[source]¶
Load the JSON files from the info archive (otherwise all attempts to access the JSON-parsed dict attributes’ keys will fail) and set the info_is_read flag to show this.
- read_zst(filename: str, paths: list) list [source]¶
Extract the bytes from a CondaStream’s internal tar.zst archive. Requires downloading the entire tarball range (but not the entire CondaStream).
- Parameters
filename – Name of the tar.zst file within the CondaStream
paths – Paths within the tar.zst archive to return bytes from
- impscan.conda_meta.so_utils.verify_exported_module_name(conda_archive, so_path: str) set[str] | None [source]¶
- class impscan.conda_meta.url_utils.ArchiveType(value)[source]¶
Bases:
enum.Enum
An enumeration.
- Bz2 = '.tar.bz2'¶
- Zstd = '.conda'¶
- impscan.conda_meta.url_utils.detect_archive_type_from_url(url: str) impscan.conda_meta.url_utils.ArchiveType [source]¶
- impscan.conda_meta.zip_utils.open_zipfile_from_url(url: str) zipfile.ZipFile [source]¶
- impscan.conda_meta.zip_utils.read_zipped_zst(zf: zipfile.ZipFile, zst_tar_fn: str, zst_paths: list) list [source]¶
Given the ZipFile zf, tarball filename zst_tar_fn, and path(s) within the zst tarball zst_paths, return a list of one or more bytestrings from decompressing those paths.
- class impscan.conda_meta.zstd_utils.ZstdTarFile(name, mode='r', *, level_or_option=None, zstd_dict=None, **kwargs)[source]¶
Bases:
tarfile.TarFile
Database handling¶
Set up a database to store the package archive listings in.
- class impscan.db.CondaArchiveListings(start_from_pkg: str | None = None)[source]¶
Bases:
object
Synchronous listings, using CondaStream to efficiently look at conda archives.
- make_archive(source_url: str, defer_pull: bool = True) impscan.conda_meta.formats.CondaArchive [source]¶
Create CondaArchive object; includes channel and format detection
- make_archives(defer_pull: bool = True)[source]¶
Make and return a list of CondaArchive objects and pull their URLs collectively in an efficient async procedure (not seriallly).
- property urlset: Generator[str, None, None]¶
Generator of URLs for async fetching
- class impscan.db.PackageDB(dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/impscan/envs/latest/lib/python3.9/site-packages/impscan/assets'), filename='package_catalogue.db', create=True, no_touch=False)[source]¶
Bases:
object
- directory = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/impscan/envs/latest/lib/python3.9/site-packages/impscan/assets')¶
- filename = 'package_catalogue.db'¶
- property path¶
- class impscan.db.db_utils.CondaPackageDB(dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/impscan/envs/latest/lib/python3.9/site-packages/impscan/assets'), filename='package_catalogue.db', create=True, no_touch=False)[source]¶
- class impscan.db.db_utils.PackageDB(dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/impscan/envs/latest/lib/python3.9/site-packages/impscan/assets'), filename='package_catalogue.db', create=True, no_touch=False)[source]¶
Bases:
object
- directory = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/impscan/envs/latest/lib/python3.9/site-packages/impscan/assets')¶
- filename = 'package_catalogue.db'¶
- property path¶
Lookup¶
?
Scanning¶
The scanner
subpackage handles import module name identification
- class impscan.scanner.ast_parsing.ParsedPy(py_file_path: pathlib.Path, env_config: impscan.config.EnvConfig)[source]¶
Bases:
object
- property allowed_imports¶
- property banned_imports¶
- impscan.scanner.ast_utils.retrieve_imported_modules(py_file_path: pathlib.Path) set [source]¶
Return a set of imported names (excluding stdlib modules) by parsing the AST for import statements (ignoring relative imports).
- impscan.scanner.import_utils.get_sibling_module_names(target_module_path: pathlib.Path) set [source]¶
Given a source module at target_module_path, determine the names of any modules it may import in the local directory: either those files ending in .py or directories (which do not need to contain an __init__.py due to implicit namespaces).
- impscan.scanner.module_utils.stdlib_dynload_module_names(stdlib_path: pathlib.Path) set [source]¶
Given the path to the standard library, extend it to the lib-dynload/ directory, collect the module names of all dynamic libraries within it.
Return a set of all the modules loaded dynamically in the standard library.
- impscan.scanner.module_utils.stdlib_module_names() set [source]¶
Get the path to the standard library by using the sys.modules list, specifically the filepath stored for a non-builtin library (pathlib), and use this path to detect all standard library module names rather than hard-code them.
Return a set of all the modules in the standard library.
- class impscan.scanner.requirement.EnvReqs(env_config: impscan.config.EnvConfig)[source]¶
Bases:
object
- register(python_file: pathlib.Path)[source]¶
- impscan.scanner.sanitiser.is_ignored_path(path: pathlib.Path)[source]¶
Check each part of a path for matches against the list of filters given in ignore_part_names.
- impscan.scanner.scan.scan_imports(source_path: pathlib.Path, env_config) impscan.scanner.requirement.EnvReqs [source]¶
Execute the scan of import statements below source_path (either a Python file or a directory to be walked recursively to find them), identifying the dependency graphs within the repositories given in env_config and returning the list(s) of requirements for each.
CLI¶
The command-line tool impscan
is made available as an entrypoint to
impscan.__main__.main()
, in turn a thin interface to impscan.cli
.
—
Config¶
Configuration handling.
—
- class impscan.config.EnvConfig(**kwargs)[source]¶
Bases:
object
- __dict__ = mappingproxy({'__module__': 'impscan.config', '__init__': <function EnvConfig.__init__>, 'set_config': <function EnvConfig.set_config>, '__dict__': <attribute '__dict__' of 'EnvConfig' objects>, '__weakref__': <attribute '__weakref__' of 'EnvConfig' objects>, '__doc__': None, '__annotations__': {}})¶
- __module__ = 'impscan.config'¶
- __weakref__¶
list of weak references to the object (if defined)