Megfile
Overview
Guide to using megfile’s unified APIs, SmartPath, configuration, and CLI for file operations across various backends, like local FS, S3/OSS-compatible object storage, SFTP, WebDAV, HTTP, HDFS, and stdio.
Quick Start
- Install base:
pip install megfile; add extras per backend (megfile[cli],megfile[hdfs],megfile[webdav]). - Configure credentials/endpoints (env vars > config files). S3 examples:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, optionalAWS_ENDPOINT_URL/OSS_ENDPOINT/AWS_ENDPOINT_URL_S3,AWS_S3_ADDRESSING_STYLE. - Path format:
protocol://bucket/keyor alias (e.g.,tos://); bare POSIX paths are treated asfile://. - Import functional APIs (
from megfile import smart_open, smart_sync, ...) or SmartPath (from megfile.smart_path import SmartPath).
Supported Protocols & Extras
- Local FS (
file://or bare paths) — base install. - S3/OSS-compatible (
s3://, plus aliases) — base install. - SFTP (
sftp://) — installmegfile[cli]ormegfilewith SFTP deps. - HTTP/HTTPS (
http://,https://) — base install. - Stdio (
stdio://) — base install. - HDFS (
hdfs://) — installmegfile[hdfs]. - WebDAV (
webdav://) — installmegfile[webdav]. - Full Protocol path format reference:
references/path_format.md.
Core Tasks
File IO
smart_open(path, mode='r', encoding=None, **options): open binary/text handles.- Convenience loaders/savers:
smart_load_content/smart_save_content(bytes),smart_load_text/smart_save_text(str),smart_save_as(file_obj, path),smart_load_from(path)returns BinaryIO. smart_combine_open(glob, mode='rb', open_func=smart_open): sequentially reads multiple files as one stream.
Existence & Metadata
smart_exists,smart_isfile,smart_isdir,smart_islink,smart_isabs.smart_access(path, mode=Access.READ/WRITE).smart_stat/smart_lstat(via SmartPath),smart_getsize,smart_getmtime,smart_getmd5(recalculate=False, followlinks=False).
Listing & Globbing
- Directory traversal:
smart_listdir,smart_scandir,smart_walk,smart_scan,smart_scan_stat(returns FileEntry/StatResult). - Pattern matching:
smart_glob,smart_iglob,smart_glob_statfor stat-rich globbing.
Data Transfer & Lifecycle
- Copy/sync:
smart_copy,smart_sync,smart_sync_with_progress(progress-friendly wrapper),smart_concatto merge multiple sources. - Moves/deletes:
smart_move,smart_rename,smart_remove,smart_unlink,smart_touch,smart_makedirs. - Links:
smart_symlink,smart_readlink.
Path Utilities
smart_path_join,smart_abspath,smart_realpath,smart_relpath,smart_isabs.- SmartPath mirrors pathlib semantics but routes to the right backend:
path = SmartPath("s3://bucket/key"); path.exists(); path.open(mode="rb").
Caching
smart_cache(path, cacher=SmartCacher, **options): cache remote resources locally for tools that only support local files.
Configuration
- Use CLI helpers to persist credentials/endpoints; environment variables take precedence.
- Profiles enable multiple endpoints (e.g.,
s3+prod://...). Seereferences/configuration/for protocol-specific flags and env vars. - Full config reference:
references/configuration/.
CLI Essentials
- Install CLI extras:
pip install 'megfile[cli]'. - Common commands (ls/cp/sync/stat/md5sum/mkdir/rm/touch) mirror POSIX semantics across backends.
- Completion scripts:
megfile completion zsh. - Full command list and flags:
references/cli.md.
Usage Notes
- Prefer smart_* for protocol-agnostic code paths; avoid branching per backend.
- Ensure required extras are installed for target protocols before invoking APIs.
- For high-volume sync/copy, supply
map_func(e.g.,ThreadPoolExecutor.map) andcallbackto report progress. - Use aliases via
megfile config alias <alias> <protocol>to shorten paths (e.g.,tos://).
References
- API surface:
references/megfile.smart.mdandreferences/megfile.smart_path.md. - Configuration flags, env vars, and profiles:
references/configuration/ - CLI commands and flags:
references/cli.md - Full Protocol path format reference:
references/path_format.md. - Glob patterns reference:
references/advanced/glob.md.
