ytfetcher package¶
Subpackages¶
- ytfetcher.config package
- ytfetcher.models package
- ytfetcher.services package
- ytfetcher.utils package
Submodules¶
ytfetcher.exceptions module¶
- exception ytfetcher.exceptions.ExporterError¶
Bases:
ExceptionBase exception for all Exporter errors.
- exception ytfetcher.exceptions.InvalidHeaders¶
Bases:
YTFetcherErrorRaises when headers are invalid.
- exception ytfetcher.exceptions.NoDataToExport¶
Bases:
ExporterErrorRaises when channel snippets and transcripts are empty.
- exception ytfetcher.exceptions.OutputDirectoryNotFoundError¶
Bases:
ExporterErrorRaised when the specified output directory does not exist.
- exception ytfetcher.exceptions.YTFetcherError¶
Bases:
ExceptionBase exception for all YTFetcher errors.
Module contents¶
- class ytfetcher.ChannelData(*, video_id: str, transcripts: list[Transcript] | None = None, metadata: DLSnippet | None = None, comments: list[Comment] | None = None)¶
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- to_dict() dict¶
- transcripts: list[Transcript] | None¶
- video_id: str¶
- class ytfetcher.DLSnippet(*, id: str, title: str, description: str | None = None, url: str | None = None, duration: float | None = None, view_count: int | None = None, thumbnails: list[dict] | None = None)¶
Bases:
BaseModel- description: str | None¶
- duration: float | None¶
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- thumbnails: list[dict] | None¶
- title: str¶
- url: str | None¶
- video_id: str¶
- view_count: int | None¶
- class ytfetcher.VideoTranscript(*, video_id: str, transcripts: list[Transcript])¶
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- to_dict() dict¶
- transcripts: list[Transcript]¶
- video_id: str¶
- class ytfetcher.YTFetcher(max_results: int, video_ids: list[str] | None, playlist_id: str | None = None, channel_handle: str | None = None, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False)¶
Bases:
objectYTFetcher is a high-level interface for fetching YouTube video metadata and transcripts.
It supports three modes of initialization: - From a channel handle (via from_channel) - From a playlist ID (via from_playlist_id) - From a list of specific video IDs (via from_video_ids)
Internally, it uses the yt-dlp to retrieve video snippets and metadata, and the youtube_transcript_api (with optional proxy support) to fetch transcripts.
- Parameters:
max_results (int) – Maximum number of videos to fetch.
video_ids (list[str]) – List of specific video IDs to fetch.
playlist_id (str | None)
channel_handle (str | None) – Optional YouTube channel handle (used when fetching from channel).
http_config (HTTPConfig) – Configuration for HTTP client behavior.
proxy_config (ProxyConfig | None) – Optional proxy settings for transcript fetching.
languages (Iterable[str]) – Preferred languages to fetch first, default to en
manually_created (bool) – Flag for fetching only manually created transcripts. Default to False
- async fetch_comments(max_comments: int = 20, max_workers: int = 30) list[ChannelData]¶
Fetches comments for all videos.
- Parameters:
max_comments – Max number of comments to fetch.
max_workers – Max number of workers for threads.
- Returns:
A list of objects containing only comments.
- Return type:
list[ChannelData]
- async fetch_snippets() list[ChannelData] | None¶
Returns the raw snippet data (metadata and video IDs) retrieved from the YouTube Data API.
- Returns:
An object containing video metadata and IDs.
- Return type:
list[ChannelData] | None
- async fetch_transcripts() list[ChannelData]¶
Returns only the transcripts from cached or freshly fetched YouTube data.
- Returns:
Transcripts only with video_id (excluding metadata).
- Return type:
list[ChannelData]
- async fetch_with_comments(max_comments: int = 20, max_workers: int = 30) list[ChannelData]¶
Fetches comments, addition to transcripts and metadata.
- Parameters:
max_comments – Max number of comments to fetch.
max_workers – Max number of workers for threads.
- Returns:
A list objects containing transcript text, metadata and comments.
- Return type:
list[ChannelData]
- async fetch_youtube_data() list[ChannelData]¶
Asynchronously fetches transcript and metadata for all videos retrieved from the channel or video IDs.
- Returns:
A list of objects containing transcript text and associated metadata.
- Return type:
list[ChannelData]
- classmethod from_channel(channel_handle: str, max_results: int = 50, http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False) YTFetcher¶
Create a fetcher that pulls up to max_results from the channel.
- classmethod from_playlist_id(playlist_id: str, max_results: int = 50, http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False) YTFetcher¶
Create a fetcher tthat fetches from given playlist id.
- classmethod from_video_ids(video_ids: list[str] = [], http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False) YTFetcher¶
Create a fetcher that only fetches from given video ids.
- property metadata: list[DLSnippet] | None¶
Metadata for each video, such as title, duration, and description.
- Returns:
List of Snippet objects containing video metadata.
- Return type:
list[DLSnippet] | None
- property video_ids: list[str]¶
List of video IDs fetched from the YouTube channel or provided directly.
- Returns:
Video ID strings.
- Return type:
list[str]