ytfetcher package¶

Subpackages¶

Submodules¶

ytfetcher.exceptions module¶

exception ytfetcher.exceptions.ExporterError¶

Bases: Exception

Base exception for all Exporter errors.

exception ytfetcher.exceptions.InvalidHeaders¶

Bases: YTFetcherError

Raises when headers are invalid.

exception ytfetcher.exceptions.NoDataToExport¶

Bases: ExporterError

Raises when channel snippets and transcripts are empty.

exception ytfetcher.exceptions.OutputDirectoryNotFoundError¶

Bases: ExporterError

Raised when the specified output directory does not exist.

exception ytfetcher.exceptions.YTFetcherError¶

Bases: Exception

Base exception for all YTFetcher errors.

Module contents¶

class ytfetcher.ChannelData(*, video_id: str, transcripts: list[Transcript] | None = None, metadata: DLSnippet | None = None, comments: list[Comment] | None = None)¶

Bases: BaseModel

comments: list[Comment] | None¶

metadata: DLSnippet | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_dict() → dict¶

transcripts: list[Transcript] | None¶

video_id: str¶

Bases: BaseModel

description: str | None¶

duration: float | None¶

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

thumbnails: list[dict] | None¶

title: str¶

url: str | None¶

validate_url() → DLSnippet¶: If URL is missing, build it using the video_id.

video_id: str¶

view_count: int | None¶

class ytfetcher.VideoTranscript(*, video_id: str, transcripts: list[Transcript])¶

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_dict() → dict¶

transcripts: list[Transcript]¶

video_id: str¶

class ytfetcher.YTFetcher(max_results: int, video_ids: list[str] | None, playlist_id: str | None = None, channel_handle: str | None = None, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False)¶

Bases: object

YTFetcher is a high-level interface for fetching YouTube video metadata and transcripts.

It supports three modes of initialization: - From a channel handle (via from_channel) - From a playlist ID (via from_playlist_id) - From a list of specific video IDs (via from_video_ids)

Internally, it uses the yt-dlp to retrieve video snippets and metadata, and the youtube_transcript_api (with optional proxy support) to fetch transcripts.

Parameters:

max_results (int) – Maximum number of videos to fetch.
video_ids (list[str]) – List of specific video IDs to fetch.
playlist_id (str | None)
channel_handle (str | None) – Optional YouTube channel handle (used when fetching from channel).
http_config (HTTPConfig) – Configuration for HTTP client behavior.
proxy_config (ProxyConfig | None) – Optional proxy settings for transcript fetching.
languages (Iterable[str]) – Preferred languages to fetch first, default to en
manually_created (bool) – Flag for fetching only manually created transcripts. Default to False

async fetch_comments(max_comments: int = 20, max_workers: int = 30) → list[ChannelData]¶

Fetches comments for all videos.

Parameters:

max_comments – Max number of comments to fetch.
max_workers – Max number of workers for threads.

Returns:

A list of objects containing only comments.

Return type:

list[ChannelData]

async fetch_snippets() → list[ChannelData] | None¶

Returns the raw snippet data (metadata and video IDs) retrieved from the YouTube Data API.

Returns:: An object containing video metadata and IDs.
Return type:: list[ChannelData] | None

async fetch_transcripts() → list[ChannelData]¶

Returns only the transcripts from cached or freshly fetched YouTube data.

Returns:: Transcripts only with video_id (excluding metadata).
Return type:: list[ChannelData]

async fetch_with_comments(max_comments: int = 20, max_workers: int = 30) → list[ChannelData]¶

Fetches comments, addition to transcripts and metadata.

Parameters:

max_comments – Max number of comments to fetch.
max_workers – Max number of workers for threads.

Returns:

A list objects containing transcript text, metadata and comments.

Return type:

list[ChannelData]

async fetch_youtube_data() → list[ChannelData]¶

Asynchronously fetches transcript and metadata for all videos retrieved from the channel or video IDs.

Returns:: A list of objects containing transcript text and associated metadata.
Return type:: list[ChannelData]

classmethod from_channel(channel_handle: str, max_results: int = 50, http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False) → YTFetcher¶: Create a fetcher that pulls up to max_results from the channel.

classmethod from_playlist_id(playlist_id: str, max_results: int = 50, http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False) → YTFetcher¶: Create a fetcher tthat fetches from given playlist id.

classmethod from_video_ids(video_ids: list[str] = [], http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False) → YTFetcher¶: Create a fetcher that only fetches from given video ids.

property metadata: list[DLSnippet] | None¶

Metadata for each video, such as title, duration, and description.

Returns:: List of Snippet objects containing video metadata.
Return type:: list[DLSnippet] | None

property video_ids: list[str]¶

List of video IDs fetched from the YouTube channel or provided directly.

Returns:: Video ID strings.
Return type:: list[str]

ytfetcher package¶

Subpackages¶

Submodules¶

ytfetcher.exceptions module¶

Module contents¶

ytfetcher

Navigation

Related Topics