ytfetcher package

Subpackages

Submodules

ytfetcher.exceptions module

exception ytfetcher.exceptions.ExporterError

Bases: Exception

Base exception for all Exporter errors.

exception ytfetcher.exceptions.InvalidHeaders

Bases: YTFetcherError

Raises when headers are invalid.

exception ytfetcher.exceptions.NoDataToExport

Bases: ExporterError

Raises when channel snippets and transcripts are empty.

exception ytfetcher.exceptions.OutputDirectoryNotFoundError

Bases: ExporterError

Raised when the specified output directory does not exist.

exception ytfetcher.exceptions.YTFetcherError

Bases: Exception

Base exception for all YTFetcher errors.

Module contents

class ytfetcher.ChannelData(*, video_id: str, transcripts: list[Transcript] | None = None, metadata: DLSnippet | None = None, comments: list[Comment] | None = None)

Bases: BaseModel

comments: list[Comment] | None
metadata: DLSnippet | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_dict() dict
transcripts: list[Transcript] | None
video_id: str
class ytfetcher.DLSnippet(*, id: str, title: str, description: str | None = None, url: str | None = None, duration: float | None = None, view_count: int | None = None, thumbnails: list[dict] | None = None)

Bases: BaseModel

description: str | None
duration: float | None
model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

thumbnails: list[dict] | None
title: str
url: str | None
validate_url() DLSnippet

If URL is missing, build it using the video_id.

video_id: str
view_count: int | None
class ytfetcher.VideoTranscript(*, video_id: str, transcripts: list[Transcript])

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_dict() dict
transcripts: list[Transcript]
video_id: str
class ytfetcher.YTFetcher(max_results: int, video_ids: list[str] | None, playlist_id: str | None = None, channel_handle: str | None = None, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False)

Bases: object

YTFetcher is a high-level interface for fetching YouTube video metadata and transcripts.

It supports three modes of initialization: - From a channel handle (via from_channel) - From a playlist ID (via from_playlist_id) - From a list of specific video IDs (via from_video_ids)

Internally, it uses the yt-dlp to retrieve video snippets and metadata, and the youtube_transcript_api (with optional proxy support) to fetch transcripts.

Parameters:
  • max_results (int) – Maximum number of videos to fetch.

  • video_ids (list[str]) – List of specific video IDs to fetch.

  • playlist_id (str | None)

  • channel_handle (str | None) – Optional YouTube channel handle (used when fetching from channel).

  • http_config (HTTPConfig) – Configuration for HTTP client behavior.

  • proxy_config (ProxyConfig | None) – Optional proxy settings for transcript fetching.

  • languages (Iterable[str]) – Preferred languages to fetch first, default to en

  • manually_created (bool) – Flag for fetching only manually created transcripts. Default to False

async fetch_comments(max_comments: int = 20, max_workers: int = 30) list[ChannelData]

Fetches comments for all videos.

Parameters:
  • max_comments – Max number of comments to fetch.

  • max_workers – Max number of workers for threads.

Returns:

A list of objects containing only comments.

Return type:

list[ChannelData]

async fetch_snippets() list[ChannelData] | None

Returns the raw snippet data (metadata and video IDs) retrieved from the YouTube Data API.

Returns:

An object containing video metadata and IDs.

Return type:

list[ChannelData] | None

async fetch_transcripts() list[ChannelData]

Returns only the transcripts from cached or freshly fetched YouTube data.

Returns:

Transcripts only with video_id (excluding metadata).

Return type:

list[ChannelData]

async fetch_with_comments(max_comments: int = 20, max_workers: int = 30) list[ChannelData]

Fetches comments, addition to transcripts and metadata.

Parameters:
  • max_comments – Max number of comments to fetch.

  • max_workers – Max number of workers for threads.

Returns:

A list objects containing transcript text, metadata and comments.

Return type:

list[ChannelData]

async fetch_youtube_data() list[ChannelData]

Asynchronously fetches transcript and metadata for all videos retrieved from the channel or video IDs.

Returns:

A list of objects containing transcript text and associated metadata.

Return type:

list[ChannelData]

classmethod from_channel(channel_handle: str, max_results: int = 50, http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False) YTFetcher

Create a fetcher that pulls up to max_results from the channel.

classmethod from_playlist_id(playlist_id: str, max_results: int = 50, http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False) YTFetcher

Create a fetcher tthat fetches from given playlist id.

classmethod from_video_ids(video_ids: list[str] = [], http_config: ~ytfetcher.config.http_config.HTTPConfig = <ytfetcher.config.http_config.HTTPConfig object>, proxy_config: ~youtube_transcript_api.proxies.ProxyConfig | None = None, languages: ~typing.Iterable[str] = ('en', ), manually_created: bool = False) YTFetcher

Create a fetcher that only fetches from given video ids.

property metadata: list[DLSnippet] | None

Metadata for each video, such as title, duration, and description.

Returns:

List of Snippet objects containing video metadata.

Return type:

list[DLSnippet] | None

property video_ids: list[str]

List of video IDs fetched from the YouTube channel or provided directly.

Returns:

Video ID strings.

Return type:

list[str]