clipsai.comAI tool

Clips AI

clipsai.com
Plans tarifaires

Aucun plan tarifaire detaille n'est encore disponible pour cet outil.

Presentation detaillee

Clips AI Documentation Clips AI is an open-source Python library that automatically converts longform video into clips. With just a few lines of code, you can segment a video into multiple clips and resize its aspect ratio from 16:9 to 9:16. GitHubDemo Quickstart Clips AI is designed for audio-centric, narrative-based videos such as podcasts, interviews, speeches, and sermons. Our clipping algorithm analyzes a video's transcript to identify and create clips. Our resizing algorithm dynamically reframes videos to focuse on the current speaker, converting the video into various aspect ratios. Installation Install Python dependencies. We highly suggest using a virtual environment (such as venv) to avoid dependency conflicts pip install clipsai CopyCopied! pip install whisperx@git+https://github.com/m-bain/whisperx.git CopyCopied! Install libmagic Install ffmpeg Creating clips Since clips are found using the video's transcript, the video must first be transcribed. Transcribing is done with WhisperX, an open-source wrapper on Whisper with additional functionality for detecting start and stop times for each word. For trimming the original video into a chosen clip, refer to the clipping reference. from clipsai import ClipFinder, Transcriber transcriber = Transcriber() transcription = transcriber.transcribe(audio_file_path="/abs/path/to/video.mp4") clipfinder = ClipFinder() clips = clipfinder.find_clips(transcription=transcription) print("StartTime: ", clips[0].start_time) print("EndTime: ", clips[0].end_time) CopyCopied! Read clipping reference Resizing a video A hugging face access token is required to resize a video since Pyannote is utilized for speaker diarization. You won't be charged for using Pyannote and instructions are on the Pyannote HuggingFace page. For resizing the original video to the desired aspect ratio, refer to the resizing reference. from clipsai import resize crops = resize( video_file_path="/abs/path/to/video.mp4", pyannote_auth_token="pyannote_token", aspect_ratio=(9, 16) ) print("Crops: ", crops.segments) CopyCopied! Read resizing reference --- Clip The clipping feature leverages the TextTiling algorithm to segment long-form audio content into coherent clips using the transcript. This approach, first conceptualized by Marti A. Hearst in the 1990s, detects shifts in the topics of a piece of content by analyzing word usage and distribution patterns. Thanks to recent advances in NLP, Texttiling with BERT Embeddings provides significant improvements over Texttiling's original formulation and can be readily applied to SoA transcriptions using Whisper. The algorithm segments the text at the granularity of sentences with the entire process focusing on detecting topic shifts rather than topics themselves. This is particularly effective in identifying distinct sections within a narrative and, consequently, clips of varying lengths optimized for short and extended audio content segments. Usage The following returns the start and end time of the clips. from clipsai import ClipFinder, Transcriber transcriber = Transcriber() transcription = transcriber.transcribe(audio_file_path="/abs/path/to/video.mp4") clipfinder = ClipFinder() clips = clipfinder.find_clips(transcription=transcription) print("StartTime: ", clips[0].start_time) print("EndTime: ", clips[0].end_time) CopyCopied! To trim the video using the returned clips, run the following code. media_editor = clipsai.MediaEditor() # use this if the file contains audio stream only media_file = clipsai.AudioFile("/abs/path/to/audio_only_file.mp4") # use this if the file contains both audio and video stream media_file = clipsai.AudioVideoFile("/abs/path/to/video.mp4") clip = clips[0] # select the clip you'd like to trim clip_media_file = media_editor.trim( media_file=media_file, start_time=clip.start_time, end_time=clip.end_time, trimmed_media_file_path="/abs/path/to/clip.mp4", # doesn't exist yet ) CopyCopied! ClipFinder Class Source Code A class for finding engaging clips based on the input transcript. Methods Namefind_clipsType-> list[Clip]DescriptionFinds clips in an audio file's transcription using the TextTiling Algorithm. Required Parameters NametranscriptionTranscriptionDescriptionThe transcription of an audio or video file to find clips from. Clip Class Source Code Represents a clip of a video or audio file. Properties Namestart_timeTypestringDescriptionThe start time of the clip in seconds.Nameend_timeTypestringDescriptionThe end time of the clip in seconds.Namestart_charTypestringDescriptionThe start character in the transcription of the clip.Nameend_charTypestringDescriptionThe end character in the transcription of the clip. Methods NamecopyType-> ClipDescriptionReturns a copy of the Clip instance.Nameto_dictType-> dictDescriptionReturns a dictionary representation of the clip. --- Resize The resizing feature relies on three subcomponents: Speaker diarization with Pyannote Scene change detection with PySceneDetect Face detection with MTCNN and MediaPipe These libraries are leveraged to dynamically resize a video to focus on whoever is speaking at any given moment. For a detailed explanation of the algorithm, read here. Usage The following returns the information to be able to resize the video. from clipsai import resize crops = resize( video_file_path="/abs/path/to/video.mp4", pyannote_auth_token="pyannote_token", aspect_ratio=(9, 16) ) print("Crops: ", crops.segments) CopyCopied! To resize the video using the returned crops, run the following code. media_editor = clipsai.MediaEditor() # use this if the file contains video stream only media_file = clipsai.VideoFile("/abs/path/to/video_only_file.mp4") # use this if the file contains both audio and video stream media_file = clipsai.AudioVideoFile("/abs/path/to/video.mp4") resized_video_file = media_editor.resize_video( original_video_file=media_file, resized_video_file_path="/abs/path/to/resized/video.mp4", # doesn't exist yet width=crops.crop_width, height=crops.crop_height, segments=crops.to_dict()["segments"], ) CopyCopied! Resize Function Source Code NameresizeType-> CropsDescriptionDynamically resizes a video to a specified aspect ratio (default 9:16) to focus on the current speaker Required Parameters Namevideo_file_pathstringDescriptionAbsolute path to the video file to resize.Namepyannote_auth_tokenstringDescriptionAuthentication token for Pyannote, obtained from HuggingFace. Optional Parameters Nameaspect_ratiotuple[int, int] = (9, 16)DescriptionThe target aspect ratio for resizing the video (width, height). Default is (9, 16).Namemin_segment_durationfloat = 1.5DescriptionThe minimum duration in seconds for a diarized speaker segment to be considered. Default is 1.5.Namesamples_per_segmentint = 13DescriptionThe number of samples to take per speaker segment for face detection. Default is 13. Reduce this for faster performance (at the sake of worse accuracy).Nameface_detect_widthint = 960DescriptionThe width in pixels to which the video will be downscaled for face detection. Smaller widths detect faster, but may be less accurate. Default is 960.Nameface_detect_marginint = 20DescriptionMargin around detected faces, used in the MTCNN face detector. Default is 20.Nameface_detect_post_processbool = FalseDescriptionIf set to True, post-processing is applied to the face detection output to make it appear more natural. Default is False.Namen_face_detect_batchesint = 8DescriptionNumber of batches for processing face detection when using GPUs. This is vital for proper memory allocation. Default is 8.Namemin_scene_durationfloat = 0.25DescriptionMinimum duration in seconds for a scene to be considered during scene detection. Default is 0.25.Namescene_merge_thresholdfloat = 0.25DescriptionThreshold in seconds for merging scene changes with speaker segments. Default is 0.25.Nametime_precisionint = 6DescriptionPrecision (number of decimal places) for start and end times of the segments. Default is 6. Less than 4 decimal places may result in rounding errors for the purposes of cropping the video with ffmpeg.Namedevicestring: cuda | cpu = NoneDescriptionPyTorch device to perform computations on. Default is None, which auto detects the correct device. Crops Class Source Code Represents the resizing information for an entire video including the video's original width and height dimensions, the video's resized width and height dimensions, and the segments of the video for focusing on the current speaker. Segments are defined over an interval of time, providing the x-y coordinate of the top left corner of a rectangle with pixel dimensions crop_width by crop_height to focus on the current speaker. Properties Namecrop_widthTypeintDescriptionThe width of the resized video in number of pixels.Namecrop_heightTypeintDescriptionThe height of the resized video in number of pixels.Nameoriginal_widthTypeintDescriptionThe width of the original video in number of pixels.Nameoriginal_heightTypeintDescriptionThe height of the original video in number of pixels.NamesegmentsTypeList[Segment]DescriptionThe list of Segments providing the crop coordinates and times. Methods NamecopyType-> CropsDescriptionReturns a copy of the Crops instance.Nameto_dictType-> dictDescriptionReturns a dictionary representation of the Crops instance. Segment Class Source Code Segments are defined over an interval of time in the video, providing the x-y coordinate of the top left corner of a rectangle with pixel dimensions crop_width by crop_height to focus on the current speaker. Properties NamexTypeintDescriptionThe x coordinate of the top left corner of the crop from the original video.NameyTypeintDescriptionThe y coordinate of the top left corner of the crop from the original video.Namestart_timeTypefloatDescriptionThe start time of the segment in seconds.Nameend_timeTypefloatDescriptionThe end time of the segment in seconds.NamespeakersTypeList[int]DescriptionReturns a list of speaker identifiers in this segment. Each identifier uniquely represents a speaker in the entire video. Methods NamecopyType-> SegmentDescriptionReturns a copy of the Segment instance.Nameto_dictType-> dictDescriptionReturns a dictionary representation of the Segment properties. --- Transcribe The transcribing feature utilizes WhisperX (an open-source wrapper on Whisper with additional functionality for detecting start and stop times for each word) to transcribe audio or video. Transcribing the content produces a Transcription object with comprehensive transcription information including the word-level, character-level, and sentence-level timestamps. Transcribing content is a prerequisite for clipping content. Usage from clipsai import Transcriber transcriber = Transcriber() transcription: Transcription = transcriber.transcribe( audio_file_path="/abs/path/to/video.mp4" ) CopyCopied! Transcriber Class Source Code A class for transcribing audio or video using WhisperX. Methods NametranscribeType-> TranscriptionDescriptionTranscribes an audio or video file. Required Parameters Nameaudio_file_pathstringDescriptionAbsolute path to the audio or video file to transcribe. Optional Parameters Nameiso6391_lang_codestring = NoneDescriptionISO 639-1 language code to transcribe the media in. Default is None, which autodetects the media's language.Namebatch_sizeint = 16Descriptionwhisperx batch size. Reduce if low on GPU memory. Namedetect_languageType-> stringDescriptionDetects the language of an audio or video file. Required Parameters Nameaudio_file_pathstringDescriptionAbsolute path to the audio or video file to transcribe. Optional Parameters Nameiso6391_lang_codestring = NoneDescriptionISO 639-1 language code to transcribe the media in. Default is None, which autodetects the media's language.Namebatch_sizeint = 16Descriptionwhisperx batch size. Reduce if low on GPU memory. Transcription Class Source Code The Transcription class offers a detailed breakdown of audio or video transcriptions. It enables thorough analysis by providing structured access to the content at multiple levels - from individual characters and words to full sentences. Properties NamecharactersTypelist[Character]DescriptionA list of characters from the text as Character objects and ordered by start time.NamewordsTypelist[Word]DescriptionA list of words from the text as Word objects and ordered by start time.NamesentencesTypelist[Sentence]DescriptionA list of sentences from the text as Sentence objects and ordered by start time.NametextTypestringDescriptionThe full textual content of the transcription.NamelanguageTypestringDescriptionThe ISO 639-1 language code of the transcription's language.Namecreated_timeTypedatetimeDescriptionThe time when the transcription was created.Namestart_timeTypefloatDescriptionThe start time of the transcript in seconds.Nameend_timeTypefloatDescriptionThe end time of the transcript in seconds.Namesource_softwareTypestringDescriptionThe software used for transcribing. Methods Namefind_word_indexType-> intDescriptionFinds the index in the transcript's character info who's start or end time is closest to 'target_time' (seconds). Required Parameters Nametarget_timefloatDescriptionThe time in seconds to search for.Nametype_of_timestring: start | endDescription start: returns the index of the word with the closest start time before target_time. end: returns the index of the word with the closest end time after target time. Namefind_sentence_indexType-> intDescriptionFinds the index in the transcript's sentence info who's start or end time is closest to 'target_time' (seconds). Required Parameters Nametarget_timefloatDescriptionThe time in seconds to search for.Nametype_of_timestring: start | endDescription start: returns the index of the sentence with the closest start time before target_time. end: returns the index of the sentence with the closest end time after target time. Sentence Class Source Code Represents a sentence in a transcription. Properties Namestart_timeTypefloatDescriptionThe start time of the sentence in seconds.Nameend_timeTypefloatDescriptionThe end time of the sentence in seconds.Namestart_charTypeintDescriptionThe index of the sentence's start character in the full text.Nameend_charTypeintDescriptionThe index of the sentence's end character in the full text.NametextTypestringDescriptionThe text of the word. Methods Nameto_dictType-> dictDescriptionReturns the properties of the sentence as a dictionary. Word Class Source Code Represents a word in a transcription. Properties Namestart_timeTypefloatDescriptionThe start time of the word in seconds.Nameend_timeTypefloatDescriptionThe end time of the word in seconds.Namestart_charTypeintDescriptionThe index of the word's start character in the full text.Nameend_charTypeintDescriptionThe index of the word's end character in the full text.NametextTypestringDescriptionThe text of the word. Methods Nameto_dictType-> dictDescriptionReturns the properties of the word as a dictionary. Character Class Source Code Represents a character in a transcription. Properties Namestart_timeTypefloatDescriptionThe start time of the character in seconds.Nameend_timeTypefloatDescriptionThe end time of the character in seconds.Nameword_indexTypeintDescriptionThe index of the word in the transcription of the character.Namesentence_indexTypeintDescriptionThe index of the sentence in the transcription of the character.NametextTypestringDescriptionThe text of the character. Methods Nameto_dictType-> dictDescriptionReturns the properties of the character as a dictionary.