SaaS

AI 유튜브 요약 영상 생성기

사용자가 제공한 여러 유튜브 영상의 핵심 정보들을 바탕으로 새로운 정보성 유튜브 영상을 자동으로 제작해주는 서비스

Pain Level

High

Painkiller

정보 수집, 대본 작성, 촬영, 편집, 자막 작업 등 영상 제작 과정은 매우 시간 소모적이고 노동 집약적입니다. 이 과정을 자동화하는 것은 크리에이터의 핵심적인 고통을 해결해줍니다.

Target Customer

Persona

정보성 콘텐츠 유튜버, 지식 크리에이터, 마케터

Context

영상 제작에 드는 시간과 노력을 줄이고 싶을 때, 다양한 소스에서 얻은 정보를 하나의 영상으로 빠르게 재구성하고 싶을 때

Motivation

콘텐츠 제작의 효율성을 극대화하여 더 많은 영상을, 더 빠르게 업로드하고 싶어함

💡 Current Solutions & Differentiation

주로 Vrew, CapCut, Adobe Premiere Pro 같은 영상 편집 툴을 사용하여 수동으로 편집 및 제작합니다. 정보 수집 및 대본 작성 과정도 별도로 직접 진행하며, 일부는 뤼튼(Wrtn)이나 다글로(Daglo) 같은 AI 요약 도구를 보조적으로 사용합니다.

Key Differentiation

단순 텍스트-영상 변환을 넘어, 여러 '유튜브 영상'을 소스로 하여 그 맥락과 내용을 AI가 이해하고, 이를 유기적으로 재구성한 새로운 '정보성 영상' 대본과 결과물을 만든다는 점이 핵심 차별점입니다.

⚔️ Competitors

HeyGen

https://www.heygen.com/

Strengths

텍스트 입력만으로 AI 아바타가 말하는 고품질 영상을 빠르게 생성
다양한 언어와 목소리 지원
복잡한 편집 기술 없이도 사용 가능

Weaknesses

한국어 AI 목소리나 아바타의 발음이 부자연스러울 수 있음
사용자가 직접 대본을 완벽하게 준비해야 함
아이디어의 핵심인 '다른 영상'에서 정보를 추출하고 재구성하는 기능은 없음

Vrew (브루)

https://vrew.voyagerx.com/

Strengths

음성인식을 통한 자동 자막 생성 기능이 매우 강력하며 한국 시장에서 인지도가 높음
다양한 AI 목소리 지원 및 직관적인 컷 편집 기능
저작권 문제 없는 방대한 스톡 이미지/비디오 소스 제공

Weaknesses

자동 영상 '생성'보다는 AI를 활용한 '편집 보조'에 가까움
사용자가 직접 영상 소스를 준비하고 편집의 큰 틀을 잡아야 함
여러 영상의 정보를 종합하여 새로운 콘텐츠를 만드는 기능은 없음

Daglo (다글로)

https://daglo.ai/

Strengths

유튜브 영상 링크만으로 텍스트 스크립트 추출 및 요약 기능 제공
회의록, 강의 등 음성 파일을 텍스트로 변환하는 데 특화됨
정보 수집 및 리서치 단계의 시간을 단축시켜 줌

Weaknesses

텍스트 요약 및 분석에만 초점이 맞춰져 있음
영상 자체를 생성하거나 편집하는 기능은 전혀 없음
아이디어의 최종 결과물인 '영상'을 만드는 경쟁자가 아닌, 과정의 일부를 해결하는 도구

Builder Tools

Take this idea to the next level with AI-generated documentation.

📚 Project Documentation

Turn your idea into actionable specs and code plans.

0 Free Credits Available

Goal

To automate the time-consuming and labor-intensive process of creating informational videos. We solve the core pain point for creators by automatically summarizing, scripting, and producing new video content from multiple existing YouTube videos.

Target Audience

Informational content YouTubers, knowledge creators, and marketers who aim to maximize their content production efficiency and upload more content, faster.

Value Prop

Unlike simple text-to-video converters, our service uses multiple YouTube videos as a source. The AI understands the context and content of these sources, then organically reconstructs the key information into a new, unique, and coherent informational video script and final product.

Functional Requirements (Features)

P0 (MVP)

Multi-URL YouTube Content Ingestion

A user interface allowing the input of two or more valid YouTube video URLs. The system must validate the links, and on the backend, extract the full transcript or audio for processing.

P0 (MVP)

AI Content Synthesis & Script Generation

Utilize a Large Language Model (e.g., via OpenAI API) to process the combined transcripts from the source videos. The model must identify the most critical points, remove redundancies, and structure the information into a new, coherent narrative script for a new video. The output must be editable text.

P0 (MVP)

AI Text-to-Speech Voiceover

Integrate with a voice generation API (e.g., ElevenLabs) to convert the final script into a high-quality, natural-sounding audio voiceover. For the MVP, provide a selection of 3-5 pre-selected English and Korean voices.

P0 (MVP)

Automated Visual Assembly

The system must programmatically search and select relevant stock videos and images from an API (e.g., Pexels, Pixabay) based on keywords extracted from the script's sentences or scenes. These visuals will be automatically timed and stitched together with the voiceover track to create a complete video.

P0 (MVP)

Video Preview and Export

A player to preview the final generated video before downloading. An 'Export' function that renders the project into a single 1080p MP4 file and makes it available for the user to download.

P1 (Nice to have)

Simple Visual Editor

After the initial video is generated, allow users to replace individual stock footage clips. The UI should show the script sentence-by-sentence and the corresponding visual, with a button to search for and select an alternative clip from the stock footage library.

User Flow

Step 1: User signs up and logs into the dashboard.

Step 2: User creates a new project and inputs two or more YouTube video URLs to be used as sources.

Step 3: The system fetches and analyzes the transcripts from the source URLs.

Step 4: The AI synthesizes the key information from all sources and generates a single, consolidated video script.

Step 5: The user reviews the generated script and can make minor text edits.

Step 6: The system generates an AI voiceover from the script and automatically assembles a video by matching the voiceover with relevant stock footage and images.

Step 7: The user previews the final generated video.

Step 8: The user exports the final video in a standard format (e.g., MP4) for upload to YouTube.

Data Entities

User

Stores user authentication info (email, hashed password, OAuth tokens), subscription plan details, and workspace information.

Project

Represents a single video creation job. Attributes include project_id, user_id, title, status (e.g., 'processing', 'completed', 'failed'), and timestamps.

SourceContent

Stores data related to the input URLs for a project. Attributes include source_id, project_id, original_youtube_url, and the fetched_transcript.

GeneratedAsset

Stores the artifacts created by the AI for a project. Attributes include asset_id, project_id, asset_type ('script', 'audio', 'video'), generated_text_script, and storage_path for media files (e.g., S3 URL for audio and final video).

Non-Functional Req

Performance: Video rendering must be handled by asynchronous background workers to avoid blocking the user interface. Users should be notified (e.g., by email) when their video is ready. Target generation time for a 5-minute video should be under 20 minutes.
Security: All user data must be encrypted at rest and in transit. Implement standard security practices for authentication and session management.
Copyright & Ethics: The UI must display a clear disclaimer regarding copyright and Fair Use policies. The system should be positioned as a tool to assist in creating transformative new works, and the responsibility for final publication lies with the user.
Scalability: The architecture should be designed to handle a growing number of concurrent video rendering jobs, likely using a queue-based system and scalable cloud compute instances.
SEO: The main landing page and any public-facing content should be server-side rendered (per the Next.js tech stack) to ensure they are crawlable and optimized for search engines.