Skip to content

RESOURCES / BLOG

Solving Long-Form Content Fatigue With Next.js and Cloudinary

We’ve all been there: You need one, specific clip from a 60-minute webinar, but scrubbing through the timeline feels like searching for a needle in a haystack. For developers and content teams, video is often a “black box” rich with information but opaque to search engines and users unless you manually tag every minute.

This hassle leads to content fatigue. Users bounce because they can’t find what they need, and teams burn out trying to manually transcribe and chapter every recording.

In this guide, you’ll combine Next.js 16 with Cloudinary’s AI video intelligence to build an AI video knowledge hub that automatically:

  1. Transcribes speech to text.
  2. Chapters video based on topic shifts.
  3. Indexes content for deep-search capability.

The result? A Netflix-style portal where users can search for a keyword (e.g., “Next.js Server Actions”) and jump instantly to the exact second it was spoken.

Before writing any code, you’ll need to lay the foundation. You’ll build on Next.js 16 (App Router) to handle server-side rendering and Cloudinary to offload the heavy AI processing.

Start by scaffolding a modern Next.js 16 application with TypeScript and Tailwind CSS for rapid styling.

npx create-next-app@latest ai-video-hub --typescript --tailwind --eslint
cd ai-video-hub
Code language: CSS (css)

Once inside, you’ll install the core dependencies for your media player and icons:

npm install next-cloudinary lucide-react

This is the most critical part of the entire build. You aren’t just uploading a video; you’re triggering an AI pipeline.

  1. Log in to your Cloudinary Console.
  2. Navigate to Settings > Upload > Upload Presets.
  3. Click Add Upload Preset and name it ai_video_hub_preset.
  4. Enabling AI transcription:
    • Go to the Add-ons tab.
    • Find Google AI Video Transcription and check the box.
    • This tells Cloudinary to automatically generate a .transcript (JSON) and .vtt (Subtitle) file the moment a video finishes uploading.
  5. Setting up auto-tagging:
    • Still in the preset settings, enable Auto-Tagging.
    • This ensures every uploaded video is automatically categorized (e.g., “technology”, “webinar”), making it easier to filter and search for content later without manual data entry.
  6. Unsigned Mode:
  • Set the Signing Mode to Unsigned.
  • This allows our frontend to upload directly to Cloudinary without exposing our API Secret.

To securely connect your Next.js application to Cloudinary, you’ll need to store your API credentials in an environment variable file.

In the root directory of your project (same level as package.json), create a new file named .env.local.

Paste the following keys into that file. You can find these values in your Cloudinary Console Dashboard under “Product Environment Credentials”.

Code snippet

# Exposed to the browser (Client-Side)
NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME="your_cloud_name"
NEXT_PUBLIC_CLOUDINARY_UPLOAD_PRESET="ai_video_hub_preset"

# Server-Side Only (Admin API & Signing)
CLOUDINARY_API_KEY="your_api_key"
CLOUDINARY_API_SECRET="your_api_secret"
Code language: PHP (php)

Now, every time a user uploads a video through our app, Cloudinary will silently process it in the background, generating the metadata we need for our “Knowledge Hub.”

Most tutorials use a simple <video> tag, but in a Next.js playlist, rapidly switching videos creates a “Zombie Player” race condition. React might unmount the old player after the new one initializes, causing the library to attach to a dead DOM node and resulting in a black screen.

{The Fix:} Unique Session IDs: Force React to completely destroy and rebuild the player by using a playbackSession timestamp as a unique key. {/The Fix}

app/page.tsx (The Controller)

const handleVideoSelect = (newId: string) => {
  setPlaybackSession(Date.now()); // Force fresh mount
  setPublicId(newId);
};

// ... inside render ...
<VideoStage
  key={`${publicId}-${playbackSession}`} // The magic key forces a hard reset
  publicId={publicId}
  // ...
/>;
Code language: JavaScript (javascript)

components/hub/video-stage.tsx (The Engine)

export function VideoStage({
  publicId,
  onTimeUpdate,
  playerRef,
}: VideoStageProps) {
  return (
    <div className="relative aspect-video bg-slate-900 rounded-2xl overflow-hidden">
      <CldVideoPlayer
        id={`player-${publicId}`}
        width="1920"
        height="1080"
        src={publicId}
        autoplay={true}
        onDataLoad={({ player }) => {
          playerRef.current = player;
          player.on("timeupdate", () => {
            onTimeUpdate(player.currentTime());
          });
        }}
      />
    </div>
  );
}
Code language: JavaScript (javascript)

View the full file on GitHub.

Now that you have a stable video player, you’ll need to feed it the AI-generated data. Cloudinary automatically creates a .transcript (JSON) and a .vtt (Subtitle) file for us, but retrieving them isn’t as simple as just fetching a URL.

Video processing and AI transcription happen at different speeds.

  • Video ready: Version v100 (Instant)
  • Transcript ready: Version v105 (5 seconds later)

If our app requests .../v100/my-video.transcript immediately, Cloudinary returns a 404 Not Found because the file was actually saved under v105.

You built a robust Server Action that refuses to give up.

  1. Fuzzy versioning: If the exact version match fails, check versions v+1 up to v+15. This “fuzziness” bridges the async gap perfectly.
  2. VTT fallback: If the rich JSON transcript is missing entirely, you’ll automatically fetch and parse the standard .vtt subtitle file as a backup.

Here’s the logic that powers your intelligence layer: app/actions/media-process.ts

export async function getTranscriptAction(
  publicId: string,
  videoVersion?: string
) {
  const baseUrl = `https://res.cloudinary.com/${process.env.NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME}/raw/upload`;

  // 1. GENERATE CANDIDATES (Fuzzy Versioning)
  // We check the current version AND the next 15 versions to catch async delays.
  const versionsToCheck = [];
  if (videoVersion) {
    const v = parseInt(videoVersion);
    for (let i = 0; i <= 15; i++) versionsToCheck.push(v + i);
  }

  // 2. CHECK CANDIDATES
  for (const v of versionsToCheck) {
    // Priority A: Rich JSON Transcript
    const jsonUrl = `${baseUrl}/v${v}/${publicId}.transcript`;
    const jsonRes = await fetch(jsonUrl);
    if (jsonRes.ok) return parseTranscriptJSON(await jsonRes.json());

    // Priority B: VTT Fallback (Standard Subtitles)
    const vttUrl = `${baseUrl}/v${v}/${publicId}.en-US.vtt`;
    const vttRes = await fetch(vttUrl);
    if (vttRes.ok) return parseVTT(await vttRes.text());
  }

  return []; 
}
Code language: JavaScript (javascript)

This single function ensures our UI never breaks, even when the AI is still “thinking” or when one file format fails to generate.

View the full file on GitHub.

With the transcript data in hand, you’ll need to display it. A static list isn’t enough; we want a dynamic experience that highlights the current sentence, scrolls automatically, and filters in real-time.

You don’t need to hit the server every time the user types. By using useMemo, we filter the entire transcript array in real-time. This keeps the UI snappy even with hour-long webinars containing thousands of lines.

// 1. Filter Logic: Updates instantly as 'query' changes
const filtered = useMemo(() => {
  if (!query) return transcript;
  return transcript.filter((t) =>
    t.text.toLowerCase().includes(query.toLowerCase())
  );
}, [transcript, query]);
Code language: JavaScript (javascript)

To keep the user oriented, we need the sidebar to follow the video.

  • Deep Linking: Clicking a segment calls onSeek(item.startTime), jumping the video player to that exact second.
  • Auto-Scroll: We use useEffect to watch the video’s currentTime. When a new segment becomes “active,” we automatically scroll it into the center of the view.
// 2. Auto-Scroll Logic: Keeps the active line in view
const activeSegmentRef = useRef < HTMLDivElement > null;

useEffect(() => {
  // Only auto-scroll if the user isn't actively searching
  if (!query && activeSegmentRef.current) {
    activeSegmentRef.current.scrollIntoView({
      behavior: "smooth",
      block: "center",
    });
  }
}, [currentTime, query]);
Code language: JavaScript (javascript)

View full file on GitHub**.

The player and the sidebar need to talk to each other. Plus, you’ll need a way to let users browse other videos without reloading the page.

Instead of keeping state inside the player or sidebar, lift it to your main page controller (app/page.tsx). This allows you to orchestrate the entire experience:

  • Player tells the Controller: “I’m at 00:45.”
  • Controller tells the Sidebar: “Highlight the segment at 00:45.”
  • Sidebar tells the Controller: “User clicked 02:30.”
  • Controller tells the Player: “Seek to 02:30.”

You shouldn’t have to manually update a JSON file every time you upload a video. Instead, you can use Cloudinary’s Admin API to fetch the latest uploads automatically.

app/actions/media-process.ts

export async function getPlaylistAction() {
  // Fetch all videos with the 'ai-knowledge-hub' tag
  const results = await cloudinary.search
    .expression("resource_type:video AND tags=ai-knowledge-hub")
    .sort_by("created_at", "desc")
    .max_results(10)
    .execute();

  return results.resources;
}
Code language: JavaScript (javascript)

Now, the moment you upload a video and it gets auto-tagged, it instantly appears in the playlist for everyone to see.

View the full file on GitHub.

What happens if the user loads the page immediately after uploading? The video might play, but the AI transcript won’t exist yet.

If you just fetch the transcript once and fail, the user will see an empty sidebar. You need to poll the endpoint until the data arrives.

Spamming the server every 100ms is bad practice. With exponential backoff, you wait 1 second, then 2, then 4, up to a maximum limit.

components/hub/insights-sidebar.tsx

// Inside our data fetching effect
const pollForTranscript = async (attempt = 0) => {
  try {
    const data = await getTranscriptAction(publicId, version);
    if (data && data.length > 0) {
      setTranscript(data);
      return; // Success! Stop polling.
    }

    // If fail, wait longer each time (1s, 2s, 4s...)
    const delay = Math.min(1000 * Math.pow(2, attempt), 10000);

    if (attempt < 5) {
      // Give up after 5 tries
      setTimeout(() => pollForTranscript(attempt + 1), delay);
    }
  } catch (e) {
    console.error("Polling failed", e);
  }
};
Code language: JavaScript (javascript)

This ensures that if the AI takes 10 seconds to finish processing, your UI patiently waits and then seamlessly pops in the data without the user ever needing to refresh.

View full file on GitHub.

By integrating Cloudinary’s AI at the upload stage, you eliminated accessibility debt that usually piles up with video content.

  1. Upload. A content manager drops a raw .mp4 into the upload widget.
  2. Analyze. Cloudinary generates captions (.vtt) and a transcript (.json) automatically.
  3. Deliver. The frontend detects these new assets and attaches them to the player.

You omit the fourth step, which would generally be typing out captions manually. The system heals itself. If you upload a video today, it’s accessible by default tomorrow.

Users get a professional interface where they can easily search within the video like it’s a document, read the captions synced with audio (great for non-native speakers), and navigate via chapters instead of scrubbing blindly.

Instead of opaque pixels, your video content is searchable, accessible, and interactive.

To recap your improvements:

  • You used f_auto and q_auto to ensure you’ll never serve a 100MB file when a 10MB AV1 version will suffice.
  • No more managing a complex backend! Next.js handles the UI, Cloudinary handles the AI.
  • You turned 60 minutes of tedious, manual work into five minutes of actionable insights.

Make sure to sign up for a free Cloudinary, and try this project out for yourself today.

Start Using Cloudinary

Sign up for our free plan and start creating stunning visual experiences in minutes.

Sign Up for Free