Subtitles / Untertitel
Voctoweb hat mehr aktuell zwei Varianten über die publizierte Vorträge Untertitel bekommen können:
- die klassischen Workflows über die Django-App auf c3subtitles.de (subtitleStatus)
- per Whisper/Transcribee über die Publishing API
Vereinfacht gesagt geht es dabei immer um zwei Tasks:
- das Publizieren der SRT/VTT Datei
- das Anlegen der Metadaten (aka „Subtitle Recording“) in voctoweb, damit die Untertitel auch im Player angezeigt werden
Publishing Gateway/API
Um sich nicht mit den Eigenheiten der c3voc Infrastrktur herumschlagen zu müssen wurde 2023/2024 eine REST API eingerichtet die es erlaubt Zusatzdaten wie Slides, Subtitles etc. rein per REST zu publizieren, ohne das SFTP o.ä. verwendet werden muss:
Die dazu notwendigen API Keys sind entweder statisch, oder Bearer-Token aus https://sso.c3voc.de/ – mehr Details dazu im Git-Repo.
Transcribee
Zum 38C3/39C3 kam https://github.com/bugbakery/transcribee-voctoweb-glue-2/wiki/How-to-create-subtitles hinzu:
At the moment, the process looks like this:
1. The glue code creates an entry for every talk uploaded to the congress event on media.ccc.de (state: `new`)
2. A [transcribee](https://github.com/bugbakery/transcribee) document is automatically created for every talk (state: `preparing`)
3. transcribee creates a draft transcription
4. You go to https://subtitles.bugbakery.org and claim a talk with the state `needs correction` or `partially corrected`
5. You correct the draft in transcribee using the `Open Editor` button
6. When you want to stop correcting, you click `Finish work` and
- enter the time up to which the transcript is corrected (MM:SS or HH:MM:SS) and click `Ok`
- or choose `Until the end` when the correction of the whole talk is finished (state: done)
7. When the whole talk is corrected, you can hit `Publish`, which uploads the transcript to the talk on https://media.ccc.de. Please note, that it can take a couple of moments for the new version to be visible to you due to caching.
If you want to fix a mistake in already published subtitles, you start with step 4 and claim a talk with the state `done`.
Nächster Schritte:
- die Software die aktuell auf https://subtitles.bugbakery.org/ läuft bei https://subtitles.c3voc.de/ und
- ans SSO anzubinden
- default mäßig alle neuen Talks auf media.ccc.de dort dann auch einlaufen lassen
— Andi 2026/06/20
Classic workflow via c3subtitles.de/Amara/etc.
Components
- sync_media_recordings.py uses CSV export from
subtitlesStatusto publish subtiles in vocotoweb via REST API:- was running every 10 minutes via systemd timer releasing.c3voc.de
:/etc/systemd/system/publish-subtitles.timer
- subtitleStatus (Django-App @ c3subtitles.de)
- Dashboard mit Übersicht des Transkriptionsstatus pro Konferenz und Vortrag
- Workflow-Manager
- …
- schiebt fertige Untertitel-Dateien (SRT) per rsync auf mirror.selfnet.de
CSV-Export from C3Subtitles:
https://c3subtitles.de/media_export/2020-12-30T0:00:00.99Z
Example:
To download the raw (draft) subtitles from Amara, append use https://amara.org/api/videos/{amara_key}/languages/{amara_lanuage}/subtitles/?format=vtt (compare https://apidocs.amara.org/#fetch-raw-subtitles)
States
For voctoweb (media.ccc.de) only states 7, 8 and 12 are relevant. Subtitle files in all other states should be ignored.
| ID | voctoweb | c3subtitles | additional information |
|---|---|---|---|
| 1 | Nothing available yet | irrelevant should not exist | |
| 2 | todo | Transcribed until | should exist |
| 3 | Transcript finished | might exist - still no timestamps | |
| 4 | Please do not touch, work in progress | Autotiming in process no timestamps | |
| 5 | Synced until | rare case of syncing by hand | |
| 6 | Syncing finished | with timestamps, usable as draft | |
| 7 | draft | Quality control done until | with timestamps, usable as draft |
| 8 | complete | Job completed | finished, obviously with timestamps and usable |
| 9 | Unknown | should not exist | |
| 11 | todo | Translated until | translation, not usable as draft |
| 12 | translated | Translation is finished | finished, obviously with timestamps and usable |
