Subtitles / Untertitel
Voctoweb hat mehr aktuell zwei Möglichkeiten damit hochgeladene Vortäge Untertitel bekommen:
- die klassischen Workflows über die Django-App auf c3subtitles.de (subtitleStatus)
- per Whisper/Transcribee über die Publishing API
Vereinfacht gesagt geht es dabei immer um zwei Tasks:
- das Publizieren der SRT/VTT Datei
- das anlegen der Metadaten (aka “Subtitle Recording”) in voctoweb, damit die Untertitel auch im Player angezeigt werden
via Publishing API
TBD
Architecture overview in context of media.ccc.de
- Sync-Skript von subtitlesStatus → vocotoweb: https://github.com/voc/scripts/blob/master/subtitles/sync_media_recordings.py
- systemd Timer auf releasing.c3voc.de
-
- Dashboard mit Übersicht des Transkriptionsstatus pro Konferenz und Vortrag
- Workflow-Manager
- …
- schiebt fertige Untertitel-Dateien (SRT) per rsync auf mirror.selfnet.de
CSV-Export from C3Subtitles:
https://c3subtitles.de/media_export/2020-12-30T0:00:00.99Z
Example:
To download the raw (draft) subtitles from Amara, append use https://amara.org/api/videos/{amara_key}/languages/{amara_lanuage}/subtitles/?format=vtt
(compare https://apidocs.amara.org/#fetch-raw-subtitles)
States
For voctoweb (media.ccc.de) only states 7, 8 and 12 are relevant. Subtitle files in all other states should be ignored.
ID | voctoweb | c3subtitles | additional information |
---|---|---|---|
1 | Nothing available yet | irrelevant should not exist | |
2 | todo | Transcribed until | should exist |
3 | Transcript finished | might exist - still no timestamps | |
4 | Please do not touch, work in progress | Autotiming in process no timestamps | |
5 | Synced until | rare case of syncing by hand | |
6 | Syncing finished | with timestamps, usable as draft | |
7 | draft | Quality control done until | with timestamps, usable as draft |
8 | complete | Job completed | finished, obviously with timestamps and usable |
9 | Unknown | should not exist | |
11 | todo | Translated until | translation, not usable as draft |
12 | translated | Translation is finished | finished, obviously with timestamps and usable |
Communication
- Twitter: twitter.com/c3subtitles
- Mailinglist: subtitles-angels -at- lists.selfnet.de
- Mailinglist: subtitles -at- lists.ccc.de
- IRC: #subtitles auf hackint. Requires SSL. - but also the #voc channel
- Etherpad-Domain: https://subtitles.pads.ccc.de
- Jabber: c3subtitles -!-at-!- jabber.ccc.de
- Videos on amara.org : c3subtitles videos on amara.org
- E-Mail: subtitles -!-at-!- c3voc.de
What is our goal?
Better and more barrierfree access to the live talks and streams and to the videos afterwards via subtitles. Especially for non-natives of the spoken languages and for deaf and hard of hearing listeners.
Nice side effect: finished subtitles are pretty easy to translate in any other language, amara.org also provides a very easy usable interface for that purpose.
How can I help?
- If you visit the congress and are a user of a speech recognition software, please contact us! Also if you are a computer stenography writer or a good touch typist.
- If you are interested in what we are working on behind the scenes, just contact us!
- Help us creating the subtitles via amara.org - you do not even have to visit the congress to do that! Everybody from at home can do that!
What are our current projects behind the scenes?
- Devoloping software for a user interface to choose which subtitle you want to work on depending on your favorite task
- Developing software for subtitles via computer stenography or speech recognition, visible live in the talk via webstream and later as start for the precise version to work on in amara.org
- Developing a phonetic german steno keyboard layout
- Building a steno keyboard
- Using an old mechanical stenographer with a micro controller to detect the pressed keys as steno input