Recently, one of our customers has requested our assistance in migrating their content from Adobe Connect to Kaltura.
This post will cover the challenges we faced and the approaches we took to overcome them.
We’ve released the project as FOSS [licensed under AGPLv3] and contributions are most welcome.
So, what is Adobe Connect?
Adobe Connect is a proprietary platform for virtual presentations, conferencing sessions and screen recordings.
These virtual meeting rooms consist of widgets, or ‘pods’ in Adobe terminology, with each pod performing a designated role [presentation, camera feed, chat, attendee list, shared files, etc].
For this project, we used the Adobe Connect API in order to retrieve the metadata for the assets [name, creation date, creator, etc].
Several third-party FOSS clients are available, we chose adobe_connect which is written in Ruby.
Sadly, no API for obtaining one cohesive media file of a given session is available. Which, of course, is the most crucial component.
Obtaining the assets
To properly appreciate the challenge, one must first understand how Adobe Connect stores the assets.
A typical asset [recording] consists of the following:
- Audio FLV files
- Video FLV files
- Widget [pod] FLV files
- Metadata XML files
Using FFmpeg, it is easy to merge the audio and video FLVs into one “flat” [standalone] media file.
However, as mentioned before, a recording may also include various widgets and in many cases, these contain crucial content.
For example, consider a session in which the speaker has presented slides. Obviously, if these cannot be viewed, the recording becomes rather meaningless. By the same token, if the speaker is addressing a question asked by a remote attendee and the question is not shown, the context would be lost.
Because the widget FLVs are not proper media files, tools like FFmpeg cannot assist in handling them.
Even if we were willing to introduce Flash into the migration process [and we weren’t], the first step would be to reverse engineer the logic used to store and display the data. As is often the case with proprietary software, no documentation on that is available.
After some research, we decided to take the following approach:
- Download the asset ZIP archive from Adobe Connect, extract the contents and concatenate the audio FLVs into one MP3 using FFmpeg
- With Selenium and Mozilla’s Geckodriver, launch Firefox and navigate to the recording’s URL so that it plays the session using the Adobe SWF
- Use FFmpeg’s x11grab option to capture the screen display
- Once done, use FFmpeg’s scene detection feature to determine when the recording had actually started [this is needed because the AC app takes a long time to load and there’s no other way to ascertain how long it actually took]
- Merge the audio and video files and use the Kaltura API to ingest the resulting media file
Our customer had roughly 40,000 assets to migrate. Therefore, it was of paramount importance for the code to be able to handle multiple assets concurrently.
To that end, we’ve written a small wrapper around xvfb-run.
The number of concurrent jobs to run is determined based on the value of the
MAX_CONCUR_PROCS ENV var and the only real limitation is HW resources [namely: CPU, RAM].
A word about slides…
By following the above method, we were able to produce a standalone MKV file that most common media players can play and our platform can easily ingest and transcode into different flavours.
However, we felt that more could be done:)
The Kaltura player is capable of displaying slides alongside the video. A slide is represented by an object called a “thumb cue point”. This object has several important properties:
- An image representing the actual “slide”
- The title [string]
- The description [a longer string, typically the full textual contents of the slide]
start_time which denotes when the slide should be displayed during playback [i.e starting from the N second of the video]
While the video we produced by recording the screen display shows the speaker, as well as the presentation widget [pod], there is a lot of metadata here that could be leveraged to provide a better user experience.
Alas, the original presentation files [typically PPT or PPTX] cannot be easily downloaded from Adobe Connect, certainly not by using the API. It was, therefore, necessary to find an alternative way of obtaining the slides.
OpenCV is an open source computer vision and machine learning library.
Using a combination of the FFmpeg scene detection feature and OpenCV we were able to accomplish the following:
- Get the timings for the slide changes
- Determine the dimensions of the slide widget/pod inside the frame and generate images per slide [since the customer has assets recorded with different AC versions and the presenter can configure how the widgets are arranged on screen as well as their width and height, the dimensions are not constant]
A careful review of the asset ZIP archive yielded an interesting find: a file called srchdata.xml, which consists of the textual contents of the presentation, per slide. We’ve used that data to set the
description members on the cue point objects.
That gave us all the data required in order to provide a user experience similar to what you can see here:
And of course, the metadata is also searchable.
The OpenCV code used to achieve this can be found here.
Credit where credit is due
This project would have taken far longer had it not been for the existence of FOSS. In particular, I’d like to thank the fine people responsible for FFmpeg, Geckodriver, Selenium, and OpenCV for all their amazing work.
I would also like to thank my friend and colleague Hila Karimov. Hila joined the project immediately after the POC phase and has been instrumental in implementing several important features as well as supporting the customer through the migration process.
And, last but not least, thanks to Jack Sharon for his initial discovery work with the customer, his encouragement and commitment to finding an apt solution. Cheers, Jack!