Lessons learned producing our first multi camera live stream with OBS
Introduction to our studio, hardware and software setup. How we planned and executed the production and what we would do different next time.
Sometimes ideas go places. When I proposed to our local government to produce a late night show style live stream on Youtube instead of a more traditional video conference format as a COVID-19 replacement for a in person event I would have never thought that in the end we as a team of software engineers would be the ones doing the actual production. Especially as I asked the mayor of our 120k city of Pforzheim, Peter Boch, to be the talk host. To my surprise neither the mayor nor any other official said no so we ended up here:
Watch the full recording of the show on Youtube: https://www.youtube.com/watch?v=eZLgaGTYcaY
We started researching how a typical late show works by binge watching Jimmy Fallon, Steven Colbert and others. We analyzed camera shots, timings and contents. We tried to understand timings and learned we need a band not only for entertainment but also to have smooth transitions between certain segments. Soon it was clear we needed a studio. So we teamed up with our friends and experts for ecommerce photo- and videography, Gieske Studios, https://studio-gieske.de/ and they built a popup studio just for this one event:
At this point it started to get serious and I felt a Hollywood vibe. We defined we would need a way for guests to walk in, a position for the monologue, a talkhost desk, a couch with enough distance for guests, a small stage for a band and a big screen to pull in remote guests via Microsoft Teams. Also a backdrop for the host when standing or sitting to provide some depth would be nice.
This is the final studio setup:
A short walkthrough: There was a small office-styled room behind the actual stage which acted as a backdrop as well as the walk in path for the hosts and the guests. Left of the stairs there was the desk with a Surface Studio that we used to display the schedule to the host as well as a Teams chat for interaction with the director. Also we added the large microphone as decoration to add to the late night look. Right of the stairs we added the Surface Hub 84 from our meeting room to have a large screen to interact with remote guests via Microsoft Teams. Next the sofa with around 2,5m distance to the desk and behind that a small stage for our band.
Cameras, Lights and Microphones
Luckily our friends at Gieske Studios also helped us with 3 of their professional photo and video cameras, 2 on a regular stand and 1 on a crane. Gieske Studio also setup the lighting. We used 2 wireless lavalier microphones to get the audio from the host and the guests. The band did bring their own microphones.
Based on what we knew from the shows that inspired us we defined a set of camera shots we thought we needed:
Close up standing for the opening monologue
Host and guest
Close up guest
Remote guest via Microsoft Teams
Split screens with remote guest
So we had far more shots to do than we had discrete cameras. So we created a plan which camera is responsible for which shots and also added an intercom system between the 3 camera operators and the director using simple walkie talkies to the setup.
As you can see in the screenshots above we added branding with a logo and animated lower thirds to the stream. To make the show more entertaining we shot some clips in advance like an intro and other content items. When talking the whole thing through we also realized we needed to have some breaks to change batteries on the cameras and microphones even though the stream was only 1 hour 30 minutes. So we planned to have commercial breaks showing ads of the participating companies and videos of already realized digitization projects of the city. Commercial breaks also need short intro/outro clips to separate them from the main program. All this footage was shot and created by our friends at Campaigners Network, https://campaignersnetwork.de/.
To be clear: At this point still no one on our team has had any experience with anything we would have to do to pull this off…
The big picture
To create a live stream we would need a system with all the required inputs: 3 cameras, remote guests via Microsoft Teams, multiple wireless microphones, prerecorded footage and animated overlays.
The system needed to able to organize all inputs and footage in advance into scenes so we can switch fast, have a preview of all sources and the final output for the director and be able to output directly to a Youtube live stream.
An excellent open source software that fullfills most of these requirements is OBS (Open Broadcaster Software, https://obsproject.com/). Its stable, flexible and free. But most of all it has a vibrant community of passionate people that are sharing knowledge.
This infographic gives a high level overview of the flow possible with OBS.
In our case we narrowed it down to this simplified architecture that fitted our needs to this production.
So let’s drill down a bit an have a look at the individual components:
Cameras over HDMI
We used cameras that all had a HDMI output. Therefore we needed a capture device to feed the camera signals into the PC that was running OBS. If you have a single camera setup usually the way to go is for a HDMI Capture stick with USB like the Elgato Cam Link (https://www.elgato.com/de/gaming/cam-link-4k).
In our case however we needed 3 inputs and I was just not having faith in having 3 USB dongles. So we went for a 4 input capture card, the Magewell Pro Capture Vierer HDMI (http://www.magewell.com/products/pro-capture-quad-hdmi). This worked out perfectly, we had a clear Full HD signal from all 3 cameras at the same time. To give the cameras some mobility we used long 20m HDMI cables, no issues here.
Microsoft Teams via NDI
Just recently Microsoft has add NDI output support to Teams. Never heard of NDI? Here is what Wikipedia says:
Network Device Interface (NDI) is a royalty-free software standard developed by NewTek to enable video-compatible products to communicate, deliver, and receive high-definition video over a computer network in a high-quality, low-latency manner that is frame-accurate and suitable for switching in a live production environment.
This means in our case we can have a regular Teams video call and the video source of all participants (and their audio) as well as their screen shares become available to OBS as discrete!!! video streams. So we can mix those video feeds into our production and even do side-by-side compositions:
To enable this flow NDI needs to be activated on the tenant level by an administrator in Teams and also needs to be turned on explicitly in the Teams client running on the OBS machine. This worked in our case only for some users, luckily for the user we used with OBS. Read more on how to setup NDI with teams: https://support.microsoft.com/en-us/office/broadcasting-audio-and-video-from-teams-with-ndi%C2%AE-technology-e91a0adb-96b9-4dca-a2cd-07181276afa3
To use NDI inputs in OBS you also need to install a plugin: https://obsproject.com/forum/resources/obs-ndi-newtek-ndi%E2%84%A2-integration-into-obs-studio.528/
BUT: We found the video signal coming from Teams over NDI to be somewhat tricky. To signal is stable enough for broadcast but it changes it’s frame size from time to time becoming bigger or smaller during runtime. This is no issue when you are running the signal fullscreen but makes this features virtually useless in a split screen scenario like in the picture above. Also every Teams participant has a different video frame size depending on the hardware the participant is using, e.g. smartphone portrait vs. PC landscape vs. different webcam manufactures. To do proper side-by-side composition it’s essential to know the video size in advance at prepare the right transformation to the signal in OBS. In our case the participants joined during the live stream so we did not have enough time to properly transform the input into the side-by-side composition and when we realized that even when we did so it can still chance the size during runtime we gave up and went to simple full screen.
To have animated overlays e.g. for lower thirds the video clips need to have transparency which requires mov files. Something we did using Adobe After Effects.
Audio in general was a weak spot in our first production because we underestimated it. We used a super basic analog mixer we had laying around anyway to mix the microphones of the persons in the studio as well as the music from the band.
We then connected the mixed to the PC using the on board line in of the PC’s mainboard. The sound was OK but just not great. The real challenge is getting the audio mix right independent from the video mix. A simple sample: When we showed the Teams guest fullscreen, their audio was fine but when we switch to the wide shot with the host and the monitor the Teams guest was barely hearable. That’s because audio sources are bound to their video sources. Something we realized during the production when it was too late. A way to prevent this is have the video input with the desired audio source stacked “invisible” underneath the actual video source in the scene.
We used 3 displays connected to the OBS PC: One for the editor view, one for the Multiview that previews all scenes and one to have a clear Full HD preview of the program.
Multiview is really helpful to have an overview during the production. We also had a HDMI splitter on the preview monitor to the right to we could duplicate that signal to additional screen for the crew.
The PC itself is a pretty average machine, around 3 years old. We had no performance issues whatsoever.
This was our killer hardware to have fast switching. It’s next to impossible to control OBS in a live stream situation using mouse and keyboard. Mouse is way to slow to switch scenes. You could use the keyboard with custom shortcuts but who remembers them in a stressful live situation. Stream Deck (https://www.elgato.com/de/gaming/stream-deck) allowed us to program scenes, overlays and other actions and assets to dedicated buttons with customizable icons:
This way we had our own custom mixing hardware console. In my opinion this greatly improves speed and reduces human error especially in a complex setup.
It’s way more work than anticipated
In the beginning I planned to do all the direction, video and audio mixing myself. Just forget it! We ended up with me directing, a co worker doing the video mix and another one doing the audio mix. And we were super busy. Maybe because we are all software developers and did it for the first time…
NDI in Teams it just not there yet
This is a real killer feature but it’s just not mature enough for composition. Using it fullscreen works perfect and allows for decentralized stream productions. Also there are other options to leverage NDI e.g. via Skype that might be more stable.
Do not underestimate Audio
In retrospective this is the one area where we should have done a bit more testing in advance to have the right audio inputs in every scene. Also to better understand how monitoring works in general.
Multitasking is hard
That’s the biggest takeaway for me personally: Directing live is hardcore multitasking. Giving instructions to the camera crew some seconds ahead of time via intercom, communicating with the band, the host and the guests, establishing the remote connections to the guests and giving them status updates via chat and finally doing the actual shot with the editor is something you need to really train. I have great respect for everyone who is doing this professionally…
It’s no professional gear!
We had some crashes in advance. 15 Minute before the show the Stream Deck stopped working and we had to restart the machine just hoping there were no Windows Updates… #DontDoThisInProduction
I hope this post has given some insight in how we approached this challenge and what issues we had. Overall I’m still impressed how the crew did pull this off and the quality of the result.
We learned a lot in a short amount of time and had a lot of fun. If you are planning to do a similar production and want to have more insights do not hesitate to contact me. Also if you have suggestions and ideas please share in the comments below.