TCP Audio/Video Chat

TCP Xcvr (or Transceiver) is a point to point audio/video chat program written in Python. It is designed to send audio and video signals between two host IP address on a LAN. It was originally conceived as a learning exercise for the Python socket library, but quickly grew to cover the OpenCV video library, PyAudio, and the TkInter GUI Library. A screen shot of the GUI is shown below.

1. Operation

As shown in the figure, the Xcvr GUI is a no frills TkInter interface with a fewer frills remote host editor. When the program starts, it listens on the local host for remote connections over port 4096. When a remote connection is detected, the local transmit function is initialized, and two way transmission is established. To initiate a connection, select a remote host from the list and click the ‘Select’ button, then click on the ‘Call’ button. The ‘Call’ button turns into a ‘Disconnect’ button which is used to terminate the call.

When the program starts, it checks for a ‘remote_hosts.dat’ file which contains the computer name and IP address of each host in the list. If it doesn’t find the file, it creates a temporary list (Host1, Host2, Host3 etc.) which can be edited. To edit the list, click on a name. The name and IP address will appear in the two entry boxes. After editing, clicking on the ‘Edit’ button will update the list and update or create the ‘remote_hosts.dat’ file.

Similarly, clicking on the ‘Add’ button will add the entry box info to the list, and update the file. The ‘Delete’ button will remove a list entry and update the file.

The ‘Exit’ button and the close button in the upper right hand corner terminate the program.

2. Audio Video Synchronization

One of the main stumbling blocks in the program development was how to combine the audio and video, and maintain synchronization. There are a few multimedia libraries around, but they’re either obsolete, or incompatible with Python 3 and Windows. The final method might not be the best but it works and its reasonably straightforward.

The audio/video frame consists of a 4 byte integer which describes the length of the frame followed by a jpeg image frame, followed by an audio buffer of 2048 * 2 = 4096 bytes.

FrameThe Send thread waits until the audio buffer is full, grabs an image from the WebCam thread, calculates the size and sends the frame to the remote host. Repeat as required. This establishes a fixed frame rate based on the audio buffer size and sample rate.

On the receive side, the Receive thread gets the frame size from the first four bytes, receives the entire frame, and extracts the audio and video information. The image is displayed on the GUI and the audio is written to the PyAudio stream. Since the Send and Receive audio buffers are the same size, this synchronizes the Send and Receive functions.

A more detailed explanation can be found in the (hopefully well commented) source code:

3. Running the Program

The program is run in the Python environment using Python 3.4.3 under Windows. The OpenCV 3 library is used for image capture, because it seems to be the only library compatible with Windows and Python 3. All the other libraries (threading, pyAudio etc.) are standard Python libraries available for Python 3.

4. Things To Do

For a learning exercise, this program was very worthwhile since I had to learn everything from the ground up (video, networking, audio etc.). As usual though there are areas for improvement. The actual frame rate for example seems to be about 8 FPS, which is sufficient for a chat program. I suspect that my audio/video synchronization may be slowing things down due to the blocking routines used. A more elaborate synchronization scheme might help. Another problem might be the overhead involved in TCP handshaking. Possibly reworking the networking portions to UDP might speed things up.

Another more serious problem is that the program works as expected on two of my three computers (one Widows 8.1 laptop, and two desktop XPs). Rolling back the webcams driver on the offending XP solved the problem so it looks like an incompatibility with either the PyAudio or OpenCv library might be the culprit. Further investigation may be required.