The overall system

The project relies on a base structure to actualize communication. This structure - the overall system - implements a server-client model, in which the server handles the audio recording and encoding, and provides the client with the encoded video, while the client deals with video/audio decoding and playing. In order to achieve real-time streaming the system creates two sockets: one using a modified version of the Real Time Streaming Protocol (RTSP) [1] for streaming control and key exchanging; the second using the Real-time Transport Protocol (RTP) [2] for video frames delivery. Requiring a low latency communication, the second socket is based on the User Datagram Protocol (UDP) [3] , transaction-oriented and unreliable, while the first socket makes use of the Transmission Control Protocol (TCP) [4] to avoid information loss caused by network problems.

Server and client are connected by two sockets: one using the RTSP protocol, one the RTP protocol

Once the main server and client are launched, the client connects to the server, enstablishing the RTSP connection. The secret key necessary for the video encoding is exchanged with the Diffie-Hellman method [5], then the second connection is set up.


	random_generator   = secrets.SystemRandom()
	self.private_key   = random_generator.randrange(1, self.GROUP_SIZE)
	self.partial_key   = pow(self.PUBLIC_GENERATOR, self.private_key, self.PUBLIC_PRIME)
	self.received_key  : Optional[int] = None
	self.secret        : Optional[int] = None

	def generate_secret(self, received_key: bytes):
		self.secret = pow(self.received_key, self.private_key, self.PUBLIC_PRIME)
		return self.secret

Initialization and key computation with Diffie-Hellman

Audio recording and encoding are designed to comply with the video fps rate, so the data packets are sent by the server as soon as new audio data is available. Due to the influence of network conditions on data delivery, the client waits for an initial delay before playing it in order to guarantee a continuous listening.

	frame = self._video_stream.next_frame()
	audio_frame = self._read_next_audio_frame()

	encoded_frame = self._encoder.encode(frame, audio_frame, seed);

	rtp_packet    = RTPPacket(payload_type=RTPPacket.TYPE.MJPEG, sequence_number=frame_number, timestamp=frame_number*self._server.FRAME_PERIOD, payload=bytes(encoded_frame)).get_packet()


Server side packet creation and transmission

	packet = self._rtp_socket.receiveDatagram().data().data()
	packet = RTPPacket.process_packet(bytes(packet))

	frame = np.frombuffer(packet.get_payload(), dtype=np.int8)

	(decoded_frame, decoded_audio_frame) = self._decoder.decode(frame, seed)

Client side packet receiving and decoding

The system employs multi-threading intensively, which is necessary not only to manage two connections simultaneously, but also to operate its various functions and the graphic interfaces at the same time. By far, thread managing and signaling were the most complex aspects to implement in regard to the overall system.

	self.rtsp_connection_thread = QThread()
	self.rtp_connection_thread  = QThread()
	self.audio_player_thread    = QThread()




Example of thread creation and signal connection on the client side

Currently, the system works unilaterally, but it could be developed to include bidirectional communication, as in actual phone conversations.


[1] Real Time Streaming Protocol (RTSP)
[2] RTP: A Transport Protocol for Real-Time Applications
[3] User Datagram Protocol
[4] Transmission Control Protocol
[5] Diffie-Hellman Key Agreement Method