mirror of
https://gitlab.ewi.tudelft.nl/ee2l1/2025-2026/A.K.03.git
synced 2025-12-12 15:50:57 +01:00
928 lines
42 KiB
Plaintext
928 lines
42 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"\n",
|
||
"\n",
|
||
"[Table of Contents](0_Table_of_Contents.ipynb)\n",
|
||
"\n",
|
||
"# Chapter 5: Module 3 - Locating KITT Using Audio Communication\n",
|
||
"\n",
|
||
"**Contents:**\n",
|
||
"* [Pre-recorded Data](#pre-recorded-data)\n",
|
||
"* [Background Knowledge](#background-knowledge)\n",
|
||
"* [Step 1: Implementing Channel Estimation ](#step-1-implementing-channel-estimation)\n",
|
||
"* [Step 2: Determining Time of Arrivals](#step-2-determining-time-of-arrivals)\n",
|
||
"* [Step 3: Pulse Segmentation](#step-3-pulse-segmentation)\n",
|
||
"* [Step 4: Calculating TDOA Between Microphone Pairs](#step-4-calculating-tdoa-between-microphone-pairs)\n",
|
||
"* [Sanity Check](#sanity-check)\n",
|
||
"* [Localization Using TDOA Information](#localization-using-tdoa-information)\n",
|
||
"* [Localization Class](#localization-class)\n",
|
||
"* [Optional Extension](#optional-extensions)\n",
|
||
"* [Assessment and Reporting](#assessment-and-reporting)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Import necessary libraries\n",
|
||
"import matplotlib.pyplot as plt # For plotting purposes\n",
|
||
"import numpy as np # For convolution function\n",
|
||
"from scipy.io import wavfile\n",
|
||
"\n",
|
||
"# Uncomment one of the following lines depending on your setup\n",
|
||
"\n",
|
||
"# If you are using the real car, uncomment the next lines and comment the simulator lines\n",
|
||
"# from serial import Serial\n",
|
||
"# from sounddevice import sounddevice\n",
|
||
"\n",
|
||
"# If you are using the simulator, uncomment the next lines and comment the real car lines\n",
|
||
"\n",
|
||
"# Note: After changing the import statement, you need to restart the kernel for changes to take effect."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"KITT must be located in its field, and directions must be determined to navigate to the final destination. In previous modules, your colleagues have developed scripts to communicate with KITT and read sensor data. You will focus on locating KITT using audio signals received by microphones placed around the field. It is recommended that you read Modules 1 and 2 and talk to the other sub-group frequently. Otherwise the odds of your codes working together are small.\n",
|
||
"\n",
|
||
"For localization, we will use recordings of the beacon signal at various microphones, deconvolve these using a reference signal, and determine the relative time delays from the resulting channel estimates. These time differences, known as Time Difference of Arrival (TDOA), can be used to estimate KITT's position within the field. (You will be able to reuse your `ch2` or `ch3` code from the courselab Assignments.)\n",
|
||
"\n",
|
||
"At the end of this module, you will have developed a script to locate KITT within the field with reasonable accuracy, using the data recorded by the microphones. You will also have tested and verified the accuracy and robustness of your solution. \n",
|
||
"\n",
|
||
"The module is divided into the following 3 sections:\n",
|
||
"1. _TDOA Estimation_: In this section, you will estimate the TDOA between pairs of microphones using the provided recordings.\n",
|
||
"2. _Localization_: In this section, you will use the estimated TDOAs to locate KITT within the field.\n",
|
||
"3. _Deployment_: In this section, you will adapt the code to work on the real car."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Pre-recorded Data\n",
|
||
"\n",
|
||
"To get you started, we provide you with 7 recordings at known locations and a reference recording taken close to one of the microphones. These can be used to develop and test your algorithms. The recordings are at locations randomly distributed across the field, as follows:\n",
|
||
"\n",
|
||
"| Recording Index | x [cm] | y [cm] |\n",
|
||
"|-----------------|------------|------------|\n",
|
||
"| 0 | 64 | 40 |\n",
|
||
"| 1 | 82 | 399 |\n",
|
||
"| 2 | 109 | 76 |\n",
|
||
"| 3 | 143 | 296 |\n",
|
||
"| 4 | 150 | 185 |\n",
|
||
"| 5 | 178 | 439 |\n",
|
||
"| 6 | 232 | 275 |\n",
|
||
"\n",
|
||
"*Table 1: Locations of the given recordings (in cm)*\n",
|
||
"\n",
|
||
"The x and y axes of the field are defined as follows, where the numbers refer to the microphone index:\n",
|
||
"\n",
|
||
"<img src=\"pictures/axisdef.png\" alt=\"Field axis and microphone index definition\" width=\"250px\">\n",
|
||
"\n",
|
||
"*Figure: Field axis and microphone index definition*\n",
|
||
"\n",
|
||
"You can assume these positions for the microphones (note the different height of microphone 5, and note that these recordings from 2023 did not use the same locations of the microphones on your field this year, so ensure that your code can easily use a table with other locations):\n",
|
||
"\n",
|
||
"| Microphone | x [cm] | y [cm] | z [cm] |\n",
|
||
"|------------|------------|---------|---------|\n",
|
||
"| 1 | 0 | 0 | 50 |\n",
|
||
"| 2 | 0 | 460 | 50 |\n",
|
||
"| 3 | 460 | 460 | 50 |\n",
|
||
"| 4 | 460 | 0 | 50 |\n",
|
||
"| 5 | 0 | 230 | 80 |\n",
|
||
"\n",
|
||
"*Table 2: Locations of the microphones (in cm)*\n",
|
||
"\n",
|
||
"### Loading and Plotting the Provided Recordings\n",
|
||
"\n",
|
||
"The code below helps you load and plot the first of the 7 audio signals provided. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Coordinates of the recordings\n",
|
||
"record_x = [64, 82, 109, 143, 150, 178, 232]\n",
|
||
"record_y = [40, 399, 76, 296, 185, 439, 275]\n",
|
||
"\n",
|
||
"# List to store filenames\n",
|
||
"filenames = []\n",
|
||
"\n",
|
||
"# Generate filenames based on coordinates\n",
|
||
"for i in range(len(record_x)):\n",
|
||
" real_x = record_x[i]\n",
|
||
" real_y = record_y[i]\n",
|
||
" filenames.append(f\"Files/Student Recordings/record_x{real_x}_y{real_y}.wav\")\n",
|
||
"\n",
|
||
"# Load the first recording\n",
|
||
"Fs, recording = wavfile.read(filenames[0])\n",
|
||
"\n",
|
||
"# Plot the first channel of the first recording\n",
|
||
"plt.plot(recording[:, 0])\n",
|
||
"plt.title(f\"Recording at x={record_x[0]} cm, y={record_y[0]} cm, Microphone 1\")\n",
|
||
"plt.xlabel(\"Sample Index\")\n",
|
||
"plt.ylabel(\"Amplitude\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Play around with the recordings and try to understand the data you are working with. Try the following:\n",
|
||
"- Load and plot another recording.\n",
|
||
"- Zoom in on one of the pulses.\n",
|
||
"- Of one pulse, overlay the different microphones. Can you spot the time differences? Can you determine in which corner of the field the recording was made?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Plot the second recording\n",
|
||
"Fs, recording = wavfile.read(filenames[1])\n",
|
||
"\n",
|
||
"plt.plot(recording[:, 0])\n",
|
||
"plt.title(f\"Recording at x={record_x[0]} cm, y={record_y[0]} cm, Microphone 2\")\n",
|
||
"plt.xlabel(\"Sample Index\")\n",
|
||
"plt.ylabel(\"Amplitude\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Zoom in on one of the pulses\n",
|
||
"plt.plot(recording[202000:209000, 0])\n",
|
||
"plt.title(\"Zoomed in on a pulse\")\n",
|
||
"plt.xlabel(\"Sample Index\")\n",
|
||
"plt.ylabel(\"Amplitude\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Of one pulse, overlay the different microphones.\n",
|
||
"plt.plot(recording[202000:209000, 0], alpha=0.5)\n",
|
||
"plt.plot(recording[202000:209000, 1], alpha=0.5)\n",
|
||
"plt.title(\"Zoomed in on a pulse, first 2 microphones\")\n",
|
||
"plt.xlabel(\"Sample Index\")\n",
|
||
"plt.ylabel(\"Amplitude\")\n",
|
||
"plt.legend([\"Microphone 1\", \"Microphone 2\"])\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 74,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"### Student Version ###\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"*Hints:*\n",
|
||
"- You can select the recording by changing the index in: ```Fs, recording = wavfile.read(filenames[index])```.\n",
|
||
"- You can select the microphones by changing the index in: ```recording[:, mic]```.\n",
|
||
"- These recordings are sampled at 44.1 kHz."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Background Knowledge\n",
|
||
"\n",
|
||
"In the first week courselab Assignments, you developed algorithms for channel estimation using signals received at two microphones. In this module, we extend this to 5 microphones and use this to locate the car.\n",
|
||
"\n",
|
||
"### Channel Estimation\n",
|
||
"\n",
|
||
"The channel estimation problem is as follows:\n",
|
||
"\n",
|
||
"Suppose we transmit a known signal \\( x[n] \\) over a communication channel and measure the received signal \\( y[n] \\). The channel acts as a filter, which we will assume to be linear and time-invariant (LTI). Therefore, the received signal is a convolution of the transmitted signal by the channel impulse response \\( h[n] \\):\n",
|
||
"\n",
|
||
"$$\n",
|
||
"y[n] = x[n] * h[n]\n",
|
||
"$$\n",
|
||
"\n",
|
||
"Knowing the transmitted signal $x[n]$, can we recover the impulse response of the communication channel $h[n]$ from $y[n]$? This is essentially an inversion problem.\n",
|
||
"\n",
|
||
"#### Channel Estimation Algorithms\n",
|
||
"\n",
|
||
"There are several methods for channel estimation:\n",
|
||
"\n",
|
||
"1. **Deconvolution in Time Domain (ch1):** Involves matrix inversion. It is computationally complex and requires lots of memory.\n",
|
||
"\n",
|
||
"2. **Matched Filter (ch2):** Avoids matrix inversion by computing the cross-correlation of $y[n]$ with $x[n]$. The channel estimate is given by:\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\hat{h}[n] = \\frac{1}{\\alpha} \\, y[n] * x[-n]\n",
|
||
"$$\n",
|
||
"\n",
|
||
"This method depends heavily on having a reference signal with good autocorrelation properties: $r[n] = x[n] * x[-n]$ should resemble a delta-spike. The normalization constant $\\alpha$ can be set to $\\alpha = r[0] = \\|x\\|^2$.\n",
|
||
"\n",
|
||
"3. **Deconvolution in Frequency Domain (ch3):** Involves using the Fast Fourier Transform (FFT). Convolution in time becomes multiplication in frequency, so deconvolution becomes division in the frequency domain:\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\hat{H}[k] = \\frac{Y[k]}{X[k]}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"Then, the channel impulse response $ \\hat{h}[n] $ is obtained by taking the inverse FFT of $ \\hat{H}[k] $.\n",
|
||
"\n",
|
||
"This method is efficient but requires handling divisions by zero or very small values, which can be achieved by thresholding. Apart from edge effects, the result should be identical to that of **ch1**.\n",
|
||
"\n",
|
||
"**Note:** We recommend using method **ch3** (Deconvolution in Frequency Domain) for this module.\n",
|
||
"\n",
|
||
"### Time Difference of Arrival (TDOA)\n",
|
||
"\n",
|
||
"After estimating the channel impulse response for each microphone, we can detect the first incoming path (the direct path from KITT to the microphone). This corresponds to the propagation delay of the car beacon to each microphone. However, since we do not know the exact transmission time, we can only obtain relative propagation delays. By taking the differences of these delays between pairs of microphones, we eliminate the unknown transmission time, resulting in Time Difference of Arrival (TDOA) measurements.\n",
|
||
"\n",
|
||
"These TDOA measurements can be used to estimate the position of KITT in the field.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Step 1: Implementing Channel Estimation\n",
|
||
"\n",
|
||
"Using the provided reference signal and your channel estimation algorithm (ch3: deconvolution in the frequency domain), deconvolve the recordings to obtain the channel impulse response for each microphone.\n",
|
||
"\n",
|
||
"#### Steps:\n",
|
||
"\n",
|
||
"1. **Load and Normalize the Recorded and Reference Signal:**\n",
|
||
"\n",
|
||
" - The reference signal is provided as a WAV file recorded close to one of the microphones.\n",
|
||
" - Normalize the signals to have a maximum amplitude of 1. This is important because the amplitude of the received signals may vary depending on the microphone.\n",
|
||
"\n",
|
||
"2. **Segment the Signals:**\n",
|
||
"\n",
|
||
" - Manually extract a single pulse from the reference signal and the recordings. (Later, you will have to automate the segmentation of the recordings.)\n",
|
||
"\n",
|
||
"3. **Implement the Channel Estimation Algorithm:**\n",
|
||
"\n",
|
||
" - Use ch3 to estimate the channel impulse response of each of the microphone signals.\n",
|
||
" - Be sure to handle divisions by zero or very small values in the frequency domain by applying a threshold.\n",
|
||
"\n",
|
||
"**Notes and Hints:**\n",
|
||
"\n",
|
||
"- **Normalization:** You can normalize a signal by dividing it by its maximum absolute value:\n",
|
||
"\n",
|
||
" ```python\n",
|
||
" ref_signal = ref_signal / np.max(np.abs(ref_signal))\n",
|
||
" ```\n",
|
||
"\n",
|
||
"- **FFT Length:** When performing convolution or deconvolution via FFT, ensure that the FFT length is sufficient to avoid \"circular convolution\" effects (you will learn about this in EE3S1). Calculate `N = len(ref_pulse) + len(rec_pulse) - 1`, and zero-pad both ref_pulse and rec_pulse to length $N$ before doing the FFT. This ensures that circular convolution becomes the familiar linear convolution.\n",
|
||
"\n",
|
||
"- **Thresholding:** Adjust the threshold value based on the magnitude of the FFT of the reference signal. Experiment with different threshold values to find one that works well, starting with a small value (e.g., 1e-3)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"### Student Version ###\n",
|
||
"\n",
|
||
"# Load and normalize the reference signal\n",
|
||
"Fs_ref, ref_signal = wavfile.read(\"../files/Student Recordings/reference.wav\")\n",
|
||
"ref_signal = ref_signal[221000:222500, 0] # Use only one channel\n",
|
||
"# TODO: Normalize the reference signal\n",
|
||
"\n",
|
||
"# Plot the reference signal\n",
|
||
"plt.figure()\n",
|
||
"plt.plot(ref_signal)\n",
|
||
"plt.title(\"Reference Signal\")\n",
|
||
"plt.xlabel(\"Sample Index\")\n",
|
||
"plt.ylabel(\"Amplitude\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"### Student Version ###\n",
|
||
"\n",
|
||
"# TODO: Load a recorded signal\n",
|
||
"# TODO: Select one channel\n",
|
||
"# TODO: Normalize the recording\n",
|
||
"\n",
|
||
"# Plot the recorded signal\n",
|
||
"plt.figure()\n",
|
||
"plt.plot(recording)\n",
|
||
"plt.title(\"Recorded Signal\")\n",
|
||
"plt.xlabel(\"Sample Index\")\n",
|
||
"plt.ylabel(\"Amplitude\")\n",
|
||
"plt.show()\n",
|
||
"\n",
|
||
"# TODO: Segment the recording to extract one pulse\n",
|
||
"\n",
|
||
"# Plot the segmented microphone signal\n",
|
||
"plt.figure()\n",
|
||
"plt.plot(recording_pulse)\n",
|
||
"plt.title(\"Segmented Signal\")\n",
|
||
"plt.xlabel(\"Sample Index\")\n",
|
||
"plt.ylabel(\"Amplitude\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"### Student Version ###\n",
|
||
"\n",
|
||
"# TODO: Implement the channel estimation algorithm (deconvolution in frequency domain)\n",
|
||
"def channel3(ref_pulse, rec_pulse):\n",
|
||
" pass\n",
|
||
"\n",
|
||
"# Estimate the channel impulse response\n",
|
||
"\n",
|
||
"h_est = channel3(ref_signal, recording_pulse)\n",
|
||
"\n",
|
||
"# Plot the estimated channel impulse response\n",
|
||
"plt.figure()\n",
|
||
"plt.plot(h_est)\n",
|
||
"plt.title(\"Estimated Channel Impulse Response\")\n",
|
||
"plt.xlabel(\"Sample Index\")\n",
|
||
"plt.ylabel(\"Amplitude\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Inspect your plots and determine if the impulse response looks realistic (properly zoom in). Do you see a clear peak, at the beginning of the response? "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Step 2: Determining Time of Arrivals\n",
|
||
"The goal is to get the time differences of arrival (TDOA) between each pair of microphones. As an intermediate step, we will determine the time of arrivals (TOAs) for each microphone, relative to the reference signal. From the peaks in the channel impulse responses, determine the TOAs and store these in a table.\n",
|
||
"\n",
|
||
"#### Steps:\n",
|
||
"\n",
|
||
"1. **Identify the Peaks in the Channel Impulse Responses:**\n",
|
||
"\n",
|
||
" - The first significant peak corresponds to the direct path from KITT to the microphone.\n",
|
||
"\n",
|
||
"2. **Calculate the TOAs:**\n",
|
||
"\n",
|
||
" - Convert the sample indices to time values using the sampling frequency.\n",
|
||
"\n",
|
||
"3. **Store the TOAs:**\n",
|
||
"\n",
|
||
" - Collect the TOAs for each microphone and store them in a structured format (e.g., a dictionary or table).\n",
|
||
"\n",
|
||
"**Notes and Hints:**\n",
|
||
"\n",
|
||
"- The index of the maximum value in the channel impulse response can be found using:\n",
|
||
"\n",
|
||
" ```python\n",
|
||
" peak_index = np.argmax(np.abs(h_est))\n",
|
||
" ```\n",
|
||
"\n",
|
||
"- Remember that the absolute time of transmission is unknown, so the TOAs are relative to the arbitrary start of your recording.\n",
|
||
"\n",
|
||
"- If you did not tightly crop the start of your reference signal, a TOA could be negative (the microphone signal appears to arrive earlier than the reference signal); in this case, due to the periodic nature of the IFFT, the peak of the impulse response could be at the far end of the estimated $\\hat{h}[n]$. If in your `ch3` code you chopped the estimate to a shorter length $L$, you might have chopped off this peak and see only noise."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# TODO: Identify the peak in the channel impulse response\n",
|
||
"\n",
|
||
"# TODO: Calculate the TOA (in seconds)\n",
|
||
"\n",
|
||
"# TODO:Store the TOA for the microphone\n",
|
||
"\n",
|
||
"# TODO: Print the TOA"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Once you have working code for a single microphone, apply your processing to all 5 microphones. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Step 3: Pulse Segmentation\n",
|
||
"\n",
|
||
"Uptil now, you have manually selected a single pulse from the recordings. In practice, you will need to automate this process. Implement a pulse detection algorithm to automatically segment the pulses from the recordings. We leave this part open to you, but we provide some hints below. (You could postpone this step until after implementing step 4.)\n",
|
||
"\n",
|
||
"**Notes and Hints:**\n",
|
||
"\n",
|
||
"- **Pulse Detection:**\n",
|
||
"\n",
|
||
" - `find_peaks` from `scipy.signal` can be used to detect peaks in the envelope of the signal. It has many options to help you define what is a useful peak.\n",
|
||
" - Adjust the `height` and `distance` parameters based on the characteristics of your signal.\n",
|
||
"\n",
|
||
"- **Outlier Rejection:**\n",
|
||
"\n",
|
||
" - You may need to reject some detected pulses that give a very different TOA compared to the other segments.\n",
|
||
"\n",
|
||
"- **Single Pulse Importance:**\n",
|
||
"\n",
|
||
" - You may be tempted to compute the channel response for the whole recording. However, the computational complexity of this approach is high. Instead, focus on a single pulse. Also, you might be tempted to produce more accurate results by averaging over several pulses. But remember that this is not really useful in your final application, if you want to do localization on a driving car (in that case, you want to use only the last, most recent pulse)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"### Student Version ###\n",
|
||
"\n",
|
||
"# TODO: Develop a function to detect the start indices of pulses in the recording\n",
|
||
"# Takes signal and number of pulses in the recordings as input and returns a list of start indices\n",
|
||
"def detect_pulses(signal, num_pulses):\n",
|
||
" return peaks\n",
|
||
"\n",
|
||
"# TODO: Load the recording\n",
|
||
"\n",
|
||
"# Detect pulses in the recorded signal\n",
|
||
"pulse_starts = detect_pulses(recording_channel, num_pulses)\n",
|
||
"\n",
|
||
"plt.plot(recording_channel)\n",
|
||
"plt.plot(pulse_starts, recording_channel[pulse_starts], 'x')\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"### Student Version ###\n",
|
||
"\n",
|
||
"# TODO: Develop a function that takes in a recording and returns the TOA for each microphone\n",
|
||
"def estimate_TOAs(recording, num_pulses):\n",
|
||
" TOAs = {}\n",
|
||
" return TOAs\n",
|
||
"\n",
|
||
"Fs, recording = wavfile.read(filenames[1])\n",
|
||
"print(estimate_TOAs(recording, 5))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Step 4: Calculating TDOA Between Microphone Pairs\n",
|
||
"\n",
|
||
"Complete the TDOA function to calculate the time difference of arrival between pairs of microphones.\n",
|
||
"\n",
|
||
"#### Steps:\n",
|
||
"\n",
|
||
"1. **Calculate TDOA Between Two Microphones:**\n",
|
||
"\n",
|
||
" - For each pair of microphones, calculate the difference in TOAs.\n",
|
||
"\n",
|
||
"2. **Convert TDOA to Distance Difference:**\n",
|
||
"\n",
|
||
" - Use the speed of sound (approximately 343 m/s) to convert TDOA to distance difference.\n",
|
||
"\n",
|
||
"3. **Store and Use TDOA Values:**\n",
|
||
"\n",
|
||
" - Collect TDOA values for all relevant microphone pairs."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"### Student Version ###\n",
|
||
"\n",
|
||
"# TODO: Implement a function to calculate TDOA between two microphones\n",
|
||
"def calculate_tdoa(TOA1, TOA2):\n",
|
||
" pass\n",
|
||
" return tdoa\n",
|
||
"\n",
|
||
"# Load the TOAs for each microphone\n",
|
||
"Fs, recording = wavfile.read(filenames[1])\n",
|
||
"TOAs = estimate_TOAs(recording, 5)\n",
|
||
"\n",
|
||
"# TODO: Calculate TDOA between Mic1 and other microphones\n",
|
||
"\n",
|
||
"# Print the results\n",
|
||
"print(f\"TDOA between Mic1 and Mic2: {tdoa_12} seconds\")\n",
|
||
"\n",
|
||
"print(f\"TDOA between Mic1 and Mic3: {tdoa_13} seconds\")\n",
|
||
"\n",
|
||
"print(f\"TDOA between Mic1 and Mic4: {tdoa_14} seconds\")\n",
|
||
"\n",
|
||
"print(f\"TDOA between Mic1 and Mic5: {tdoa_15} seconds\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Sanity Check\n",
|
||
"The accuracy of the TDOA measurements is crucial for the localization accuracy. You can test the accuracy of your TDOA estimation by calculating the TDOA based on the known recording positions and comparing it with the estimated TDOA.\n",
|
||
"\n",
|
||
"Develop a test function using Pythagoras’ theorem that takes an $(x, y)$ position as input and calculates the theoretical TDOAs you would observe from this position for all microphone pairs. This involves calculating the distance from the given point to each microphone.\n",
|
||
"\n",
|
||
"This will help you:\n",
|
||
"\n",
|
||
"- Debug your TDOA function.\n",
|
||
"- Verify your position estimation in the next steps."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Microphone positions (x, y, z) in meters\n",
|
||
"microphone_positions = {\n",
|
||
" 1: np.array([0.0, 0.0, 0.50]),\n",
|
||
" 2: np.array([0.0, 4.60, 0.50]),\n",
|
||
" 3: np.array([4.60, 4.60, 0.50]),\n",
|
||
" 4: np.array([4.60, 0.0, 0.50]),\n",
|
||
" 5: np.array([0.0, 2.30, 0.80])\n",
|
||
"}\n",
|
||
"\n",
|
||
"# Function to calculate theoretical TDOAs from a given (x, y) position\n",
|
||
"def calculate_theoretical_tdoas(car_position, mic_positions):\n",
|
||
" # Car's position (x, y, z)\n",
|
||
" x_car, y_car = car_position\n",
|
||
" z_car = 0.0 # Assuming car's beacon is at z = 0\n",
|
||
"\n",
|
||
" # Calculate distances and TOAs to each microphone\n",
|
||
" toas = {}\n",
|
||
" for mic_id, mic_pos in mic_positions.items():\n",
|
||
" distance = np.sqrt((mic_pos[0] - x_car)**2 + (mic_pos[1] - y_car)**2 + (mic_pos[2] - z_car)**2)\n",
|
||
" toas[mic_id] = distance\n",
|
||
"\n",
|
||
" # Calculate TDOAs between microphone pairs\n",
|
||
" tdoa_dict = {}\n",
|
||
" mic_ids = list(mic_positions.keys())\n",
|
||
" for i in range(len(mic_ids)):\n",
|
||
" for j in range(i+1, len(mic_ids)):\n",
|
||
" mic_i = mic_ids[i]\n",
|
||
" mic_j = mic_ids[j]\n",
|
||
" tdoa = toas[mic_j] - toas[mic_i]\n",
|
||
" tdoa_dict[(mic_i, mic_j)] = tdoa\n",
|
||
"\n",
|
||
" return tdoa_dict\n",
|
||
"\n",
|
||
"# Example usage:\n",
|
||
"# Car position in meters\n",
|
||
"car_position = np.array([1.43, 2.96]) # Replace with desired (x, y) position\n",
|
||
"\n",
|
||
"# Calculate theoretical TDOAs\n",
|
||
"theoretical_tdoas = calculate_theoretical_tdoas(car_position, microphone_positions)\n",
|
||
"\n",
|
||
"# Print the TDOA values\n",
|
||
"for mic_pair, tdoa in theoretical_tdoas.items():\n",
|
||
" print(f\"TDOA between Mic{mic_pair[0]} and Mic{mic_pair[1]}: {tdoa:.6f} seconds\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Localization Using TDOA Information\n",
|
||
"\n",
|
||
"Now we arrive at the main question studied in this module: **How can we locate the car using the TDOA estimates?** With 5 microphones, we can compute the TDOAs between all pairs of microphones and obtain 10 TDOA measurements. Next, you need an algorithm to convert these TDOA measurements into the $(x, y)$ location of the car.\n",
|
||
"\n",
|
||
"We offer two approaches for this:\n",
|
||
"\n",
|
||
"1. **Linear Algebra Approach:** Study [Appendix C](../appendix/Appendix_C.ipynb), which shows a basic algorithm to solve for $(x, y)$ using linear algebra. This algorithm is sub-optimal but should be relatively fast, and is a nice illustration of the use of linear algebra (for those who appreciate this).\n",
|
||
"\n",
|
||
"2. **Grid Search Method:** Perform a grid search over possible $(x, y)$ positions and evaluate which position best matches the estimated TDOAs. This method can be accurate but may be slower depending on implementation. Consider e.g. iterative grid refinement.\n",
|
||
"\n",
|
||
"Implement one of these methods inside the coordintate_2d function.\n",
|
||
"\n",
|
||
"Although we compute the $(x, y)$ position, in reality, the world is 3D, and the microphones have a certain height above the height of the audio beacon (we define the beacon to sit at $z=0$). The algorithm in Appendix C does not take this height into account. However, this extension is straightforward and you should add it."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"### Student Version ###\n",
|
||
"# Implement a function to estimate the 2D coordinates of the car\n",
|
||
"def coordinate_2d(D12, D13, D14, D15):\n",
|
||
" xyMic = np.array([[0, 0, 50], [0, 460, 50], [460, 460, 50], [460, 0, 50], [0, 230, 80]])\n",
|
||
" return x, y"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Localization Class\n",
|
||
"\n",
|
||
"Finally, it is time to set up a processing pipeline by putting everything together. You will develop a `Localization` class that takes a 5-channel recording as input and returns the $(x, y)$ coordinates of the car.\n",
|
||
"\n",
|
||
"### Suggested Class Structure\n",
|
||
"\n",
|
||
"**Notes:**\n",
|
||
"\n",
|
||
"- **Modularity:** Separates different stages of processing for clarity and reusability.\n",
|
||
"- **Extensibility:** You can add more methods or attributes as needed.\n",
|
||
"- **Implementation:** You need to implement the `estimate_tdoas` method based on your earlier code."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"### Student Version ###\n",
|
||
"\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"\n",
|
||
"class Localization:\n",
|
||
" \"\"\"\n",
|
||
" The Localization class processes multi-microphone audio recordings to estimate the position of a sound source (e.g., KITT) using Time Difference of Arrival (TDOA) measurements.\n",
|
||
"\n",
|
||
" Attributes:\n",
|
||
" - recording (numpy.ndarray): A 2D array containing the recordings from multiple microphones. Shape is (num_samples, num_mics).\n",
|
||
" - refSignal (numpy.ndarray): The reference signal used for channel estimation.\n",
|
||
" - num_pulses (int): The number of pulses present in the recording.\n",
|
||
" - Fs (int): The sampling frequency of the recordings.\n",
|
||
" - mic_positions (numpy.ndarray): Array containing the positions of the microphones in centimeters.\n",
|
||
" - localizations (tuple): Estimated (x, y) position of the sound source.\n",
|
||
" \"\"\"\n",
|
||
" def __init__(self, recording, refSignal, num_pulses, Fs):\n",
|
||
" \"\"\"\n",
|
||
" Initialize the Localization object with the given recordings and parameters.\n",
|
||
"\n",
|
||
" Parameters:\n",
|
||
" - recording (numpy.ndarray): A 2D numpy array of shape (num_samples, num_mics) containing the recordings from multiple microphones. Each column corresponds to a microphone.\n",
|
||
" - refSignal (numpy.ndarray): A 1D numpy array containing the reference signal used for channel estimation.\n",
|
||
" - num_pulses (int): The number of pulses in the recording.\n",
|
||
" - Fs (int): The sampling frequency of the recordings in Hz.\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" def localization(self):\n",
|
||
" \"\"\"\n",
|
||
" Process the recordings to estimate the 2D position of the sound source.\n",
|
||
"\n",
|
||
" This method performs the following steps:\n",
|
||
" - Segments the recordings into individual pulses based on the number of pulses.\n",
|
||
" - For each pulse:\n",
|
||
" - Extracts the pulse segment from the recording.\n",
|
||
" - Calculates the Time Difference of Arrival (TDOA) between microphone pairs using the extracted pulse.\n",
|
||
" - Averages the TDOA measurements across all pulses.\n",
|
||
" - Converts the averaged TDOAs to distance differences.\n",
|
||
" - Estimates the 2D coordinates of the sound source using the TDOA measurements.\n",
|
||
"\n",
|
||
" Returns:\n",
|
||
" - x_car (float): Estimated x-coordinate of the sound source in centimeters.\n",
|
||
" - y_car (float): Estimated y-coordinate of the sound source in centimeters.\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" return x_car, y_car\n",
|
||
"\n",
|
||
" def coordinate_2d(self, D12, D13, D14):\n",
|
||
" \"\"\"\n",
|
||
" Estimate the 2D coordinates of the sound source based on TDOA measurements.\n",
|
||
"\n",
|
||
" This method constructs a system of linear equations using the TDOA measurements and microphone positions and solves for the (x, y) coordinates of the sound source.\n",
|
||
"\n",
|
||
" Parameters:\n",
|
||
" - D12 (float): Distance difference between Microphone 1 and Microphone 2 in centimeters.\n",
|
||
" - D13 (float): Distance difference between Microphone 1 and Microphone 3 in centimeters.\n",
|
||
" - D14 (float): Distance difference between Microphone 1 and Microphone 4 in centimeters.\n",
|
||
"\n",
|
||
" Returns:\n",
|
||
" - position (numpy.ndarray): A 1D array containing the estimated x and y coordinates [x, y] in centimeters.\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" return x, y # Return the estimated (x, y) coordinates\n",
|
||
" \n",
|
||
" def previously_implemented_methods(self):\n",
|
||
" # Copy paste the previously implemented methods\n",
|
||
" pass\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" # TODO: Load the recording\n",
|
||
"\n",
|
||
" # TODO: Load the reference signal\n",
|
||
"\n",
|
||
" # TODO: Initialize the Localization object\n",
|
||
" \n",
|
||
" # TODO: Get the localized position\n",
|
||
"\n",
|
||
" print(f\"Estimated position: x={x_car}, y={y_car}\")\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Optional Extensions\n",
|
||
"\n",
|
||
"If you finish the basic assignment quickly and want to challenge yourself further, try adding additional functionality to your program. For example:\n",
|
||
"\n",
|
||
"### 1. Accounting for Height Differences\n",
|
||
"\n",
|
||
"The current set of linear equations does not consider the height difference between the microphones and the car. This leads to slight inaccuracies, especially when the car is close to a microphone.\n",
|
||
"\n",
|
||
"**Extension Task:**\n",
|
||
"\n",
|
||
"- Augment the equations to include the $z$ variable (height of the car).\n",
|
||
"- Use the known value of $z_{\\text{car}} = 0$ to eliminate the variable or adjust the equations accordingly.\n",
|
||
"\n",
|
||
"### 2. Implementing Advanced Localization Algorithms\n",
|
||
"\n",
|
||
"The provided method is simple but may be unreliable in certain situations (e.g., when the distances to two microphones are equal).\n",
|
||
"\n",
|
||
"**Extension Task:**\n",
|
||
"\n",
|
||
"- Research and implement more advanced algorithms from literature, such as:\n",
|
||
"\n",
|
||
" - Stephen Bancroft, “An algebraic solution of the GPS equations”, IEEE Transactions on Aerospace and Electronic Systems, vol.21, no.7, pp.56-59, January 1985.\n",
|
||
" - Amir Beck, Petre Stoica, and Jian Li, “Exact and Approximate Solutions of Source Localization Problems”, IEEE Transactions on Signal Processing, vol.56, no.5, pp. 1770-1778, May 2008.\n",
|
||
"\n",
|
||
"### 3. Grid Search Method\n",
|
||
"\n",
|
||
"Instead of the linear algebra approach, implement a grid search where the room is partitioned into a dense grid of possible positions, and each location is evaluated against the TDOA data to find the best fit.\n",
|
||
"\n",
|
||
"**Extension Task:**\n",
|
||
"\n",
|
||
"- Perform a coarse grid search with larger steps (e.g., 10 cm).\n",
|
||
"- Refine the search in regions with the best matches.\n",
|
||
"\n",
|
||
"**Hints:**\n",
|
||
"\n",
|
||
"- Calculate the theoretical TDOAs for each grid point.\n",
|
||
"- Compute an error metric (e.g., sum of squared differences) between theoretical and measured TDOAs.\n",
|
||
"- Select the grid point with the minimum error as the estimated position.\n",
|
||
"\n",
|
||
"### 4. Robustness\n",
|
||
"\n",
|
||
"Along with the estimated $(x,y)$ location, your code could also return a parameter that indicates the reliability of that position. For this you could use the equation error (of the estimated TDOA vs the theoretical TDOA expected for this $(x,y)$), which is basically the same as the value of the cost function in the grid search. Subsequent modules that use your $(x,y)$ estimate could then judge whether to use or reject this position."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Assessment and Reporting\n",
|
||
"\n",
|
||
"### Mid-term Assessment and Report\n",
|
||
"\n",
|
||
"Ultimately in Week 4, you will showcase the functionality of your localization script to your assigned TA. You should demonstrate proper localization of the car on the 3 recordings with unknown coordinates.\n",
|
||
"\n",
|
||
"After you pass this assessment, you are ready to document your results in your midterm report. A detailed report is required, covering:\n",
|
||
"\n",
|
||
"- **Approach:** Describe the methods and algorithms used.\n",
|
||
"- **Implementation:** Explain how you implemented the algorithms.\n",
|
||
"- **Testing:** Show results from tests on simulated and provided test data.\n",
|
||
"- **Results:** Present localization results, error analysis, and accuracy.\n",
|
||
"- **Conclusion:** Summarize the expected accuracy and reliability of your localization method.\n",
|
||
"\n",
|
||
"Please review the guidelines in Chapter 7 for more information.\n",
|
||
"\n",
|
||
"### After the Midterm: Reference Signal and Integration\n",
|
||
"\n",
|
||
"If you have completed this module successfully, you can start integrating and estimating the car’s location from your own microphone measurements.\n",
|
||
"\n",
|
||
"#### Reference Signal Selection\n",
|
||
"\n",
|
||
"Using a good beacon signal is important, because it determines the quality of your location estimates. Since the time in IP3 is limited, we will give a few general hints, but note that the topic could be explored in much more detail.\n",
|
||
"\n",
|
||
"In your ch3 algorithm (deconvolution in frequency domain), you have seen that we omit all frequencies where the beacon signal amplitude spectrum $|X(F)|$ is small, because we divide by $X(F)$ and this would blow up the noise. For best results, we want $X(F)$ to be large for all frequencies. Therefore, ideally, we have a flat power spectrum. For random signals, this corresponds to the use of \"white noise\". \n",
|
||
"\n",
|
||
"**Beacon code** Our beacon code consists of 32 bits, each 0 or 1. White noise corresponds to a beacon code for which the autocorrelation resembles a delta spike. You could generate a bit code using 'rand', and then round to 0 or 1. A further consideration is that the code should start and end with '1', or else you are using in fact a code that is shorter than 32 bits. \n",
|
||
"\n",
|
||
"After generating a code, check its autocorrelation function, $x[n]\\ast x[-n]$. \n",
|
||
"You want a strong peak for the 0-lag of the autocorrelation but as low as possible for any other lag. You can also check its spectral properties.\n",
|
||
"*Hint:* Randomly generated codes are suitable for our purposes. You could try some optimal codes (check communication theory literature for “gold codes”), but don't waste too much time in trying to find the best code.\n",
|
||
"\n",
|
||
"**Carrier frequency and bit frequency** The carrier frequency defines the \"pitch\" of the transmitted signal. Since we use audio equipment which is optimized for human listening, it is probably better to use carriers below 10 kHz. \n",
|
||
"\n",
|
||
"The bit frequency determines the bandwidth of the transmitted signal. Since we want to probe the channel on as many different frequencies as possible, we would select a large bit frequency (but it can't be larger than the carrier frequency). The hardware implementation seems to limit the total output power. If we use a wider bandwidth, the energy per hertz is reduced. Thus, there is a trade-off. You can try to find good values experimentally, i.e. try a range of bit frequencies in steps of 1000 Hz, and see which setting gives most accurate results.\n",
|
||
"\n",
|
||
"_Note:_ Last year it appeared the implementation of the firmware on the car has a bug which prevents modifying the beacon parameters? Let us know if you experience this.\n",
|
||
"\n",
|
||
"A perfect repetition count is not yet required; but you need to make sure that the full code can be transmitted and recorded on all microphones within the same recording window.\n",
|
||
"\n",
|
||
"**Field Testing:** A theoretical signal may not work well in practice. Use your field test time well. Consider recording a bunch of different audio beacon settings and analyzing them later.\n",
|
||
"\n",
|
||
"### Integration Assignment\n",
|
||
"\n",
|
||
"#### **Make Test Recording**\n",
|
||
"\n",
|
||
"- Record the signal transmitted over KITT’s beacon with your selected parameters.\n",
|
||
"- Place the microphone close to KITT’s beacon to get a clean recording.\n",
|
||
"- Avoid clipping by adjusting recording levels.\n",
|
||
"\n",
|
||
"#### **Create Reference Signal**\n",
|
||
"\n",
|
||
"- Load the recording into Python.\n",
|
||
"- Clean it by removing zero intervals and extracting a single pulse.\n",
|
||
"- Use this as your reference signal in deconvolution.\n",
|
||
"\n",
|
||
"#### **Test Performance**\n",
|
||
"\n",
|
||
"- Place KITT at a known location (e.g., the center of the field).\n",
|
||
"- Make recordings and run your localization algorithm.\n",
|
||
"- Evaluate the accuracy.\n",
|
||
"\n",
|
||
"#### **Integrate with Recording Code**\n",
|
||
"\n",
|
||
"- Combine your localization algorithm with the recording code.\n",
|
||
"- Address any issues such as blocking recording.\n",
|
||
"- Aim to locate KITT in real-time while it is moving.\n",
|
||
"\n",
|
||
"#### **Refinements Needed for a Driving Car**\n",
|
||
"\n",
|
||
"- Add timestamp information to your location estimates.\n",
|
||
"- Segment your recording and take only the last (most recent) pulse.\n",
|
||
"- Consider the delays between recording, processing, and KITT's movement. Figure out how long ago was the most recent pulse.\n",
|
||
"- Use a high repetition rate and short recording lengths.\n",
|
||
"- Check out the difference between \"blocking\" and \"non-blocking\" recordings. Perhaps you will need multi-threading such that the localization can run in parallel with the control.\n",
|
||
"\n",
|
||
"### Deliverable (Final Report)\n",
|
||
"\n",
|
||
"Show your selected beacon parameters and comment on the resulting performance.\n",
|
||
"\n",
|
||
"- **Accuracy:** How accurate is your localization algorithm?\n",
|
||
"- **Real-time Operation:** Are you able to drive and locate KITT simultaneously in real-time?\n",
|
||
"- **Discussion:** Address any challenges faced and how you overcame them.\n",
|
||
"\n",
|
||
"**Note to Students:**\n",
|
||
"\n",
|
||
"- Remember to document your code thoroughly.\n",
|
||
"- Include explanations for your choices and any assumptions made.\n",
|
||
"- Test your algorithms extensively with both simulated and real data.\n"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "booktestenv",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.12.4"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|