From 19b05138098032537f2ec1071b3126e2bc8f77fc Mon Sep 17 00:00:00 2001 From: canisio Date: Wed, 29 Apr 2026 10:03:34 -0300 Subject: [PATCH] docs: update documentation for capture redesign and validation --- README.md | 27 ++++++-- docs/pl_rx_subsystem.md | 145 ++++++++++++++++------------------------ docs/ps_subsystem.md | 111 ++++++++++++++++++++++-------- 3 files changed, 160 insertions(+), 123 deletions(-) diff --git a/README.md b/README.md index f76822c..761f4c3 100644 --- a/README.md +++ b/README.md @@ -11,21 +11,34 @@ The system implements a high-throughput signal chain in the FPGA (PL) and perfor ## Current Status - Tx subsystem: LFM pulse generator (DDS-based, complex output) -- Rx subsystem: fully functional channelizer pipeline (PFB-based) +- Rx subsystem: fully functional channelizer pipeline (PFB-based) or bypass - PL → PS interface: AXI4-Stream + DMA operational -- PS processing: frame-based algorithm (RMS + peak detection) +- PS processing: frame-based algorithm on a Data Process Window (DPW) --- ## System Architecture -ADC → Channelizer (PFB, 512 bins) -→ FFT_Capture (frame control) -→ FIFO Serializer (4 FIFOs → 1 stream) -→ AXI4-Stream (uint64) +Tx (PL) +→ Waveform Generator (LFM / CW / Pulsed) +→ DAC +→ RF Loopback / Input + +Rx (PL) +→ ADC +→ Channelizer (PFB, 512 bins) / Bypass / Counter +→ Capture (frame control) +→ AXI4-Stream (128-bit, 4 samples/clock) → DMA (S2MM) → PS Memory -→ Processor Algorithm +→ Processor Algorithm + +Post Processing (PS) +→ Triggered Capture +→ Sample Unpacking (I/Q) +→ Data Reshaping → [FrameSize x nFrames x nTriggers] +→ Host Communication / Processing / Visualization +→ One DPW is a windows of FrameSize x nFrames samples --- diff --git a/docs/pl_rx_subsystem.md b/docs/pl_rx_subsystem.md index 789a9e0..dd2432d 100644 --- a/docs/pl_rx_subsystem.md +++ b/docs/pl_rx_subsystem.md @@ -6,11 +6,9 @@ ## Overview -The Rx subsystem implements a **polyphase filter bank (PFB) channelizer** followed by FFT processing. +The Rx subsystem implements a **polyphase filter bank (PFB) channelizer** followed by FFT processing, a **bypass path**, and a **multi-frame capture pipeline**. -It converts wideband ADC input into frequency-domain channels and streams the result to the PS. - -A **bypass path** is also available for raw data inspection and debugging. +It converts wideband ADC input into frequency-domain channels (or raw samples via bypass) and streams the result to the PS. --- @@ -18,67 +16,46 @@ A **bypass path** is also available for raw data inspection and debugging. ### Channelizer Path (default) -ADC - ↓ -PFB Channelizer (Decimation + Filtering) - ↓ -FFT (512 bins) - ↓ -FFT Capture - ↓ -FIFO Serializer (4 → 1) - ↓ -AXI4-Stream - ↓ +ADC + ↓ +PFB Channelizer (Decimation + Filtering) + ↓ +FFT (512 bins) + ↓ +Capture (frame control) + ↓ +AXI4-Stream (128-bit, 4 samples/clock) + ↓ DMA --- ### Bypass Path (Debug / Raw Data) -ADC - ↓ -Bypass Path - ↓ -FIFO / Serializer - ↓ -AXI4-Stream - ↓ +ADC + ↓ +Bypass Path + ↓ +Capture (frame control) + ↓ +AXI4-Stream (128-bit, 4 samples/clock) + ↓ DMA --- -## Bypass Functionality +## Capture Pipeline -The bypass allows direct observation of the input signal without channelization. - -### Purpose - -- Debugging and validation -- Access to raw ADC-domain data -- Comparison with channelized output -- Verification of downstream processing - ---- +- Multi-frame acquisition (configurable nFrames) +- Frame size: 512 samples +- Supports asynchronous capture start (not frame-aligned) +- TLAST asserted at frame boundaries ### Behavior -- Input data is routed directly to output -- No filtering or FFT applied -- Maintains same output interface (AXI4-Stream) - ---- - -### Selection Mechanism - -A selector signal chooses between: - -- Channelizer output (normal operation) -- Bypass output (raw data) - -Implementation typically uses: -- Parallel paths -- Output switching logic +- First frame may be partial +- Frames may contain ≤ 2 frame indices (expected) +- DPW spans nFrames frames but covers nFrames + 1 frame regions --- @@ -86,22 +63,19 @@ Implementation typically uses: ### ADC Input - Sampling rate: 4096 MSPS -- Data type: **fixdt(1,16,15)** (Q1.15 format) +- Data type: **fixdt(1,16,15)** (Q1.15) ### PFB Channelizer - Decimation: 8 - Effective bandwidth: 512 MHz -- Input and internal scaling aligned to Q1.15 domain ### FFT - Size: 512 - Produces frequency bins -### FFT Capture -- Controls frame boundaries - -### FIFO Serializer -- Converts parallel streams into single stream +### Capture +- Defines frame boundaries (512 samples) +- Generates TLAST --- @@ -109,62 +83,57 @@ Implementation typically uses: ### System Standardization -The signal chain was standardized to a **Q1.15 fixed-point format (fixdt(1,16,15))**: - -- DAC output uses Q1.15 -- ADC input is reinterpreted as Q1.15 (Same Stored Integer) -- Channelizer input operates in this normalized domain - ---- +- End-to-end Q1.15 (**fixdt(1,16,15)**) ### Channelizer Output Scaling -- Native channelizer output: **sFix25_En23** -- Rescaled and quantized to: **fixdt(1,16,15)** - -This conversion: - -- Preserves signal dynamic range -- Maximizes fractional precision -- Uses rounding and saturation -- Aligns with system-wide numeric format +- Native: **sFix25_En23** +- Quantized to: **fixdt(1,16,15)** (round + saturate) --- -### Data Width Reduction +## Data Packing (Updated) -- Previous format: **50 bits per complex sample** (25 bits real + 25 bits imag) -- New format: **32 bits per complex sample** (16 bits real + 16 bits imag) +- 4 samples per clock +- Each sample: complex (16-bit real + 16-bit imag) +- Packed into **128-bit AXI4-Stream word** Benefits: - -- Reduced AXI bandwidth -- Reduced FIFO usage -- More efficient DMA transfers +- Matches datapath parallelism +- Efficient DMA transfers +- Eliminates need for serializer stage --- ## AXI4-Stream Output -- Data type: uint32 (packed complex: 16-bit real + 16-bit imag) +- Width: 128 bits +- Contains 4 complex samples per cycle - TLAST = frame boundary --- -## Data Format +## Debug / Validation Features -- Frame size: 512 samples -- Complex samples packed into 32-bit words +A counter-based debug mode is implemented: + +- Real part → sample counter (0..511) +- Imag part → frame index + +Used to validate: +- Sample continuity +- Frame boundaries +- DMA ordering and integrity --- ## Key Characteristics - Fully streaming pipeline -- High throughput - Deterministic latency -- Consistent fixed-point scaling (Q1.15 end-to-end) -- Supports dual-mode operation (channelizer / bypass) +- High throughput (4 samples/clock) +- Dual-mode operation (channelizer / bypass) +- Validated up to nFrames = 1024 --- diff --git a/docs/ps_subsystem.md b/docs/ps_subsystem.md index df0aa50..c24877b 100644 --- a/docs/ps_subsystem.md +++ b/docs/ps_subsystem.md @@ -1,4 +1,4 @@ -# 🧠 PS Subsystem (Control + Processing) +# 🧠 PS Subsystem (Control + Capture + Processing) [🏠 Project Home](../README.md) @@ -8,73 +8,128 @@ The PS subsystem is responsible for: +- System initialization - Configuring PL subsystems +- Triggering captures - Receiving data via DMA -- Performing frame-based processing +- Preparing data for processing and visualization + +The current implementation acts as a **placeholder for post-processing**, focusing on reliable data acquisition and host interaction. --- ## Responsibilities -### Control +### Control & Initialization -- Writes parameters to PL registers: - - Tx generator configuration -- Generates TxPulseStart trigger +- Configure PL parameters: + - Tx waveform configuration + - Capture parameters (nFrames, etc.) +- Initialize DMA and memory buffers +- Manage system startup + +--- + +### Trigger & Capture + +- Generates capture trigger (software-controlled) +- Controls DPW acquisition timing +- Each trigger initiates one DPW capture --- ### DMA Handling - AXI4-Stream → DMA (S2MM) -- Data stored in PS DDR +- Receives **128-bit stream** (4 samples per clock) +- Stores data in PS DDR memory Configuration: -- Frame size: 512 -- Buffers: 16 +- Frame size: 512 samples +- nFrames: configurable (validated up to 1024) --- -### Processing Pipeline +## Data Format -DMA → uint64[512] -→ unpack real/imag -→ convert to complex -→ RMS + peak detection +### Raw DMA Data + +- Packed complex samples +- 16-bit real + 16-bit imag per sample +- 4 samples per 128-bit word + +--- + +### Processing Representation + +Data is unpacked and reshaped into: + +``` +[FrameSize x nFrames x nTriggers] +``` + +--- + +## Processing Pipeline (Current) + +DMA +→ Unpack samples (I/Q separation) +→ Convert to complex representation +→ Reshape into 3D structure +→ Visualization / basic analysis + +--- + +## Validation Support + +Uses counter-based validation: + +- Real part → sample counter +- Imag part → frame index + +Enables verification of: + +- Data continuity +- Frame alignment +- Correct ordering from DMA --- ## Execution Model -- Event-driven (DMA trigger) -- No buffering queue -- Frames may be dropped +- Triggered (event-based) +- Burst capture (DPW) +- Not continuous real-time streaming --- ## Performance Notes -- Bottleneck: unpacking + conversion -- Cannot sustain full-rate input +- Designed for correctness and validation (not optimized) +- Bottleneck: unpacking + data movement +- Full-rate continuous processing not supported --- -## Interaction with PL +## Role in System -### Tx Control -- Low-rate trigger (~Hz) -- Starts burst generation +The PS currently serves as: -### Rx Data -- Continuous high-rate stream +- Control interface +- Data acquisition manager +- Pre-processing stage + +Future implementations will replace the current processing with advanced algorithms (e.g., FrFT). --- ## Future Work -- Replace processing with FrFT -- NEON optimization -- Throughput improvements +- FrFT-based processing +- Timestamp integration +- UDP streaming +- Optimization (NEON / vectorization) +- Metadata extraction (move complexity to PL) ---