docs: update documentation for capture redesign and validation

This commit is contained in:
canisio
2026-04-29 10:03:34 -03:00
parent b3ba729f8b
commit 19b0513809
3 changed files with 160 additions and 123 deletions

View File

@@ -11,21 +11,34 @@ The system implements a high-throughput signal chain in the FPGA (PL) and perfor
## Current Status
- Tx subsystem: LFM pulse generator (DDS-based, complex output)
- Rx subsystem: fully functional channelizer pipeline (PFB-based)
- Rx subsystem: fully functional channelizer pipeline (PFB-based) or bypass
- PL → PS interface: AXI4-Stream + DMA operational
- PS processing: frame-based algorithm (RMS + peak detection)
- PS processing: frame-based algorithm on a Data Process Window (DPW)
---
## System Architecture
ADC → Channelizer (PFB, 512 bins)
FFT_Capture (frame control)
FIFO Serializer (4 FIFOs → 1 stream)
AXI4-Stream (uint64)
Tx (PL)
Waveform Generator (LFM / CW / Pulsed)
DAC
RF Loopback / Input
Rx (PL)
→ ADC
→ Channelizer (PFB, 512 bins) / Bypass / Counter
→ Capture (frame control)
→ AXI4-Stream (128-bit, 4 samples/clock)
→ DMA (S2MM)
→ PS Memory
→ Processor Algorithm
→ Processor Algorithm
Post Processing (PS)
→ Triggered Capture
→ Sample Unpacking (I/Q)
→ Data Reshaping → [FrameSize x nFrames x nTriggers]
→ Host Communication / Processing / Visualization
→ One DPW is a windows of FrameSize x nFrames samples
---

View File

@@ -6,11 +6,9 @@
## Overview
The Rx subsystem implements a **polyphase filter bank (PFB) channelizer** followed by FFT processing.
The Rx subsystem implements a **polyphase filter bank (PFB) channelizer** followed by FFT processing, a **bypass path**, and a **multi-frame capture pipeline**.
It converts wideband ADC input into frequency-domain channels and streams the result to the PS.
A **bypass path** is also available for raw data inspection and debugging.
It converts wideband ADC input into frequency-domain channels (or raw samples via bypass) and streams the result to the PS.
---
@@ -18,67 +16,46 @@ A **bypass path** is also available for raw data inspection and debugging.
### Channelizer Path (default)
ADC
PFB Channelizer (Decimation + Filtering)
FFT (512 bins)
FFT Capture
FIFO Serializer (4 → 1)
AXI4-Stream
ADC
PFB Channelizer (Decimation + Filtering)
FFT (512 bins)
Capture (frame control)
AXI4-Stream (128-bit, 4 samples/clock)
DMA
---
### Bypass Path (Debug / Raw Data)
ADC
Bypass Path
FIFO / Serializer
AXI4-Stream
ADC
Bypass Path
Capture (frame control)
AXI4-Stream (128-bit, 4 samples/clock)
DMA
---
## Bypass Functionality
## Capture Pipeline
The bypass allows direct observation of the input signal without channelization.
### Purpose
- Debugging and validation
- Access to raw ADC-domain data
- Comparison with channelized output
- Verification of downstream processing
---
- Multi-frame acquisition (configurable nFrames)
- Frame size: 512 samples
- Supports asynchronous capture start (not frame-aligned)
- TLAST asserted at frame boundaries
### Behavior
- Input data is routed directly to output
- No filtering or FFT applied
- Maintains same output interface (AXI4-Stream)
---
### Selection Mechanism
A selector signal chooses between:
- Channelizer output (normal operation)
- Bypass output (raw data)
Implementation typically uses:
- Parallel paths
- Output switching logic
- First frame may be partial
- Frames may contain ≤ 2 frame indices (expected)
- DPW spans nFrames frames but covers nFrames + 1 frame regions
---
@@ -86,22 +63,19 @@ Implementation typically uses:
### ADC Input
- Sampling rate: 4096 MSPS
- Data type: **fixdt(1,16,15)** (Q1.15 format)
- Data type: **fixdt(1,16,15)** (Q1.15)
### PFB Channelizer
- Decimation: 8
- Effective bandwidth: 512 MHz
- Input and internal scaling aligned to Q1.15 domain
### FFT
- Size: 512
- Produces frequency bins
### FFT Capture
- Controls frame boundaries
### FIFO Serializer
- Converts parallel streams into single stream
### Capture
- Defines frame boundaries (512 samples)
- Generates TLAST
---
@@ -109,62 +83,57 @@ Implementation typically uses:
### System Standardization
The signal chain was standardized to a **Q1.15 fixed-point format (fixdt(1,16,15))**:
- DAC output uses Q1.15
- ADC input is reinterpreted as Q1.15 (Same Stored Integer)
- Channelizer input operates in this normalized domain
---
- End-to-end Q1.15 (**fixdt(1,16,15)**)
### Channelizer Output Scaling
- Native channelizer output: **sFix25_En23**
- Rescaled and quantized to: **fixdt(1,16,15)**
This conversion:
- Preserves signal dynamic range
- Maximizes fractional precision
- Uses rounding and saturation
- Aligns with system-wide numeric format
- Native: **sFix25_En23**
- Quantized to: **fixdt(1,16,15)** (round + saturate)
---
### Data Width Reduction
## Data Packing (Updated)
- Previous format: **50 bits per complex sample** (25 bits real + 25 bits imag)
- New format: **32 bits per complex sample** (16 bits real + 16 bits imag)
- 4 samples per clock
- Each sample: complex (16-bit real + 16-bit imag)
- Packed into **128-bit AXI4-Stream word**
Benefits:
- Reduced AXI bandwidth
- Reduced FIFO usage
- More efficient DMA transfers
- Matches datapath parallelism
- Efficient DMA transfers
- Eliminates need for serializer stage
---
## AXI4-Stream Output
- Data type: uint32 (packed complex: 16-bit real + 16-bit imag)
- Width: 128 bits
- Contains 4 complex samples per cycle
- TLAST = frame boundary
---
## Data Format
## Debug / Validation Features
- Frame size: 512 samples
- Complex samples packed into 32-bit words
A counter-based debug mode is implemented:
- Real part → sample counter (0..511)
- Imag part → frame index
Used to validate:
- Sample continuity
- Frame boundaries
- DMA ordering and integrity
---
## Key Characteristics
- Fully streaming pipeline
- High throughput
- Deterministic latency
- Consistent fixed-point scaling (Q1.15 end-to-end)
- Supports dual-mode operation (channelizer / bypass)
- High throughput (4 samples/clock)
- Dual-mode operation (channelizer / bypass)
- Validated up to nFrames = 1024
---

View File

@@ -1,4 +1,4 @@
# 🧠 PS Subsystem (Control + Processing)
# 🧠 PS Subsystem (Control + Capture + Processing)
[🏠 Project Home](../README.md)
@@ -8,73 +8,128 @@
The PS subsystem is responsible for:
- System initialization
- Configuring PL subsystems
- Triggering captures
- Receiving data via DMA
- Performing frame-based processing
- Preparing data for processing and visualization
The current implementation acts as a **placeholder for post-processing**, focusing on reliable data acquisition and host interaction.
---
## Responsibilities
### Control
### Control & Initialization
- Writes parameters to PL registers:
- Tx generator configuration
- Generates TxPulseStart trigger
- Configure PL parameters:
- Tx waveform configuration
- Capture parameters (nFrames, etc.)
- Initialize DMA and memory buffers
- Manage system startup
---
### Trigger & Capture
- Generates capture trigger (software-controlled)
- Controls DPW acquisition timing
- Each trigger initiates one DPW capture
---
### DMA Handling
- AXI4-Stream → DMA (S2MM)
- Data stored in PS DDR
- Receives **128-bit stream** (4 samples per clock)
- Stores data in PS DDR memory
Configuration:
- Frame size: 512
- Buffers: 16
- Frame size: 512 samples
- nFrames: configurable (validated up to 1024)
---
### Processing Pipeline
## Data Format
DMA → uint64[512]
→ unpack real/imag
→ convert to complex
→ RMS + peak detection
### Raw DMA Data
- Packed complex samples
- 16-bit real + 16-bit imag per sample
- 4 samples per 128-bit word
---
### Processing Representation
Data is unpacked and reshaped into:
```
[FrameSize x nFrames x nTriggers]
```
---
## Processing Pipeline (Current)
DMA
→ Unpack samples (I/Q separation)
→ Convert to complex representation
→ Reshape into 3D structure
→ Visualization / basic analysis
---
## Validation Support
Uses counter-based validation:
- Real part → sample counter
- Imag part → frame index
Enables verification of:
- Data continuity
- Frame alignment
- Correct ordering from DMA
---
## Execution Model
- Event-driven (DMA trigger)
- No buffering queue
- Frames may be dropped
- Triggered (event-based)
- Burst capture (DPW)
- Not continuous real-time streaming
---
## Performance Notes
- Bottleneck: unpacking + conversion
- Cannot sustain full-rate input
- Designed for correctness and validation (not optimized)
- Bottleneck: unpacking + data movement
- Full-rate continuous processing not supported
---
## Interaction with PL
## Role in System
### Tx Control
- Low-rate trigger (~Hz)
- Starts burst generation
The PS currently serves as:
### Rx Data
- Continuous high-rate stream
- Control interface
- Data acquisition manager
- Pre-processing stage
Future implementations will replace the current processing with advanced algorithms (e.g., FrFT).
---
## Future Work
- Replace processing with FrFT
- NEON optimization
- Throughput improvements
- FrFT-based processing
- Timestamp integration
- UDP streaming
- Optimization (NEON / vectorization)
- Metadata extraction (move complexity to PL)
---