docs: update documentation for capture redesign and validation
This commit is contained in:
25
README.md
25
README.md
@@ -11,22 +11,35 @@ The system implements a high-throughput signal chain in the FPGA (PL) and perfor
|
||||
## Current Status
|
||||
|
||||
- Tx subsystem: LFM pulse generator (DDS-based, complex output)
|
||||
- Rx subsystem: fully functional channelizer pipeline (PFB-based)
|
||||
- Rx subsystem: fully functional channelizer pipeline (PFB-based) or bypass
|
||||
- PL → PS interface: AXI4-Stream + DMA operational
|
||||
- PS processing: frame-based algorithm (RMS + peak detection)
|
||||
- PS processing: frame-based algorithm on a Data Process Window (DPW)
|
||||
|
||||
---
|
||||
|
||||
## System Architecture
|
||||
|
||||
ADC → Channelizer (PFB, 512 bins)
|
||||
→ FFT_Capture (frame control)
|
||||
→ FIFO Serializer (4 FIFOs → 1 stream)
|
||||
→ AXI4-Stream (uint64)
|
||||
Tx (PL)
|
||||
→ Waveform Generator (LFM / CW / Pulsed)
|
||||
→ DAC
|
||||
→ RF Loopback / Input
|
||||
|
||||
Rx (PL)
|
||||
→ ADC
|
||||
→ Channelizer (PFB, 512 bins) / Bypass / Counter
|
||||
→ Capture (frame control)
|
||||
→ AXI4-Stream (128-bit, 4 samples/clock)
|
||||
→ DMA (S2MM)
|
||||
→ PS Memory
|
||||
→ Processor Algorithm
|
||||
|
||||
Post Processing (PS)
|
||||
→ Triggered Capture
|
||||
→ Sample Unpacking (I/Q)
|
||||
→ Data Reshaping → [FrameSize x nFrames x nTriggers]
|
||||
→ Host Communication / Processing / Visualization
|
||||
→ One DPW is a windows of FrameSize x nFrames samples
|
||||
|
||||
---
|
||||
|
||||
## Key Parameters
|
||||
|
||||
@@ -6,11 +6,9 @@
|
||||
|
||||
## Overview
|
||||
|
||||
The Rx subsystem implements a **polyphase filter bank (PFB) channelizer** followed by FFT processing.
|
||||
The Rx subsystem implements a **polyphase filter bank (PFB) channelizer** followed by FFT processing, a **bypass path**, and a **multi-frame capture pipeline**.
|
||||
|
||||
It converts wideband ADC input into frequency-domain channels and streams the result to the PS.
|
||||
|
||||
A **bypass path** is also available for raw data inspection and debugging.
|
||||
It converts wideband ADC input into frequency-domain channels (or raw samples via bypass) and streams the result to the PS.
|
||||
|
||||
---
|
||||
|
||||
@@ -24,11 +22,9 @@ PFB Channelizer (Decimation + Filtering)
|
||||
↓
|
||||
FFT (512 bins)
|
||||
↓
|
||||
FFT Capture
|
||||
Capture (frame control)
|
||||
↓
|
||||
FIFO Serializer (4 → 1)
|
||||
↓
|
||||
AXI4-Stream
|
||||
AXI4-Stream (128-bit, 4 samples/clock)
|
||||
↓
|
||||
DMA
|
||||
|
||||
@@ -40,45 +36,26 @@ ADC
|
||||
↓
|
||||
Bypass Path
|
||||
↓
|
||||
FIFO / Serializer
|
||||
Capture (frame control)
|
||||
↓
|
||||
AXI4-Stream
|
||||
AXI4-Stream (128-bit, 4 samples/clock)
|
||||
↓
|
||||
DMA
|
||||
|
||||
---
|
||||
|
||||
## Bypass Functionality
|
||||
## Capture Pipeline
|
||||
|
||||
The bypass allows direct observation of the input signal without channelization.
|
||||
|
||||
### Purpose
|
||||
|
||||
- Debugging and validation
|
||||
- Access to raw ADC-domain data
|
||||
- Comparison with channelized output
|
||||
- Verification of downstream processing
|
||||
|
||||
---
|
||||
- Multi-frame acquisition (configurable nFrames)
|
||||
- Frame size: 512 samples
|
||||
- Supports asynchronous capture start (not frame-aligned)
|
||||
- TLAST asserted at frame boundaries
|
||||
|
||||
### Behavior
|
||||
|
||||
- Input data is routed directly to output
|
||||
- No filtering or FFT applied
|
||||
- Maintains same output interface (AXI4-Stream)
|
||||
|
||||
---
|
||||
|
||||
### Selection Mechanism
|
||||
|
||||
A selector signal chooses between:
|
||||
|
||||
- Channelizer output (normal operation)
|
||||
- Bypass output (raw data)
|
||||
|
||||
Implementation typically uses:
|
||||
- Parallel paths
|
||||
- Output switching logic
|
||||
- First frame may be partial
|
||||
- Frames may contain ≤ 2 frame indices (expected)
|
||||
- DPW spans nFrames frames but covers nFrames + 1 frame regions
|
||||
|
||||
---
|
||||
|
||||
@@ -86,22 +63,19 @@ Implementation typically uses:
|
||||
|
||||
### ADC Input
|
||||
- Sampling rate: 4096 MSPS
|
||||
- Data type: **fixdt(1,16,15)** (Q1.15 format)
|
||||
- Data type: **fixdt(1,16,15)** (Q1.15)
|
||||
|
||||
### PFB Channelizer
|
||||
- Decimation: 8
|
||||
- Effective bandwidth: 512 MHz
|
||||
- Input and internal scaling aligned to Q1.15 domain
|
||||
|
||||
### FFT
|
||||
- Size: 512
|
||||
- Produces frequency bins
|
||||
|
||||
### FFT Capture
|
||||
- Controls frame boundaries
|
||||
|
||||
### FIFO Serializer
|
||||
- Converts parallel streams into single stream
|
||||
### Capture
|
||||
- Defines frame boundaries (512 samples)
|
||||
- Generates TLAST
|
||||
|
||||
---
|
||||
|
||||
@@ -109,62 +83,57 @@ Implementation typically uses:
|
||||
|
||||
### System Standardization
|
||||
|
||||
The signal chain was standardized to a **Q1.15 fixed-point format (fixdt(1,16,15))**:
|
||||
|
||||
- DAC output uses Q1.15
|
||||
- ADC input is reinterpreted as Q1.15 (Same Stored Integer)
|
||||
- Channelizer input operates in this normalized domain
|
||||
|
||||
---
|
||||
- End-to-end Q1.15 (**fixdt(1,16,15)**)
|
||||
|
||||
### Channelizer Output Scaling
|
||||
|
||||
- Native channelizer output: **sFix25_En23**
|
||||
- Rescaled and quantized to: **fixdt(1,16,15)**
|
||||
|
||||
This conversion:
|
||||
|
||||
- Preserves signal dynamic range
|
||||
- Maximizes fractional precision
|
||||
- Uses rounding and saturation
|
||||
- Aligns with system-wide numeric format
|
||||
- Native: **sFix25_En23**
|
||||
- Quantized to: **fixdt(1,16,15)** (round + saturate)
|
||||
|
||||
---
|
||||
|
||||
### Data Width Reduction
|
||||
## Data Packing (Updated)
|
||||
|
||||
- Previous format: **50 bits per complex sample** (25 bits real + 25 bits imag)
|
||||
- New format: **32 bits per complex sample** (16 bits real + 16 bits imag)
|
||||
- 4 samples per clock
|
||||
- Each sample: complex (16-bit real + 16-bit imag)
|
||||
- Packed into **128-bit AXI4-Stream word**
|
||||
|
||||
Benefits:
|
||||
|
||||
- Reduced AXI bandwidth
|
||||
- Reduced FIFO usage
|
||||
- More efficient DMA transfers
|
||||
- Matches datapath parallelism
|
||||
- Efficient DMA transfers
|
||||
- Eliminates need for serializer stage
|
||||
|
||||
---
|
||||
|
||||
## AXI4-Stream Output
|
||||
|
||||
- Data type: uint32 (packed complex: 16-bit real + 16-bit imag)
|
||||
- Width: 128 bits
|
||||
- Contains 4 complex samples per cycle
|
||||
- TLAST = frame boundary
|
||||
|
||||
---
|
||||
|
||||
## Data Format
|
||||
## Debug / Validation Features
|
||||
|
||||
- Frame size: 512 samples
|
||||
- Complex samples packed into 32-bit words
|
||||
A counter-based debug mode is implemented:
|
||||
|
||||
- Real part → sample counter (0..511)
|
||||
- Imag part → frame index
|
||||
|
||||
Used to validate:
|
||||
- Sample continuity
|
||||
- Frame boundaries
|
||||
- DMA ordering and integrity
|
||||
|
||||
---
|
||||
|
||||
## Key Characteristics
|
||||
|
||||
- Fully streaming pipeline
|
||||
- High throughput
|
||||
- Deterministic latency
|
||||
- Consistent fixed-point scaling (Q1.15 end-to-end)
|
||||
- Supports dual-mode operation (channelizer / bypass)
|
||||
- High throughput (4 samples/clock)
|
||||
- Dual-mode operation (channelizer / bypass)
|
||||
- Validated up to nFrames = 1024
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# 🧠 PS Subsystem (Control + Processing)
|
||||
# 🧠 PS Subsystem (Control + Capture + Processing)
|
||||
|
||||
[🏠 Project Home](../README.md)
|
||||
|
||||
@@ -8,73 +8,128 @@
|
||||
|
||||
The PS subsystem is responsible for:
|
||||
|
||||
- System initialization
|
||||
- Configuring PL subsystems
|
||||
- Triggering captures
|
||||
- Receiving data via DMA
|
||||
- Performing frame-based processing
|
||||
- Preparing data for processing and visualization
|
||||
|
||||
The current implementation acts as a **placeholder for post-processing**, focusing on reliable data acquisition and host interaction.
|
||||
|
||||
---
|
||||
|
||||
## Responsibilities
|
||||
|
||||
### Control
|
||||
### Control & Initialization
|
||||
|
||||
- Writes parameters to PL registers:
|
||||
- Tx generator configuration
|
||||
- Generates TxPulseStart trigger
|
||||
- Configure PL parameters:
|
||||
- Tx waveform configuration
|
||||
- Capture parameters (nFrames, etc.)
|
||||
- Initialize DMA and memory buffers
|
||||
- Manage system startup
|
||||
|
||||
---
|
||||
|
||||
### Trigger & Capture
|
||||
|
||||
- Generates capture trigger (software-controlled)
|
||||
- Controls DPW acquisition timing
|
||||
- Each trigger initiates one DPW capture
|
||||
|
||||
---
|
||||
|
||||
### DMA Handling
|
||||
|
||||
- AXI4-Stream → DMA (S2MM)
|
||||
- Data stored in PS DDR
|
||||
- Receives **128-bit stream** (4 samples per clock)
|
||||
- Stores data in PS DDR memory
|
||||
|
||||
Configuration:
|
||||
- Frame size: 512
|
||||
- Buffers: 16
|
||||
- Frame size: 512 samples
|
||||
- nFrames: configurable (validated up to 1024)
|
||||
|
||||
---
|
||||
|
||||
### Processing Pipeline
|
||||
## Data Format
|
||||
|
||||
DMA → uint64[512]
|
||||
→ unpack real/imag
|
||||
→ convert to complex
|
||||
→ RMS + peak detection
|
||||
### Raw DMA Data
|
||||
|
||||
- Packed complex samples
|
||||
- 16-bit real + 16-bit imag per sample
|
||||
- 4 samples per 128-bit word
|
||||
|
||||
---
|
||||
|
||||
### Processing Representation
|
||||
|
||||
Data is unpacked and reshaped into:
|
||||
|
||||
```
|
||||
[FrameSize x nFrames x nTriggers]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Processing Pipeline (Current)
|
||||
|
||||
DMA
|
||||
→ Unpack samples (I/Q separation)
|
||||
→ Convert to complex representation
|
||||
→ Reshape into 3D structure
|
||||
→ Visualization / basic analysis
|
||||
|
||||
---
|
||||
|
||||
## Validation Support
|
||||
|
||||
Uses counter-based validation:
|
||||
|
||||
- Real part → sample counter
|
||||
- Imag part → frame index
|
||||
|
||||
Enables verification of:
|
||||
|
||||
- Data continuity
|
||||
- Frame alignment
|
||||
- Correct ordering from DMA
|
||||
|
||||
---
|
||||
|
||||
## Execution Model
|
||||
|
||||
- Event-driven (DMA trigger)
|
||||
- No buffering queue
|
||||
- Frames may be dropped
|
||||
- Triggered (event-based)
|
||||
- Burst capture (DPW)
|
||||
- Not continuous real-time streaming
|
||||
|
||||
---
|
||||
|
||||
## Performance Notes
|
||||
|
||||
- Bottleneck: unpacking + conversion
|
||||
- Cannot sustain full-rate input
|
||||
- Designed for correctness and validation (not optimized)
|
||||
- Bottleneck: unpacking + data movement
|
||||
- Full-rate continuous processing not supported
|
||||
|
||||
---
|
||||
|
||||
## Interaction with PL
|
||||
## Role in System
|
||||
|
||||
### Tx Control
|
||||
- Low-rate trigger (~Hz)
|
||||
- Starts burst generation
|
||||
The PS currently serves as:
|
||||
|
||||
### Rx Data
|
||||
- Continuous high-rate stream
|
||||
- Control interface
|
||||
- Data acquisition manager
|
||||
- Pre-processing stage
|
||||
|
||||
Future implementations will replace the current processing with advanced algorithms (e.g., FrFT).
|
||||
|
||||
---
|
||||
|
||||
## Future Work
|
||||
|
||||
- Replace processing with FrFT
|
||||
- NEON optimization
|
||||
- Throughput improvements
|
||||
- FrFT-based processing
|
||||
- Timestamp integration
|
||||
- UDP streaming
|
||||
- Optimization (NEON / vectorization)
|
||||
- Metadata extraction (move complexity to PL)
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user