docs: update documentation for capture redesign and validation
This commit is contained in:
27
README.md
27
README.md
@@ -11,21 +11,34 @@ The system implements a high-throughput signal chain in the FPGA (PL) and perfor
|
|||||||
## Current Status
|
## Current Status
|
||||||
|
|
||||||
- Tx subsystem: LFM pulse generator (DDS-based, complex output)
|
- Tx subsystem: LFM pulse generator (DDS-based, complex output)
|
||||||
- Rx subsystem: fully functional channelizer pipeline (PFB-based)
|
- Rx subsystem: fully functional channelizer pipeline (PFB-based) or bypass
|
||||||
- PL → PS interface: AXI4-Stream + DMA operational
|
- PL → PS interface: AXI4-Stream + DMA operational
|
||||||
- PS processing: frame-based algorithm (RMS + peak detection)
|
- PS processing: frame-based algorithm on a Data Process Window (DPW)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## System Architecture
|
## System Architecture
|
||||||
|
|
||||||
ADC → Channelizer (PFB, 512 bins)
|
Tx (PL)
|
||||||
→ FFT_Capture (frame control)
|
→ Waveform Generator (LFM / CW / Pulsed)
|
||||||
→ FIFO Serializer (4 FIFOs → 1 stream)
|
→ DAC
|
||||||
→ AXI4-Stream (uint64)
|
→ RF Loopback / Input
|
||||||
|
|
||||||
|
Rx (PL)
|
||||||
|
→ ADC
|
||||||
|
→ Channelizer (PFB, 512 bins) / Bypass / Counter
|
||||||
|
→ Capture (frame control)
|
||||||
|
→ AXI4-Stream (128-bit, 4 samples/clock)
|
||||||
→ DMA (S2MM)
|
→ DMA (S2MM)
|
||||||
→ PS Memory
|
→ PS Memory
|
||||||
→ Processor Algorithm
|
→ Processor Algorithm
|
||||||
|
|
||||||
|
Post Processing (PS)
|
||||||
|
→ Triggered Capture
|
||||||
|
→ Sample Unpacking (I/Q)
|
||||||
|
→ Data Reshaping → [FrameSize x nFrames x nTriggers]
|
||||||
|
→ Host Communication / Processing / Visualization
|
||||||
|
→ One DPW is a windows of FrameSize x nFrames samples
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -6,11 +6,9 @@
|
|||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
The Rx subsystem implements a **polyphase filter bank (PFB) channelizer** followed by FFT processing.
|
The Rx subsystem implements a **polyphase filter bank (PFB) channelizer** followed by FFT processing, a **bypass path**, and a **multi-frame capture pipeline**.
|
||||||
|
|
||||||
It converts wideband ADC input into frequency-domain channels and streams the result to the PS.
|
It converts wideband ADC input into frequency-domain channels (or raw samples via bypass) and streams the result to the PS.
|
||||||
|
|
||||||
A **bypass path** is also available for raw data inspection and debugging.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -18,67 +16,46 @@ A **bypass path** is also available for raw data inspection and debugging.
|
|||||||
|
|
||||||
### Channelizer Path (default)
|
### Channelizer Path (default)
|
||||||
|
|
||||||
ADC
|
ADC
|
||||||
↓
|
↓
|
||||||
PFB Channelizer (Decimation + Filtering)
|
PFB Channelizer (Decimation + Filtering)
|
||||||
↓
|
↓
|
||||||
FFT (512 bins)
|
FFT (512 bins)
|
||||||
↓
|
↓
|
||||||
FFT Capture
|
Capture (frame control)
|
||||||
↓
|
↓
|
||||||
FIFO Serializer (4 → 1)
|
AXI4-Stream (128-bit, 4 samples/clock)
|
||||||
↓
|
↓
|
||||||
AXI4-Stream
|
|
||||||
↓
|
|
||||||
DMA
|
DMA
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Bypass Path (Debug / Raw Data)
|
### Bypass Path (Debug / Raw Data)
|
||||||
|
|
||||||
ADC
|
ADC
|
||||||
↓
|
↓
|
||||||
Bypass Path
|
Bypass Path
|
||||||
↓
|
↓
|
||||||
FIFO / Serializer
|
Capture (frame control)
|
||||||
↓
|
↓
|
||||||
AXI4-Stream
|
AXI4-Stream (128-bit, 4 samples/clock)
|
||||||
↓
|
↓
|
||||||
DMA
|
DMA
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Bypass Functionality
|
## Capture Pipeline
|
||||||
|
|
||||||
The bypass allows direct observation of the input signal without channelization.
|
- Multi-frame acquisition (configurable nFrames)
|
||||||
|
- Frame size: 512 samples
|
||||||
### Purpose
|
- Supports asynchronous capture start (not frame-aligned)
|
||||||
|
- TLAST asserted at frame boundaries
|
||||||
- Debugging and validation
|
|
||||||
- Access to raw ADC-domain data
|
|
||||||
- Comparison with channelized output
|
|
||||||
- Verification of downstream processing
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Behavior
|
### Behavior
|
||||||
|
|
||||||
- Input data is routed directly to output
|
- First frame may be partial
|
||||||
- No filtering or FFT applied
|
- Frames may contain ≤ 2 frame indices (expected)
|
||||||
- Maintains same output interface (AXI4-Stream)
|
- DPW spans nFrames frames but covers nFrames + 1 frame regions
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Selection Mechanism
|
|
||||||
|
|
||||||
A selector signal chooses between:
|
|
||||||
|
|
||||||
- Channelizer output (normal operation)
|
|
||||||
- Bypass output (raw data)
|
|
||||||
|
|
||||||
Implementation typically uses:
|
|
||||||
- Parallel paths
|
|
||||||
- Output switching logic
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -86,22 +63,19 @@ Implementation typically uses:
|
|||||||
|
|
||||||
### ADC Input
|
### ADC Input
|
||||||
- Sampling rate: 4096 MSPS
|
- Sampling rate: 4096 MSPS
|
||||||
- Data type: **fixdt(1,16,15)** (Q1.15 format)
|
- Data type: **fixdt(1,16,15)** (Q1.15)
|
||||||
|
|
||||||
### PFB Channelizer
|
### PFB Channelizer
|
||||||
- Decimation: 8
|
- Decimation: 8
|
||||||
- Effective bandwidth: 512 MHz
|
- Effective bandwidth: 512 MHz
|
||||||
- Input and internal scaling aligned to Q1.15 domain
|
|
||||||
|
|
||||||
### FFT
|
### FFT
|
||||||
- Size: 512
|
- Size: 512
|
||||||
- Produces frequency bins
|
- Produces frequency bins
|
||||||
|
|
||||||
### FFT Capture
|
### Capture
|
||||||
- Controls frame boundaries
|
- Defines frame boundaries (512 samples)
|
||||||
|
- Generates TLAST
|
||||||
### FIFO Serializer
|
|
||||||
- Converts parallel streams into single stream
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -109,62 +83,57 @@ Implementation typically uses:
|
|||||||
|
|
||||||
### System Standardization
|
### System Standardization
|
||||||
|
|
||||||
The signal chain was standardized to a **Q1.15 fixed-point format (fixdt(1,16,15))**:
|
- End-to-end Q1.15 (**fixdt(1,16,15)**)
|
||||||
|
|
||||||
- DAC output uses Q1.15
|
|
||||||
- ADC input is reinterpreted as Q1.15 (Same Stored Integer)
|
|
||||||
- Channelizer input operates in this normalized domain
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Channelizer Output Scaling
|
### Channelizer Output Scaling
|
||||||
|
|
||||||
- Native channelizer output: **sFix25_En23**
|
- Native: **sFix25_En23**
|
||||||
- Rescaled and quantized to: **fixdt(1,16,15)**
|
- Quantized to: **fixdt(1,16,15)** (round + saturate)
|
||||||
|
|
||||||
This conversion:
|
|
||||||
|
|
||||||
- Preserves signal dynamic range
|
|
||||||
- Maximizes fractional precision
|
|
||||||
- Uses rounding and saturation
|
|
||||||
- Aligns with system-wide numeric format
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Data Width Reduction
|
## Data Packing (Updated)
|
||||||
|
|
||||||
- Previous format: **50 bits per complex sample** (25 bits real + 25 bits imag)
|
- 4 samples per clock
|
||||||
- New format: **32 bits per complex sample** (16 bits real + 16 bits imag)
|
- Each sample: complex (16-bit real + 16-bit imag)
|
||||||
|
- Packed into **128-bit AXI4-Stream word**
|
||||||
|
|
||||||
Benefits:
|
Benefits:
|
||||||
|
- Matches datapath parallelism
|
||||||
- Reduced AXI bandwidth
|
- Efficient DMA transfers
|
||||||
- Reduced FIFO usage
|
- Eliminates need for serializer stage
|
||||||
- More efficient DMA transfers
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## AXI4-Stream Output
|
## AXI4-Stream Output
|
||||||
|
|
||||||
- Data type: uint32 (packed complex: 16-bit real + 16-bit imag)
|
- Width: 128 bits
|
||||||
|
- Contains 4 complex samples per cycle
|
||||||
- TLAST = frame boundary
|
- TLAST = frame boundary
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Data Format
|
## Debug / Validation Features
|
||||||
|
|
||||||
- Frame size: 512 samples
|
A counter-based debug mode is implemented:
|
||||||
- Complex samples packed into 32-bit words
|
|
||||||
|
- Real part → sample counter (0..511)
|
||||||
|
- Imag part → frame index
|
||||||
|
|
||||||
|
Used to validate:
|
||||||
|
- Sample continuity
|
||||||
|
- Frame boundaries
|
||||||
|
- DMA ordering and integrity
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Key Characteristics
|
## Key Characteristics
|
||||||
|
|
||||||
- Fully streaming pipeline
|
- Fully streaming pipeline
|
||||||
- High throughput
|
|
||||||
- Deterministic latency
|
- Deterministic latency
|
||||||
- Consistent fixed-point scaling (Q1.15 end-to-end)
|
- High throughput (4 samples/clock)
|
||||||
- Supports dual-mode operation (channelizer / bypass)
|
- Dual-mode operation (channelizer / bypass)
|
||||||
|
- Validated up to nFrames = 1024
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
# 🧠 PS Subsystem (Control + Processing)
|
# 🧠 PS Subsystem (Control + Capture + Processing)
|
||||||
|
|
||||||
[🏠 Project Home](../README.md)
|
[🏠 Project Home](../README.md)
|
||||||
|
|
||||||
@@ -8,73 +8,128 @@
|
|||||||
|
|
||||||
The PS subsystem is responsible for:
|
The PS subsystem is responsible for:
|
||||||
|
|
||||||
|
- System initialization
|
||||||
- Configuring PL subsystems
|
- Configuring PL subsystems
|
||||||
|
- Triggering captures
|
||||||
- Receiving data via DMA
|
- Receiving data via DMA
|
||||||
- Performing frame-based processing
|
- Preparing data for processing and visualization
|
||||||
|
|
||||||
|
The current implementation acts as a **placeholder for post-processing**, focusing on reliable data acquisition and host interaction.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Responsibilities
|
## Responsibilities
|
||||||
|
|
||||||
### Control
|
### Control & Initialization
|
||||||
|
|
||||||
- Writes parameters to PL registers:
|
- Configure PL parameters:
|
||||||
- Tx generator configuration
|
- Tx waveform configuration
|
||||||
- Generates TxPulseStart trigger
|
- Capture parameters (nFrames, etc.)
|
||||||
|
- Initialize DMA and memory buffers
|
||||||
|
- Manage system startup
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Trigger & Capture
|
||||||
|
|
||||||
|
- Generates capture trigger (software-controlled)
|
||||||
|
- Controls DPW acquisition timing
|
||||||
|
- Each trigger initiates one DPW capture
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### DMA Handling
|
### DMA Handling
|
||||||
|
|
||||||
- AXI4-Stream → DMA (S2MM)
|
- AXI4-Stream → DMA (S2MM)
|
||||||
- Data stored in PS DDR
|
- Receives **128-bit stream** (4 samples per clock)
|
||||||
|
- Stores data in PS DDR memory
|
||||||
|
|
||||||
Configuration:
|
Configuration:
|
||||||
- Frame size: 512
|
- Frame size: 512 samples
|
||||||
- Buffers: 16
|
- nFrames: configurable (validated up to 1024)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Processing Pipeline
|
## Data Format
|
||||||
|
|
||||||
DMA → uint64[512]
|
### Raw DMA Data
|
||||||
→ unpack real/imag
|
|
||||||
→ convert to complex
|
- Packed complex samples
|
||||||
→ RMS + peak detection
|
- 16-bit real + 16-bit imag per sample
|
||||||
|
- 4 samples per 128-bit word
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Processing Representation
|
||||||
|
|
||||||
|
Data is unpacked and reshaped into:
|
||||||
|
|
||||||
|
```
|
||||||
|
[FrameSize x nFrames x nTriggers]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Processing Pipeline (Current)
|
||||||
|
|
||||||
|
DMA
|
||||||
|
→ Unpack samples (I/Q separation)
|
||||||
|
→ Convert to complex representation
|
||||||
|
→ Reshape into 3D structure
|
||||||
|
→ Visualization / basic analysis
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Validation Support
|
||||||
|
|
||||||
|
Uses counter-based validation:
|
||||||
|
|
||||||
|
- Real part → sample counter
|
||||||
|
- Imag part → frame index
|
||||||
|
|
||||||
|
Enables verification of:
|
||||||
|
|
||||||
|
- Data continuity
|
||||||
|
- Frame alignment
|
||||||
|
- Correct ordering from DMA
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Execution Model
|
## Execution Model
|
||||||
|
|
||||||
- Event-driven (DMA trigger)
|
- Triggered (event-based)
|
||||||
- No buffering queue
|
- Burst capture (DPW)
|
||||||
- Frames may be dropped
|
- Not continuous real-time streaming
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Performance Notes
|
## Performance Notes
|
||||||
|
|
||||||
- Bottleneck: unpacking + conversion
|
- Designed for correctness and validation (not optimized)
|
||||||
- Cannot sustain full-rate input
|
- Bottleneck: unpacking + data movement
|
||||||
|
- Full-rate continuous processing not supported
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Interaction with PL
|
## Role in System
|
||||||
|
|
||||||
### Tx Control
|
The PS currently serves as:
|
||||||
- Low-rate trigger (~Hz)
|
|
||||||
- Starts burst generation
|
|
||||||
|
|
||||||
### Rx Data
|
- Control interface
|
||||||
- Continuous high-rate stream
|
- Data acquisition manager
|
||||||
|
- Pre-processing stage
|
||||||
|
|
||||||
|
Future implementations will replace the current processing with advanced algorithms (e.g., FrFT).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Future Work
|
## Future Work
|
||||||
|
|
||||||
- Replace processing with FrFT
|
- FrFT-based processing
|
||||||
- NEON optimization
|
- Timestamp integration
|
||||||
- Throughput improvements
|
- UDP streaming
|
||||||
|
- Optimization (NEON / vectorization)
|
||||||
|
- Metadata extraction (move complexity to PL)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user