Class ComputeParallelReduction

Synopsis

#include <Source/Falcor/Utils/Algorithm/ComputeParallelReduction.h>

class dlldecl ComputeParallelReduction : public std::enable_shared_from_this<ComputeParallelReduction>

Description

Class that performs parallel reduction over all pixels in a texture.

The reduction is done on recursively on blocks of n = 1024 elements. The total number of iterations is ceil(log2(N)/10), where N is the total number of elements (pixels).

The numerical error for the summation operation lies between pairwise summation (blocks of size n = 2) and naive running summation.

Inheritance

Ancestors: std::enable_shared_from_this< ComputeParallelReduction >

Methods

~ComputeParallelReduction
createCreate parallel reduction helper.
executePerform parallel reduction

Source

Lines 45-99 in Source/Falcor/Utils/Algorithm/ComputeParallelReduction.h.

class dlldecl ComputeParallelReduction : public std::enable_shared_from_this<ComputeParallelReduction>
{
public:
    using SharedPtr = std::shared_ptr<ComputeParallelReduction>;
    using SharedConstPtr = std::shared_ptr<const ComputeParallelReduction>;
    virtual ~ComputeParallelReduction() = default;
    enum class Type
    {
        Sum,
        MinMax,
    };
    /** Create parallel reduction helper.
        \return Created object, or an exception is thrown on failure.
    */
    static SharedPtr create();
    /** Perform parallel reduction.
        The computations are performed in type T, which must be compatible with the texture format:
        - float4 for floating-point texture formats (float, snorm, unorm).
        - uint4 for unsigned integer texture formats.
        - int4 for signed integer texture formats.
        For the Sum operation, unused components are set to zero if texture format has < 4 components.
        For performance reasons, it is advisable to store the result in a buffer on the GPU,
        and then issue an asynchronous readback in user code to avoid a full GPU flush.
        The size of the result buffer depends on the executed operation:
        - Sum needs 16B
        - MinMax needs 32B
        \param[in] pRenderContext The render context.
        \param[in] pInput Input texture.
        \param[in] operation Reduction operation.
        \param[out] pResult (Optional) The result of the reduction operation is stored here if non-nullptr. Note that this requires a GPU flush!
        \param[out] pResultBuffer (Optional) Buffer on the GPU to which the result is copied (16B or 32B).
        \param[out] resultOffset (Optional) Byte offset into pResultBuffer to where the result should be stored.
        \return True if successful, false if an error occured.
    */
    template<typename T>
    bool execute(RenderContext* pRenderContext, const Texture::SharedPtr& pInput, Type operation, T* pResult = nullptr, Buffer::SharedPtr pResultBuffer = nullptr, uint64_t resultOffset = 0);
private:
    ComputeParallelReduction();
    void allocate(uint32_t elementCount, uint32_t elementSize);
    ComputeState::SharedPtr             mpState;
    ComputeProgram::SharedPtr           mpInitialProgram;
    ComputeProgram::SharedPtr           mpFinalProgram;
    ComputeVars::SharedPtr              mpVars;
    Buffer::SharedPtr                   mpBuffers[2];       ///< Intermediate buffers for reduction iterations.
};





Add Discussion as Guest

Log in to DocsForge