Skip to content

VideoFrame.to_ndarray() crashes on Windows DirectShow bottom-up bgr24 / rgb24 frames with negative line_size #2213

@haotruongnhat

Description

@haotruongnhat

This seems to happen for bottom-up raw frames from dshow. The returned NumPy array is backed by an invalid buffer description, and the process later crashes when the array is fully read, for example via .copy(). Any suggestion on how to resolve this issue?

Observed frame properties

From the failing frame:

def on_image(image: VideoFrame):
    print(
        "format=", image.format.name,
        "size=", (image.width, image.height),
        "line_size=", image.planes[0].line_size,
        "buffer_ptr=", hex(image.planes[0].buffer_ptr),
        "buffer_size=", image.planes[0].buffer_size,
    )

    buffer = image.to_ndarray(format="bgr24")
    frame = buffer.copy()  # process crashes here

and I got

format= bgr24 size= (3840, 2160) line_size= -11520 buffer_ptr= 0x147a14d9380 buffer_size= 24883200

Environment

  • OS: Windows
  • Input: DirectShow (dshow)
  • Frame format: bgr24
  • Resolution: 3840x2160

For 3840x2160 in bgr24, the expected row size is:

3840 * 3 = 11520 bytes

So line_size = -11520 indicates a bottom-up packed frame with negative stride.

Important observation

If I insert an FFmpeg vflip filter before the callback, the crash disappears.

Behavior I observed:

  • DirectShow bgr24 frame without vflip: crashes
  • Same device and payload, then use to_ndarray(format="rgb24") then flip the channel rgb = bgr[:, :, ::-1]: works
  • Same device and payload with vflip: works

This suggests the problem is not the stream metadata itself, but the memory layout of the original bottom-up packed frame. vflip likely materializes a new frame with a normal top-down layout.

Expected behavior

VideoFrame.to_ndarray() should safely handle packed rgb24 / bgr24 frames with negative linesize, either by:

  • allocating and copying rows manually in the correct order, or
  • avoiding the no-copy path when line_size < 0

Actual behavior

The process crashes when the ndarray is read, often at:

buffer.copy()

Additional note

Nature FFmpeg does not support RGB/BGR buffer on Windows. Therefore, I have used this patch for pyav-ffmpeg build

diff --git a/libavdevice/dshow.c b/libavdevice/dshow.c
index 6e97304..6a55584 100644
--- a/libavdevice/dshow.c
+++ b/libavdevice/dshow.c
@@ -56,6 +56,10 @@
 #   define AMCONTROL_COLORINFO_PRESENT 0x00000080 // if set, indicates DXVA color info is present in the upper (24) bits of the dwControlFlags
 #endif
 
+#define DSHOW_MEDIASUBTYPE_RGB565 0xe436eb7b
+#define DSHOW_MEDIASUBTYPE_RGB555 0xe436eb7c
+#define DSHOW_MEDIASUBTYPE_RGB24  0xe436eb7d
+#define DSHOW_MEDIASUBTYPE_RGB32  0xe436eb7e
 
 static enum AVPixelFormat dshow_pixfmt(DWORD biCompression, WORD biBitCount)
 {
@@ -76,10 +80,33 @@ static enum AVPixelFormat dshow_pixfmt(DWORD biCompression, WORD biBitCount)
             case 32:
                 return AV_PIX_FMT_0RGB32;
         }
+    case DSHOW_MEDIASUBTYPE_RGB565:
+        return AV_PIX_FMT_RGB565;
+    case DSHOW_MEDIASUBTYPE_RGB555:
+        return AV_PIX_FMT_RGB555;
+    case DSHOW_MEDIASUBTYPE_RGB24:
+        return AV_PIX_FMT_BGR24;
+    case DSHOW_MEDIASUBTYPE_RGB32:
+        return AV_PIX_FMT_0RGB32;
     }
     return avpriv_pix_fmt_find(PIX_FMT_LIST_RAW, biCompression); // all others
 }
 
+static int dshow_is_bottomup_rgb(DWORD biCompression)
+{
+    switch (biCompression) {
+    case BI_RGB:
+    case BI_BITFIELDS:
+    case DSHOW_MEDIASUBTYPE_RGB565:
+    case DSHOW_MEDIASUBTYPE_RGB555:
+    case DSHOW_MEDIASUBTYPE_RGB24:
+    case DSHOW_MEDIASUBTYPE_RGB32:
+        return 1;
+    default:
+        return 0;
+    }
+}
+
 static enum AVColorRange dshow_color_range(DXVA2_ExtendedFormat *fmt_info)
 {
     switch (fmt_info->NominalRange)
@@ -1581,7 +1608,7 @@ dshow_add_device(AVFormatContext *avctx,
         par->codec_type = AVMEDIA_TYPE_VIDEO;
         par->width      = fmt_info->width;
         par->height     = fmt_info->height;
-        par->codec_tag  = bih->biCompression;
+        par->codec_tag  = fmt_info->pix_fmt == AV_PIX_FMT_NONE ? bih->biCompression : 0;
         par->format     = fmt_info->pix_fmt;
         if (bih->biCompression == MKTAG('H', 'D', 'Y', 'C')) {
             av_log(avctx, AV_LOG_DEBUG, "attempt to use full range for HDYC...\n");
@@ -1594,7 +1621,7 @@ dshow_add_device(AVFormatContext *avctx,
         par->chroma_location = fmt_info->chroma_loc;
         par->codec_id = fmt_info->codec_id;
         if (par->codec_id == AV_CODEC_ID_RAWVIDEO) {
-            if (bih->biCompression == BI_RGB || bih->biCompression == BI_BITFIELDS) {
+            if (dshow_is_bottomup_rgb(bih->biCompression)) {
                 par->bits_per_coded_sample = bih->biBitCount;
                 if (par->height < 0) {
                     par->height *= -1;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions