[gs-bugs] [Bug 698759] - MuPDF - uncaught exception: non-page object in page tree

Discussion:

b***@artifex.com

2017-11-17 15:50:33 UTC

http://bugs.ghostscript.com/show_bug.cgi?id=698759

Bug ID: 698759
Summary: uncaught exception: non-page object in page tree
Product: MuPDF
Version: 1.11
Hardware: PC
OS: Windows NT
Status: UNCONFIRMED
Severity: normal
Priority: P4
Component: mupdf
Assignee: mupdf-***@artifex.com
Reporter: ***@outlook.de
QA Contact: gs-***@ghostscript.com
Word Size: ---

Function "pdf_load_page_tree" in file pdf_page.c does not catch any exceptions,
and function "pdf_load_page_tree_imp" does not catch all that are possible,
e.g. fz_throw in line 58 is uncaught.

--
You are receiving this mail because:
You are the QA Contact for the bug.

b***@artifex.com

2017-11-17 23:21:28 UTC

Permalink

http://bugs.ghostscript.com/show_bug.cgi?id=698759

Tor Andersson <***@artifex.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@artifex.com

--- Comment #1 from Tor Andersson <***@artifex.com> ---
That sounds like a broken file. Please attach the problematic file and we can
take a look.

--
You are receiving this mail because:
You are the QA Contact for the bug.

b***@artifex.com

2017-11-19 00:39:56 UTC

Permalink

http://bugs.ghostscript.com/show_bug.cgi?id=698759

--- Comment #2 from Jorj <***@outlook.de> ---
Created attachment 14469
--> http://bugs.ghostscript.com/attachment.cgi?id=14469&action=edit
a damaged pdf (tail cut off)

This sure *is* a corrupt PDF. As a co-author of a Python binding to MuPDF, it
is vital that all exceptions in fz_open_document are caught and convertible to
Python exceptions, so that the Python interpreter is not crashing.
This corrupted document has no valid / recoverable page tree and brings down
the Python environment with an uncaught exception.
When I insert fz_try(ctx) / fz_catch(ctx) in "fz_load_page_tree" of pdf_page.c,
then the PDF is "successfully" opened with zero pages ... thus creating a
recoverable situation.
Even more: images can be successfully extracted without any accessable page via
running through the XREF table ...

--
You are receiving this mail because:
You are the QA Contact for the bug.

b***@artifex.com

2017-11-20 14:56:02 UTC

Permalink

http://bugs.ghostscript.com/show_bug.cgi?id=698759

--- Comment #3 from Tor Andersson <***@artifex.com> ---
The only way it would be able to pull down the python environment is if you're
calling MuPDF functions without guarding them with fz_try/fz_catch. Every
function that takes a fz_context is allowed to throw an exception.

If you're writing Python bindings for MuPDF, I *strongly* suggest you bracket
every single MuPDF function call with fz_try, and recast the fitz exception to
a native Python exception in the fz_catch.

You should be doing something like this for every function you bind:

fz_try(ctx)
doc = fz_open_document(ctx, path);
fz_catch(ctx) {
PyErr_SetString(MuPDFError, fz_caught_message(ctx));
return;
}

See the bindings in source/tools/murun.c and platform/java/mupdf_native.c for
examples of doing that for JavaScript and Java.

--
You are receiving this mail because:
You are the QA Contact for the bug.

b***@artifex.com

2017-11-21 14:04:26 UTC

Permalink

http://bugs.ghostscript.com/show_bug.cgi?id=698759

Jorj <***@outlook.de> changed:

What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Jorj <***@outlook.de> ---
(In reply to Tor Andersson from comment #3)

Post by b***@artifex.com
The only way it would be able to pull down the python environment is if
you're calling MuPDF functions without guarding them with fz_try/fz_catch.
Every function that takes a fz_context is allowed to throw an exception.
If you're writing Python bindings for MuPDF, I *strongly* suggest you
bracket every single MuPDF function call with fz_try, and recast the fitz
exception to a native Python exception in the fz_catch.
fz_try(ctx)
doc = fz_open_document(ctx, path);
fz_catch(ctx) {
PyErr_SetString(MuPDFError, fz_caught_message(ctx));
return;
}
See the bindings in source/tools/murun.c and platform/java/mupdf_native.c
for examples of doing that for JavaScript and Java.

Thank you for time!
Of course I'm doing what you suggest ... at least I thought so. However, your
advice gave me a kick to take a yet closer look:
This PDF is indeed opened, though with errors. My call to "fz_load_outline" was
not wrapped into a fz_try / fz_catch sequence.
Thanks again.

--
You are receiving this mail because:
You are the QA Contact for the bug.