Discussion:
[gs-bugs] [Bug 696765] - Ghostscript - Support SOURCE_DATE_EPOCH for reproducible builds
b***@artifex.com
2016-05-10 09:51:19 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

Bug ID: 696765
Summary: Support SOURCE_DATE_EPOCH for reproducible builds
Product: Ghostscript
Version: unspecified
Hardware: All
URL: https://wiki.debian.org/ReproducibleBuilds/TimestampsP
roposal
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: P4
Component: General
Assignee: ghostpdl-***@artifex.com
Reporter: ***@pwned.gg
QA Contact: gs-***@ghostscript.com
Word Size: ---

Created attachment 12529
--> http://bugs.ghostscript.com/attachment.cgi?id=12529&action=edit
Allow the build timestamp to be externally set

Hi, we at the Reproducible Builds project have developed a standard for build
tools to follow if they wish to support exact bitwise reproducible output.
Bitwise reproducibility is essential for automatically verifying that multiple
builders reached the same result, since (for example) it is impossible to
develop a general algorithm to say that two different timestamps embedded in
*arbitrary* code or data actually "mean" the same thing.

Attached is a patch to make ghostscript support the SOURCE_DATE_EPOCH
environment variable. When set, all references to the "current" date/time in
the build output will instead refer to this date, which is the number of
seconds (excluding leap seconds) since the Unix epoch (1970-01-01 UTC in the
Gregorian calendar). We have already been using this in Debian with success at
making ghostscript generate bitwise reproducible output.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-10 10:30:01 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #1 from Ken Sharp <***@artifex.com> ---
I think there is, at the least, some confusion over terminology here. You refer
to the 'build timestamp', yet you are modifying the current time function.

We don't have a 'build timestamp', the closest thing in a Ghostscript build
would be the release time stamp, which is compiled in a string.

So I'm going to conclude that you are referring to the Creation and
Modification dates in PDF output exclusively, and have not considered the
rendering of PostScript files. PostScript has functions to access the real time
clock, and your patch would break that. Some standard PostScript test suites
use the clock functions, and print the result on the output, and use two calls
to the clock to determine elapsed time, which is also printed. At least one
well known PostScript test file fails (undefined result, ie divide by zero) if
the two times are identical.

In addition your patch only addresses Linux, we would need to concoct similar
code for all the platform-specific files, or risk causing confusion by having
different behaviour on different platforms.

So, assuming that your concern is solely the date and time stamps in the PDF
files produced by the pdfwrite device; we are not happy about producing files
which lie about the time. We are prepared to consider a command line option to
prevent the inclusion of the CreationDate and ModDate, and the XML Metadata, as
an enhancement if this is sufficient for your purposes.

Of course, it seems to us that this would be failing part of your objective,
since you won't be testing the time functions in different builds in this way,
but that is going to be true no matter what you do if you require that the
creation dates be the same.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-10 11:47:52 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #2 from ***@pwned.gg ---
Whoops, sorry about the jargon. You are right, "build timestamp" refers to the
context where ghostscript is used as part of a build process - i.e. to generate
some output from some source code, which is meant to be consumed later by a
reader.

I'm not familiar with PostScript and sorry that the patch omits things. I was
taking over the work from someone else and didn't review it too closely. But
we'd be happy to improve it to the standards that you need.

Regarding "PostScript has functions to access the real time clock", there are
two things (from my side, being unfamiliar with the format) that this could
mean:

1. When the reader reads the end result X.ps, this dynamically adds the time of
reading into the displayed file.
2. When the builder builds X.ps from X.src, this dynamically embeds the time of
build into X.ps, and future readers of X.ps see this as a static piece of
information.

(1) is fine for Reproducible Builds as presumably this would be a call to a
function, and this call itself would be represented as a static string of bytes
in X.ps.

However, the whole point of SOURCE_DATE_EPOCH is that (2) is really not what
people *actually mean* in practice, and people only historically used it
because they didn't have the better alternative of SOURCE_DATE_EPOCH available.
From this perspective, you would not be "[lying] about the time" - the effect
would be roughly the same if the build machine had set its own system time
clock back to that date. It is not ghostscript's job to override the intentions
of the system administrator, and similarly it is not ghostscript's job to judge
that "someone who sets SOURCE_DATE_EPOCH is lying about the time" and ignore it
for that reason.

(The reason we have SOURCE_DATE_EPOCH is that in practise setting the system
time breaks some other behaviours, and doesn't work in the case where e.g. your
build process takes between 500 and 508 seconds and you generate X.ps in the
last 90% percent of the build process.)
We are prepared to consider a command line option to prevent the inclusion of the CreationDate and ModDate, and the XML Metadata, as an enhancement if this is sufficient for your purposes.
This would be a less-than-ideal alternative - the reason we came up with
SOURCE_DATE_EPOCH is so that builders wouldn't need to hard-code tool-specific
command line options everywhere. For example, GCC accepted our patches recently
and GCC 7+ will honour SOURCE_DATE_EPOCH for the __TIME__ and __DATE__ macros.
Other documentation generators like doxygen and sphinx have also accepted our
patches.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-10 12:02:11 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #3 from Ken Sharp <***@artifex.com> ---
(In reply to infinity0 from comment #2)
Post by b***@artifex.com
Whoops, sorry about the jargon. You are right, "build timestamp" refers to
the context where ghostscript is used as part of a build process - i.e. to
generate some output from some source code, which is meant to be consumed
later by a reader.
SO, not simply confined to PDF then, but any input to any output format ?
Post by b***@artifex.com
Regarding "PostScript has functions to access the real time clock", there
are two things (from my side, being unfamiliar with the format) that this
1. When the reader reads the end result X.ps, this dynamically adds the time
of reading into the displayed file.
2. When the builder builds X.ps from X.src, this dynamically embeds the time
of build into X.ps, and future readers of X.ps see this as a static piece of
information.
Neither, the PostScript program requests the date/time and either manipulates
it, printing some result which is determined by the date/time, or simply prints
the date/time as part of the output.

This is common practice, for example, in the Quality Logic test suite.
Post by b***@artifex.com
1. When the reader reads the end result X.ps, this dynamically adds the time >of reading into the displayed file.
2. When the builder builds X.ps from X.src, this dynamically embeds the time >of build into X.ps, and future readers of X.ps see this as a static piece of >information.
(1) is fine for Reproducible Builds as presumably this would be a call to a
function, and this call itself would be represented as a static string of
bytes in X.ps.
I'm not discussing creating a PostScript output file, I'm talking about
executing a PostScript program. So the content of x.ps isn't really relevant.
Post by b***@artifex.com
We are prepared to consider a command line option to prevent the inclusion of the CreationDate and ModDate, and the XML Metadata, as an enhancement if this is sufficient for your purposes.
This would be a less-than-ideal alternative - the reason we came up with
SOURCE_DATE_EPOCH is so that builders wouldn't need to hard-code
tool-specific command line options everywhere. For example, GCC accepted our
patches recently and GCC 7+ will honour SOURCE_DATE_EPOCH for the __TIME__
and __DATE__ macros. Other documentation generators like doxygen and sphinx
have also accepted our patches.
The 'builders' will need to hard code Ghostscript-specific command line options
already, you won't get anything usable if you don't, and you would want to
specify many options quite carefully or there is a significant likelihood that
identical builds on different machines will produce different output (for
example different values from libpaper).

Given that you need to specify options to Ghostscript already, it doesn't seem
onerous to require a specific request to disable the production of timestamps
in a PDF file.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-10 15:53:29 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #4 from ***@pwned.gg ---
(In reply to Ken Sharp from comment #3)
Post by b***@artifex.com
(In reply to infinity0 from comment #2)
Post by b***@artifex.com
Whoops, sorry about the jargon. You are right, "build timestamp" refers to
the context where ghostscript is used as part of a build process - i.e. to
generate some output from some source code, which is meant to be consumed
later by a reader.
SO, not simply confined to PDF then, but any input to any output format ?
[..]
I'm not discussing creating a PostScript output file, I'm talking about
executing a PostScript program. So the content of x.ps isn't really relevant.
I've reviewed a bit more of our notes and hopefully understand the situation a
bit better. Yes, it looks like our patch was only about PDF output [1] but in
theory should apply to any output format that Ghostscript supports - let me
know if we should extend this to things other than PDF.

Regarding PS, if I understand you correctly, then by "PostScript program" you
mean {a .ps file which contains a static sequence of bytes that means "get the
current date/time"}. If that's the case, then yes indeed this won't affect R-B
and is outside of the scope of our discussion; sorry for the confusion.

But now I think I understand your point: the patch I attached will also affect
this PS behaviour, and I agree that this is not correct; we'll fix it.

[1] https://wiki.debian.org/ReproducibleBuilds/PdfGeneratedByGhostscript
Post by b***@artifex.com
Post by b***@artifex.com
We are prepared to consider a command line option to prevent the inclusion of the CreationDate and ModDate, and the XML Metadata, as an enhancement if this is sufficient for your purposes.
This would be a less-than-ideal alternative [..]
The 'builders' will need to hard code Ghostscript-specific command line
options already, you won't get anything usable if you don't [..] it doesn't
seem onerous to require a specific request to disable the production of
timestamps in a PDF file.
I understand where you're coming from, and yes your suggestion would indeed be
similar to mechanisms like CFLAGS etc. During the build process, an OS
distribution like Debian could supply a default set of GHOSTSCRIPTFLAGS to
disable timestamp creation, and the specific package would append their own
flags as necessary.

However, the real-world situation is that most buildsystems do not have support
for infrastructure like GHOSTSCRIPTFLAGS; we would have to add this everywhere
and in all buildsystems, and do this once for each tool like GhostScript, for a
total cost of O(m*n) (# of buildsystems x # of build tools). But embedded
timestamps is the biggest single issue that blocks Reproducible Builds today
[2], and tools honouring SOURCE_DATE_EPOCH would greatly reduce the cost of
achieving this, to O(n) (# of build tools).

(One alternative, to require every piece of software that uses ghostscript, to
add this flag specifically, would be even higher cost. Most developers should
not need to specifically think about Reproducible Builds, it should Just Work
for them.)

Yes, from your point of view I can understand you don't want to support every
random new environment variable coming along claiming to be special, but we do
have data to back up this claim.

Anyway, if you are not convinced by this then sure, we'll have to change our
patch to implement the command line option instead. Let me know what you prefer
in the end.

[2]
https://wiki.debian.org/ReproducibleBuilds/Howto#Files_in_data.tar_contain_timestamps
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-10 16:13:09 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #5 from Ken Sharp <***@artifex.com> ---
(In reply to infinity0 from comment #4)
Post by b***@artifex.com
I've reviewed a bit more of our notes and hopefully understand the situation
a bit better. Yes, it looks like our patch was only about PDF output [1] but
in theory should apply to any output format that Ghostscript supports - let
me know if we should extend this to things other than PDF.
Your patch would affect the execution of PostScript programs, which is one
reason we're against it.
Post by b***@artifex.com
Regarding PS, if I understand you correctly, then by "PostScript program"
you mean {a .ps file which contains a static sequence of bytes that means
"get the current date/time"}.
PostScript is a programming language, Ghostscript is an interpreter for that
programming language. The language includes means to interrogate the system
clock. The program can then use that information for any purpose it sees fit,
and it can easily be used to control the flow of execution in the program,
resulting in different output.

At heart Ghostscript is intended to take PostScript as an input and produce
raster as an output. PDF input is a recent extension, as is high level (vector)
output such as PDF or PostScript. GhostPCL will take PCL and GhostXPS will take
XPS as an input, and again these use the same graphics library as Ghostscript.
Obviously we have to consider the impact on all input languages and output
formats.
Post by b***@artifex.com
Post by b***@artifex.com
Post by b***@artifex.com
This would be a less-than-ideal alternative [..]
The 'builders' will need to hard code Ghostscript-specific command line
options already, you won't get anything usable if you don't [..] it doesn't
seem onerous to require a specific request to disable the production of
timestamps in a PDF file.
I understand where you're coming from, and yes your suggestion would indeed
be similar to mechanisms like CFLAGS etc.
I think we have crossed wires again I'm afraid. I'm not discussing any kind of
build-time change, such as an alteration to CFLAGS. I'm prepared to implement a
run-time flag, which would disable the part of the PDF output which is causing
you a problem.

Whoever runs the executable in order to test it must supply a bunch of flags to
Ghostscript in order to configure it, so it doesn't seem onerous to have the
user add a flag which omits date/time output from the pdfwrite device's
output,purely for the purpose of this testing.

The CreationDate and ModDate are optional in PDF, and we would prefer to omit
it, rather than produce something which doesn't match the system time.
Post by b***@artifex.com
Yes, from your point of view I can understand you don't want to support
every random new environment variable coming along claiming to be special,
but we do have data to back up this claim.
Anyway, if you are not convinced by this then sure, we'll have to change our
patch to implement the command line option instead. Let me know what you
prefer in the end.
As stated, our preference is to provide a command-line (run-time) option to
omit the CreationDate and ModDate from being written to the output PDF file.
I'm not asking you to write this, I'm offering it as a solution which we will
implement.

I'm not a Linux user myself, but I have discussed this with the other
developers, including our Linux build maintainer, and we are currently not
inclined to take on any patch which interferes with the time operators in
PostScript. For the purposes of producing PDF files which can be simplistically
compared we will implement a control as described, if this is sufficient for
you.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-10 16:54:41 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #6 from ***@pwned.gg ---
(In reply to Ken Sharp from comment #5)
Post by b***@artifex.com
I think we have crossed wires again I'm afraid. I'm not discussing any kind
of build-time change, such as an alteration to CFLAGS. I'm prepared to
implement a run-time flag, which would disable the part of the PDF output
which is causing you a problem.
Whoever runs the executable in order to test it must supply a bunch of flags
to Ghostscript in order to configure it, so it doesn't seem onerous to have
the user add a flag which omits date/time output from the pdfwrite device's
output,purely for the purpose of this testing.
OK, thanks for the explanation of GhostScript; I'll try to explain reproducible
builds a bit better:

When I say "build time", I mean when GhostScript is invoked as part of the
build process of some other project, to build (e.g.) some documentation. So I'm
not talking about GhostScript's own build process, but that of a project that
uses GhostScript.

We at the Reproducible Builds project represent many OS distributions, whose
job it is to package up 10000s of these projects, and make sure that their
build processes produce bit-for-bit identical results. Our goal is to make this
"the default" of buildsystems, so that project developers don't have to
specifically "opt-in" to this security property. "Opt-in" security is not
really security, because people don't want to care about security, and won't
actually "opt-in".

In other words, we would prefer the cost to be zero, rather than merely for it
to be not "onerous". Minor non-onerous costs quickly add up, across all the
10000s of packages that we have to handle. (In fact we would probably just keep
patching ghostscript instead of using this flag, since it's easier than
patching the several dozen projects that use ghostscript.)
Post by b***@artifex.com
As stated, our preference is to provide a command-line (run-time) option to
omit the CreationDate and ModDate from being written to the output PDF file.
I'm not asking you to write this, I'm offering it as a solution which we
will implement.
I'm not a Linux user myself, but I have discussed this with the other
developers, including our Linux build maintainer, and we are currently not
inclined to take on any patch which interferes with the time operators in
PostScript. For the purposes of producing PDF files which can be
simplistically compared we will implement a control as described, if this is
sufficient for you.
Would it be possible to omit CreationDate/ModDate when SOURCE_DATE_EPOCH is
nonempty, *without* requiring an extra command-line flag?

Of course nothing should affect the time operators in PostScript (and it will
probably not affect R-B) - but I'd like to point out that, it's certainly
possible to decouple PDF CreationDate/ModDate from PS time operator
interpretation, so that honouring SOURCE_DATE_EPOCH doesn't affect PostScript
at all.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-10 17:12:01 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #7 from ***@pwned.gg ---
(In reply to infinity0 from comment #6)
Post by b***@artifex.com
Of course nothing should affect the time operators in PostScript (and it
will probably not affect R-B)
Hmm, actually I just reviewed some of our packages and I think I am wrong here.
Some of them use ps2pdf to build pdfs, and (if I understand correctly) this
will translate a dynamic "get current time" PostScript command, execute it,
then embed it as a static date in the resulting PDF?

So for example, readers of the .ps will see different dates if they read it at
different times, but readers of the .pdf will see the date at which ps2pdf was
invoked? Is that correct?
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-10 18:37:20 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #8 from Ken Sharp <***@artifex.com> ---
(In reply to infinity0 from comment #6)
Post by b***@artifex.com
Would it be possible to omit CreationDate/ModDate when SOURCE_DATE_EPOCH is
nonempty, *without* requiring an extra command-line flag?
Not easily, because the pdfwrite device is (or should be!) abstracted from the
OS, so it doesn't use getenv, or have any way to access it. Which is why I
suggest a command line parameter.
Post by b***@artifex.com
Of course nothing should affect the time operators in PostScript (and it
will probably not affect R-B) - but I'd like to point out that, it's
certainly possible to decouple PDF CreationDate/ModDate from PS time
operator interpretation, so that honouring SOURCE_DATE_EPOCH doesn't affect
PostScript at all.
Yes it is possible, and in fact that *is* the way its currently done, but
wouldn't be after your patch :-) Currently the pdfwrite code isn't using the OS
abstracted time function, which it absolutely should be (don't know how that
got missed). After that, the PostScript time operator and the pdfwrite
CreationDate code will use the same code, so if you affect one, you affect
both.

I certainly do intend to alter the way pdfwrite is currently getting the time,
it should be using the abstracted functions.


(In reply to infinity0 from comment #7)
Post by b***@artifex.com
Post by b***@artifex.com
Of course nothing should affect the time operators in PostScript (and it
will probably not affect R-B)
Hmm, actually I just reviewed some of our packages and I think I am wrong
here. Some of them use ps2pdf to build pdfs,
What else would they be using Ghostscript for ?

Note that Ghostscript's PDF interpreter is actually *written* in PostScript. So
even if the input is PDF, you still are using the PostScript interpreter.
Post by b***@artifex.com
and (if I understand correctly)
this will translate a dynamic "get current time" PostScript command, execute
it, then embed it as a static date in the resulting PDF?
Potentially yes, but it can be significantly more complex than that, you could
(dumb example) choose to run a totally different set of routines in the
afternoon to the ones in the morning for example. Of course that would still
produce the same output on 2 machines with the same date/time. The time doesn't
*have* to be written (or rendered) to the output, it can be used like any other
input, to alter the behaviour of the program.

However, as I said, I've seen a widely used test file which fails if two
consecutive calls to the PostScript time function return the same time.

Are you also aware of the PostScript rand operator ? I've also seen a test file
which uses that too, so the output is comparatively non-determinstic (you would
need to ensure that the pseudo random number generator was seeded the same way
each time to get consistent results).

I'm aware that this isn't an issue for your purposes, hut it is for us. The
PostScript interpreter would not be performing as per the specification when
your environment variable is set.
Post by b***@artifex.com
So for example, readers of the .ps will see different dates if they read it
at different times, but readers of the .pdf will see the date at which
ps2pdf was invoked? Is that correct?
The PDF could contain different text represe4nting a date or time (or indeed
anything could change) but yes it will depend on the time when Ghostscript was
executed. Each run of the PostScript program would result in different output,
potentially.

This is a known problem for us with the Quality Logic test suite, where many of
the tests use the time operators to print the date/time or to give an elapsed
time, which is printed on the output.


Seems to me that your best bet is going to be to continue patching Ghostscript.
I will discuss this again with the other developers but I don;t think this is a
route we want to take.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-10 19:35:44 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #9 from ***@pwned.gg ---
(In reply to Ken Sharp from comment #8)
Post by b***@artifex.com
(In reply to infinity0 from comment #6)
Post by b***@artifex.com
Would it be possible to omit CreationDate/ModDate when SOURCE_DATE_EPOCH is
nonempty, *without* requiring an extra command-line flag?
Not easily, because the pdfwrite device is (or should be!) abstracted from
the OS, so it doesn't use getenv, or have any way to access it. Which is why
I suggest a command line parameter.
The code that reads the command line parameter could read the environment
variable instead? At least, I've never seen abstractions that separate these
two things into separate layers.
Post by b***@artifex.com
Are you also aware of the PostScript rand operator ? I've also seen a test
file which uses that too, so the output is comparatively non-determinstic
(you would need to ensure that the pseudo random number generator was seeded
the same way each time to get consistent results).
Yes, we're aware of other sources of non-determinism. However this timestamp
issue is by far the largest issue (as a whole, not just ghostscript), and
further typically when people use it they don't *really* mean "the build time".
So for cost efficiency reasons, we prefer SOURCE_DATE_EPOCH to get
reproducible timestamps, but we're ok with specific patches for other sources
of non-determinism.
Post by b***@artifex.com
Seems to me that your best bet is going to be to continue patching
Ghostscript. I will discuss this again with the other developers but I don;t
think this is a route we want to take.
I understand, no worries.

I've chatted with the rest of the team and have a few further suggestions
Post by b***@artifex.com
However, as I said, I've seen a widely used test file which fails if two consecutive
calls to the PostScript time function return the same time.
[..]
I'm aware that this isn't an issue for your purposes, hut it is for us. The
PostScript interpreter would not be performing as per the specification when
your environment variable is set.
realtime
– realtime int
returns the value of a clock that counts in real time, independently of the exe-
cution of the PostScript interpreter. The clock’s starting value is arbitrary; it has
no defined meaning in terms of calendar time. The unit of time represented by
the realtime value is one millisecond. However, the rate at which it changes is
implementation-dependent. As the time value becomes greater than the largest
integer allowed in a particular implementation, it “wraps” to the smallest (most
negative) integer.
So, this is quite generous, and could be made consistent with
SOURCE_DATE_EPOCH. This definition does not say it has to be consistent with
any external or "real" system clocks (and in fact many kernels offer multiple
clocks such as monotonic wrappers around other clocks). There are a few options
forward:

1. When S_D_E is set, then use this as the starting value of the clock. The
definition above specifically allows this. This doesn't solve R-B if a
particular invocation of ps2pdf has high variance in how long it runs, but see
#2.

2. As per (1), but also simple increment the value by 1 each time realtime is
called, as opposed to using the system clock to measure "milliseconds". This is
a more generous interpretation of "millisecond" but the spec also says "rate at
which it changes is implementation-dependent" so nobody should be relying on
this value to actually represent real millseconds.

Both of these would be a little complex, but we'd be happy to write this if you
don't want to yourselves.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 07:10:05 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #10 from Ken Sharp <***@artifex.com> ---
(In reply to infinity0 from comment #9)
Post by b***@artifex.com
The code that reads the command line parameter could read the environment
variable instead? At least, I've never seen abstractions that separate these
two things into separate layers.
The two are totally different, the command line parameters are parsed off into
PostScript. This is not OS-dependent, so its cross-platform. Environment
variables are OS-specific, so this is all in the platform-specific code.
Post by b***@artifex.com
So, this is quite generous, and could be made consistent with
SOURCE_DATE_EPOCH. This definition does not say it has to be consistent with
any external or "real" system clocks (and in fact many kernels offer
multiple clocks such as monotonic wrappers around other clocks). There are a
1. When S_D_E is set, then use this as the starting value of the clock. The
definition above specifically allows this. This doesn't solve R-B if a
particular invocation of ps2pdf has high variance in how long it runs, but
see #2.
The pdfwrite code is very variable, even on the same machine, in its timings.
Of course loading on the machine also affects this.
Post by b***@artifex.com
2. As per (1), but also simple increment the value by 1 each time realtime
is called, as opposed to using the system clock to measure "milliseconds".
This is a more generous interpretation of "millisecond" but the spec also
says "rate at which it changes is implementation-dependent" so nobody should
be relying on this value to actually represent real millseconds.
Both of these would be a little complex, but we'd be happy to write this if
you don't want to yourselves.
I'm pretty confident we won't adopt this approach, again it affects the
operation of the time functions, and is still more complex. As I said I'll put
it up again for discussion amongst the other developers.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 07:58:33 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

Chris Liddell (chrisl) <***@artifex.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@artifex.com

--- Comment #11 from Chris Liddell (chrisl) <***@artifex.com> ---
Personally, I am wary of something that could easily be seen as enabling
fraudulent information (metadata) to be embedded in a PDF file. I've seen, on
more than one occasion, the CreationDate and ModDate cited as "evidence" (for
example, for timely completion of forms etc).

Whilst it is true that some PDF internal knowledge makes it feasible to change
the dates, thus not exactly reliable evidence, it still feels worrying to be
seen to be condoning the faking of such meta-data.

Hence my suggestion to Ken that we offer to disable the writing of those dates
instead - I would *much* rather see the information not being written than fake
(potentially fraudulent) information being written.

(NOTE: that there is precedence for this type of thing: for example, eexec
encryption for Type1 fonts is almost trivial for most developers to implement,
but we avoid making it easily accessible, since we do not want to be seen to be
enabling theft of glyph outlines).

WRT to specifying extra command line options when gs is used by another package
(either by execing the executable, or calling to the .so library), you can use
the environment variable "GS_OPTIONS" to pass options to any gs instance
executed in that environment - documented here:
http://www.ghostscript.com/doc/9.19/Use.htm#Environment_variables
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 09:31:37 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #12 from ***@pwned.gg ---
(In reply to Chris Liddell (chrisl) from comment #11)
Post by b***@artifex.com
Personally, I am wary of something that could easily be seen as enabling
fraudulent information (metadata) to be embedded in a PDF file. I've seen,
on more than one occasion, the CreationDate and ModDate cited as "evidence"
(for example, for timely completion of forms etc).
Whilst it is true that some PDF internal knowledge makes it feasible to
change the dates, thus not exactly reliable evidence, it still feels
worrying to be seen to be condoning the faking of such meta-data.
Hence my suggestion to Ken that we offer to disable the writing of those
dates instead - I would *much* rather see the information not being written
than fake (potentially fraudulent) information being written.
My text editor does not prevent me from writing "I wrote this on 1901-01-01";
your reasoning here is the same as this. And as I said before, anyone running
the build can set their clock arbitrarily for a similar effect.

Refusing to code software to write a certain pattern of bits *is not security*.
Even if *you* don't write this code, someone with a reason to write this
information - such as us, the R-B people - will write this code. It is not
"fraudulent" and I'm a little offended of this association.

It is these sorts of "false security" arguments propagating that make
non-technical people think software in general is more secure than it really
is. Securely stating the time would require some sort of cryptographic ledger
protocol to link events on a global scale. For example bitcoin can be thought
of as providing this security property.

Plain standalone timestamps inherently are not protectable by any mechanism,
and just because some court thought so in a particular scenario with extra
constraints that we don't know about, does not mean that software developers
can or should assume this is OK for all scenarios.
Post by b***@artifex.com
WRT to specifying extra command line options when gs is used by another
package (either by execing the executable, or calling to the .so library),
you can use the environment variable "GS_OPTIONS" to pass options to any gs
http://www.ghostscript.com/doc/9.19/Use.htm#Environment_variables
The issue here is that then we would have to add GS-specific settings to get
the same effect. The point of SOURCE_DATE_EPOCH is that people who want
reproducible builds don't need to have intimate knowledge of all the 3rd-party
tools that their software uses.

(In reply to Ken Sharp from comment #10)
Post by b***@artifex.com
(In reply to infinity0 from comment #9)
Post by b***@artifex.com
The code that reads the command line parameter could read the environment
variable instead? At least, I've never seen abstractions that separate these
two things into separate layers.
The two are totally different, the command line parameters are parsed off
into PostScript. This is not OS-dependent, so its cross-platform.
Environment variables are OS-specific, so this is all in the
platform-specific code.
It looks like GS_OPTIONS is OS independent, so the code that reads GS_OPTIONS
could also read SOURCE_DATE_EPOCH and prepend --no-output-timestamps (or
whatever you decide) to GS_OPTIONS if S_D_E is non-empty?
Post by b***@artifex.com
Post by b***@artifex.com
So, this is quite generous, and could be made consistent with
SOURCE_DATE_EPOCH. This definition does not say it has to be consistent with
any external or "real" system clocks (and in fact many kernels offer
multiple clocks such as monotonic wrappers around other clocks). There are a
1. When S_D_E is set, then use this as the starting value of the clock. The
definition above specifically allows this. This doesn't solve R-B if a
particular invocation of ps2pdf has high variance in how long it runs, but
see #2.
The pdfwrite code is very variable, even on the same machine, in its
timings. Of course loading on the machine also affects this.
Post by b***@artifex.com
2. As per (1), but also simple increment the value by 1 each time realtime
is called, as opposed to using the system clock to measure "milliseconds".
This is a more generous interpretation of "millisecond" but the spec also
says "rate at which it changes is implementation-dependent" so nobody should
be relying on this value to actually represent real millseconds.
Both of these would be a little complex, but we'd be happy to write this if
you don't want to yourselves.
I'm pretty confident we won't adopt this approach, again it affects the
operation of the time functions, and is still more complex. As I said I'll
put it up again for discussion amongst the other developers.
OK, let me know how it goes. I was thinking you could just have a static
variable inside the function and increment that, so it wouldn't take up too
many lines. Yes it affects the operation of the function, but it is still
within what the spec states.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 10:37:24 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

***@suse.de changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@suse.de

--- Comment #13 from ***@suse.de ---
I like to share my personal opinion here:

Personally I am against the underlying idea behind
things like SOURCE_DATE_EPOCH (as far as I understand it).

In general I am against the idea that to achieve "whatever"
all software has to be changed.

From my point of view such an approach will never succeed
because there will always come up more new software that
does not care about "whatever" so that there is an endless
(and hopeless) fight to get "all software right".

Now you fix Ghostscript because that is currently used
by some other software at compile time to make documentation
(why the heck don't they provide their documentation also
in a final ready-to-read form in their sources?)
but some time later they do no longer use Ghostscript
because they switched to the new great "FancyDOC" tool
which makes your reproducible builds fail until
you got "FancyDOC" fixed and so on ad nauseam.
Why not fix how that other software makes its documentation?

Such kind of approach was tried several years ago in SUSE
(I think it was more than 10 years ago).
It never succeeded until it died out.

Bottom line:
From my point of view the idea to implement support
for SOURCE_DATE_EPOCH in all software is a dead concept.



In contrast I think the Ghostscript authors are right
that an appropriate Ghostscript command line option
to suppress time-related output or any random output
is the right way.

This way Ghostscript could be called with that option set
to achieve identical output from identical input
which is (as far as I understand it) what is
actually needed for reproducible builds.

But I think the right Ghostscript command line option for
reproducible builds should not be only "--no-output-timestamps"
but more generally it should be something
like "--no-runtime-dependant-output"
so that for same input there is always same oputput
regardless when (time, date, random number generator, ...)
or in what environment (operating system, architecture, ...)
Ghostscript was run.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 12:54:34 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #14 from ***@pwned.gg ---
(In reply to jsmeix from comment #13)
Post by b***@artifex.com
In general I am against the idea that to achieve "whatever"
all software has to be changed.
From my point of view such an approach will never succeed
because there will always come up more new software that
does not care about "whatever" so that there is an endless
(and hopeless) fight to get "all software right".
There's a misunderstanding here - with SOURCE_DATE_EPOCH, we're specifically
*not* "changing all software" - we're only changing the software which is the
root of each particular instance of the issue, i.e. the code that is actually
generating timestamps.
Post by b***@artifex.com
Now you fix Ghostscript because that is currently used
by some other software at compile time to make documentation
(why the heck don't they provide their documentation also
in a final ready-to-read form in their sources?)
but some time later they do no longer use Ghostscript
because they switched to the new great "FancyDOC" tool
which makes your reproducible builds fail until
you got "FancyDOC" fixed and so on ad nauseam.
Why not fix how that other software makes its documentation?
Such kind of approach was tried several years ago in SUSE
(I think it was more than 10 years ago).
It never succeeded until it died out.
From my point of view the idea to implement support
for SOURCE_DATE_EPOCH in all software is a dead concept.
Your argument can be generalised to argue that any ecosystem-wide change is a
dead concept, anywhere. But we see ecosystem-wide changes all the time, so your
argument must be incorrect.

The more realistic view is that all ecosystem-wide changes are made in the
*hope* that others will follow that change. Indeed, the more likely scenario is
that newer people writing software see this discussion, understand that "get
current date" does not make sense during build processes, and support
SOURCE_DATE_EPOCH instead.

GCC, doxygen, sphinx and several other projects are already supporting
SOURCE_DATE_EPOCH, so we have some momentum.
Post by b***@artifex.com
In contrast I think the Ghostscript authors are right
that an appropriate Ghostscript command line option
to suppress time-related output or any random output
is the right way.
This way Ghostscript could be called with that option set
to achieve identical output from identical input
which is (as far as I understand it) what is
actually needed for reproducible builds.
But I think the right Ghostscript command line option for
reproducible builds should not be only "--no-output-timestamps"
but more generally it should be something
like "--no-runtime-dependant-output"
so that for same input there is always same oputput
regardless when (time, date, random number generator, ...)
or in what environment (operating system, architecture, ...)
Ghostscript was run.
If every tool chooses to implement its own specific method to implement *the
same behaviour*, then of course your original assertion of "changing all
software" becomes a self-fulfilling prophecy. *That is exactly why* we designed
SOURCE_DATE_EPOCH in the first place.

In terms of "lying about the time", it is perfectly reasonable to take the
position "if SOURCE_DATE_EPOCH is set then we will effectively treat this as
the current time". The system administrator has made a specific choice to use
SOURCE_DATE_EPOCH, they are giving you permission to do this. [1] It's not like
SOURCE_DATE_EPOCH can be accidentally set for no reason. They *could* have set
their own system clock instead, but SOURCE_DATE_EPOCH is technically more
effective and more predictable, for reasons I mentioned earlier.

I am sorry for replying so much, and I will accept any decision that the
GhostScript developers make, but I just wanted to respond to arguments/points
that I believe to be inaccurate or missing our point or misunderstanding our
intentions.

[1] There are some corner cases, but I don't see that they apply to
GhostScript. I'll go into them elsewhere; trying to keep this response short.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 14:11:24 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #15 from ***@suse.de ---
I fully agree with you that the crucial factor
whether or not an ecosystem-wide change succeeds
is whether or not more will follow that change
than those who will not follow that change.

It is only my personal opinion that I think it will
not succeed to get SOURCE_DATE_EPOCH support sufficiently
in all relevant software that is needed for reproducilbe builds.


Back to the actual problem:

I think you mean the following (on one of my machines):
-------------------------------------------------------------------------
# date ; echo -e '%!\n100 100 moveto 200 300 lineto stroke showpage' \
| ps2pdf - line1.pdf
Wed 11 May 15:13:58 CEST 2016

# date ; echo -e '%!\n100 100 moveto 200 300 lineto stroke showpage' \
| ps2pdf - line2.pdf
Wed 11 May 15:14:04 CEST 2016

# diff -q line1.pdf line2.pdf
Files line1.pdf and line2.pdf differ

# pdfinfo line1.pdf | head -n3
Producer: GPL Ghostscript RELEASE CANDIDATE 1 9.19
CreationDate: Wed May 11 15:13:58 2016
ModDate: Wed May 11 15:13:58 2016

# pdfinfo line2.pdf | head -n3
Producer: GPL Ghostscript RELEASE CANDIDATE 1 9.19
CreationDate: Wed May 11 15:14:04 2016
ModDate: Wed May 11 15:14:04 2016
-------------------------------------------------------------------------

I.e. for identical PostScript input
-------------------------------------------------------------------
%!
100 100 moveto 200 300 lineto stroke showpage
-------------------------------------------------------------------
Ghostscript (via its pdfwrite device) creates different output
because it creates PDF metadata with different timestamps
and with the currently used Ghostscript version.

Accordingly I think the resulting question is
how to let Ghostscript create PDF without metadata
that depends on usually unimportant runtime values.

Obviously only SOURCE_DATE_EPOCH support in Ghostscript
would result different Ghostscript PDF output when
any different Ghostscript version is used
(i.e. also for any minor version change).

I don't know if it is intended for reproducilbe builds
when any different Ghostscript version results
a different PDF output that only differs in its metadata?

It is currently possible to specify hardcoded
PDF metadata in a file "pdfmeta" with content like:
-------------------------------------------------------------
[ /Title (none)
/Author (none)
/Subject (none)
/Keywords (none)
/ModDate (0)
/CreationDate (0)
/Creator (none)
/Producer (none)
/DOCINFO pdfmark
-------------------------------------------------------------

Then call Ghostscript with that as additional input like
------------------------------------------------------------------------------
# echo -e '%!\n100 100 moveto 200 300 lineto stroke showpage' \
| gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=line3.pdf - pdfmeta


# pdfinfo line3.pdf
Title: none
Subject: none
Keywords: none
Author: none
Creator: none
Producer: none
CreationDate: 0
ModDate: 0
...

# echo -e '%!\n100 100 moveto 200 300 lineto stroke showpage' \
| gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=line4.pdf - pdfmeta


# pdfinfo line4.pdf
Title: none
Subject: none
Keywords: none
Author: none
Creator: none
Producer: none
CreationDate: 0
ModDate: 0
...
------------------------------------------------------------------------------
cf. "Embedding PDFmarks" at
http://milan.kupcevic.net/ghostscript-ps-pdf/

But unfortunately that alone does not help:
------------------------------------------------------------------------------
# diff -q line3.pdf line4.pdf
Files line3.pdf and line4.pdf differ

# diff -aU0 line3.pdf line4.pdf | cut -b-76
--- line3.pdf 2016-05-11 15:49:20.990174454 +0200
+++ line4.pdf 2016-05-11 15:49:27.882294106 +0200
@@ -45 +45 @@
-<rdf:Description rdf:about='uuid:b5d92b71-4f9b-11f1-0000-9b0914259d8d' xmln
+<rdf:Description rdf:about='uuid:ba0548f1-4f9b-11f1-0000-9b0914259d8d' xmln
@@ -48 +48 @@
-<rdf:Description rdf:about='uuid:b5d92b71-4f9b-11f1-0000-9b0914259d8d' xmln
+<rdf:Description rdf:about='uuid:ba0548f1-4f9b-11f1-0000-9b0914259d8d' xmln
@@ -51,2 +51,2 @@
-<rdf:Description rdf:about='uuid:b5d92b71-4f9b-11f1-0000-9b0914259d8d' xmln
-<rdf:Description rdf:about='uuid:b5d92b71-4f9b-11f1-0000-9b0914259d8d' xmln
+<rdf:Description rdf:about='uuid:ba0548f1-4f9b-11f1-0000-9b0914259d8d' xmln
+<rdf:Description rdf:about='uuid:ba0548f1-4f9b-11f1-0000-9b0914259d8d' xmln
@@ -83 +83 @@
-/ID [<91D25DAA8329AF4695BA72BB2C411C1C><91D25DAA8329AF4695BA72BB2C411C1C>]
+/ID [<76DB5A97627D81D78B6B06E3229B9D37><76DB5A97627D81D78B6B06E3229B9D37>]
------------------------------------------------------------------------------
There are still some special UUIDs and IDs in the PDF :-(

I assume that SOURCE_DATE_EPOCH support in Ghostscript
will not be sufficient to get identical PDF as output
from identical PostScript input.

<sarcasm>
Welcome to the wonderful world of PDF!
</sarcasm>

Perhaps it is really easier to fix those other software
that creates its static PDF documentation anew each time
when it is compiled that they simply provide all their
static documentation also ready-for-use in their sources
in addition to their original sources like LaTeX sources
or whatever they use for their documentation.

Furthermore this usually saves tons of build resources
because one does no longer need the full stack of
documentation processing tools in the build system
and one does no longer need to run all those various
usually resource hungy documentation generating tools
each time when the software is compiled to only
generate again and again same static documentation.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 14:30:42 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

Ken Sharp <***@artifex.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |WONTFIX

--- Comment #16 from Ken Sharp <***@artifex.com> ---
(In reply to jsmeix from comment #15)
Post by b***@artifex.com
It is currently possible to specify hardcoded
-------------------------------------------------------------
[ /Title (none)
/Author (none)
/Subject (none)
/Keywords (none)
/ModDate (0)
/CreationDate (0)
/Creator (none)
/Producer (none)
/DOCINFO pdfmark
-------------------------------------------------------------
Then call Ghostscript with that as additional input like
This approach has already been rejected.
Post by b***@artifex.com
There are still some special UUIDs and IDs in the PDF :-(
These are also generated from the time value, so if you hack the time, the
values remain constant. Of course this is another good reason for us not to
support hacking the system time, the UUIDs *should* be different.

I was investigating a compromise approach, but it would not address the UUIDs.
The sticking point for us is hacking the clock, we aren't prepared to break the
PostScript operators for this.

So my conclusion is that you should carry on patching Ghostscript, I don't see
any way forward which will satisfy both the Ghostscript team and the Reliable
Builds team.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 14:47:41 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #17 from ***@suse.de ---
FYI:

For openSUSE and even more for SUSE Linux Enterprise
I will continue to keep our ghostscript RPM packages
in full compliance with Ghostscript upstream
which means that
I will not accept patches for SUSE ghostscript RPM packages
that support hacking the system time.

As a consequence for reproducible builds at SUSE
each particular other software would have to be adapted
if it calls SUSE's upstream compliant Ghostscript to make PDFs.
The presumably best adaption is when each software provides
all its static documentation as source files.

Alternatively of course someone else could maintain
any kind of "hacked Ghostscript" for SUSE as he likes ;-)
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 14:49:02 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #18 from ***@pwned.gg ---
(In reply to jsmeix from comment #15)
Post by b***@artifex.com
I don't know if it is intended for reproducilbe builds
when any different Ghostscript version results
a different PDF output that only differs in its metadata?
This is fine in general - e.g. different compiler versions will produce
different output - although it's always best to avoid minor differences where
possible. What we're building would add this metadata in a separate file so
that it's not lost, but it's not part of the "installed artifact" that users
directly consume, which can be compared between separate builders.
Post by b***@artifex.com
There are still some special UUIDs and IDs in the PDF :-(
I assume that SOURCE_DATE_EPOCH support in Ghostscript
will not be sufficient to get identical PDF as output
from identical PostScript input.
As Ken mentioned, one does get identical UUID output if one fixes the time
values.
Post by b***@artifex.com
Perhaps it is really easier to fix those other software
that creates its static PDF documentation anew each time
when it is compiled that they simply provide all their
static documentation also ready-for-use in their sources
in addition to their original sources like LaTeX sources
or whatever they use for their documentation.
Furthermore this usually saves tons of build resources
because one does no longer need the full stack of
documentation processing tools in the build system
and one does no longer need to run all those various
usually resource hungy documentation generating tools
each time when the software is compiled to only
generate again and again same static documentation.
This wouldn't be acceptable from a FOSS point of view - we generally want even
documentation in the "preferred form for modification" and PDFs are not that.

(In reply to Ken Sharp from comment #16)
Post by b***@artifex.com
These are also generated from the time value, so if you hack the time, the
values remain constant. Of course this is another good reason for us not to
support hacking the system time, the UUIDs *should* be different.
I was investigating a compromise approach, but it would not address the
UUIDs. The sticking point for us is hacking the clock, we aren't prepared to
break the PostScript operators for this.
Out of interest, what was the compromise approach? If the rest of the file is
the same, do the UUIDs really need to be different?
Post by b***@artifex.com
So my conclusion is that you should carry on patching Ghostscript, I don't
see any way forward which will satisfy both the Ghostscript team and the
Reliable Builds team.
Alright, thanks for trying, and for the detailed discussion.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 15:06:37 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #19 from Ken Sharp <***@artifex.com> ---
(In reply to infinity0 from comment #18)
Post by b***@artifex.com
Post by b***@artifex.com
I was investigating a compromise approach, but it would not address the
UUIDs. The sticking point for us is hacking the clock, we aren't prepared to
break the PostScript operators for this.
Out of interest, what was the compromise approach? If the rest of the file
is the same, do the UUIDs really need to be different?
Creating a command line switch to prevent emission of the dates, then using the
(Ghostscript extension to the PostScript language) getenv operator to
interrogate the system for the presence of the environment variable, and having
that set the command line parameter. As I mentioned previously these are, in
effect, translated into PostScript, so the command line parameters can be read
and set by the PostScript interpreter.

This would have, in effect, converted the environment variable into a command
line switch and prevented the pdfwrite device from emitting the dates when that
environment variable was present. Its all cross-platform, wouldn't have
resulted in incorrect creation times and wouldn't have affected the operation
of the time operators.

But having to also squash the UUIDs is just too much. As noted, these should
*not* be the same, its really an error (though very minor I grant) to have them
be the same, unique is supposed to mean unique.
Post by b***@artifex.com
Alright, thanks for trying, and for the detailed discussion.
Not that it will affect you, but the patch as it stands will put the wrong
timestamp on PDF files when built on Windows, even in the absence of the
environment variable.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 15:07:00 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #20 from ***@suse.de ---
Regarding "documentation in the preferred form for modification":

Intentionally I wrote "in addition to their original sources".

Would you also reject Makefile and Makefile.in to be
provided in the sources in addition to Makefile.am ?
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 15:22:18 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #21 from ***@pwned.gg ---
(In reply to jsmeix from comment #20)
Post by b***@artifex.com
Intentionally I wrote "in addition to their original sources".
Would you also reject Makefile and Makefile.in to be
provided in the sources in addition to Makefile.am ?
Sorry my bad, I skimmed over that. For FOSS purposes that is fine, yes. But for
our R-B verification purposes this wouldn't be sufficient. Your suggestion
might "tick the box" but it would basically be cheating, so we wouldn't want to
pursue that option.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-11 21:57:56 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

James Cloos <***@jhcloos.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@jhcloos.com

--- Comment #22 from James Cloos <***@jhcloos.com> ---
One thought:

Other software generating pdfs have started (or already did) accepting options
specifying exactly what metadata to add to the resulting file.

There should not be any problem w/ gs doing that, too.

(In fact, can't a bit of extra ps code do that already, anyway? Ie a -c
snippet before the src files?)

That would allow static creation et al dates to be put in the output pdf files.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-12 07:12:22 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #23 from Ken Sharp <***@artifex.com> ---
(In reply to James Cloos from comment #22)
Post by b***@artifex.com
(In fact, can't a bit of extra ps code do that already, anyway? Ie a -c
snippet before the src files?)
You mean a pdfmark which sets the DOcInfo metadata.
Post by b***@artifex.com
That would allow static creation et al dates to be put in the output pdf
files.
That idea was rejected as well. The requirement from Reproducible Builds is
that the environment variable is the *only* control.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-12 07:18:37 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #24 from Ken Sharp <***@artifex.com> ---
(In reply to Ken Sharp from comment #23)
Post by b***@artifex.com
Post by b***@artifex.com
(In fact, can't a bit of extra ps code do that already, anyway? Ie a -c
snippet before the src files?)
You mean a pdfmark which sets the DOcInfo metadata.
Post by b***@artifex.com
That would allow static creation et al dates to be put in the output pdf
files.
That idea was rejected as well. The requirement from Reproducible Builds is
that the environment variable is the *only* control.
Pressed 'save changes' too quick....

I can eliminate the problem with the CreationDate and Mod Date by various
means, but this still leaves the problem of UUIDs, which are also generated
from the time and which cannot be overriden with a pdfmark, and which would
also be required to be identical.

Which is where I gave up.
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2016-05-12 09:10:04 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

--- Comment #25 from ***@pwned.gg ---
(In reply to Ken Sharp from comment #24)
Post by b***@artifex.com
(In reply to Ken Sharp from comment #23)
Post by b***@artifex.com
Post by b***@artifex.com
(In fact, can't a bit of extra ps code do that already, anyway? Ie a -c
snippet before the src files?)
You mean a pdfmark which sets the DOcInfo metadata.
Post by b***@artifex.com
That would allow static creation et al dates to be put in the output pdf
files.
That idea was rejected as well. The requirement from Reproducible Builds is
that the environment variable is the *only* control.
A command-line option, although not saving R-B too much cost, would still be
useful for those other projects that use ghostscript directly. If they want to
think about reproducing their builds, this would become possible for them with
an unpatched ghostscript. (They would have to avoid realtime in their PS input;
also there remains the UUID issue.)

You could do that and ignore our preference for SOURCE_DATE_EPOCH. I was just
making points on why the latter is preferred, i.e. it would not save much
global cost if *everyone* chose to ignore it.
Post by b***@artifex.com
I can eliminate the problem with the CreationDate and Mod Date by various
means, but this still leaves the problem of UUIDs, which are also generated
from the time and which cannot be overriden with a pdfmark, and which would
also be required to be identical.
Which is where I gave up.
I understand this direction. If you're interested though, we did think through
these topics ourselves a year or so ago, and our conclusion is like this:

Yes, perhaps on a surface level making these things constant (timestamps and
UUIDs) might seem like "lying" or breaking some intuitive semantics of how
unique they should be. But if we step back a bit and ask, what really is the
*purpose* of these pieces of information? For UUIDs it is meant to be an easy
way to distinguish two documents that are different. But if A.pdf and B.pdf are
otherwise identical *except* for the UUID, what is the point of them being
different?

More abstractly: uniqueness/constantness is relative, it is always *given*
something. If I'm running ghostscript in a VM and I clone the VM, I would get
the same UUID in both cases. What we're saying is that UUIDs should be unique,
*given* useful (less redundant) pieces of information. Instead of UUID = f (
ghostscript version, input.ps, timestamp ), we think it's better if UUID = f (
ghostscript version, input.ps ).
--
You are receiving this mail because:
You are the QA Contact for the bug.
b***@artifex.com
2017-11-16 21:09:57 UTC
Permalink
http://bugs.ghostscript.com/show_bug.cgi?id=696765

Stefan Brüns <***@rwth-aachen.de> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@rwth-aachen.d
| |e

--- Comment #26 from Stefan Brüns <***@rwth-aachen.de> ---
(In reply to infinity0 from comment #25)
Post by b***@artifex.com
(In reply to Ken Sharp from comment #24)
Post by b***@artifex.com
(In reply to Ken Sharp from comment #23)
Post by b***@artifex.com
Post by b***@artifex.com
(In fact, can't a bit of extra ps code do that already, anyway? Ie a -c
snippet before the src files?)
You mean a pdfmark which sets the DOcInfo metadata.
Post by b***@artifex.com
That would allow static creation et al dates to be put in the output pdf
files.
That idea was rejected as well. The requirement from Reproducible Builds is
that the environment variable is the *only* control.
A command-line option, although not saving R-B too much cost, would still be
useful for those other projects that use ghostscript directly. If they want
to think about reproducing their builds, this would become possible for them
with an unpatched ghostscript. (They would have to avoid realtime in their
PS input; also there remains the UUID issue.)
You could do that and ignore our preference for SOURCE_DATE_EPOCH. I was
just making points on why the latter is preferred, i.e. it would not save
much global cost if *everyone* chose to ignore it.
Post by b***@artifex.com
I can eliminate the problem with the CreationDate and Mod Date by various
means, but this still leaves the problem of UUIDs, which are also generated
from the time and which cannot be overriden with a pdfmark, and which would
also be required to be identical.
Which is where I gave up.
I understand this direction. If you're interested though, we did think
through these topics ourselves a year or so ago, and our conclusion is like
Yes, perhaps on a surface level making these things constant (timestamps and
UUIDs) might seem like "lying" or breaking some intuitive semantics of how
unique they should be. But if we step back a bit and ask, what really is the
*purpose* of these pieces of information? For UUIDs it is meant to be an
easy way to distinguish two documents that are different. But if A.pdf and
B.pdf are otherwise identical *except* for the UUID, what is the point of
them being different?
More abstractly: uniqueness/constantness is relative, it is always *given*
something. If I'm running ghostscript in a VM and I clone the VM, I would
get the same UUID in both cases. What we're saying is that UUIDs should be
unique, *given* useful (less redundant) pieces of information. Instead of
UUID = f ( ghostscript version, input.ps, timestamp ), we think it's better
if UUID = f ( ghostscript version, input.ps ).
I don't know why it has not been mentioned yet:
https://www.ghostscript.com/doc/current/Ps2pdf.htm#Options

-sDocumentUUID=string
Defines a DocumentID to be included into the document Metadata. [...]
Note that Adobe XMP specification requires DocumentID must be same for all
versions of a document. Since Ghostscript does not provide a maintenance of
document versions, users are responsible to provide a correct UUID through this
parameter. [...]

-sInstanceUUID=string
Defines a instance ID to be included into the document Metadata. [...]
Note that Adobe XMP specification requires instance ID must be inique for
all versions of document. This parameter may be used to disable an unique ID
generation for a debug purpose.

So the current way of generating the DocumentUUID from the timestamp is
probably *ahem* suboptimal - it will return different UUIDs for the same
document recreated at a later time or for a later version, and it may return
the same UUID for multiple documents - the UUID is generated from gettimeofday
(on UNIX), so "just" microsecond resolution.

So for DocumentUUID, probably hash(input path + project name) would be a better
option, while InstanceUUID could either be derived from a version or from
hash(contents).
--
You are receiving this mail because:
You are the QA Contact for the bug.
Loading...