[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [mizar] Reference distances?




Hi Freek,

On Fri, 3 Feb 2006, Freek Wiedijk wrote:

I wonder whether anyone here can tell me the answer to the
following:

I think Krzysztof Retel and Robert Milewski did some related data-mining some time ago.

I would like to know the statistics about the "distances" of
references to labels in justifications.  Specifically:

You can get the data easily from the recent MPTP version, e.g. for the lemma IRRAT_1:E2_3 - the first by-inference in the proof of theorem IRRAT_1:1 (http://lipa.ms.mff.cuni.cz/~urban/xmlmml/html.937/irrat_1.html#T1), which is:

then E3: c1 > 1 by INT_2:def 5;

the MPTP format is:

fof(e2_3,lemma-derived, ~ r1_xreal_0(c1_3,1),
    file(irrat_1,e2_3),
    [mptp_info(2,[3],proposition,position(49,16),[3]),
    inference(mizar_by,[],[e1_3,d5_int_2])]).

mptp_info(2,[3], ... ) (as well as e2_3) tells you that this is the second proposition in the third top-level block,

inference(..., [e1_3,d5_int_2]) tells you that it was proved from e1_2, i.e. the first (i.e. previous) proposition in the same block (and also from INT_2:def 5).

Generally, the block paths can be longer, e.g. e2_49_3_1 (with mptp_info(2,[49,3,1],...)) means that this is the second proposition in the first subblock of the third subblock of the 49-th top-level block.

The data are in the pl/*.xml2 files of the MPTP0.2 distro, something like

cat pl*.xml2 | grep mizar_by | sed -e 's/^fof(\([^,]*\),.*mizar_by,\[\],\(.*\)..../\1:\2/'

will print all these data in the following format:

e2_3:[e1_3,d5_int_2]

It should be fairly easy to do the data-mining you describe on it afterwards.

In the article that you're interested in, put _one_ label on
everything that you can put a label on.  Then, restrict
yourself to labels that are "inside" a proof/end (or now/end,
hereby/end, etc.) block.  Then, for a justification "count
back" to accessible labels.  So the label of the statement
accessible with "then" would be counted as "1", the one before
that as "2", etc.  For instance in

 theorem Th1: ...
 proof
   ...
 end;

 theorem Th2: ...
 proof
 A1: ...
 consider ... such that A2: ...
 assume A3: ...
  proof
 A4: ...
  end;
 A5: ...
 then A6: ... by Th1,A1;
 end;

the distances for this "by" at the justification of A6 would
be 1 (for the implicit A5 that's in the "then") and 4 (for the
A1).  The "Th1" label should be ignored as it is outside the
proof/end blocks.

But this will generally happen often, references will be lemmas proved/assumed in parent blocks, are you sure you do not want them?

Best,
Josef