[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [mizar] Translating Mizar into some standard FOPL-like format?



Wow ... I had not seen

http://www.cs.miami.edu/~tptp/MizarTPTP/

before!

I see that there are hyperlinks pointing to the TPTP-ized proofs of each theorem.

Very cool stuff, thanks!

ben

On Wed, Oct 22, 2008 at 3:09 AM, Josef Urban <urban@ktilinux.ms.mff.cuni.cz> wrote:

Hi Ben,


On Tue, 21 Oct 2008, Ben Goertzel wrote:

I haven't kept up with this list lately, and I'm curious what is the current
state of efforts to translate the Mizar corpus of theorems and proofs into
some fairly standard first-order-predicate-logic-like formalism?

I see that as you wrote ten months ago, you really got back to this in 2008 :-). I enclose the e-mails we exchanged on this. The extended version of the paper describing export of Mizar proofs to TPTP and their cross-verification by automated theorem provers is at http://kti.mff.cuni.cz/~urban/mptp_gdv_full.pdf .

As I wrote before, I am really interested in AI systems working on this (in my opinion fairly unique) knowledge base, and will further work on making it more accessible to AI researchers if needed. I really believe that "strong AI" can benefit a lot from large computer-understandable repositories of "human thought", and that vice-versa, formal math can benefit a lot from new "strong AI" methods combining imprecise, intuitive, and heuristical thinking with precise deductive thinking.

This is no longer only about Mizar, the "Large Theory Batch" (LTB) division of the CADE ATP System Competition (CASC) really happened for the first time this year at IJCAR (http://www.cs.miami.edu/~tptp/CASC/J4/Design.html#CompetitionDivisions)
with additional categories for the problems translated from the CyC and SUMO knowledge bases, and $3000 prize money for the SUMO problems (http://www.ontologyportal.org/reasoning.html) and $200 for the CyC problems. The SUMO and CyC problems are however still quite "shallow" in comparison with the math problems, and SUMO and CyC also still mainly consist of axioms - they lack the nontrivial DAG structure of developed math theories. I hope that eventually some problems extracted e.g. from the Isabelle math library could be added to CASC-LTB, so that it does not get overfocused on shallow reasoning in large ontologies.

Best,
Josef


On Sat, 22 Dec 2007, Benjamin Goertzel wrote:

Thanks -- I have now looked through this material and it's very
exciting .. I'm focusing
on other stuff right now but do hope to get back to this area in 2008
sometime... what
you've done seems like a very important step...

On Dec 16, 2007 10:14 AM, Josef Urban <urban@ktilinux.ms.mff.cuni.cz> wrote:

Hi Ben,

I am now writing an extended version of a paper describing export of
Mizar proofs to TPTP (see
http://www.springerlink.com/content/t88848500815t188/ for the conference
version), and have just remembered your request for having Mizar proofs
exported to KIF or TPTP. So yes, I have translated it all to TPTP
derivations, which can be ATP-verified by tools like GDV (see the paper).
The page which presents it all is http://www.cs.miami.edu/~tptp/MizarTPTP/
, the TSTP icons there give you the TPTP derivations corresponding to each
Mizar proof (see
http://www.activemath.org/workshops/MathUI/07/proceedings/Urban-etal-MizarIDV-MathUI07.pdf
for explanation of the site). If you want tar.gz of all the proofs, some
version is at http://lipa.ms.mff.cuni.cz/~urban/nd_problems1.tar.gz (watch
out, it is huge). There are still some completness bugs, the biggest is
lack of explicit arithmetic evaluations done by mizar (inferences like
4+5=9 that dumb FOL ATP system will not prove), but otherwise it mostly
works (see the paper).

The MPTPChallenge is now over (see the results at
http://www.cs.miami.edu/~tptp/MPTPChallenge/Results/SVGResults.html), but
we would still be very interested in the results of your system. Also,
Geoff Sutcliffe is preparing a "batch division of CASC" for next year's
IJCAR, with similar setting like the MPTPChallenge (and possibly more
money from e.g. CyCorp, who will probably also provide some of their
problems). For my simple-but-pretty-useful inductive/deductive MaLARea
metasystem working in large theories, see
http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-257/05_Urban.pdf
(hope you'll get motivated :-).

Best,
Josef



On Mon, 19 Mar 2007, Josef Urban wrote:



Ben,

the MPTP export keeps at this moment quite a lot of the original Mizar proof
structure in the TPTP 'useful info' slot, but not all of it. Depending on
what kind of learning you want to do, it might or might not suffice. The
biggest ommission is probably the lack of further description of unproved
propositions (precisely in TPTP syntax: fofs with role 'unknown' and
mptp_info(_,_,proposition,_,_)). Most of them are probably natural deduction
(ND) assumptions, but some are not (e.g. propositions about constants
introduced by the 'consider' keyword - they can be rather thought of as
definitions).

It should not be difficult to add the missing info there - the TPTP format is
produced directly from the XML, which contains all the proof structure. There
have been several reasons for postponing this so far - my focus on getting
the reproving of the simple steps right (which is sort of a precondition for
ATP cross-verification of Mizar, which in turn is a precondition for
productive use of deductive tools like ATP as part of larger knowledge-based
systems tailored for Mizar), and also a focus on getting the reproving of
theorems from their external references right, which in a sense gives you a
large-scale proof structure (which is not only usable for learning, but also
- unlike the ND internal proofs - understandable to ATPs, and thus allowing
things like the MPTP challenge). Another reason was that I did not want to
decide about the details of MPTP ND annotations, until I decided about the
export of ND proof structure to TPTP. The latter accidentally happened last
week (not only as a next step for the cross-verification, but also as a
megalomanic plan to build the detailed MML DAG with milions of nodes :-). So
some missing annotations will appear in the next few weeks (maybe even days),
and more importantly, the Mizar ND proofs will become TPTP proofs (if needed
without any ND - though there is a good chance that assumptions will become
acknowledged and processed parts of TPTP proofs).

To sum up:
- you can have 'raw Mizar' loaded by using the XML - that gives you access to
the formulas and Mizar proof structure; the disadvantage is that for any
deductive tool which you might want to use, you'll have to define the
translation to its logic
- there is the 'raw MPTP', with formulas in extended TPTP syntax containing
the mptp_info annotations; this is sort of a middle way between the XML and
standard TPTP; the annotations are likely to get a bit better in near future,
what they annotate is still the Mizar ND structure
- there are pre-generated 'standard TPTP' problem sets in the MPTP distro and
the MPTPChallenge distro; these you can feed to ATP systems, and also
learning systems (the symbols and proposition names are stable there - always
the same semantics); for quite a lot of them an ATP proof can be found (and
used for learning)
- there will (hopefully soon) be a full TPTP (i.e. mostly/completely non-ND)
proof structure export, compatible with the proofs produced by ATP systems
like EP.

Josef

On Sun, 18 Mar 2007, Ben Goertzel wrote:


Josef,

Thanks for your reply!  However, I'm not sure we are fully understanding
each other.

I agree that the TPTP language has a level of simplicity similar to that of
KIF, so that converting TPTP into KIF wouldn't be a big deal.

And, from the point of view of my own Novamente AI system, writing a
specialized TPTP loader and bypassing KIF wouldn't be a big deal either.

However, I am not sure from your email if the "MPTP export" of Mizar, at
this point, actually exports the **proofs** in the existing Mizar corpus
(Journal of Formalized Mathematics), or just the theorem statements.  Can
you clarify? For my purposes, I need the proofs not just the theorems, as
my desire is to do inductive learning of proof-patterns based on automated
analysis of the proof-corpus.

Thanks
Ben Goertzel
Novamente LLC
ben@goertzel.org




Hi Ben,

On Sat, 17 Mar 2007, Ben Goertzel wrote:


Hi all,

I am the leader of an AI project (www.novamente.net) and am interested
to apply my AI system to automated theorem-proving based on a method
of having it ingest the existing corpus of Mizar proofs, inductively
infer
"patterns of proof" therefrom, and then use these patterns to guide its
work in further proofs.

Good, hope you'll submit your system to the MPTP Challenge
(http://www.cs.miami.edu/~tptp/MPTPChallenge/), and write a paper about it
to ESARLT (http://www.cs.miami.edu/~geoff/Conferences/ESARLT/).

However, the first issue confronted along this path is that my AI system
does not possess a Mizar loader.  This is because Mizar's syntax is
bloody complicated!  (Understandably so, because of its design goals.)

Thus, it seems necessary to convert Mizar to some simpler and more
standard
format.  A sensible choice would seem to be KIF

http://www.ksl.stanford.edu/knowledge-sharing/kif/

but of course other similarly simple formats would do as well.

Mizar is translated to TPTP by the MPTP export
(http://kti.ms.mff.cuni.cz/~urban/MPTP2/mptp0.2.tar.gz). TPTP2KIF can be
probably achieved through TPTP2X from the TPTP distro (not that there
would be many "really running systems" fully implementing KIF (AFAIK)).
Also, just for initial learning experiments, the MPTPChallenge problems
(http://www.cs.miami.edu/~tptp/MPTPChallenge/MPTPChallenge.tgz) might be
enough.

My question is whether anyone has created a converter of this sort, or
has
thought seriously about it.

If no one has created one, does anyone have an educated estimate
regarding
how long this task would take for an individual with suitable background
in mathematics and software engineering?

I hope that MPTP is enough for you, you are welcome to play with it and
improve it - there are number of ways how one can "translate" Mizar and
its proofs, e.g. with respect to optimizations available in various
reasoning systems, and I hardly did the "basic" approach so far. Also
Mizar is available in XML-ized format
(http://mmlquery.mizar.org/mizardoc/xml/Mizar.html) which solves the
low-level technical "loading" issue, and may be also usable e.g. for
machine learning (fo some hints read e.g. section 3.5 of
http://ktiml.mff.cuni.cz/~urban/mizxml.ps). However it is certainly still
"Mizar world", not a standardized well-known format like TPTP.

Best regards,
Josef Urban









--
Ben Goertzel, PhD
CEO, Novamente LLC and Biomind LLC
Director of Research, SIAI
ben@goertzel.org

"A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects."  -- Robert Heinlein