[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [mizar] an extension of the mizar language for dealing with article metadata




Hi,

I think the new directives should not use XML syntax, simply because it is ugly, verbose, and unnecessary. I don't think XML should be used for direct human authoring and reading, just as assembly language should not. The new directives should be normal Mizar keywords like "author", "title", etc.

Josef

On Tue, 25 Nov 2008, trybulec wrote:

Dear Jesse,

In Algol 60 a comment was a part of the language. The syntax was
   comment   TEXT ;

where TEXT was an arbitrary string of characters ( the semicolon itself was not allowed) We treat now comments as something that actually is not a part of the language. I would rather have "pure Mizar" with only these informations that are necessary for the verifier. We have now some features that should not be in pure Mizar and probably should be replaced by pragmas:
they may be classified in the following categories:
i. temporary pragmas:   '@proof'  - to suspend verification
ii. library pragmas:   'canceled' - to keep numeration correct
iii. editorial pragmas: repeated 'begin' - to divide article into section.
iv. maybe some others

We tried - on our seminar - to find common solution, but the categories are so different that probably
there is none.

Recently, after reading your letter, we discussed the problem again with Czeslaw and it seems that for the metadata the proper solution is to have an XML-envelope of the Mizar article. Maybe something like
    <DOCUMENT>
       "the place for metadata"
    <ARTICLE>
        "the Mizar article"
    </ARTICLE>
    </DOCUMENT>
(mind you, I know almost nothing on XML, so it might quite naive).

Then from such XML document we may extract the Mizar article and process it in the usual way. "the Mizar article" is probably not the exact citation: we have '<' as a symbol in Mizar. Still extracting the Mizar article should not be too complicated. Maybe we can be more precise and have the form

    <DOCUMENT>
       "the place for metadata"
    <ARTICLE>
    <ENVIRONMENT DECLARATION>
        "directives"
    </ENVIRONMENT DECLARATION>
     <TEXT PROPER>
        "the text proper"
    </TEXT PROPER>
    </ARTICLE>
    </DOCUMENT>

Then, 'environ' and 'begin' (if necessary) may be added while extracting the Mizar article. The directives and the text proper are written in two different languages, anyway.

    <DOCUMENT>
       "the place for metadata"
    <ARTICLE>
    <ENVIRONMENT DECLARATION>
        "directives"
    </ENVIRONMENT DECLARATION>
     <TEXT PROPER>
      <SECTION>
        "the text proper"
      </SECTION>
        ...
      <SECTION>
        "the text proper"
      </SECTION>
    </TEXT PROPER>
    </ARTICLE>
    </DOCUMENT>

Then we may put in the <SECTION> the title and maybe even the summary of the section. And the grammar of Mizar will be a little bit simpler: no repetition of 'begin'.

Regards,
Andrzej

Jesse Alama wrote:

As it stands now, the title of an article, its author(s), and the name
of sections aren't actually a formal part of a mizar text: we
customarily include this information as comments

 :: Introduction to XYZ Theory
 ::
 :: by Joe the MIZAR Writer

 environ

 ...

 begin :: Preliminaries

 ...

 begin :: Main Lemma

 ...

 begin :: Main Theorem

 ...

I propose to put this information into the body of the article, to move
it from comments to non-comments.  Perhaps we could introduce new
keywords like `title', `author', and `section'.  Then, articles could
look like

title{Introduction to XYZ Theory}
author{Joe the MIZAR Writer}

environ

  <stuff>

begin{Preliminaries}

definition ...

theorem ...

theorem ...

begin{Main Lemma}

definition ...

begin{Main Theorem}

theorem ...

This would make the sections and article title part of the text; as it
stands, these features are not represented in the mizar grammar (because
customarily they are tucked away within comments).

The primary motivation for this idea is to help improve Josef Urban's
semantic MML presentation.  Currently he has to do some hacking to
extract metadata like article titles and authors; extracting section
information (currently unsupported) would be similarly complicated.  The
problem is that the article metadata isn't represented in the XML
representation of an article; it has to be inferred from the original
mizar source.

Additionally, this new syntax might help to simplify processing of FM
articles: with this approach one wouldn't need the BibTeX fields
SECTION1, SECTION2, etc., because these could be inferred from the
article.

Thoughts?

Best,

Jesse