koonts.com

Crossword XML Schema

contents

introduction

While working on the CrosswordPlayer SVG application, I was unable to find an openly specified file format for crossword puzzles. Thus began development of a crossword XML Schema. It is still rough and comments are welcome.

overview

A schema is a set of rules describing how information is packaged together.

The W3C, who publishes the openly specified XML 1.0 standard, says: "XML (Extensible Markup Language) is a simple, very flexible text format..." and "XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content and semantics of XML documents."

This document discusses the information associated with a crossword puzzle, shows examples of puzzle formats, and explains the crossword XML schema.

information structure

puzzle types and styles

A puzzle is of some type and style. Within the context of the type and style of the puzzle the 'data' content and structure can be defined. There is also associated 'metadata' such as the title of the puzzle, and the name of the creator. The CrosswordPlayer currently handles puzzles of type Crossword, style American, and language English.

This information has been gathered from various sources and is in no way definitive or complete. Helpful web sources are the basic rules and specification links at CRUCIVERB.COM and the article Crossword Puzzles from around the World at dummies.com.

Puzzles can be of type crossword, acrostic, diagramless, etc. A type of puzzle may have different styles, the crossword puzzle has style American, French, Spanish, UK, etc. A puzzle is also in a certain language, you can have a Dutch language American-style crossword puzzle.

Puzzle style Grid Clues Numbered Other
Crossword
crossword American-style
columns x rows Across & Down corner of cell 1,2,3,..
clue is Across 5, Down 5
has complete letter interlock
usually square 15x15
Mot croises
crossword French-style
columns x rows Horizontalement & Verticalement top of columns 1,2,..
left of rows I, II,.. (Roman numberals)
clue is Verticalement 5, Horizontalement V
more than one clue per line for more than one word for line
has unchecked letters
usually asymmetrical, horizontal 9x11
Crucigramas
crossword Spanish-style
columns x rows Horizontales & Verticales top of columns 1,2,..
left of rows I, II,.. (Roman numberals)
clue is IV-3
 

The first puzzle put into XML is an American-style Crossword Puzle in English. While the following focuses on this implementation, the same process can be applied to other type, style, language puzzles.

data

An American-style crossword has a grid and some clues. The clues are grouped by 'Across' and 'Down'. The cells of the grid are numbered in the upper left corner of the cell. The 'data' is size of the grid, often a 15x15 square, which cells have letters and which are not used, the answers to each cell, and the clues that correspond to each entry for each direction. There are also 'rules' that a puzzle and type will follow, such as each cell having both an Across and Down clue. And even more simply the rule that there are no blanks within a 'word'. With the data and the rules of the type and style of puzzle, a puzzle player should be able to recreate the puzzle.

A basic structure of the data for a crossword puzzle american style with grid 15x15 would be:

1 grid (columns=15, rows=15)
15x15 = 225 cells (a cell is letter or blank)
some number of clues
a clue is across or down

metadata

The format of the 'metadata' of the puzzle attempts to follow Dublin Core standards. The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models.

metadata
title
date
creator
rights
publisher
identifier
description

XML is a good exchange format for the information (data and metadata) in the crossword itself. For exchange of information about the puzzles between crossword players (both software and human) use of Resource Description Framework (RDF) and RDF Schema (RDFS) following Dublin Core standards seems like a good fit. And that is another cool project, out of the scope of the current document.

vocabulary

grid
2 dimensional 'grid' structure of puzzle
cell
one box or square in the grid
letter cell
cell containing a letter
blank cell
cell that is not active in the puzzle
letter
one letter to one cell
word
word or words that are the answer to the clue (sometimes referred to as 'entry')
unchecked letter or unkeyed letter
letters that appear in only one word across or down
letter interlocks
letters that appear in both an across or down word

file formats

Crossword players expect to see the crossword data to be formatted in such a way that they can understand. Some formats found on the web are .txt (plain text file), .puz (Across Lite format), .cmo (Crossword Maestro format), and .xwd (Crossdown format). An open format for crossword puzzles allows seperation of the puzzle information from the puzzle player. A puzzle could be played on multiple players, without conversion difficulties caused by closed proprietary formats.

The November 23, 2003 - "Sunday Challenge" is used as an example to show the .puz binary format (shown in a hexdump) and the .xml format currently used in CrosswordPlayer.

example puz (hexdump)

Example of a crossword in puz format. A hexdump of the puz format is relatively easier to understand. The hex values correspond to the characters on the right.

0000: 6A 7A 41 43 52 4F 53 53 26 44 4F 57 4E 00 02 AA 	jzACROSS&DOWN..©
0010: 4B E9 72 04 EB EF 7E C6 31 2E 32 00 00 00 00 00 	K©r.©©~©1.2.....
0020: 00 00 00 00 00 00 00 00 00 00 00 00 0F 0F 46 00 	..............F.
0030: 01 00 00 00 4A 41 4D 45 53 4D 41 53 4F 4E 2E 54 	....JAMESMASON.T
0040: 45 53 53 49 54 41 4C 49 41 4E 49 43 45 2E 49 4C 	ESSITALIANICE.IL
0050: 57 55 47 4F 4C 44 44 49 47 47 45 52 2E 50 49 45 	WUGOLDDIGGER.PIE
0060: 52 53 50 4C 45 45 4E 2E 4E 41 56 59 2E 57 45 45 	RSPLEEN.NAVY.WEE
0070: 2E 2E 2E 53 41 52 47 2E 4E 45 57 4D 41 54 48 43 	...SARG.NEWMATHC
0080: 4F 41 54 52 4F 4F 4D 2E 53 43 41 4C 49 41 4F 44 	OATROOM.SCALIAOD
0090: 44 2E 4D 41 4C 49 43 2E 41 52 4C 45 4E 4D 45 41 	D.MALIC.ARLENMEA
00A0: 54 2E 44 45 4D 4F 4E 2E 4C 41 50 44 50 52 4D 45 	T.DEMON.LAPDPRME
00B0: 4E 2E 4D 45 44 4F 43 2E 43 49 45 55 4E 53 45 41 	N.MEDOC.CIEUNSEA
00C0: 54 2E 44 45 54 41 43 48 45 44 4C 45 41 4E 54 4F 	T.DETACHEDLEANTO
00D0: 53 2E 44 41 54 41 2E 2E 2E 53 49 50 2E 4F 4D 4E 	S.DATA...SIP.OMN
00E0: 49 2E 42 41 52 54 41 42 49 53 50 53 2E 43 41 4E 	I.BARTABISPS.CAN
00F0: 4E 45 4C 4C 4F 4E 49 4F 53 4C 4F 2E 41 54 47 55 	NELLONIOSLO.ATGU

0100: 4E 50 4F 49 4E 54 4E 45 45 44 2E 54 48 45 42 45 	NPOINTNEED.THEBE
0110: 41 54 4C 45 53 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E 	ATLES----------.
0120: 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 	--------------.-
0130: 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D 	-------------.--
0140: 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D 2E 2D 2D 	--------.----.--
0150: 2D 2E 2E 2E 2D 2D 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D 	-...----.-------
0160: 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D 	--------.-------
0170: 2D 2D 2E 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D 	--.-----.-------
0180: 2D 2D 2E 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D 	--.-----.-------
0190: 2D 2D 2E 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D 	--.-----.-------
01A0: 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 	--.-------------
01B0: 2D 2D 2E 2D 2D 2D 2D 2E 2E 2E 2D 2D 2D 2E 2D 2D 	--.----...---.--
01C0: 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D 	--.----------.--
01D0: 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D 2D 	------------.---
01E0: 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D 	-----------.----
01F0: 2D 2D 2D 2D 2D 2D 4E 6F 76 65 6D 62 65 72 20 32 	------November 2

0200: 33 2C 20 32 30 30 33 20 2D 20 22 53 75 6E 64 61 	3, 2003 - "Sunda
0210: 79 20 43 68 61 6C 6C 65 6E 67 65 22 00 20 20 42 	y Challenge".  B
0220: 79 20 42 6F 62 20 4B 6C 61 68 6E 20 20 00 A9 20 	y Bob Klahn  .© 
0230: 32 30 30 33 20 42 6F 62 20 4B 6C 61 68 6E 2E 20 	2003 Bob Klahn. 
0240: 44 69 73 74 72 69 62 75 74 65 64 20 62 79 20 43 	Distributed by C
0250: 72 6F 73 53 79 6E 65 72 67 79 28 54 4D 29 20 53 	rosSynergy(TM) S
0260: 79 6E 64 69 63 61 74 65 00 22 54 68 65 20 56 65 	yndicate."The Ve
0270: 72 64 69 63 74 22 20 61 63 74 6F 72 00 46 65 61 	rdict" actor.Fea
0280: 74 68 65 72 65 64 20 66 69 73 68 69 6E 67 20 68 	thered fishing h
0290: 6F 6F 6B 73 00 53 74 72 61 64 64 6C 69 6E 67 00 	ooks.Straddling.
02A0: 4F 6E 69 6F 6D 61 6E 69 61 63 27 73 20 6D 65 63 	Oniomaniac's mec
02B0: 63 61 00 46 69 72 73 74 20 74 6F 20 61 72 72 69 	ca.First to arri
02C0: 76 65 3F 00 50 69 73 74 6F 6C 20 6F 72 20 73 61 	ve?.Pistol or sa
02D0: 62 65 72 00 41 72 74 65 72 79 00 45 6D 6D 61 27 	ber.Artery.Emma'
02E0: 73 20 22 53 65 6E 73 65 20 61 6E 64 20 53 65 6E 	s "Sense and Sen
02F0: 73 69 62 69 6C 69 74 79 22 20 64 69 72 65 63 74 	sibility" direct

0300: 6F 72 00 52 61 74 69 66 79 00 4B 65 79 20 6C 6F 	or.Ratify.Key lo
0310: 63 61 74 69 6F 6E 3F 00 42 75 74 74 65 72 66 6C 	cation?.Butterfl
0320: 69 65 73 00 31 39 37 39 20 4E 61 73 74 61 73 73 	ies.1979 Nastass
0330: 6A 61 20 4B 69 6E 73 6B 69 20 72 6F 6C 65 00 42 	ja Kinski role.B
0340: 65 67 69 6E 20 74 6F 20 75 70 73 65 74 00 22 42 	egin to upset."B
0350: 61 62 79 20 44 6F 6C 6C 22 20 77 61 73 20 68 69 	aby Doll" was hi
0360: 73 20 66 69 6C 6D 20 64 65 62 75 74 00 53 75 67 	s film debut.Sug
0370: 61 72 00 44 69 73 70 6C 61 79 69 6E 67 20 74 68 	ar.Displaying th
0380: 65 20 73 6B 69 6C 6C 20 61 6E 64 20 65 78 70 65 	e skill and expe
0390: 72 69 65 6E 63 65 20 6F 66 20 61 6E 20 65 78 70 	rience of an exp
03A0: 65 72 74 00 42 6F 61 72 64 77 61 6C 6B 20 62 75 	ert.Boardwalk bu
03B0: 79 00 4C 61 62 6F 72 20 6F 72 67 2E 20 62 6F 72 	y.Labor org. bor
03C0: 6E 20 6F 6E 20 74 68 65 20 50 61 63 69 66 69 63 	n on the Pacific
03D0: 20 63 6F 61 73 74 20 69 6E 20 74 68 65 20 6C 61 	 coast in the la
03E0: 74 65 20 27 33 30 73 00 43 61 74 68 65 72 69 6E 	te '30s.Catherin
03F0: 65 20 5A 65 74 61 2D 4A 6F 6E 65 73 27 73 20 63 	e Zeta-Jones's c

0400: 68 61 72 61 63 74 65 72 20 69 6E 20 22 49 6E 74 	haracter in "Int
0410: 6F 6C 65 72 61 62 6C 65 20 43 72 75 65 6C 74 79 	olerable Cruelty
0420: 2C 22 20 65 2E 67 2E 00 57 61 6C 6B 20 6F 6E 20 	," e.g..Walk on 
0430: 77 61 74 65 72 3F 00 4C 79 6D 70 68 6F 63 79 74 	water?.Lymphocyt
0440: 65 20 70 72 6F 64 75 63 65 72 00 42 6C 75 65 20 	e producer.Blue 
0450: 68 75 65 00 42 6C 75 65 20 54 72 69 61 6E 67 6C 	hue.Blue Triangl
0460: 65 20 67 70 2E 00 42 61 72 65 6C 79 20 70 65 72 	e gp..Barely per
0470: 63 65 70 74 69 62 6C 65 00 4D 61 72 69 6F 6E 65 	ceptible.Marione
0480: 74 74 65 20 6D 61 6B 65 72 20 54 6F 6E 79 00 41 	tte maker Tony.A
0490: 75 74 6F 6D 61 74 6F 6E 00 49 74 20 77 61 73 20 	utomaton.It was 
04A0: 62 61 73 65 64 20 6F 6E 20 73 65 74 20 74 68 65 	based on set the
04B0: 6F 72 79 00 45 61 72 74 68 79 20 64 65 70 6F 73 	ory.Earthy depos
04C0: 69 74 20 6F 66 20 63 6C 61 79 20 61 6E 64 20 63 	it of clay and c
04D0: 61 6C 63 69 75 6D 20 63 61 72 62 6F 6E 61 74 65 	alcium carbonate
04E0: 00 59 6F 75 20 63 61 6E 20 63 68 65 63 6B 20 79 	.You can check y
04F0: 6F 75 72 20 68 61 6E 67 2D 75 70 73 20 68 65 72 	our hang-ups her

0500: 65 00 49 74 27 73 20 75 73 75 61 6C 6C 79 20 69 	e.It's usually i
0510: 72 72 65 73 69 73 74 69 62 6C 65 00 5F 5F 5F 20 	rresistible.___ 
0520: 4C 69 6E 65 20 28 70 6F 73 74 2D 57 57 49 49 20 	Line (post-WWII 
0530: 47 65 72 6D 61 6E 2D 50 6F 6C 69 73 68 20 62 6F 	German-Polish bo
0540: 72 64 65 72 29 00 49 74 27 73 20 6D 6F 72 65 20 	rder).It's more 
0550: 70 72 6F 6D 69 6E 65 6E 74 20 69 6E 20 6D 65 6E 	prominent in men
0560: 20 74 68 61 6E 20 69 6E 20 77 6F 6D 65 6E 00 57 	 than in women.W
0570: 65 6E 74 20 77 69 74 68 6F 75 74 20 73 61 79 69 	ent without sayi
0580: 6E 67 3F 00 52 65 61 67 61 6E 20 43 6F 75 72 74 	ng?.Reagan Court
0590: 20 61 70 70 6F 69 6E 74 65 65 00 4D 61 74 63 68 	 appointee.Match
05A0: 6C 65 73 73 00 41 67 69 6E 67 20 61 63 69 64 20 	less.Aging acid 
05B0: 66 6F 75 6E 64 20 69 6E 20 66 72 75 69 74 00 49 	found in fruit.I
05C0: 6E 20 43 2C 20 70 65 72 68 61 70 73 00 43 6F 6D 	n C, perhaps.Com
05D0: 70 6F 73 65 72 20 6F 66 20 22 54 68 65 20 57 69 	poser of "The Wi
05E0: 7A 61 72 64 20 6F 66 20 4F 7A 22 00 43 68 61 72 	zard of Oz".Char
05F0: 63 75 74 65 72 69 65 20 6F 66 66 65 72 69 6E 67 	cuterie offering

0600: 00 4E 61 6E 63 79 20 44 72 65 77 20 6F 72 20 4B 	.Nancy Drew or K
0610: 69 6E 67 20 54 75 74 00 53 70 65 65 64 79 20 6F 	ing Tut.Speedy o
0620: 6E 65 3F 00 22 52 65 61 64 20 74 68 69 73 21 22 	ne?."Read this!"
0630: 00 22 52 75 73 68 20 48 6F 75 72 22 20 6F 72 67 	."Rush Hour" org
0640: 2E 00 46 6C 61 63 6B 73 00 50 61 72 74 6E 65 72 	..Flacks.Partner
0650: 73 68 69 70 20 66 6F 72 20 50 65 61 63 65 20 67 	ship for Peace g
0660: 70 2E 00 52 65 64 20 42 6F 72 64 65 61 75 78 00 	p..Red Bordeaux.
0670: 49 6E 64 69 61 6E 20 62 65 61 6E 00 43 61 6E 6E 	Indian bean.Cann
0680: 65 73 20 63 6F 6E 63 65 72 6E 20 28 61 62 62 72 	es concern (abbr
0690: 2E 29 00 54 6F 70 70 6C 65 00 46 2D 31 34 20 66 	.).Topple.F-14 f
06A0: 69 67 68 74 65 72 00 49 63 79 00 57 68 65 72 65 	ighter.Icy.Where
06B0: 20 74 6F 20 66 69 6E 64 20 62 6F 74 68 20 63 72 	 to find both cr
06C0: 65 61 6D 20 70 75 66 66 73 20 61 6E 64 20 6C 65 	eam puffs and le
06D0: 6D 6F 6E 73 00 53 68 65 64 73 00 53 63 79 74 68 	mons.Sheds.Scyth
06E0: 65 20 68 61 6E 64 6C 65 00 41 70 70 6C 65 20 63 	e handle.Apple c
06F0: 6F 6F 6B 69 65 2C 20 65 2E 67 2E 00 53 6D 61 6C 	ookie, e.g..Smal

0700: 6C 20 73 77 61 6C 6C 6F 77 00 41 6C 6C 20 61 74 	l swallow.All at
0710: 20 66 69 72 73 74 3F 00 22 57 68 65 72 65 27 73 	 first?."Where's
0720: 20 44 61 64 64 79 3F 22 20 64 72 61 6D 61 74 69 	 Daddy?" dramati
0730: 73 74 00 52 6F 75 6E 64 73 20 6B 65 65 70 65 72 	st.Rounds keeper
0740: 3F 00 44 72 75 64 67 65 20 6F 72 20 74 72 75 64 	?.Drudge or trud
0750: 67 65 00 41 72 63 68 65 72 20 77 69 74 68 6F 75 	ge.Archer withou
0760: 74 20 61 20 71 75 69 76 65 72 3F 00 54 68 65 79 	t a quiver?.They
0770: 27 72 65 20 73 74 72 61 69 67 68 74 20 66 72 6F 	're straight fro
0780: 6D 20 74 68 65 20 68 6F 72 73 65 27 73 20 6D 6F 	m the horse's mo
0790: 75 74 68 00 47 61 74 65 6B 65 65 70 65 72 73 20 	uth.Gatekeepers 
07A0: 77 69 74 68 20 63 6F 6E 6E 65 63 74 69 6F 6E 73 	with connections
07B0: 20 28 61 62 62 72 2E 29 00 49 6E 74 65 6E 74 69 	 (abbr.).Intenti
07C0: 6F 6E 61 6C 20 67 72 6F 75 6E 64 69 6E 67 3F 00 	onal grounding?.
07D0: 54 75 62 75 6C 61 72 20 69 6E 76 65 6E 74 69 6F 	Tubular inventio
07E0: 6E 20 6F 66 20 74 68 65 20 6C 61 74 65 20 74 65 	n of the late te
07F0: 65 6E 73 20 6F 72 20 65 61 72 6C 79 20 74 77 65 	ens or early twe

0800: 6E 74 69 65 73 00 48 65 61 72 74 20 6F 66 20 74 	nties.Heart of t
0810: 68 65 20 6D 61 74 74 65 72 00 43 61 70 69 74 61 	he matter.Capita
0820: 6C 20 61 74 20 74 68 65 20 63 65 6E 74 65 72 20 	l at the center 
0830: 6F 66 20 43 7A 65 63 68 6F 73 6C 6F 76 61 6B 69 	of Czechoslovaki
0840: 61 3F 00 42 61 64 20 77 61 79 20 74 6F 20 62 65 	a?.Bad way to be
0850: 20 6D 61 72 72 69 65 64 00 43 61 6C 6C 20 66 6F 	 married.Call fo
0860: 72 00 27 36 30 73 20 69 6E 76 61 64 65 72 73 00 	r.'60s invaders.
0870: 00                                              	.

example xml

The same example crossword in XML.

<?xml version="1.0" encoding="utf-8"?>
<puzzle>
    <crossword language="en">
        <metadata>
            <title>"Sunday Challenge"</title>
            <date>November 23, 2003</date>
            <creator>  By Bob Klahn  </creator>
            <rights>©2003 Bob Klahn. Distributed by CrosSynergy(TM) Syndicate</rights>
            <publisher>Houston Chronical</publisher>
            <identifier>http://www.example.com/puzzles/crossword.puz</identifier>
            <description>Puzzle of type Crossword and style American in language English translated from format .puz into format .xml following schema http://www.koonts.com/some/dir/crossword.</description>
        </metadata>
        <american>
            <grid rows="15" columns="15">
                <letter id="1,1">J</letter>
                <letter id="1,2">A</letter>
                <letter id="1,3">M</letter>
                <letter id="1,4">E</letter>
                <letter id="1,5">S</letter>
                <letter id="1,6">M</letter>
                <letter id="1,7">A</letter>
                <letter id="1,8">S</letter>
                <letter id="1,9">O</letter>
                <letter id="1,10">N</letter>
                <blank></blank>
                <letter id="1,12">T</letter>
                <letter id="1,13">E</letter>
                <letter id="1,14">S</letter>
                <letter id="1,15">S</letter>
                <letter id="2,1">I</letter>
                <letter id="2,2">T</letter>
                <letter id="2,3">A</letter>
                <letter id="2,4">L</letter>
                <letter id="2,5">I</letter>
                <letter id="2,6">A</letter>
                <letter id="2,7">N</letter>
                <letter id="2,8">I</letter>
                <letter id="2,9">C</letter>
                <letter id="2,10">E</letter>
                <blank></blank>
                <letter id="2,12">I</letter>
                <letter id="2,13">L</letter>
                <letter id="2,14">W</letter>
                <letter id="2,15">U</letter>
                <letter id="3,1">G</letter>
                <letter id="3,2">O</letter>
                <letter id="3,3">L</letter>
                <letter id="3,4">D</letter>
                <letter id="3,5">D</letter>
                <letter id="3,6">I</letter>
                <letter id="3,7">G</letter>
                <letter id="3,8">G</letter>
                <letter id="3,9">E</letter>
                <letter id="3,10">R</letter>
                <blank></blank>
                <letter id="3,12">P</letter>
                <letter id="3,13">I</letter>
                <letter id="3,14">E</letter>
                <letter id="3,15">R</letter>
                <letter id="4,1">S</letter>
                <letter id="4,2">P</letter>
                <letter id="4,3">L</letter>
                <letter id="4,4">E</letter>
                <letter id="4,5">E</letter>
                <letter id="4,6">N</letter>
                <blank></blank>
                <letter id="4,8">N</letter>
                <letter id="4,9">A</letter>
                <letter id="4,10">V</letter>
                <letter id="4,11">Y</letter>
                <blank></blank>
                <letter id="4,13">W</letter>
                <letter id="4,14">E</letter>
                <letter id="4,15">E</letter>
                <blank></blank>
                <blank></blank>
                <blank></blank>
                <letter id="5,4">S</letter>
                <letter id="5,5">A</letter>
                <letter id="5,6">R</letter>
                <letter id="5,7">G</letter>
                <blank></blank>
                <letter id="5,9">N</letter>
                <letter id="5,10">E</letter>
                <letter id="5,11">W</letter>
                <letter id="5,12">M</letter>
                <letter id="5,13">A</letter>
                <letter id="5,14">T</letter>
                <letter id="5,15">H</letter>
                <letter id="6,1">C</letter>
                <letter id="6,2">O</letter>
                <letter id="6,3">A</letter>
                <letter id="6,4">T</letter>
                <letter id="6,5">R</letter>
                <letter id="6,6">O</letter>
                <letter id="6,7">O</letter>
                <letter id="6,8">M</letter>
                <blank></blank>
                <letter id="6,10">S</letter>
                <letter id="6,11">C</letter>
                <letter id="6,12">A</letter>
                <letter id="6,13">L</letter>
                <letter id="6,14">I</letter>
                <letter id="6,15">A</letter>
                <letter id="7,1">O</letter>
                <letter id="7,2">D</letter>
                <letter id="7,3">D</letter>
                <blank></blank>
                <letter id="7,5">M</letter>
                <letter id="7,6">A</letter>
                <letter id="7,7">L</letter>
                <letter id="7,8">I</letter>
                <letter id="7,9">C</letter>
                <blank></blank>
                <letter id="7,11">A</letter>
                <letter id="7,12">R</letter>
                <letter id="7,13">L</letter>
                <letter id="7,14">E</letter>
                <letter id="7,15">N</letter>
                <letter id="8,1">M</letter>
                <letter id="8,2">E</letter>
                <letter id="8,3">A</letter>
                <letter id="8,4">T</letter>
                <blank></blank>
                <letter id="8,6">D</letter>
                <letter id="8,7">E</letter>
                <letter id="8,8">M</letter>
                <letter id="8,9">O</letter>
                <letter id="8,10">N</letter>
                <blank></blank>
                <letter id="8,12">L</letter>
                <letter id="8,13">A</letter>
                <letter id="8,14">P</letter>
                <letter id="8,15">D</letter>
                <letter id="9,1">P</letter>
                <letter id="9,2">R</letter>
                <letter id="9,3">M</letter>
                <letter id="9,4">E</letter>
                <letter id="9,5">N</letter>
                <blank></blank>
                <letter id="9,7">M</letter>
                <letter id="9,8">E</letter>
                <letter id="9,9">D</letter>
                <letter id="9,10">O</letter>
                <letter id="9,11">C</letter>
                <blank></blank>
                <letter id="9,13">C</letter>
                <letter id="9,14">I</letter>
                <letter id="9,15">E</letter>
                <letter id="10,1">U</letter>
                <letter id="10,2">N</letter>
                <letter id="10,3">S</letter>
                <letter id="10,4">E</letter>
                <letter id="10,5">A</letter>
                <letter id="10,6">T</letter>
                <blank></blank>
                <letter id="10,8">D</letter>
                <letter id="10,9">E</letter>
                <letter id="10,10">T</letter>
                <letter id="10,11">A</letter>
                <letter id="10,12">C</letter>
                <letter id="10,13">H</letter>
                <letter id="10,14">E</letter>
                <letter id="10,15">D</letter>
                <letter id="11,1">L</letter>
                <letter id="11,2">E</letter>
                <letter id="11,3">A</letter>
                <letter id="11,4">N</letter>
                <letter id="11,5">T</letter>
                <letter id="11,6">O</letter>
                <letter id="11,7">S</letter>
                <blank></blank>
                <letter id="11,9">D</letter>
                <letter id="11,10">A</letter>
                <letter id="11,11">T</letter>
                <letter id="11,12">A</letter>
                <blank></blank>
                <blank></blank>
                <blank></blank>
                <letter id="12,1">S</letter>
                <letter id="12,2">I</letter>
                <letter id="12,3">P</letter>
                <blank></blank>
                <letter id="12,5">O</letter>
                <letter id="12,6">M</letter>
                <letter id="12,7">N</letter>
                <letter id="12,8">I</letter>
                <blank></blank>
                <letter id="12,10">B</letter>
                <letter id="12,11">A</letter>
                <letter id="12,12">R</letter>
                <letter id="12,13">T</letter>
                <letter id="12,14">A</letter>
                <letter id="12,15">B</letter>
                <letter id="13,1">I</letter>
                <letter id="13,2">S</letter>
                <letter id="13,3">P</letter>
                <letter id="13,4">S</letter>
                <blank></blank>
                <letter id="13,6">C</letter>
                <letter id="13,7">A</letter>
                <letter id="13,8">N</letter>
                <letter id="13,9">N</letter>
                <letter id="13,10">E</letter>
                <letter id="13,11">L</letter>
                <letter id="13,12">L</letter>
                <letter id="13,13">O</letter>
                <letter id="13,14">N</letter>
                <letter id="13,15">I</letter>
                <letter id="14,1">O</letter>
                <letter id="14,2">S</letter>
                <letter id="14,3">L</letter>
                <letter id="14,4">O</letter>
                <blank></blank>
                <letter id="14,6">A</letter>
                <letter id="14,7">T</letter>
                <letter id="14,8">G</letter>
                <letter id="14,9">U</letter>
                <letter id="14,10">N</letter>
                <letter id="14,11">P</letter>
                <letter id="14,12">O</letter>
                <letter id="14,13">I</letter>
                <letter id="14,14">N</letter>
                <letter id="14,15">T</letter>
                <letter id="15,1">N</letter>
                <letter id="15,2">E</letter>
                <letter id="15,3">E</letter>
                <letter id="15,4">D</letter>
                <blank></blank>
                <letter id="15,6">T</letter>
                <letter id="15,7">H</letter>
                <letter id="15,8">E</letter>
                <letter id="15,9">B</letter>
                <letter id="15,10">E</letter>
                <letter id="15,11">A</letter>
                <letter id="15,12">T</letter>
                <letter id="15,13">L</letter>
                <letter id="15,14">E</letter>
                <letter id="15,15">S</letter>
            </grid>
        <clues>
            <across cellid="1,1">"The Verdict" actor</across>
            <across cellid="1,12">1979 Nastassja Kinski role</across>
            <across cellid="2,1">Boardwalk buy</across>
            <across cellid="2,12">Labor org. born on the Pacific coast in the late '30s</across>
            <across cellid="3,1">Catherine Zeta-Jones's character in "Intolerable Cruelty," e.g.</across>
            <across cellid="3,12">Walk on water?</across>
            <across cellid="4,1">Lymphocyte producer</across>
            <across cellid="4,8">Blue hue</across>
            <across cellid="4,13">Barely perceptible</across>
            <across cellid="5,4">Marionette maker Tony</across>
            <across cellid="5,9">It was based on set theory</across>
            <across cellid="6,1">You can check your hang-ups here</across>
            <across cellid="6,10">Reagan Court appointee</across>
            <across cellid="7,1">Matchless</across>
            <across cellid="7,5">Aging acid found in fruit</across>
            <across cellid="7,11">Composer of "The Wizard of Oz"</across>
            <across cellid="8,1">Charcuterie offering</across>
            <across cellid="8,6">Speedy one?</across>
            <across cellid="8,12">"Rush Hour" org.</across>
            <across cellid="9,1">Flacks</across>
            <across cellid="9,7">Red Bordeaux</across>
            <across cellid="9,13">Cannes concern (abbr.)</across>
            <across cellid="10,1">Topple</across>
            <across cellid="10,8">Icy</across>
            <across cellid="11,1">Sheds</across>
            <across cellid="11,9">Apple cookie, e.g.</across>
            <across cellid="12,1">Small swallow</across>
            <across cellid="12,5">All at first?</across>
            <across cellid="12,10">Rounds keeper?</across>
            <across cellid="13,1">Gatekeepers with connections (abbr.)</across>
            <across cellid="13,6">Tubular invention of the late teens or early twenties</across>
            <across cellid="14,1">Capital at the center of Czechoslovakia?</across>
            <across cellid="14,6">Bad way to be married</across>
            <across cellid="15,1">Call for</across>
            <across cellid="15,6">'60s invaders</across>
            <down cellid="1,1">Feathered fishing hooks</down>
            <down cellid="1,2">Straddling</down>
            <down cellid="1,3">Oniomaniac's mecca</down>
            <down cellid="1,4">First to arrive?</down>
            <down cellid="1,5">Pistol or saber</down>
            <down cellid="1,6">Artery</down>
            <down cellid="1,7">Emma's "Sense and Sensibility" director</down>
            <down cellid="1,8">Ratify</down>
            <down cellid="1,9">Key location?</down>
            <down cellid="1,10">Butterflies</down>
            <down cellid="1,12">Begin to upset</down>
            <down cellid="1,13">"Baby Doll" was his film debut</down>
            <down cellid="1,14">Sugar</down>
            <down cellid="1,15">Displaying the skill and experience of an expert</down>
            <down cellid="4,11">Blue Triangle gp.</down>
            <down cellid="5,7">Automaton</down>
            <down cellid="5,12">Earthy deposit of clay and calcium carbonate</down>
            <down cellid="6,1">It's usually irresistible</down>
            <down cellid="6,2">___ Line (post-WWII German-Polish border)</down>
            <down cellid="6,3">It's more prominent in men than in women</down>
            <down cellid="6,8">Went without saying?</down>
            <down cellid="7,9">In C, perhaps</down>
            <down cellid="8,4">Nancy Drew or King Tut</down>
            <down cellid="8,10">"Read this!"</down>
            <down cellid="9,5">Partnership for Peace gp.</down>
            <down cellid="9,11">Indian bean</down>
            <down cellid="10,6">F-14 fighter</down>
            <down cellid="10,12">Where to find both cream puffs and lemons</down>
            <down cellid="11,7">Scythe handle</down>
            <down cellid="12,8">"Where's Daddy?" dramatist</down>
            <down cellid="12,13">Drudge or trudge</down>
            <down cellid="12,14">Archer without a quiver?</down>
            <down cellid="12,15">They're straight from the horse's mouth</down>
            <down cellid="13,4">Intentional grounding?</down>
            <down cellid="13,9">Heart of the matter</down>
        </clues>
        </american>
    </crossword>
</puzzle>

schemas

outline in xml

Outline of the XML structure in XML. The example crossword XML above follows this form.

<?xml version="1.0" encoding="utf-8"?>
<puzzle>
    <crossword language="en">
        <metadata>
            <title>"Title of Crossword"</title>
            <date>month DD, YYYY</date> 
            <creator>By John Smith</creator>
            <rights>Copyright, ...</rights>
            <publisher>Houston Chronical</publisher>
            <identifier>http://www.example.com/puzzles/crossword.puz</identifier>
            <description>Puzzle of type Crossword and style American in language English. 
            Format .xml following schema http://www.kooonts.com/some/dir/crossword.</description>
        </metadata>
    
        <american>
            <grid rows="15" columns="15">
                <letter id="1,1">W</letter>
                <letter id="1,2">O</letter>
                <letter id="1,3">R</letter>
                <letter id="1,4">D</letter>
                <letter id="1,5">A</letter>
                <letter id="1,6">C</letter>
                <letter id="1,7">R</letter>
                <letter id="1,8">O</letter>
                <letter id="1,9">S</letter>
                <letter id="1,10">S</letter>
                <blank></blank>
                <letter id="1,12">W</letter>
                ...
            </grid>
                
            <clues>
                <across cellid="1,1">Clue text</across>
                <across cellid="1,12">Clue text</across>
                <across cellid="2,1">Clue text</across>
                ...
                <down cellid="1,1">Clue text</down>
                <down cellid="1,2">Clue text</down>
                <down cellid="1,3">Clue text</down>
                ...
            </clues>
        </american>
    </crossword>
</puzzle>

A puzzle can be of type 'crossword', the language is specified (according to XML Language Identification xml:lang tag). A crossword has 'metadata' and 'style'. The metadata can contain various fields and must have a title, and creator. the 'style' contains the grid and clue informtion. a style can be 'American', 'French', ..

The order of the, letter or blank, cells is the order of the cells in the puzzle, starting at the top and going left to right. The only remaining information is which clue in what direction goes with which cell. Each letter that is the first letter of an word is assigned an id, each clue references that id. So the letter at position column=1, row=1 could have id="foo", then the corresponding accross and down clues would have a cellid="foo". While in practice for human clarity in the XML, the characters representing the column, row position are used, they have no meaning other than matching the letter cell with it's across and down clues. letter id="1,1" matches clue cellid="1,1".

When data is entered by humans for humans, the same data may be in multiple places, to help the human keep track of what goes where. This can lead to problems if there is a conflict in the data. A XML represenatation of a crossword, while still human readable, can avoid these errors by having the necessary information in one place. Knowing that some software will probably generate a playable puzzle from the xml file. This is why, for example, the clue numbers are not in the XML, they are generated while building the puzzle.

While the XML outline shows the general structure, it does not describe all of the information necessary for a schema. For this a schema language for XML is helpful. RELAX NG is a schema language for XML with a XML syntax (.rng) and a compact nonXML syntax (.rnc) that can be used to represent schemas. The W3C schema language is XML Schema Document (.xsd).

.rnc - RELAX NG Compact Syntax

default namespace = ""

start =
  element puzzle {
    element crossword {
      attribute language { xsd:language },
      element metadata {
        element title { text },
        element date { xsd:date },
        element creator { text },
        element rights { text }?
        element publisher { text }?
        element identifier { xsd:anyURI }?
        element description { text }?
      }+,
      element american {
        element grid {
          attribute columns { xsd:positiveInteger },
          attribute rows { xsd:positiveInteger },
          (element blank { empty }
           | element letter {
               attribute id { text },
               xsd:string { minLength = "1" maxLength = "1" }
             })+
        },
        element clues {
          element across {
            attribute cellid { text },
            text
          }+,
          element down {
            attribute cellid { text },
            text
          }+
        }
      }
    }
  }

.xsd - XML Schema Document

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="puzzle">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="crossword"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="crossword">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="metadata"/>
        <xs:element ref="american"/>
      </xs:sequence>
      <xs:attribute name="language" use="required" type="xs:NCName"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="metadata">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="title"/>
        <xs:element ref="date"/>
        <xs:element ref="creator"/>
        <xs:element ref="rights"/>
        <xs:element ref="publisher"/>
        <xs:element ref="identifier"/>
        <xs:element ref="description"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="title" type="xs:string"/>
  <xs:element name="date" type="xs:string"/>
  <xs:element name="creator" type="xs:string"/>
  <xs:element name="rights" type="xs:string"/>
  <xs:element name="publisher" type="xs:string"/>
  <xs:element name="identifier" type="xs:anyURI"/>
  <xs:element name="description" type="xs:string"/>
  <xs:element name="american">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="grid"/>
        <xs:element ref="clues"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="grid">
    <xs:complexType>
      <xs:choice maxOccurs="unbounded">
        <xs:element ref="blank"/>
        <xs:element ref="letter"/>
      </xs:choice>
      <xs:attribute name="columns" use="required" type="xs:integer"/>
      <xs:attribute name="rows" use="required" type="xs:integer"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="blank">
    <xs:complexType/>
  </xs:element>
  <xs:element name="letter">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:NCName">
          <xs:attribute name="id" use="required"/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>
  <xs:element name="clues">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" ref="across"/>
        <xs:element maxOccurs="unbounded" ref="down"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="across">
    <xs:complexType mixed="true">
      <xs:attribute name="cellid" use="required"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="down">
    <xs:complexType mixed="true">
      <xs:attribute name="cellid" use="required"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

.dtd

<?xml encoding="UTF-8"?>

<!ELEMENT puzzle (crossword)>
<!ATTLIST puzzle
  xmlns CDATA #FIXED ''>

<!ELEMENT crossword (metadata,american)>
<!ATTLIST crossword
  xmlns CDATA #FIXED ''
  language NMTOKEN #REQUIRED>

<!ELEMENT metadata (title,date,creator,rights,publisher,identifier,
                    description)>
<!ATTLIST metadata
  xmlns CDATA #FIXED ''>

<!ELEMENT american (grid,clues)>
<!ATTLIST american
  xmlns CDATA #FIXED ''>

<!ELEMENT title (#PCDATA)>
<!ATTLIST title
  xmlns CDATA #FIXED ''>

<!ELEMENT date (#PCDATA)>
<!ATTLIST date
  xmlns CDATA #FIXED ''>

<!ELEMENT creator (#PCDATA)>
<!ATTLIST creator
  xmlns CDATA #FIXED ''>

<!ELEMENT rights (#PCDATA)>
<!ATTLIST rights
  xmlns CDATA #FIXED ''>

<!ELEMENT publisher (#PCDATA)>
<!ATTLIST publisher
  xmlns CDATA #FIXED ''>

<!ELEMENT identifier (#PCDATA)>
<!ATTLIST identifier
  xmlns CDATA #FIXED ''>

<!ELEMENT description (#PCDATA)>
<!ATTLIST description
  xmlns CDATA #FIXED ''>

<!ELEMENT grid (blank|letter)+>
<!ATTLIST grid
  xmlns CDATA #FIXED ''
  columns CDATA #REQUIRED
  rows CDATA #REQUIRED>

<!ELEMENT clues (across+,down+)>
<!ATTLIST clues
  xmlns CDATA #FIXED ''>

<!ELEMENT blank EMPTY>
<!ATTLIST blank
  xmlns CDATA #FIXED ''>

<!ELEMENT letter (#PCDATA)>
<!ATTLIST letter
  xmlns CDATA #FIXED ''
  id CDATA #REQUIRED>

<!ELEMENT across (#PCDATA)>
<!ATTLIST across
  xmlns CDATA #FIXED ''
  cellid CDATA #REQUIRED>

<!ELEMENT down (#PCDATA)>
<!ATTLIST down
  xmlns CDATA #FIXED ''
  cellid CDATA #REQUIRED>

references

normative

[RDF]
Resource Description Framework (RDF)
[RDFS]
RDF Schema (RDFS)
[RELAX NG Compact Syntax Tutorial]
RELAX NG Compact Syntax Tutorial, OASIS Working Draft, 26 March 2003
[RNG]
RELAX NG Specification, OASIS Committee Specification, 3 December 2001. Definitive specification for RELAX NG using the XML syntax.
[RNC]
RELAX NG Compact Syntax, OASIS Committee Specification, 21 November 2002. Definitive specification for the compact syntax in terms of the XML syntax.
[Unicode]
The Unicode Consortium. The Unicode Standard, Version 3.2 or later
[XML 1.0]
Extensible Markup Language (XML) 1.0
[XML:LANG]
XML Language Identification, xml:lang tag.
[XSD]
XML Schema Document (.xsd)

informative

[Crossword Puzzles from around the World]
Crossword Puzzles from around the World. Adapted From: 101 Crossword Puzzles For Dummies, Volume 1 by Dummies.com.
[CMO]
Crossword Maestro software has .cmo file format.
[CWML]
CRUCIVERB.COM Crossword Constructors Community Center. Mailing list for a "CrossWord Markup Language"
[Dublin Core]
Dublin Core Metadata Initiative
[PUZ]
Across Lite Literate Software Systems has .puz format.
[SPECIFICATION]
List of links to CROSSWORD SPECIFICATION SHEETS for various publications. CRUCIVERB.COM Crossword Constructors Community Center
[XML]
W3C Extensible Markup Language (XML)
[XML Schema]
W3C XML Schema
[XWD]
Crossdown has .xwd format.