Tom MacWright

tom@macwright.com

parse-gedcom

parse-gedcom is a simple parser for the GEDCOM Genealogy data format. I’ve maintained my family tree in Geni since 2007, and currently it has 168 people in the graph. Having invested a significant amount of time tracking down relatives of the distant, and now on the cusp of figuring out what happened before Ellis Island, I wanted to understand this data and be able to do interesting things with it.

As Nelson Minar wrote yesterday, geneaology and the technology around it is fascinating and often ahead of its time in terms of annotation, representing uncertainty, and finding corresponding entries across datasets.

parse-gedcom is a simple, tasteful parser that does the bare minimum to make GEDCOM’s rather unusual encoding palatable to computers. GEDCOM is a weird tree that parse-gedcom transforms into a nested JavaScript object that can be serialized and deserialized as JSON.

/* INPUT

0 @I58346@ INDI
 1 NAME Tom /MacWright/
  2 GIVN Tom
  2 SURN MacWright
 1 SEX M
*/
({ "pointer": "@I58346@",
  "tag": "INDI",
  "tree": [{
    "tag": "NAME", "data": "Tom /MacWright/",
    "tree": [{
      "tag": "GIVN", "data": "Tom"
    }, {
      "tag": "SURN", "data": "MacWright"
    }]
  }, {
    "tag": "SEX", "data": "M"
  }
})

The package also includes parse-gedcom-d3, a tool that translates parse-gedcom’s output into JSON that fits perfectly into a d3 force directed graph, since d3’s data expectations are a fairly common point of confusion.

Finally, there’s a web interface where you can drag & drop a GEDCOM file and get a live d3 chart in an instant.