A GENTECH Primer

1. Where to get it

You can get the GENTECH Genealogical Data Model specifications at www.gentech.org/gdm . You will want both the description document and the model and process diagrams. By following the sections below, you will gain a good working knowledge of the most important parts of the data model.

2. Personas: a slice of a timeline

In the data model, a Persona comes in two flavors:

a "base" Persona, which is a person based strictly on one source document, and
a "composite" Persona, which is a person based on other, "lower-level" base or composite Personas.

For example, suppose you came across a document which states that John Brown is of age 43, and another document which states that John Brown lived in New York in 1903. If you created a single Persona, "John Brown", who was of age 43 and lived in New York in 1903, then you have assumed that the John Brown in both documents are one and the same. The disadvantages of using a single Persona are many.

If you later find out those John Browns are different, you will have a difficult time picking apart the Personas, since you made only one.
A researcher looking at your work may not spot the assumption, and base their work on possibly erroneous reasoning.
A reasercher looking at your work may spot the assumption, and wonder what other hidden assumptions you made.
You will forget why you made that assumption.

The rule is:

one document = one Persona.

A Persona is given a name which you can consider a "working" name. It is not the "official" name of the person. Should the official name be the name a person was born with? The name a person called themselves? The name on their death certificate? The answer is, anything you want. A person has a name, but the name is not the person. Names change over time, just as age does. Just as you wouldn't uniquely identify a person by age, you wouldn't uniquely identify a person by name. So if one document gives a name as "John Braun" and another one gives it as "John Brown", you will have two Personas with different names, which you might conclude are one and the same person. A researcher searching through databases may not know one or the other name, and thus they will catch your data regardless of what name they know.

3. Assertions: the basic unit of genealogy

Assertions are facts. There are various types of facts that can be recorded in the data model, and they can be summarized in this table:

Column A	Column B
A Persona...	...is a Persona
An Event...	...played a role in an Event
A Characteristic...	...had a Characteristic
A Group...	...is part of a Group

Choose one from Column A and one from Column B. Certain combinations are not allowed. We will go through all possible 16 assertions.

3.1.1. A Persona is a Persona (disallowed)

This is not allowed. If you want to state that Persona A is the same as Persona B, then you must create a new Persona C which combines the two (see "A Group is a Persona" and "A Persona is part of a Group").

3.1.2. A Persona played a role in an Event

Here you are stating that something happened, and the Persona played a part in it. For example, an event could be a birth. The roles are things like father, mother, child, midwife, etc. When you say "John Smith was born to Abe Smith and Betty Tipper", you are stating three facts:

John Smith was the child in the event "birth of John Smith",
Abe Smith was the father in the event "birth of John Smith", and
Betty Tipper was the mother in the event "birth of John Smith".

These assertions could come from different documents, which is why the assertions in that seemingly simple statement need to be kept separate.

3.1.3. A Persona had a Characteristic

Here you are asserting a characteristic of a Persona. Age, race, religion, hair color, height, and occupation are all examples of characteristics.

3.1.4. A Persona is part of a Group

Groups are logical aggregations of things of the same type. Personas can only be a part of a Group composed of other Personas. Persona Groups are used primarily to create higher-level Personas from lower-level ones.

However, one could also use Persona Groups to represent actual groupings of people. For example, suppose you wanted to represent the assertion that "John Smith was in Platoon X". You could:

create a group called "People in Platoon X" and assert that the Persona "John Smith" was part of that group, or
consider membership in Platoon X as a Characteristic and assert that the Persona "John Smith" had that characteristic.

If you really care who else was in that platoon, then you would create a Group. Otherwise it is just as reasonable to create a Characteristic.

3.2.1. An Event is a Persona (disallowed)

This doesn't make any sense.

3.2.2. An Event played a role in an Event

This may not seem to make sense. You could consider this statement "An Event happened relative to an Event". Thus, one Event could happen before another Event, and you would represent this by using the Event-Event Assertion. Whether you say Event A happened before Event B or Event B happened after Event A is irrelevant, since both mean the same thing, and the program should be expected to understand the equivalence.

3.2.3. An Event had a Characteristic (disallowed)

While on the surface this may seem to make sense, any possible Characteristic an Event might have -- namely date and place -- are saved within the Event itself.

3.2.4. An Event is part of a Group

Just as Personas can be optionally Grouped for membership, so too can Events. For example, a Group could be "things that happened to John Smith". This is not recommended, since which Persona is "John Smith"? The same information can be gained by starting with a Persona and then finding all the Events tied to that Persona and its lower-level Personas.

However, this kind of assertion can represent Events that didn't happen to Personas. For example, a Group could be "things that happened during World War I". This could serve as a useful reference tool.

3.3.1. A Characteristic is a Persona (disallowed)

This doesn't make any sense.

3.3.2. A Characteristic is an Event (disallowed)

This also doesn't make any sense.

3.3.3. A Characteristic is a Characteristic (disallowed)

This doesn't make sense, even in the sense of Event-Event assertions, since Characteristics are not relative to each other.

3.3.4. A Characteristic is part of a Group (not recommended)

This use is not recommended for Groups of Characterstics about Personas, since the same information can be gained by finding all the Characteristics tied to that Persona and its lower-level Personas.

3.4.1. A Group is a Persona

This is the second part of composing a higher-level Persona out of lower-level Personas. Once a Group of low-level Personas is created, a Group can be asserted to be equivalent to a new Persona. This only applies to Groups of Personas.

3.4.2. A Group is an Event (disallowed)

This doesn't make any sense.

3.4.3. A Group is a Characteristic (disallowed)

This also doesn't make any sense.

3.4.4. A Group is part of a Group

This can be used when sub-groups are grouped together.

3.5. All the combinations

Of the 16 combinations, fully half are disallowed. Actually, the GENTECH organization merely proposes their prohibition, so they might come up with a use for a disallowed combination.

3.6. Examples

These diagrams show how statements in documents are converted to Assertions in the data model.

3.6.1 John Smith was born in New York in 1887.

Diagram 3.6.1: An Assertion exemplifying 3.1.2.

The objects ET1, ROLE1, and PT1 are "standard" objects which can be reused. All other objects are created at the time of the assertion. Note that the program should be savvy enough to understand that a date of "1887" means any time in 1887. It would probably be useful to have some kind of starting library of Places, and perhaps a repository of Places accessible on the Internet (such as JewishGen 's "Shtetl Seeker ").

3.6.2. Adam Smith was born 3 months before John Smith.

Diagram 3.6.2: An Assertion exemplifying 3.2.2.

The key feature of this diagram is the assertion that one event took place three months prior to another event. The program should know that the relative time could be anywhere from 3 months 0 days to 3 months 31 days.

3.6.3. The John Smith from 3.6.1 is the same as the John Smith from 3.6.2.

Diagram 3.6.3: An Assertion exemplifying 3.4.1.

The p3.6.1 and p3.6.2 Persona objects are the same objects from 3.6.1 and 3.6.2.

4. Evidence

As stated in section 2, one document = one Persona. The GENTECH data model provides for the representation of documents or, more generally, sources of information.

To show where a given Assertion came from, we can connect it to a Source. Sources also have hierarchies. For example, a book is composed of pages, and pages are composed of lines. If it is important to cite different pages in a book, or different lines in a page, then there can be a book-level Source describing the publication information of the book, page-level Sources referring to individual page numbers (or ranges of page numbers) within the book, and line-level Sources referring to individual lines (or ranges of lines) within a page.

If it isn't important to cite different pages or lines in a book, as when you will probably only pick one page or one line out of a given book, then there need be no hierarchy for that book. A single Source representation the publication information, page range, and line range, suffices.

Sources are not necessarily books. They could be personal interviews, images, or any source of information.

It is not only important to cite a source of information properly, but it is important to record where that source of information exists, so that other researchers can follow your path. Thus, Sources exist in Repositories. A Repository could be a library, or someone's personal possessions.

4.1. Example

An example should illustrate the above more clearly. Suppose the assertion from example 3.6.1 comes from a book called, "Smiths in America", by Hokey Pokey, published by Jenny Holojii Co., ISBN 0880090011, page 16. The book is owned by the researcher.

Diagram 4.1: A Source and some of its "decorations"

I haven't included all of the CitationParts in this diagram, otherwise it would get too cluttered. The other parts would be things like publisher, year of publication, ISBN number, author, and so on.

The Source is owned by John Q. Researcher, as shown by the RepositorySource link. In the case of an Internet-accessible repository (which should normally be backed up by a non-Internet repository, given the transient nature of content on the Internet) the Repository's address would be the top-level URL, and the RepositorySource's callNumber would be the URL relative to the top-level URL.

The Representation is where you can attach files, use extracts, or refer to physical files in your desk drawer (does anyone really still use physical files? JOKE!).

5. Reasoning

Assertions have a field called "rationale", in which you can put your reasons for making the Assertion. Generally speaking, the lowest-level Assertions don't nead a rationale, since they are simply representations of information from a Source.

However, when you combine Assertions into a new Assertion, or an Assertion doesn't come from a Source, you should provide your reasoning.

5.1 Assertions without Sources

For example, the Assertions in 3.6.3 are something the researcher asserted. The Assertion adding p3.6.1 probably wouldn't need a rationale, since it could be the first Persona added to the Group. However, the Assertion adding p3.6.2 could use an explanation of the researcher's reasoning for thinking that p3.6.2 is the same as p3.6.1. The Assertion equating a Persona to the Group probably wouldn't need a rationale, since it is an "administrative" function. The reason we would want to put rationales in the Persona-to-Group Assertions and not the Group-to-Persona Assertion is that any one of the Persona-to-Group Assertions could be proven wrong in the future, but the rest of the Persona-to-Group Assertions would not be affected.

5.2 Conclusional Assertions

By building an Assertion upon other Assertions, we can draw conclusions. For example, if one Assertion says Adam Smith was born 3 months before John Smith, and another Assertion says John Smith was born in 1887, we can conclude (via a new Assertion) that Adam Smith was born between 1 SEP 1886 and 30 SEP 1887. The rationale for this would be something like, "date range calculation".

5.3 Contradictory Assertions

Suppose we find, after concluding that Adam Smith was born between 1 SEP 1886 and 30 SEP 1887, a birth certificate for Adam Smith, which states he was born on 20 AUG 1886. That's pretty close to our range, but nevertheless out of our range. Although our conclusion was correct given the evidence, one or both pieces of evidence must be incorrect -- either Adam Smith was not born 3 months before John Smith, or John Smith was not born in 1887, or both. Nevertheless, it is not a good idea to erase incorrect evidence, because that evidence may still be useful, and will prevent future researchers from going down the wrong path.

Assertions have a "disproved" field, which signals whether an Assertion has been shown to be incorrect. We would first mark our conclusion as disproved. This is a signal that one or more of its supporting Assertions are incorrect, but we don't know yet which ones. Also, we would have to add the birth certificate Assertion to the list of lower-level Assertions maintained by the conclusional Assertion, and add to the rationale that we really, really believe the birth certificate.