The EPUB version of the CDC STD has been used for pulling in most of the content for Project Shirly. However, the treatment/regimen tables are all images therefore we can not imported the text out of them and place it into an HTML table. This information will have to be gathered manual from the PDF and placed into a spreadsheet for import into HTML tables. This document outlines how to the table information will be extracted and saved in a file for transformation into HTML.

The table data will be placed into one or more files and have the following format using coma separated values:

 

condition,Chlamydia

patient,"Adolescents and Adults"

pdf-page,45

pdf-column,2

table-in-column,2

header,"Alternative Regimens"

subheader,

regimen,"Erythromycin base 500 mg orally four times a day for 7 days"

separator,OR

regimen,"Erythromycin ethylsuccinate 800 mg orally four times a day for 7 days"

separator,OR

regimen,"Levofloxacin 500 mg orally once daily for 7 days"

separator,OR

regimen,"Ofloxacin 300 mg orally twice a day for 7 days"

footer,

 

Multiple entries can be placed in one file but showed be separated by a couple spaces for readability.  Any entry that is longer than one word should be in quotes. A blank table template follows.

 

condition,

patient,

pdf-page,

pdf-column,

table-in-column,

header,

regimen,

separator,

regimen,

separator,

regimen,

separator,

regimen,

footer,

 

LINK TO FILE

  • No labels