I have a following sample sgml data from my .sgm file and I want convert this in to xml
<?dtd name="viewed">
<?XMLDOC>
<viewed >xyz
<cite>
<yr>2010
<pno cite="2010 abc 1188">10
<?/XMLDOC>
<?XMLDOC>
<viewed>abc.
<cite>
<yr>2010
<pno cite="2010 xyz 5133">9
<?/XMLDOC>
Output should be like this:
<index1>
<num viewed="xyz"/>
<heading>xyz</heading>
<index-refs>
<link caseno="2010 abc 1188</link>
</index-refs>
</index-1>
<index1>
<num viewed="abc"/>
<heading>abc</heading>
<index-refs>
<link caseno="2010 xyz 5133</link>
</index-refs>
</index-1>
Can this be done in c# or can we use xslt 2.0 to do this kind of conversion?
Others have already given some good advice. Here's one way of putting it all together by first converting the input SGML to well-formed XML and then using XSLT to transform that to the exact format you need.
Converting your SGML to well-formed XML
The osx
tool from the OpenSP package suggested by mzjn is a good tool for this. Since your SGML markup omits end tags, you need to have a DTD from which the correct nesting of elements can be determined. If you don't have a DTD, you need to create one. For your example input, it could be as simple as this:
<!ELEMENT toplevel o o (viewed)+>
<!ELEMENT viewed - o (#PCDATA,cite)>
<!ELEMENT cite - o (yr,pno)>
<!ELEMENT yr - o (#PCDATA)>
<!ELEMENT pno - o (#PCDATA)>
<!ATTLIST pno cite CDATA #REQUIRED>
You also need to add a proper doctype declaration to the beginning of your SGML file. Assuming you have your DTD in file viewed.dtd
.
<!DOCTYPE toplevel SYSTEM "viewed.dtd" >
With this addition, you should now be able use osx
to convert the SGML to XML. (It won't be able to convert the processing instructions which start with a /
as those are not allowed in XML, and will emit a warning about them.)
osx input.sgm > input.xml
Transforming the resulting XML to your desired format
For the above case, you could use something like the following XSLT stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="VIEWED">
<index1>
<num viewed="{normalize-space(text())}"/>
<heading>
<xsl:value-of select="normalize-space(text())"/>
</heading>
<index-refs>
<xsl:apply-templates select="CITE"/>
</index-refs>
</index1>
</xsl:template>
<xsl:template match="CITE">
<link caseno="{PNO/@CITE}"/>
</xsl:template>
</xsl:stylesheet>