PDA

View Full Version : XML Parser


Tolnaftate2004
08-23-2006, 12:00 PM
parseXML()

Description
Here is a little code that can transform an XML file into something that GScript can understand. The format is similar to that of ActionScript's XML Object.
XML.childNodes is an array of entities within the XML object.
XML.parent is the parent entity of the XML object.
XML.nodeName is the name of entity within the XML object.
XML.nodeType is an integer representing the type of the XML object (1 for plain text, 2 for XML entity).
XML.nodeValue is the XML object / plain text nested within the XML object.
XML.attributes is a string in the pattern of var1=val1&var2=val2&...&varN=valN. Variables can also be read with XML.attributes.var.
XML.xmlDecl is the XML declaration of the XML object.
XML.docTypeDecl is the DTD of the XML object.

Except nodeType, None of these values are read only.

The Code
function parseXML(strXML) {
temp.xml = strXML;
this.childNodes = new [0];
temp.intOTag = temp.xml.pos("<");
temp.intETag = temp.xml.pos(">");
temp.intPrev = -1;
if (temp.intOTag == -1) {
this.childNodes.add(temp.xml);
this.childNodes[++temp.intPrev].type = 1;
} else {
while (temp.intOTag > -1) {
if (temp.intOTag > 0) {
this.childNodes.add(temp.xml.substring(0,temp.intO Tag));
this.childNodes[++temp.intPrev].type = 1;
this.childNodes[temp.intPrev].parent = this;
temp.xml = temp.xml.substring(temp.intOTag);
temp.intOTag = temp.xml.pos("<");
temp.intETag = temp.xml.pos(">");
}
temp.strTag = temp.xml.substring(1,temp.intETag-1);
temp.strTagName = temp.strTag.substring(0,temp.strTag.pos(" ")).trim();
temp.strTag = temp.strTag.substring(temp.strTagName.length());
this.childNodes[temp.intPrev].nodeName = temp.strTagName;
if (!temp.strTagName.starts("?xml") && temp.strTagname != "!DOCTYPE" && !temp.strTagName.starts("!--") && !temp.strTagName.starts("!CDATA[")) {
if (temp.strTag != " /") {
temp.xml = temp.xml.substring(temp.intETag+1);
temp.strXMLCopy = temp.xml;
temp.intDepth = 1;
do {
temp.intOPos = temp.strXMLCopy.pos("<" @ temp.strTagName);
temp.intEPos = temp.strXMLCopy.pos("</" @ temp.strTagName @ ">");
if (temp.intOPos == -1) temp.intOPos = temp.strXMLCopy.length()+1;
switch (temp.intOPos < temp.intEPos) {
case true:
temp.intDepth ++;
temp.strXMLCopy = temp.strXMLCopy.substring(temp.strXMLCopy.substrin g(temp.intOPos+1).pos(">")+1);
break;
case false:
temp.intDepth --;
temp.strXMLCopy = temp.strXMLCopy.substring(temp.strXMLCopy.substrin g(temp.intEPos+1).pos(">")+1);
break;
}
} while (temp.intDepth > 0);
temp.strChild = temp.xml.substring(0,temp.intEPos);
temp.xml = temp.xml.substring(temp.intEPos+3+temp.strTagName. length());
this.childNodes.add("<" @ temp.strTagName @ temp.strTag @ ">" @ temp.strChild @ "</" @ temp.strTagName @ ">");
with(this.childNodes[++temp.intPrev]) parseXML(temp.strChild);
this.childNodes[temp.intPrev].type = 2;
this.childNodes[temp.intPrev].nodeName = temp.strTagName;
this.childNodes[temp.intPrev].parent = this;
this.childNodes[temp.intPrev].nodeValue = temp.strChild;
} else {
this.childNodes.add("<" @ temp.strTagName @ temp.strTag @ ">");
this.childNodes[++temp.intPrev].nodeName = temp.strTagName;
this.childNodes[temp.intPrev].nodeType = 2;
this.childNodes[temp.intPrev].parent = this;
temp.xml = temp.xml.substring(temp.intETag+1);
}
with(this.childNodes[temp.intPrev].attributes) getAttributes(temp.strTag);
} else {
switch (temp.strTagName) {
case "?xml":
this.xmlDecl = "<?xml" @ temp.strTag @ ">";
with (this.xmlDecl.attributes) getAttributes(temp.strTag);
break;
case "!DOCTYPE":
this.docTypeDecl = "<!DOCTYPE" @ temp.strTag @ ">";
break;
}
if (temp.strTagName == "!--") temp.intETag = temp.xml.pos("-->") + 2;
elseif (temp.strTagName == "!CDATA[") temp.intETag = temp.xml.pos("]>") + 1;
temp.xml = temp.xml.substring(temp.intETag+1);
}
temp.intOTag = temp.xml.pos("<");
temp.intETag = temp.xml.pos(">");
}
if (temp.xml.length() > 0) {
this.childNodes.add(temp.xml);
this.childNodes[temp.intPrev].type = 1;
this.childNodes[temp.intPrev].parent = this;
temp.xml = "";
}
}
return strXML;
}

function getAttributes(strTag) {
temp.arr = "";
while(strTag.length() > 0) {
strTag = strTag.trim();
temp.pos = strTag.pos("=");
if (temp.pos > -1) {
temp.var = strTag.substring(0,temp.pos);
if (strTag.charat(temp.pos+1) == "\"") temp.npos = strTag.substring(temp.pos+2).pos("\"");
else temp.npos = strTag.substring(temp.pos+2).pos(" ");
temp.val = strTag.substring(temp.pos+2,temp.npos);
this.(@temp.var) = temp.val;
temp.arr.add(temp.var @ "=" @ temp.val);
strTag = strTag.substring(temp.npos+temp.pos+3).trim();
} else strTag = "";
}
temp.len = temp.arr.size();
temp.str = temp.arr[0];
for (temp.i=1;temp.i<temp.len;temp.i++)
temp.str @= "&" @ temp.arr[temp.i];
this = temp.str;
return;
}

Usage
with(var) this = parseXML(strXML);

Parameters
strXML A string of XML or well-formed HTML.

Returns
The parameter passed to the function.

Example
function onCreated() {
temp.query = requesturl("http://www.w3schools.com/xml/note.xml"); /* import file lines */
catchevent(temp.query,"onReceiveData","onData");
}

function onData(obj) {
with(me) this = parseXML(obj.fulldata); /* parse */
echo(me.xmlDecl.attributes.version); /* prints "1.0" */
}



The scene is set; to remake the XML file, you're going to have to get creative. Post problems / comments / etc.

Admins
08-23-2006, 03:31 PM
Sounds interesting and useful
A little improvement for using the requesturl object:
I think you can use "obj.fulldata" instead of merging the obj.data lines:
with(me) this = parseXML(obj.fulldata)

Tolnaftate2004
08-23-2006, 09:48 PM
Thanks, I was not aware that existed.

ApothiX
08-24-2006, 05:30 AM
Wouldn't it work if you did it like this:
me = new TStaticVar();
me = parseXML(blahblah);?

Not sure if setting 'this' is a wise thing to do

Tolnaftate2004
08-24-2006, 06:22 AM
Wouldn't it work if you did it like this:
me = new TStaticVar();
me = parseXML(blahblah);?
No, parseXML just returns blahblah, so in effect I'd have just parsed some crazy string to come up with me = blahblah. It is written that way so that 'me' is given the properties stated above.

e:
function parseXML(strXML) {
ERROR = NULL;
ERROR.NO_CLOSE_TAG = "There is no closing tag for tag <%s>.";
ERROR.NO_CLOSE_ENT = "An entity was not terminated.";
ERROR.XML_DECL_END = "The XML declaration was not terminated properly.";
ERROR.COMMENT_END = "A comment was never terminated.";
ERROR.IMPROP_FORM = "The XML file is improperly formed.";
temp.xml = strXML;
this.childNodes = new [0];
temp.intOTag = temp.xml.pos("<");
if (~temp.intOTag) temp.intETag = getEndIndex(temp.xml,">");
temp.intPrev = -1;
if (temp.intOTag == -1) {
this.childNodes.add(temp.xml);
this.childNodes[++temp.intPrev].nodeType = 1;
} else {
while (temp.intOTag > -1) {
if (temp.intOTag > 0) {
this.childNodes.add(temp.xml.substring(0,temp.intO Tag));
this.childNodes[++temp.intPrev].nodeType = 1;
this.childNodes[temp.intPrev].parent = this;
temp.xml = temp.xml.substring(temp.intOTag);
temp.intOTag = temp.xml.pos("<");
if (~temp.intOTag) temp.intETag = getEndIndex(temp.xml,">");
}
temp.strTag = temp.xml.substring(1,temp.intETag-1);
temp.strTagName = temp.strTag.substring(0,temp.strTag.pos(" ")).trim();
temp.strTag = temp.strTag.substring(temp.strTagName.length());
this.childNodes[temp.intPrev].nodeName = temp.strTagName;
if (!temp.strTagName.starts("?xml") && temp.strTagname != "!DOCTYPE" && !temp.strTagName.starts("!--") && !temp.strTagName.starts("!CDATA[")) {
if (!temp.strTagName.starts("/")) {
if (temp.strTag != " /") {
temp.xml = temp.xml.substring(temp.intETag+1);
temp.strXMLCopy = temp.xml;
temp.intDepth = 1;
do {
temp.intOPos = temp.strXMLCopy.pos("<" @ temp.strTagName);
temp.intEPos = temp.strXMLCopy.pos("</" @ temp.strTagName @ ">");
if (temp.intOPos == -1) temp.intOPos = temp.strXMLCopy.length()+1;
if (temp.intEPos == -1) produce_error(ERROR.NO_CLOSE_TAG,temp.strTagName);
switch (temp.intOPos < temp.intEPos) {
case true:
temp.intDepth ++;
temp.strXMLCopy = temp.strXMLCopy.substring(temp.strXMLCopy.substrin g(temp.intOPos+1).pos(">")+1);
break;
case false:
temp.intDepth --;
temp.strXMLCopy = temp.strXMLCopy.substring(temp.strXMLCopy.substrin g(temp.intEPos+1).pos(">")+1);
break;
}
} while (temp.intDepth > 0);
temp.strChild = temp.xml.substring(0,temp.intEPos);
temp.xml = temp.xml.substring(temp.intEPos+3+temp.strTagName. length());
this.childNodes.add("<" @ temp.strTagName @ temp.strTag @ ">" @ temp.strChild @ "</" @ temp.strTagName @ ">");
with(this.childNodes[++temp.intPrev]) parseXML(temp.strChild);
this.childNodes[temp.intPrev].nodeType = 2;
this.childNodes[temp.intPrev].nodeName = temp.strTagName;
this.childNodes[temp.intPrev].parent = this;
this.childNodes[temp.intPrev].nodeValue = temp.strChild;
} else {
this.childNodes.add("<" @ temp.strTagName @ temp.strTag @ ">");
this.childNodes[++temp.intPrev].nodeName = temp.strTagName;
this.childNodes[temp.intPrev].nodeType = 2;
this.childNodes[temp.intPrev].parent = this;
temp.xml = temp.xml.substring(temp.intETag+1);
}
with(this.childNodes[temp.intPrev].attributes) getAttributes(temp.strTag);
} else produce_error(ERROR.IMPROP_FORM);
} else {
switch (temp.strTagName) {
case "?xml":
temp.intETag = temp.xml.pos("?>");
if (intETag == -1) produce_error(ERROR.XML_DECL_END);
temp.strTag = temp.xml.substring(5,temp.intETag-5);
this.xmlDecl = "<?xml" @ temp.strTag @ "?>";
with (this.xmlDecl.attributes) getAttributes(temp.strTag);
break;
case "!DOCTYPE":
this.docTypeDecl = "<!DOCTYPE" @ temp.strTag @ ">";
break;
}
if (temp.strTagName == "!--") {
temp.pos = temp.xml.pos("-->");
if (temp.pos > -1) temp.intETag = temp.pos + 2;
else produce_error(ERROR.COMMENT_END);
}
elseif (temp.strTagName == "!CDATA[") {
temp.pos = temp.xml.pos("]>");
if (temp.pos > -1) temp.intETag = temp.pos + 1;
else produce_error(ERROR.COMMENT_END);
}
temp.xml = temp.xml.substring(temp.intETag+1);
}
temp.intOTag = temp.xml.pos("<");
if (~temp.intOTag) temp.intETag = getEndIndex(temp.xml,">");
}
if (temp.xml.length() > 0) {
this.childNodes.add(temp.xml);
this.childNodes[temp.intPrev].nodeType = 1;
this.childNodes[temp.intPrev].parent = this;
temp.xml = "";
}
}
return strXML;
}

function getAttributes(strTag) {
temp.arr = "";
while(strTag.length() > 0) {
strTag = strTag.trim();
temp.pos = strTag.pos("=");
if (temp.pos > -1) {
temp.var = strTag.substring(0,temp.pos);
if (strTag.charat(temp.pos+1) == char(34)) {
temp.npos = strTag.substring(temp.pos+2).pos(char(34) @ " ");
temp.len = strTag.substring(temp.pos+2).length();
if (temp.npos == -1 && strTag.substring(temp.pos+2).substring(temp.len-1,1) == char(34)) temp.npos = temp.len-1;
}
else temp.npos = strTag.substring(temp.pos+2).pos(" ");
temp.val = strTag.substring(temp.pos+2,temp.npos);
this.(@temp.var) = temp.val;
temp.arr.add(temp.var @ "=" @ temp.val);
strTag = strTag.substring(temp.npos+temp.pos+3).trim();
} else strTag = "";
}
temp.len = temp.arr.size();
temp.str = temp.arr[0];
for (temp.i=1;temp.i<temp.len;temp.i++)
temp.str @= "&" @ temp.arr[temp.i];
this = temp.str;
return;
}

function getEndIndex(strXML,endtag) {
temp.olen = 0;
while (1) { // We'll return directly from inside
temp.test = strXML.pos(endtag);
if (temp.test == -1) {
produce_error(ERROR.NO_CLOSE_ENT);
return -1;
}
temp.before = strXML.substring(temp.test);
temp.open = 0;
do {
temp.pos = temp.before.pos("=\"");
if (temp.pos > -1) {
temp.before = temp.before.substring(temp.pos+2);
temp.open = 1;
temp.pos = temp.before.pos("\" ");
if (temp.pos > -1) {
temp.before = temp.before.substring(temp.pos+2);
temp.open = 0;
} elseif (temp.before.substring(temp.before.length()-1,1) == "\"") {
// \"
temp.before = "";
temp.open = 0;
}
}
} while(temp.pos > -1 && temp.open == 0);
if (temp.open == 1) {
strXML = strXML.substring(temp.test+endtag.length());
temp.olen += temp.test+endtag.length();
} else return temp.olen + temp.test;
}
return -1; // 1 != true ?
}

function produce_error(strError) {
echo(format(strError,params.subarray(1)));
return;
}

Some bug fixes:

The xml declaration looks for the '?>' closing tag.
Quotes next to the closing tag are removed.
attributes can contain '>' now.


Some additions:

Error messages.
XML.nodeType now works for all nodes (I had originally scripted it to be XML.type and some of the .type code was still there. They were also never read-only at all, it seems.

ApothiX
08-25-2006, 11:40 AM
Ah, I see what it's doing now, but I still stand by my saying I don't think you should be setting this, especially if you're just setting it as a string. Why not just make it set a 'me.parsedXML', and return an error code?

with(me) {
temp.errorcode = parseXML();
}

switch(temp.errorcode) {
// Error Checking
}

Tolnaftate2004
08-25-2006, 08:45 PM
Ah, I see what it's doing now, but I still stand by my saying I don't think you should be setting this, especially if you're just setting it as a string.

Because the way I have it is the only way I could figure out the recursion to work correctly. If I set the string inside of the function, any child that has children will be changed to that child's children. If you get what I'm saying.


<key />
<node id="1">
<childnode />
</node>

'me' would still be '<key />\n<node id="1">\n <childnode />\n</node>', but 'me.childNodes' would come up with <key /> and ' ,<childnode />'.
Yes, it is possible to change that, but it would be reparsing bits that have already been parsed.

And setting this is not harming anything. I am merely setting a variable a different way.

new TGraalVar("me") {
this = "This is perfectly acceptable.";
}

If I was setting the NPC's this, then it might be a problem.

Why not just make it set a 'me.parsedXML', and return an error code?

with(me) {
temp.errorcode = parseXML();
}

switch(temp.errorcode) {
// Error Checking
}

Well, if I really wanted to follow ActionScript, I would set XML.status, which gives an error code.

e:
temp.before = strXML.substring(temp.test);
should be
temp.before = strXML.substring(0,temp.test);
in the code.