FAQ
So I set out to learn handling three-letter-acronym files in Python,
and SAX worked nicely until I encountered badly formed XMLs, like with
bad characters in it (well Unicode supposed to handle it all but
apparently doesn't), using http://dchublist.com/hublist.xml.bz2 as
example data, with goal to extract Users and Address properties where
number of Users is greater than given number.

So I extended my First XML Example with an error handler:

# ========= snip ===========
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
from xml.sax.handler import ErrorHandler

class HubHandler(ContentHandler):
def __init__(self, hublist):
self.Address = ''
self.Users = ''
hl = hublist
def startElement(self, name, attrs):
self.Address = attrs.get('Address',"")
self.Users = attrs.get('Users', "")
def endElement(self, name):
if name == "Hub" and int(self.Users) > 2000:
#print self.Address, self.Users
hl.append({self.Address: int(self.Users)})

class HubErrorHandler(ErrorHandler):
def __init__(self):
pass
def error(self, exception):
import sys
print "Error, exception: %s\n" % exception
def fatalError(self, exception):
print "Fatal Error, exception: %s\n" % exception

hl = []

parser = make_parser()

hHandler = HubHandler(hl)
errHandler = HubErrorHandler()

parser.setContentHandler(hHandler)
parser.setErrorHandler(errHandler)

fh = file('hublist.xml')
parser.parse(fh)

def compare(x,y):
if x.values()[0] > y.values()[0]:
return 1
elif x.values()[0] < y.values()[0]:
return -1
return 0

hl.sort(cmp=compare, reverse=True)

for h in hl:
print h.keys()[0], " ", h.values()[0]
# ========= snip ===========

And then BAM, Pythonwin has hit me:

execfile('ph.py')
Fatal Error, exception: hublist.xml:2247:11: not well-formed (invalid
token)

Fatal Error, exception: hublist.xml:2247:11: not well-formed (invalid
token)

Fatal Error, exception: hublist.xml:2247:11: not well-formed (invalid
token)

Fatal Error, exception: hublist.xml:2247:11: not well-formed (invalid
token)

Fatal Error, exception: hublist.xml:2247:11: not well-formed (invalid
token)

================================ RESTART ================================
Just before the "RESTART" line, Windows has announced it killed
pythonw.exe process (I suppose it was a child process).

WTF is happening here? Wasn't fatalError method in the HubErrorHandler
supposed to handle the invalid tokens? And why is the message repeated
many times? My method is called apparently, but something in SAX goes
awry and the interpreter crashes.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedMay 3, '08 at 8:50p
activeMay 3, '08 at 8:50p
posts1
users1
websitepython.org

1 user in discussion

Mrkafk: 1 post

People

Translate

site design / logo © 2018 Grokbase