Discussion:
What do people know about MarkLogic, - the db mandated for Healthcare.gov
Steve Lewis
2013-11-23 16:24:03 UTC
Permalink
What do people know about MarkLogic, a NoSQL database apparently mandated
for Healthcare.gov
http://www.nytimes.com/2013/11/23/us/politics/tension-and-woes-before-health-website-crash.html?ref=us&_r=0

I am interested in both the technical merits/demerits and the politics of
how this could be chosen.

However I might be intrigued with NoSQL, I am trying to imagine why any
choice but Oracle (or maybe DB2) would be
seriously considered;
--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
Eric Jain
2013-11-23 17:46:50 UTC
Permalink
However I might be intrigued with NoSQL, I am trying to imagine why any choice but Oracle (or maybe DB2) would be
seriously considered;
Oracle was not an option:

http://gop12.thehill.com/2012/12/oracle-ceo-made-last-minute-donation-to.html

:-)


------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
j***@public.gmane.org
2013-11-23 18:03:37 UTC
Permalink
I don't have any inside information on healthcare.gov's design or the politics involved, but I do know that MarkLogic is hugely more performant than Oracle in both machine resources and cost when you have a very large number of XML documents to store and process. MarkLogic was the first commercial document database that used a scalable grid architecture from the ground up. Beating the pants off Oracle in benchmarks involving hundreds of millions of XML documents is exactly how MarkLogic got their customer wins. For technical overview I'd suggest this:


http://developer.marklogic.com/inside-marklogic



Jim


---In ***@yahoogroups.com, <***@...> wrote:



What do people know about MarkLogic, a NoSQL database apparently mandated for Healthcare.gov
http://www.nytimes.com/2013/11/23/us/politics/tension-and-woes-before-health-website-crash.html?ref=us&_r=0 http://www.nytimes.com/2013/11/23/us/politics/tension-and-woes-before-health-website-crash.html?ref=us&_r=0


I am interested in both the technical merits/demerits and the politics of how this could be chosen.


However I might be intrigued with NoSQL, I am trying to imagine why any choice but Oracle (or maybe DB2) would be
seriously considered;
--
Steven M. Lewis PhD 4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
Konstantin Ignatyev
2013-11-23 18:14:09 UTC
Permalink
XML is always a wrong answer as Jason likes to say and I happen to agree.
Post by j***@public.gmane.org
I don't have any inside information on healthcare.gov's design or the
politics involved, but I do know that MarkLogic is hugely more performant
than Oracle in both machine resources and cost when you have a very large
number of XML documents to store and process. MarkLogic was the first
commercial document database that used a scalable grid architecture from
the ground up. Beating the pants off Oracle in benchmarks involving
hundreds of millions of XML documents is exactly how MarkLogic got their
http://developer.marklogic.com/inside-marklogic
Jim
What do people know about MarkLogic, a NoSQL database apparently mandated
for Healthcare.gov
http://www.nytimes.com/2013/11/23/us/politics/tension-and-woes-before-health-website-crash.html?ref=us&_r=0
I am interested in both the technical merits/demerits and the politics of
how this could be chosen.
However I might be intrigued with NoSQL, I am trying to imagine why any
choice but Oracle (or maybe DB2) would be
seriously considered;
--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
--
Konstantin Ignatyev

PS: If this is a typical day on planet Earth, humans will add fifteen
million tons of carbon to the atmosphere, destroy 115 square miles of
tropical rainforest, create seventy-two miles of desert, eliminate between
forty to one hundred species, erode seventy-one million tons of topsoil,
add 2,700 tons of CFCs to the stratosphere, and increase their population
by 263,000

Bowers, C.A. The Culture of Denial: Why the Environmental Movement Needs a
Strategy for Reforming Universities and Public Schools. New York: State
University of New York Press, 1997: (4) (5) (p.206)
j***@public.gmane.org
2013-11-23 18:52:04 UTC
Permalink
Really? What is it that is a plausible alternative for interchange of hundreds of different kinds of structured and semi-structured data amongst tens of thousands of different computer systems? Surely you wouldn't prefer that we're strictly limited to HL7's tagged record format (like EDIFACT) forever. If you mean something like JSON then surely you jest, such solutions are strictly for idiosyncratic systems and have no connection to the reality of healthcare IT (which I do have some decades of familiarity with). Personally I prefer HTML-based interchange, especially eRDF but unfortunately politics snatched defeat from the jaws of great success by forcing RDFa on everyone.

Jim


---In ***@yahoogroups.com, <***@...> wrote:

XML is always a wrong answer as Jason likes to say and I happen to agree.


On Sat, Nov 23, 2013 at 10:03 AM, <***@... https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=***@...> wrote:
I don't have any inside information on healthcare.gov http://healthcare.gov's design or the politics involved, but I do know that MarkLogic is hugely more performant than Oracle in both machine resources and cost when you have a very large number of XML documents to store and process. MarkLogic was the first commercial document database that used a scalable grid architecture from the ground up. Beating the pants off Oracle in benchmarks involving hundreds of millions of XML documents is exactly how MarkLogic got their customer wins. For technical overview I'd suggest this:


http://developer.marklogic.com/inside-marklogic http://developer.marklogic.com/inside-marklogic



Jim


---In ***@yahoogroups.com https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=***@yahoogroups.com, <***@...> wrote:



What do people know about MarkLogic, a NoSQL database apparently mandated for Healthcare.gov
http://www.nytimes.com/2013/11/23/us/politics/tension-and-woes-before-health-website-crash.html?ref=us&_r=0 http://www.nytimes.com/2013/11/23/us/politics/tension-and-woes-before-health-website-crash.html?ref=us&_r=0


I am interested in both the technical merits/demerits and the politics of how this could be chosen.


However I might be intrigued with NoSQL, I am trying to imagine why any choice but Oracle (or maybe DB2) would be
seriously considered;
--
Steven M. Lewis PhD 4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

















--
Konstantin Ignatyev

PS: If this is a typical day on planet Earth, humans will add fifteen million tons of carbon to the atmosphere, destroy 115 square miles of tropical rainforest, create seventy-two miles of desert, eliminate between forty to one hundred species, erode seventy-one million tons of topsoil, add 2,700 tons of CFCs to the stratosphere, and increase their population by 263,000

Bowers, C.A. The Culture of Denial: Why the Environmental Movement Needs a Strategy for Reforming Universities and Public Schools. New York: State University of New York Press, 1997: (4) (5) (p.206)
Jason Osgood
2013-11-23 19:26:39 UTC
Permalink
Hi James.
Post by j***@public.gmane.org
Surely you wouldn't prefer that we're strictly limited to HL7's tagged record format
Yes.

Whatever the source system’s “native” supported format is fine by me.

Further, HL7 2.x, warts and all, is FAR SUPERIOR to HL7 3.x.

To “modernize” healthcare IT, some group of clowns amused themselves by divining a spec that’d make Kafka proud. They didn’t merely transliterate HL7’s EDIness into XML. You know, permitting trivial round tripping.

Oh no.

They provided XSDs (which don’t compile). There’s new abstraction, archetypes, type system, and so forth. Something they call RIM (Reference Information Model). The spec is absolutely immense. And ambiguous. So there’s no definitive agreement on what values should appear where or how. So every one wings it, resulting in far more interop and compatibility problems than HL7 2.x ever had. (Sorry, it’s been years, so I don’t have examples.)

They did add a few new document types, like continuity of care document (CCD), which is just a simple report. Not worth a whole new ontological system of ultimate awesomeness.


Cheers, Jason

------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Daniel Kirkdorffer
2013-11-23 19:34:12 UTC
Permalink
They provided XSDs (which don't compile).
Meaning? Are you saying _their_ didn't render to objects (via JAXB for
example), or that XSD documents in general don't?
-----Original Message-----
On Behalf Of Jason Osgood
Sent: Saturday, November 23, 2013 11:27 AM
Subject: Re: [seajug] RE: What do people know about
MarkLogic, - the db mandated for Healthcare.gov
Hi James.
Post by j***@public.gmane.org
Surely you wouldn't prefer that we're strictly limited to
HL7's tagged
Post by j***@public.gmane.org
record format
Yes.
Whatever the source system's "native" supported format is fine by me.
Further, HL7 2.x, warts and all, is FAR SUPERIOR to HL7 3.x.
To "modernize" healthcare IT, some group of clowns amused
themselves by divining a spec that'd make Kafka proud. They
didn't merely transliterate HL7's EDIness into XML. You know,
permitting trivial round tripping.
Oh no.
They provided XSDs (which don't compile). There's new
abstraction, archetypes, type system, and so forth. Something
they call RIM (Reference Information Model). The spec is
absolutely immense. And ambiguous. So there's no definitive
agreement on what values should appear where or how. So every
one wings it, resulting in far more interop and compatibility
problems than HL7 2.x ever had. (Sorry, it's been years, so I
don't have examples.)
They did add a few new document types, like continuity of
care document (CCD), which is just a simple report. Not worth
a whole new ontological system of ultimate awesomeness.
Cheers, Jason
------------------------------------
Yahoo Groups Links
------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Daniel Kirkdorffer
2013-11-23 19:43:19 UTC
Permalink
uh... _their_ XSDs, of course.


-----Original Message-----
From: seajug-***@public.gmane.org [mailto:seajug-***@public.gmane.org] On Behalf Of
Daniel Kirkdorffer
Sent: Saturday, November 23, 2013 11:34 AM
To: seajug-***@public.gmane.org
Subject: RE: [seajug] RE: What do people know about MarkLogic, - the db
mandated for Healthcare.gov
They provided XSDs (which don't compile).
Meaning? Are you saying _their_ didn't render to objects (via JAXB for
example), or that XSD documents in general don't?
-----Original Message-----
On Behalf Of Jason Osgood
Sent: Saturday, November 23, 2013 11:27 AM
Subject: Re: [seajug] RE: What do people know about
MarkLogic, - the db mandated for Healthcare.gov
Hi James.
Post by j***@public.gmane.org
Surely you wouldn't prefer that we're strictly limited to
HL7's tagged
Post by j***@public.gmane.org
record format
Yes.
Whatever the source system's "native" supported format is fine by me.
Further, HL7 2.x, warts and all, is FAR SUPERIOR to HL7 3.x.
To "modernize" healthcare IT, some group of clowns amused
themselves by divining a spec that'd make Kafka proud. They
didn't merely transliterate HL7's EDIness into XML. You know,
permitting trivial round tripping.
Oh no.
They provided XSDs (which don't compile). There's new
abstraction, archetypes, type system, and so forth. Something
they call RIM (Reference Information Model). The spec is
absolutely immense. And ambiguous. So there's no definitive
agreement on what values should appear where or how. So every
one wings it, resulting in far more interop and compatibility
problems than HL7 2.x ever had. (Sorry, it's been years, so I
don't have examples.)
They did add a few new document types, like continuity of
care document (CCD), which is just a simple report. Not worth
a whole new ontological system of ultimate awesomeness.
Cheers, Jason
------------------------------------
Yahoo Groups Links
Jason Osgood
2013-11-23 19:43:25 UTC
Permalink
Hi Dan.
Post by Daniel Kirkdorffer
They provided XSDs (which don't compile).
Meaning? Are you saying _their_ didn't render to objects (via JAXB for
example), or that XSD documents in general don't?
The best I could figure out, they created the HL7 XML RIM using a Microsoft stack and didn’t bother verifying the artifacts could be consumed by other stacks (e.g. Java).

The standards people were like “Works for us. Perhaps you should learn how to XML."

------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Dennis Sosnoski
2013-11-23 21:30:05 UTC
Permalink
Post by Jason Osgood
...
The best I could figure out, they created the HL7 XML RIM using a Microsoft stack and didn’t bother verifying the artifacts could be consumed by other stacks (e.g. Java).
The standards people were like “Works for us. Perhaps you should learn how to XML."
Ahhh... that doesn't sound unreasonable to me. XML is XML, and Microsoft
doesn't have any proprietary extensions to the format that I know of.
And when you say "don't compile" I assume you mean that you tried
running them through JAXB and had a failure of some sort. JAXB is not
the greatest tool in the world, but it does allow customizations. Do you
know why it failed? There are also other alternatives, including
XMLBeans (which is not the most convenient to use, but does handle
anything with a schema) and my own JiBX (limited schema support, but
very flexible - though now sadly shelved, since corporate users haven't
been supportive of the development).

Ironically, JAXB 2.x and JAX-WS 2.x are both products of a Sun effort to
slavishly imitate Microsoft. As explained to me by a senior Sun manager
at the time, they felt they were losing the enterprise market to
Microsoft and needed to match everything Microsoft did, feature by
feature. Idiots.

- Dennis


------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Konstantin Ignatyev
2013-11-23 19:28:17 UTC
Permalink
As if that reality is near perfect and does not need any changes!

IMO the whole healthcare and IT is just need to be wiped out clean and
started from scratch and with complete transparency: no privacy in medical
records, no denials based on history or conditions, providers charge same
price everyone (with and without insurance) and insurance companies cover
it or portion of it.

At it stands today the health( cringe to say care) industry is a giant
mafia to prey on us.

And IT in the industry just twisted to serve its corrupt masters.

Back to technical: there is legitimate need for parseable and manipulable
format to represent tree like structures. But XML has evolved into horrible
format for that, not that it could not been good for that, it is just it
morphed into complete mess.
... If you mean something like JSON then surely you jest, such solutions
are strictly for idiosyncratic systems and have no connection to the
reality of healthcare IT (which I do have some decades of familiarity
with)....
Jim
Jason Osgood
2013-11-23 19:40:50 UTC
Permalink
Hi James.
Post by j***@public.gmane.org
What is it that is a plausible alternative for interchange of hundreds of different kinds of structured and semi-structured data amongst tens of thousands of different computer systems?
The most simple thing that works.

The whole idea is to copy a string from one spinning platter and paste it onto a different platter. Or paste that string into a report, such as a web page.

Any extra method, message, API, transformation, parse, layer, abstraction, indirection, schema, model, cache, or whatever those does anything more than a cut and a paste is wasted effort (muda).

But even that is working too hard. As my friend Stan Dyck concluded, we shouldn’t be shuttling data around. Rather, we should just access the source data directly.


Cheers, Jason




------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Jason Osgood
2013-11-23 19:07:23 UTC
Permalink
Post by j***@public.gmane.org
MarkLogic is hugely more performant than Oracle in both machine resources and cost when you have a very large number of XML documents to store and process.
Apologies, but that means nothing to me.
Post by j***@public.gmane.org
XML is always a wrong answer as Jason likes to say and I happen to agree.
Good to know you have my back. :)

However.

We spent our time slurping, storing, exchanging, and report generating for formats, primarily HL7 2.x (a mutant form of EDI) and HL7 3.x (XML syntax).

The ETL work scraped content out of these files, stuffed it all into a nearly normalized RDBMS, and then turned around moments later and used wicked joins to reconstitute the exact same information.

It was a terrible idea, strategy, architect, mess. Hard to scale, high cost of change, difficult to debug.

My idea was to simply log the messages we receive (as loose files). Then index it with Lucene. Text search to find stuff then map/reduce to divine the single best record (information could arrive out of order, with corrections). No need for SQL’s join queries, because the data was already in the format needed (mostly).

(We didn’t get very far prototyping that idea before we got bought. Our new Overlords mandated we use InterSystems’ Cache, a chaotic mutation of MUMPS, and probably the only thing I’ve seen worse than ASN.1, which is worse than XML. Imagine a SmallTalk developer environment written Perl, where any compiler error can wreck your entire runtime, with no undo.)

So my operating premise was that while XML blows, it’s best to just log original data received. Makes the inevitable multivendor troubleshooting conference calls much shorter.

As for MarkLogic. Whatever. As a NoSQL key/value store, I’m sure it’s fantastic. /sarcasm

The complete non starter with this “XML store” is they use XQuery. Terrible idea. No one knows it. The tool stack is mostly nonexistent. And do you really want require document and schema validation before you can access the data? Because I guarantee the garbage you receive will have problems pop up unannounced.

As for scale, this whole healthcare.gov thing feels overwrought. This is just eligibility data and guarantor signups, right? I could run it on my laptop.

The real problem with these IT systems is trying to get competitors to play nicely with each other. The feds set up a “meaningful use” data standard (which I never understood) as a carrot and stick situation. They spec’d the minimum data set to be supported (trivial) or you’d have your funding cut.

I support any efforts to kick in the teeth of the insurance companies, pharma, laboratories, etc. My primary complaint with “meaningful use” is that it accomplished so little. Like bribing your kids with a gallon of ice cream to eat an ounce of vegetables, pretty please.


Cheers, Jason

------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
j***@public.gmane.org
2013-11-23 19:26:37 UTC
Permalink
Just to be clear, MarkLogic is *not* a key/value store. It is a schema-agnostic document database. And while it is true that all kinds of XML shredding is going on inside the indexing nodes, none of that is visible to a database user. The primary query language for MarkLogic is XQuery but there are all kinds of text search smarts beyond just elements and attributes.


The insanely hard problem of interchange of healthcare documents, which necessarily entails the maintenance and implementation of national and international standards, is not caused by the ACA or a cause for healthcare.gov's troubles. I will say though that the fact that healthcare.gov chose a database that makes the document's format primary rather than thinking that they should invest any effort at all in reformulating its structure shows the architects had their priorities straight (at least in that regard).


Jim


Jim
Post by j***@public.gmane.org
MarkLogic is hugely more performant than Oracle in both machine resources and cost when you have a very large number of XML documents to store and process.
Apologies, but that means nothing to me.
Post by j***@public.gmane.org
XML is always a wrong answer as Jason likes to say and I happen to agree.
Good to know you have my back. :)

However.

We spent our time slurping, storing, exchanging, and report generating for formats, primarily HL7 2.x (a mutant form of EDI) and HL7 3.x (XML syntax).

The ETL work scraped content out of these files, stuffed it all into a nearly normalized RDBMS, and then turned around moments later and used wicked joins to reconstitute the exact same information.

It was a terrible idea, strategy, architect, mess. Hard to scale, high cost of change, difficult to debug.

My idea was to simply log the messages we receive (as loose files). Then index it with Lucene. Text search to find stuff then map/reduce to divine the single best record (information could arrive out of order, with corrections). No need for SQL’s join queries, because the data was already in the format needed (mostly).

(We didn’t get very far prototyping that idea before we got bought. Our new Overlords mandated we use InterSystems’ Cache, a chaotic mutation of MUMPS, and probably the only thing I’ve seen worse than ASN.1, which is worse than XML. Imagine a SmallTalk developer environment written Perl, where any compiler error can wreck your entire runtime, with no undo.)

So my operating premise was that while XML blows, it’s best to just log original data received. Makes the inevitable multivendor troubleshooting conference calls much shorter.

As for MarkLogic. Whatever. As a NoSQL key/value store, I’m sure it’s fantastic. /sarcasm

The complete non starter with this “XML store” is they use XQuery. Terrible idea. No one knows it. The tool stack is mostly nonexistent. And do you really want require document and schema validation before you can access the data? Because I guarantee the garbage you receive will have problems pop up unannounced.

As for scale, this whole healthcare.gov thing feels overwrought. This is just eligibility data and guarantor signups, right? I could run it on my laptop.

The real problem with these IT systems is trying to get competitors to play nicely with each other. The feds set up a “meaningful use” data standard (which I never understood) as a carrot and stick situation. They spec’d the minimum data set to be supported (trivial) or you’d have your funding cut.

I support any efforts to kick in the teeth of the insurance companies, pharma, laboratories, etc. My primary complaint with “meaningful use” is that it accomplished so little. Like bribing your kids with a gallon of ice cream to eat an ounce of vegetables, pretty please.


Cheers, Jason
Jason Osgood
2013-11-23 19:29:35 UTC
Permalink
Just to be clear, MarkLogic … is a schema-agnostic document database.
aka file system?
And while it is true that all kinds of XML shredding is going on inside the indexing nodes, none of that is visible to a database user.
I really despise working with black boxes.

------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Dennis Sosnoski
2013-11-23 21:03:47 UTC
Permalink
Haven't done anything with MarkLogic, but I have been involved in the
ACA project from the state side. I helped Nebraska implement their
interfaces to the Federal Data Service Hub (incidentally, they're one of
the states with the fewest problems :-) ). I got involved in May, and
was surprised that the web services involved in communicating between
the states and the hub were still changing well past that point.

From the article, it sounds like both Medicate and CGI were incredibly
incompetent. CGI apparently didn't like being told to use MarkLogic from
the beginning, and probably didn't make any real effort to understand
how to use it properly. Certainly the types of problems with MarkLogic
usage discussed in their emails at the end of September
(http://energycommerce.house.gov/sites/republicans.energycommerce.house.gov/files/20131121-Sept26to30AdministraitonEmails.pdf)
are the sorts of things you'd expect any competent developer to have
looked into early on.

So my guess is that the CGI people were probably used to working with
relational databases and weren't interested in learning anything new,
instead just assuming they could do everything the same way they would
with Oracle or whatever. Does that mean MarkLogic was a bad choice?

- Dennis
Post by Steve Lewis
What do people know about MarkLogic, a NoSQL database apparently
mandated for Healthcare.gov
http://www.nytimes.com/2013/11/23/us/politics/tension-and-woes-before-health-website-crash.html?ref=us&_r=0
I am interested in both the technical merits/demerits and the politics
of how this could be chosen.
However I might be intrigued with NoSQL, I am trying to imagine why
any choice but Oracle (or maybe DB2) would be
seriously considered;
--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
Douglas Pearson
2013-11-23 21:31:46 UTC
Permalink
Reading that chain of emails is pretty fascinating stuff. Interesting that
they were able to handle 500 concurrents, were aiming to test 10,000 and
knew they needed probably 50,000.

Those numbers look strangely familiar since our little game site also gets
10,000 concurrents at daily peak traffic and the backend was built by a
team of 2 and for a few dollars less than $600 million.

Maybe the problem isn't the technology choices as much as the people doing
the work?

Doug
Haven't done anything with MarkLogic, but I have been involved in the ACA
project from the state side. I helped Nebraska implement their interfaces
to the Federal Data Service Hub (incidentally, they're one of the states
with the fewest problems :-) ). I got involved in May, and was surprised
that the web services involved in communicating between the states and the
hub were still changing well past that point.
From the article, it sounds like both Medicate and CGI were incredibly
incompetent. CGI apparently didn't like being told to use MarkLogic from
the beginning, and probably didn't make any real effort to understand how
to use it properly. Certainly the types of problems with MarkLogic usage
discussed in their emails at the end of September (
http://energycommerce.house.gov/sites/republicans.energycommerce.house.gov/files/20131121-Sept26to30AdministraitonEmails.pdf)
are the sorts of things you'd expect any competent developer to have looked
into early on.
So my guess is that the CGI people were probably used to working with
relational databases and weren't interested in learning anything new,
instead just assuming they could do everything the same way they would with
Oracle or whatever. Does that mean MarkLogic was a bad choice?
- Dennis
What do people know about MarkLogic, a NoSQL database apparently
mandated for Healthcare.gov
http://www.nytimes.com/2013/11/23/us/politics/tension-and-woes-before-health-website-crash.html?ref=us&_r=0
I am interested in both the technical merits/demerits and the politics
of how this could be chosen.
However I might be intrigued with NoSQL, I am trying to imagine why any
choice but Oracle (or maybe DB2) would be
seriously considered;
--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
Dennis Sosnoski
2013-11-23 21:38:35 UTC
Permalink
I'm sure it was also the requirements, which probably kept changing. One
of the main points from the NYT article was that the guy in charge of
the project, Henry Chao, didn't have the authority to actually make any
decisions on his own! When every design decision has to go through a
process of inter-agency negotiation nothing is ever going to get done in
a timely manner.

- Dennis
Post by Douglas Pearson
Reading that chain of emails is pretty fascinating stuff. Interesting
that they were able to handle 500 concurrents, were aiming to test
10,000 and knew they needed probably 50,000.
Those numbers look strangely familiar since our little game site also
gets 10,000 concurrents at daily peak traffic and the backend was
built by a team of 2 and for a few dollars less than $600 million.
Maybe the problem isn't the technology choices as much as the people
doing the work?
Doug
Haven't done anything with MarkLogic, but I have been involved in
the ACA project from the state side. I helped Nebraska implement
their interfaces to the Federal Data Service Hub (incidentally,
they're one of the states with the fewest problems :-) ). I got
involved in May, and was surprised that the web services involved
in communicating between the states and the hub were still
changing well past that point.
From the article, it sounds like both Medicate and CGI were
incredibly incompetent. CGI apparently didn't like being told to
use MarkLogic from the beginning, and probably didn't make any
real effort to understand how to use it properly. Certainly the
types of problems with MarkLogic usage discussed in their emails
at the end of September
(http://energycommerce.house.gov/sites/republicans.energycommerce.house.gov/files/20131121-Sept26to30AdministraitonEmails.pdf)
are the sorts of things you'd expect any competent developer to
have looked into early on.
So my guess is that the CGI people were probably used to working
with relational databases and weren't interested in learning
anything new, instead just assuming they could do everything the
same way they would with Oracle or whatever. Does that mean
MarkLogic was a bad choice?
- Dennis
Post by Steve Lewis
What do people know about MarkLogic, a NoSQL database apparently
mandated for Healthcare.gov
http://www.nytimes.com/2013/11/23/us/politics/tension-and-woes-before-health-website-crash.html?ref=us&_r=0
I am interested in both the technical merits/demerits and the
politics of how this could be chosen.
However I might be intrigued with NoSQL, I am trying to imagine
why any choice but Oracle (or maybe DB2) would be
seriously considered;
--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 <tel:206-384-1340> (cell)
Skype lordjoe_com
P.Hill
2013-11-24 23:40:18 UTC
Permalink
I am trying to imagine why any choice but Oracle (or maybe DB2) would beseriously considered;
Might I suggest that the major thrust of the site is to build up (1) a
customer profile, (2) record of choices selected & tried, (3) an actual
choice of a policy from a particular insurance company. This seems
formulaic for a document oriented, non-relational database, because it's
not very far from a customer profile, (persistent) session information,
and a (persistent) shopping cart. Many in the web business store such
things as documents in NOSQL solutions.

As demonstrated we can argue about the problems of actually structured
document standards, but modeling one user as one document (or a few
documents) does not seem unreasonable. The other primary use cases are
about searching for policies by various criteria. Therefore I'd ask
whether this problem is close enough to faceted document searching as
many product sites do these days w/o a RDBMs using SOLR, ElasticSearch
if there is need for word search within fields or Cassandra or BigTable
if you just have have lots of extra fields to search (price, coverage,
availablity, ...). Does this problem really need more than structured
documents for a USER, a lot of POLICY documents, and maybe a document
representing an INSURANCE COMPANY?

I can't imagine what use case makes you balk. What use case just calls
out to you for a network of more tables with all kinds of handy primary,
secondary keys and foreign keys (aka the relations of the RDBMS),
because there are requirements to approach or combine the data from so
many other ways?

Why not NOSQL?

-Paul
However I might be intrigued with NoSQL, I am trying to imagine why
any choice but Oracle (or maybe DB2) would be
seriously considered;
--
Dennis Sosnoski
2013-11-25 00:33:33 UTC
Permalink
Just to add to Paul's points, the federal government uses XML standards
- NIEM in particular (https://www.niem.gov/Pages/default.aspx). A big
part of the ACA implementation involves data exchanges between different
groups, for information such as citizenship, income verification, social
security status, etc., all defined by XML documents. Interactions with
the states involves even more XML documents. I suspect the XML database
was intended to simplify the processing of these document exchanges (and
perhaps to provide audit trails as well). You could certainly slice and
dice the data into relational form instead, but it's a lot of work and
it's not clear to me that there's any benefit in doing so when you're
basically just acting as an intelligent message switch.

- Dennis
Post by P.Hill
I am trying to imagine why any choice but Oracle (or maybe DB2) would beseriously considered;
Might I suggest that the major thrust of the site is to build up (1) a
customer profile, (2) record of choices selected & tried, (3) an
actual choice of a policy from a particular insurance company. This
seems formulaic for a document oriented, non-relational database,
because it's not very far from a customer profile, (persistent)
session information, and a (persistent) shopping cart. Many in the
web business store such things as documents in NOSQL solutions.
As demonstrated we can argue about the problems of actually structured
document standards, but modeling one user as one document (or a few
documents) does not seem unreasonable. The other primary use cases
are about searching for policies by various criteria. Therefore I'd
ask whether this problem is close enough to faceted document searching
as many product sites do these days w/o a RDBMs using SOLR,
ElasticSearch if there is need for word search within fields or
Cassandra or BigTable if you just have have lots of extra fields to
search (price, coverage, availablity, ...). Does this problem really
need more than structured documents for a USER, a lot of POLICY
documents, and maybe a document representing an INSURANCE COMPANY?
I can't imagine what use case makes you balk. What use case just calls
out to you for a network of more tables with all kinds of handy
primary, secondary keys and foreign keys (aka the relations of the
RDBMS), because there are requirements to approach or combine the data
from so many other ways?
Why not NOSQL?
-Paul
However I might be intrigued with NoSQL, I am trying to imagine why
any choice but Oracle (or maybe DB2) would be
seriously considered;
--
Paul Z. Wu
2013-11-25 22:47:28 UTC
Permalink
Well, Oregon's site was contracted to Oracle....  it is even more shameful.... no single user has been able to sign up up to today!   So DB may be not one of the main reasons for the problems. 

 
Paul Z. Wu
 



________________________________
From: Dennis Sosnoski <dms-WAiJhE/vqclWk0Htik3J/***@public.gmane.org>
To: seajug-***@public.gmane.org
Cc: P.Hill <parehill1-***@public.gmane.org>
Sent: Sunday, November 24, 2013 4:33 PM
Subject: Re: [seajug] What do people know about MarkLogic, - the db mandated for Healthcare.gov



 
Just to add to Paul's points, the federal government uses XML standards - NIEM in particular (https://www.niem.gov/Pages/default.aspx). A big part of the ACA implementation involves data exchanges between different groups, for information such as citizenship, income verification, social security status, etc., all defined by XML documents. Interactions with the states involves even more XML documents. I suspect the XML database was intended to simplify the processing of these document exchanges (and perhaps to provide audit trails as well). You could certainly slice and dice the data into relational form instead, but it's a lot of work and it's not clear to me that there's any benefit in doing so when you're basically just acting as an intelligent message switch.

  - Dennis
Post by P.Hill
I am trying to imagine why any choice but Oracle (or maybe DB2) would beseriously considered;
Might I suggest that the major thrust of the site is to build up
(1) a customer profile, (2) record of choices selected &
tried, (3) an actual choice of a policy from a particular
insurance company.  This seems formulaic for a document
oriented, non-relational database, because it's not very far
from a customer profile, (persistent) session information, and a
(persistent) shopping cart.  Many in the web business store such
things as documents in NOSQL solutions. 
Post by P.Hill
As demonstrated we can argue about the problems of actually
structured document standards, but modeling one user as one
document (or a few documents) does not seem unreasonable.  The
other primary use cases are about searching for policies by
various criteria.  Therefore I'd ask whether this problem is
close enough to faceted document searching as many product sites
do these days w/o a RDBMs using SOLR, ElasticSearch if there is
need for word search within fields or Cassandra or BigTable if
you just have have lots of extra fields to search (price,
coverage, availablity, ...).   Does this problem really need
more than structured documents for a USER, a lot of POLICY
documents, and maybe a document representing an INSURANCE
COMPANY?
Post by P.Hill
I can't imagine what use case makes you balk. What use case just
calls out to you for a network of more tables with all kinds of
handy primary, secondary keys and foreign keys (aka the
relations of the RDBMS), because there are requirements to
approach or combine the data from so many other ways?
Post by P.Hill
Why not NOSQL?
-Paul
 
However I might be intrigued with NoSQL, I am trying to imagine why any choice but Oracle (or maybe DB2) would be
seriously considered;
-- 
Jason Osgood
2013-11-26 01:48:24 UTC
Permalink
Well, Oregon's site was contracted to Oracle.... it is even more shameful.... no single user has been able to sign up up to today! So DB may be not one of the main reasons for the problems.
I’d never work with Oracle professional services.

Our office was brought in as a subcontractor by the prime contractor Sun (which became Oracle). They spec’d *ridiculously* huge servers, with matching software licensing. Their “architects” were utterly opposed to any efficiency improvement which would reduce the hardware or software payload.

I’m sure gold plating is the norm, every where. It’s still gross when seen up close.


Cheers, Jason

------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Jason Osgood
2013-11-26 13:46:08 UTC
Permalink
Just to add to Paul's points, the federal government uses XML standards - NIEM in particular… A big part of the ACA implementation involves data exchanges between different groups, for information such as citizenship, income verification, social security status, etc., all defined by XML documents.
Interesting.

I was curious how NIEM related to NHIN (which may have a new name since my time).

https://www.niem.gov/

http://en.wikipedia.org/wiki/Nationwide_Health_Information_Network

Gods, I wish these standards write-ups would start with “Here are some use cases showing why you need to use NIEM.” But no, it’s reams about governance, policies, architecture, blah, blah, blah.

Learning that it builds on the GJXDM work helps.

http://en.wikipedia.org/wiki/GJXDM

And I finally found the goods:

http://release.niem.gov/niem/3.0/schemas.html

Quite an impressive body of work.

Despite my antipathy towards all things XML, this work must still be done. If not XSD, then what? I don’t have an answer for that.


Cheers, Jason

------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Dennis Sosnoski
2013-11-26 20:44:27 UTC
Permalink
Post by Jason Osgood
...
http://release.niem.gov/niem/3.0/schemas.html
Quite an impressive body of work.
Despite my antipathy towards all things XML, this work must still be done. If not XSD, then what? I don’t have an answer for that.
Yes, unfortunately I don't have an answer for that either. And
impressive they may be, but these schemas are a real pain to work with
using JAXB. Layers and layers of classes to dig through to get to the
actual data. :-(

- Dennis



------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Konstantin Ignatyev
2013-11-26 21:28:56 UTC
Permalink
Well, I think the answer should be something like Scala classes ;)

Really, because Scala has class Option to capture notion of optional
values. Consider this:

case class HCDocument (
id:String,
name:String,
person:Person,
tags:Option[List[String]] = None
)

What useful can be expressed in XSD, but cannot be expressed as class with
optionals?

Note the "useful" adjective, in my books things like schema definition for
ISBN are not useful http://www.xfront.com/isbn.html

Is it just me or there is a reasonable use for schemas like that one for
ISBN ( all 3589 lines of it)?

I think structure definitions and validations should live in separate
definitions
so implementer can use any validator that makes sense in the context when
it makes sense
Post by Jason Osgood
...
http://release.niem.gov/niem/3.0/schemas.html
Quite an impressive body of work.
Despite my antipathy towards all things XML, this work must still be
done. If not XSD, then what? I don’t have an answer for that.
Yes, unfortunately I don't have an answer for that either. And
impressive they may be, but these schemas are a real pain to work with
using JAXB. Layers and layers of classes to dig through to get to the
actual data. :-(
- Dennis
------------------------------------
Yahoo Groups Links
--
Konstantin Ignatyev

PS: If this is a typical day on planet Earth, humans will add fifteen
million tons of carbon to the atmosphere, destroy 115 square miles of
tropical rainforest, create seventy-two miles of desert, eliminate between
forty to one hundred species, erode seventy-one million tons of topsoil,
add 2,700 tons of CFCs to the stratosphere, and increase their population
by 263,000

Bowers, C.A. The Culture of Denial: Why the Environmental Movement Needs a
Strategy for Reforming Universities and Public Schools. New York: State
University of New York Press, 1997: (4) (5) (p.206)
Stan Dyck
2013-11-26 23:20:42 UTC
Permalink
I don't think the relative expressibility of XSD vs. X is at issue. It is possible to *express* a typing system in any
language, right? The problem is that a Scala expression is necessarily going to tie you to the JVM which won't be
acceptable everywhere.

If you have decided that your particular business domain needs a *universal* type system (which I think is a dubious
goal (hello HL7!), but we'll leave that for another discussion), I think the means by which you express that type system
should at least have a couple of properties. It should be...

1. universally parseable, including human eye parsable, and
2. programmatically producible

Of course, with enough work, all programming languages meet these requirements, but XSD, for whatever limitations it
might have, is built with those properties in mind. (Where XSD falls down, IMHO is that it isn't really human producible.)

Interestingly, (to me anyway) homoiconic languages like lisp also have these properties. That's one of the reasons they
deserve a closer look.

StanD.
Post by Konstantin Ignatyev
Well, I think the answer should be something like Scala classes ;)
case class HCDocument (
id:String,
name:String,
person:Person,
tags:Option[List[String]] = None
)
What useful can be expressed in XSD, but cannot be expressed as class with optionals?
Note the "useful" adjective, in my books things like schema definition for ISBN are not useful
http://www.xfront.com/isbn.html
Is it just me or there is a reasonable use for schemas like that one for ISBN ( all 3589 lines of it)?
I think structure definitions and validations should live in separate definitions
so implementer can use any validator that makes sense in the context when it makes sense
Post by Jason Osgood
...
http://release.niem.gov/niem/3.0/schemas.html
Quite an impressive body of work.
Despite my antipathy towards all things XML, this work must still be done. If not XSD, then what? I don’t have an
answer for that.
Yes, unfortunately I don't have an answer for that either. And
impressive they may be, but these schemas are a real pain to work with
using JAXB. Layers and layers of classes to dig through to get to the
actual data. :-(
- Dennis
------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Konstantin Ignatyev
2013-11-27 01:56:12 UTC
Permalink
Apologies for not making myself clearer: I did not mean that Scala classes,
I meant using them to describe structure of documents because of
expressiveness. Other decent IDLs would work too (
https://developers.google.com/protocol-buffers/docs/overview for example).

And I think that expressiveness of a language is VERY important, that is
why we have domain specific languages like SQL and CSS.

And my another major point is that XSD fails because it mixes structure
with validation. Validation needs its own definition language but it should
not be mixed with structure definition.

Typical rookie mistake ;) like early HTML tried to be structure and
presentation - we just recently started to recover from all that madness.
Post by Stan Dyck
I don't think the relative expressibility of XSD vs. X is at issue. It is
possible to *express* a typing system in any
language, right? The problem is that a Scala expression is necessarily
going to tie you to the JVM which won't be
acceptable everywhere.
If you have decided that your particular business domain needs a
*universal* type system (which I think is a dubious
goal (hello HL7!), but we'll leave that for another discussion), I think
the means by which you express that type system
should at least have a couple of properties. It should be...
1. universally parseable, including human eye parsable, and
2. programmatically producible
Of course, with enough work, all programming languages meet these
requirements, but XSD, for whatever limitations it
might have, is built with those properties in mind. (Where XSD falls down,
IMHO is that it isn't really human producible.)
Interestingly, (to me anyway) homoiconic languages like lisp also have
these properties. That's one of the reasons they
deserve a closer look.
StanD.
Well, I think the answer should be something like Scala classes ;)
Really, because Scala has class Option to capture notion of optional
case class HCDocument (
id:String,
name:String,
person:Person,
tags:Option[List[String]] = None
)
What useful can be expressed in XSD, but cannot be expressed as class
with optionals?
Note the "useful" adjective, in my books things like schema definition
for ISBN are not useful
http://www.xfront.com/isbn.html
Is it just me or there is a reasonable use for schemas like that one for
ISBN ( all 3589 lines of it)?
I think structure definitions and validations should live in separate
definitions
so implementer can use any validator that makes sense in the context
when it makes sense
Post by Jason Osgood
...
http://release.niem.gov/niem/3.0/schemas.html
Quite an impressive body of work.
Despite my antipathy towards all things XML, this work must still be
done. If not XSD, then what? I don’t have an
answer for that.
Yes, unfortunately I don't have an answer for that either. And
impressive they may be, but these schemas are a real pain to work with
using JAXB. Layers and layers of classes to dig through to get to the
actual data. :-(
- Dennis
--
Konstantin Ignatyev

PS: If this is a typical day on planet Earth, humans will add fifteen
million tons of carbon to the atmosphere, destroy 115 square miles of
tropical rainforest, create seventy-two miles of desert, eliminate between
forty to one hundred species, erode seventy-one million tons of topsoil,
add 2,700 tons of CFCs to the stratosphere, and increase their population
by 263,000

Bowers, C.A. The Culture of Denial: Why the Environmental Movement Needs a
Strategy for Reforming Universities and Public Schools. New York: State
University of New York Press, 1997: (4) (5) (p.206)
Eric Jain
2013-11-27 02:09:17 UTC
Permalink
On Tue, Nov 26, 2013 at 5:56 PM, Konstantin Ignatyev
And my another major point is that XSD fails because it mixes structure with validation. Validation needs its own definition language but it should not be mixed with structure definition.
It's not all that clear to me where "structure" should stop and
"validation" start: Most people (at least those working with strongly
typed languages) might agree that basic data types are structure, and
that constraints such as "the value of x must be greater than the
value of y" are validation, but what about constraints such as "x must
be between 0 and 100" or "there must be at least 10 items in the
list"?

Also, how do you describe validation rules?
--
Eric Jain
zenobase.com -- What do you want to track today?


------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Konstantin Ignatyev
2013-11-27 02:25:39 UTC
Permalink
How to express validation rules efficiently is a darn good questions and
there are many attempts to do that (
http://camel.apache.org/validate.html), I do not know the best but my
point is that good definitions are defined
separately from structure definition, pretty much like in statically
defined languages ;)

And many validations can not be expressed in in any sort of schema: for
example "SSN should be valid" - validity of SSN is not about length and
symbols, it is matter of being issued, and belonging to still alive person
with same name as submitter of the document.
And how many elements should be in list often depends on some other things,
for example if number of claimed exempts on tax return defines if any
dependents are expected in the list.
Post by Eric Jain
On Tue, Nov 26, 2013 at 5:56 PM, Konstantin Ignatyev
Post by Konstantin Ignatyev
And my another major point is that XSD fails because it mixes structure
with validation. Validation needs its own definition language but it should
not be mixed with structure definition.
It's not all that clear to me where "structure" should stop and
"validation" start: Most people (at least those working with strongly
typed languages) might agree that basic data types are structure, and
that constraints such as "the value of x must be greater than the
value of y" are validation, but what about constraints such as "x must
be between 0 and 100" or "there must be at least 10 items in the
list"?
Also, how do you describe validation rules?
--
Eric Jain
zenobase.com -- What do you want to track today?
--
Konstantin Ignatyev

PS: If this is a typical day on planet Earth, humans will add fifteen
million tons of carbon to the atmosphere, destroy 115 square miles of
tropical rainforest, create seventy-two miles of desert, eliminate between
forty to one hundred species, erode seventy-one million tons of topsoil,
add 2,700 tons of CFCs to the stratosphere, and increase their population
by 263,000

Bowers, C.A. The Culture of Denial: Why the Environmental Movement Needs a
Strategy for Reforming Universities and Public Schools. New York: State
University of New York Press, 1997: (4) (5) (p.206)
Jason Osgood
2013-11-27 04:16:41 UTC
Permalink
Hi Eric Jain.
Post by Eric Jain
how do you describe validation rules?
Excellent question. I currently believe that validation should be done as close to the input stream as possible.

0 - InputStream (encoding)
1 - Lexer
2 - Parser
3 - Compiler / Interpreter + Type system
4 - Program Logic

This is most simple. Easiest to express. And less likely to have conflicting validation rules.


Hi Dennis Sosnoski.
Post by Eric Jain
Schema is a horrible mess, but does at least allow you to specify both [structure and data types] of the document (though poorly in some cases, such as with the date/time types).
Date/time is a great example.

ARON supports dates natively, using a hybrid of syntax and helper classes (#s 1 and 3 above).

ARON's grammar tokenizes the input stream. Then date values are verified using SimpleDateFormat. Instead of me coding an uber date parser, I premade various SDF instances using common date formats, hoping one will succeed.


Where the wheels fall off, in my opinion, is when validation is done in multiple places. At the day gig, we have services calling services calling Hibernate calling SQL. A constraint violation can happen any where, which is hard to report / log. Rules are repeated across layers, making maintenance a nightmare.

I’m not a big fan of validation being done by the database, other than foreign keys and uniqueness. Because I don’t know how to treat those rules as source code, and I don’t know how to step thru them with a debugger.

I would love to have user defined types, like David Bacon’s Kava language. So the type system can do bounds checking like day of week has to be 1 thru 7.

For stuff like SSN and part numbers, after the lexing, that’s all best done in program logic.


Cheers, Jason

PS- Apologies, ARON’s docs are out of date. ARON data files can now include other files, and there’s a rudimentary override implementation; both are not reflected in the wiki. https://code.google.com/p/aron/




------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Dennis Sosnoski
2013-11-27 06:25:43 UTC
Permalink
Hi Jason,
...Hi Dennis Sosnoski.
Post by Eric Jain
Schema is a horrible mess, but does at least allow you to specify both [structure and data types] of the document (though poorly in some cases, such as with the date/time types).
Date/time is a great example.
ARON supports dates natively, using a hybrid of syntax and helper classes (#s 1 and 3 above).
ARON's grammar tokenizes the input stream. Then date values are verified using SimpleDateFormat. Instead of me coding an uber date parser, I premade various SDF instances using common date formats, hoping one will succeed.
The big problem I have with schema date/time data types is the
*optional* time zone. This probably makes sense for the sloppy
hand-generated markup documents that the POP crowd work with, but for
data exchange between programs it makes no sense at all. And all the
date/time types explicitly make the zone information optional, so
there's no way to say in schema that you want a date or time value and
it always needs to be in UTC (the only choice that really makes sense) -
instead, you say you want a date or time and it's up to whoever
constructs the document whether they want to tell you that that value
actually means. Idiots.

ARON looks roughly equivalent to JSONl. Do you have a document
description format for it?
Where the wheels fall off, in my opinion, is when validation is done in multiple places. At the day gig, we have services calling services calling Hibernate calling SQL. A constraint violation can happen any where, which is hard to report / log. Rules are repeated across layers, making maintenance a nightmare.
Yes, that is annoying. That's one of the reasons I'd prefer to limit the
document interchange specification to just saying the structure of the
data and the data type for each value, and to have that enforced by the
parser. After that it's really up to the application to do what it wants
in terms of validation (such as by annotations, by some sort of rules
engine, or by hard-coded checks).

- Dennis



------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Eric Jain
2013-11-27 07:03:57 UTC
Permalink
[...] And all the
date/time types explicitly make the zone information optional, so
there's no way to say in schema that you want a date or time value and
it always needs to be in UTC (the only choice that really makes sense) -
instead, you say you want a date or time and it's up to whoever
constructs the document whether they want to tell you that that value
actually means. Idiots.
That does seem like a major shortcoming...

I think there are valid use cases both for time with offset and for
local time (latter can make sense e.g. if a time zone ID is given
elsewhere or assumed to be UTC, though mapping tools probably can't
handle that)?

Java 8 got it all figured out with LocalDateTime, OffsetDateTime and
ZonedDateTime :-)
--
Eric Jain
zenobase.com -- What do you want to track today?


------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Konstantin Ignatyev
2013-11-27 08:28:39 UTC
Permalink
Do not you think Eric that Java is a bit slow? JodaTime figured it all
circa 2003.....
Post by Eric Jain
Java 8 got it all figured out with LocalDateTime, OffsetDateTime and
ZonedDateTime :-)
--
Konstantin Ignatyev

PS: If this is a typical day on planet Earth, humans will add fifteen
million tons of carbon to the atmosphere, destroy 115 square miles of
tropical rainforest, create seventy-two miles of desert, eliminate between
forty to one hundred species, erode seventy-one million tons of topsoil,
add 2,700 tons of CFCs to the stratosphere, and increase their population
by 263,000

Bowers, C.A. The Culture of Denial: Why the Environmental Movement Needs a
Strategy for Reforming Universities and Public Schools. New York: State
University of New York Press, 1997: (4) (5) (p.206)
Eric Jain
2013-11-27 08:46:57 UTC
Permalink
On Wed, Nov 27, 2013 at 12:28 AM, Konstantin Ignatyev
Do not you think Eric that Java is a bit slow? JodaTime figured it all circa 2003.....
Almost: Joda doesn't distinguish explicitly between OffsetDateTime and
ZonedDateTime :-)
--
Eric Jain
zenobase.com -- What do you want to track today?


------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
P.Hill
2013-12-03 04:59:16 UTC
Permalink
Post by Konstantin Ignatyev
Do not you think Eric that Java is a bit slow? JodaTime figured it all
circa 2003.....
JodaTime was directly the progenitor of the current stuff and several of
JodaTime folks worked to get it in the standard.
There is not really an us vs them in that case, just a very long
committee release based driectly on JodaTime, a flaw that needing fixing
in JodaTime, and then Sun sold to Oracle, long release cycles .... zzzzzzzzz

Yes, there are use-cases for no TZ (floating) datetimes, lots of uses
for Z (UTC/GMT) datatimes, and plenty for local datetimes. I think they
all can be used in iCal standards. In fact, I was on that mailing list
back in the day. I certainly think Joda and now Java's separate, but
related classes for each kind of datetime is the right approach NOT the
.Net approach where there is one class with an optional TZ (meaning
unspecified floating, not default to UTC).

But I digress.

-Paul

Dennis Sosnoski
2013-11-27 11:34:34 UTC
Permalink
Post by Eric Jain
[...] And all the
date/time types explicitly make the zone information optional, so
there's no way to say in schema that you want a date or time value and
it always needs to be in UTC (the only choice that really makes sense) -
instead, you say you want a date or time and it's up to whoever
constructs the document whether they want to tell you that that value
actually means. Idiots.
That does seem like a major shortcoming...
I think there are valid use cases both for time with offset and for
local time (latter can make sense e.g. if a time zone ID is given
elsewhere or assumed to be UTC, though mapping tools probably can't
handle that)?
Yes, I definitely agree. If you're giving a birthdate, for instance, it
doesn't usually make sense to have a time zone. If you're giving the
effective date of an insurance policy, it generally does. The problem
with schema is that it doesn't allow you to say which one you expect,
and explicitly requires you to treat all date/time values which do not
have a time zone as though they could be in any time zone (so +/- about
13 hours).

- Dennis



------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
George Smith
2013-11-27 16:18:14 UTC
Permalink
Not to totally derail this conversion into the Timestamp space, but I was
talking with one of the GitHub developers/presenters about how they handle
the transport/presentation of timestamps, and he said they used something
similar to:

2014/01/16T08:15:23.123Z-09:30

An ISO 8601 UTC form WITH an optional local offset (if not there then the
local time WAS UTC). The power of this form is that it is sortable (zulu
time), and can reliably (without resorting to some stupid massive
historical table of timezones and offsets w/ date-time ranges for Daylight
savings time BS) show local time as entered and zulu time. It can also,
with resorting to a stupid (slightly less smaller) historical table of
(only your local timezone) offsets w/ date-time ranges for Daylight savings
time BS, show the time in the current timezone.

George
Post by Dennis Sosnoski
Post by Eric Jain
[...] And all the
date/time types explicitly make the zone information optional, so
there's no way to say in schema that you want a date or time value and
it always needs to be in UTC (the only choice that really makes sense) -
instead, you say you want a date or time and it's up to whoever
constructs the document whether they want to tell you that that value
actually means. Idiots.
That does seem like a major shortcoming...
I think there are valid use cases both for time with offset and for
local time (latter can make sense e.g. if a time zone ID is given
elsewhere or assumed to be UTC, though mapping tools probably can't
handle that)?
Yes, I definitely agree. If you're giving a birthdate, for instance, it
doesn't usually make sense to have a time zone. If you're giving the
effective date of an insurance policy, it generally does. The problem
with schema is that it doesn't allow you to say which one you expect,
and explicitly requires you to treat all date/time values which do not
have a time zone as though they could be in any time zone (so +/- about
13 hours).
- Dennis
--
"And the users exclaimed with a laugh and a taunt: It's just what we
asked for but not what we want." -- Unknown
Jason Osgood
2013-11-27 16:54:12 UTC
Permalink
Hi George.
Post by George Smith
2014/01/16T08:15:23.123Z-09:30
An ISO 8601 UTC form WITH an optional local offset (if not there then the local time WAS UTC).
Big fan of Zulu time. For ARON I decided to use hyphens. Because sometimes dates are embedded in URLs.

Trying to understand what you, Eric, Bob, Dennis, others have been trying to edumacate me on date, time, timezone, AND allowing for conciseness when time doesn’t matter, ARON has a fall thru, where it tries to parse the most specific format first.


"yyyy-MM-dd'T'HH:mm:ssZ"
"yyyy-MM-dd'T'HH:mm:ssz"
"yyyy-MM-dd'T'HH:mm:ss"
"yyyy-MM-dd'T'HH:mm"
"yyyy-MM-dd"

What I call “optimistic parsing”. Harhar.








------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
George Smith
2013-11-27 17:01:58 UTC
Permalink
Jason,

Right, I "should" have used dashes as that is the ISO 8601 format. It
looks like you forgot to start with the version that includes fractional
seconds.

George
Post by Jason Osgood
Hi George.
GitHub developers/presenters about how they handle the
transport/presentation of timestamps, and he said they used something
2014/01/16T08:15:23.123Z-09:30
An ISO 8601 UTC form WITH an optional local offset (if not there then
the local time WAS UTC).
Big fan of Zulu time. For ARON I decided to use hyphens. Because sometimes
dates are embedded in URLs.
Trying to understand what you, Eric, Bob, Dennis, others have been trying
to edumacate me on date, time, timezone, AND allowing for conciseness when
time doesn’t matter, ARON has a fall thru, where it tries to parse the most
specific format first.
"yyyy-MM-dd'T'HH:mm:ssZ"
"yyyy-MM-dd'T'HH:mm:ssz"
"yyyy-MM-dd'T'HH:mm:ss"
"yyyy-MM-dd'T'HH:mm"
"yyyy-MM-dd"
What I call “optimistic parsing”. Harhar.
------------------------------------
Yahoo Groups Links
--
"And the users exclaimed with a laugh and a taunt: It's just what we
asked for but not what we want." -- Unknown
Stan Dyck
2013-11-27 03:06:46 UTC
Permalink
Excellent observation! If you agree and want to eschew XSD I suggest you
use RELAX NG for describing document structure and Schematron for
specifying validation rules. Both have the advantage of being available now.

StanD.
Post by Konstantin Ignatyev
Apologies for not making myself clearer: I did not mean that Scala
classes, I meant using them to describe structure of documents because
of expressiveness. Other decent IDLs would work too
(https://developers.google.com/protocol-buffers/docs/overview for
example).
And I think that expressiveness of a language is VERY important, that
is why we have domain specific languages like SQL and CSS.
And my another major point is that XSD fails because it mixes
structure with validation. Validation needs its own definition
language but it should not be mixed with structure definition.
Typical rookie mistake ;) like early HTML tried to be structure and
presentation - we just recently started to recover from all that madness.
I don't think the relative expressibility of XSD vs. X is at
issue. It is possible to *express* a typing system in any
language, right? The problem is that a Scala expression is
necessarily going to tie you to the JVM which won't be
acceptable everywhere.
If you have decided that your particular business domain needs a
*universal* type system (which I think is a dubious
goal (hello HL7!), but we'll leave that for another discussion), I
think the means by which you express that type system
should at least have a couple of properties. It should be...
1. universally parseable, including human eye parsable, and
2. programmatically producible
Of course, with enough work, all programming languages meet these
requirements, but XSD, for whatever limitations it
might have, is built with those properties in mind. (Where XSD
falls down, IMHO is that it isn't really human producible.)
Interestingly, (to me anyway) homoiconic languages like lisp also
have these properties. That's one of the reasons they
deserve a closer look.
StanD.
Well, I think the answer should be something like Scala classes ;)
Really, because Scala has class Option to capture notion of
case class HCDocument (
id:String,
name:String,
person:Person,
tags:Option[List[String]] = None
)
What useful can be expressed in XSD, but cannot be expressed as
class with optionals?
Note the "useful" adjective, in my books things like schema
definition for ISBN are not useful
http://www.xfront.com/isbn.html
Is it just me or there is a reasonable use for schemas like that
one for ISBN ( all 3589 lines of it)?
I think structure definitions and validations should live in
separate definitions
so implementer can use any validator that makes sense in the
context when it makes sense
On Tue, Nov 26, 2013 at 12:44 PM, Dennis Sosnoski
Post by Jason Osgood
...
http://release.niem.gov/niem/3.0/schemas.html
Quite an impressive body of work.
Despite my antipathy towards all things XML, this work must
still be done. If not XSD, then what? I don’t have an
answer for that.
Yes, unfortunately I don't have an answer for that either. And
impressive they may be, but these schemas are a real pain to
work with
using JAXB. Layers and layers of classes to dig through to get
to the
actual data. :-(
- Dennis
--
Konstantin Ignatyev
PS: If this is a typical day on planet Earth, humans will add fifteen
million tons of carbon to the atmosphere, destroy 115 square miles of
tropical rainforest, create seventy-two miles of desert, eliminate
between forty to one hundred species, erode seventy-one million tons
of topsoil, add 2,700 tons of CFCs to the stratosphere, and increase
their population by 263,000
Bowers, C.A. The Culture of Denial: Why the Environmental Movement
Needs a Strategy for Reforming Universities and Public Schools. New
York: State University of New York Press, 1997: (4) (5) (p.206)
Dennis Sosnoski
2013-11-27 04:01:24 UTC
Permalink
I've liked Relax NG ever since I first saw it. It's cleaner and much
easier to understand than schema - but there's no sign that it's
becoming an accepted alternative for standards, so I'm afraid it's going
to stay a niche technology.

- Dennis
Post by Stan Dyck
Excellent observation! If you agree and want to eschew XSD I suggest
you use RELAX NG for describing document structure and Schematron for
specifying validation rules. Both have the advantage of being
available now.
StanD.
Post by Konstantin Ignatyev
Apologies for not making myself clearer: I did not mean that Scala
classes, I meant using them to describe structure of documents
because of expressiveness. Other decent IDLs would work too
(https://developers.google.com/protocol-buffers/docs/overview for
example).
And I think that expressiveness of a language is VERY important, that
is why we have domain specific languages like SQL and CSS.
And my another major point is that XSD fails because it mixes
structure with validation. Validation needs its own definition
language but it should not be mixed with structure definition.
Typical rookie mistake ;) like early HTML tried to be structure and
presentation - we just recently started to recover from all that madness.
I don't think the relative expressibility of XSD vs. X is at
issue. It is possible to *express* a typing system in any
language, right? The problem is that a Scala expression is
necessarily going to tie you to the JVM which won't be
acceptable everywhere.
If you have decided that your particular business domain needs a
*universal* type system (which I think is a dubious
goal (hello HL7!), but we'll leave that for another discussion),
I think the means by which you express that type system
should at least have a couple of properties. It should be...
1. universally parseable, including human eye parsable, and
2. programmatically producible
Of course, with enough work, all programming languages meet these
requirements, but XSD, for whatever limitations it
might have, is built with those properties in mind. (Where XSD
falls down, IMHO is that it isn't really human producible.)
Interestingly, (to me anyway) homoiconic languages like lisp also
have these properties. That's one of the reasons they
deserve a closer look.
StanD.
Well, I think the answer should be something like Scala classes ;)
Really, because Scala has class Option to capture notion of
case class HCDocument (
id:String,
name:String,
person:Person,
tags:Option[List[String]] = None
)
What useful can be expressed in XSD, but cannot be expressed as
class with optionals?
Note the "useful" adjective, in my books things like schema
definition for ISBN are not useful
http://www.xfront.com/isbn.html
Is it just me or there is a reasonable use for schemas like
that one for ISBN ( all 3589 lines of it)?
I think structure definitions and validations should live in
separate definitions
so implementer can use any validator that makes sense in the
context when it makes sense
On Tue, Nov 26, 2013 at 12:44 PM, Dennis Sosnoski
Post by Jason Osgood
...
http://release.niem.gov/niem/3.0/schemas.html
Quite an impressive body of work.
Despite my antipathy towards all things XML, this work must
still be done. If not XSD, then what? I don’t have an
answer for that.
Yes, unfortunately I don't have an answer for that either. And
impressive they may be, but these schemas are a real pain to
work with
using JAXB. Layers and layers of classes to dig through to get
to the
actual data. :-(
- Dennis
--
Konstantin Ignatyev
PS: If this is a typical day on planet Earth, humans will add fifteen
million tons of carbon to the atmosphere, destroy 115 square miles of
tropical rainforest, create seventy-two miles of desert, eliminate
between forty to one hundred species, erode seventy-one million tons
of topsoil, add 2,700 tons of CFCs to the stratosphere, and increase
their population by 263,000
Bowers, C.A. The Culture of Denial: Why the Environmental Movement
Needs a Strategy for Reforming Universities and Public Schools. New
York: State University of New York Press, 1997: (4) (5) (p.206)
Stan Dyck
2013-11-27 04:18:18 UTC
Permalink
No sign? That's what all the Windows people say about their OS. I don't
listen to them either ;)

Support for RELAX NG is built into javax.xml.validation.

http://docs.oracle.com/javase/7/docs/api/javax/xml/validation/SchemaFactory.html

If it's better, you should use it.

StanD.
Post by Dennis Sosnoski
I've liked Relax NG ever since I first saw it. It's cleaner and much
easier to understand than schema - but there's no sign that it's
becoming an accepted alternative for standards, so I'm afraid it's
going to stay a niche technology.
- Dennis
Post by Stan Dyck
Excellent observation! If you agree and want to eschew XSD I suggest
you use RELAX NG for describing document structure and Schematron for
specifying validation rules. Both have the advantage of being
available now.
StanD.
Post by Konstantin Ignatyev
Apologies for not making myself clearer: I did not mean that Scala
classes, I meant using them to describe structure of documents
because of expressiveness. Other decent IDLs would work too
(https://developers.google.com/protocol-buffers/docs/overview for
example).
And I think that expressiveness of a language is VERY important,
that is why we have domain specific languages like SQL and CSS.
And my another major point is that XSD fails because it mixes
structure with validation. Validation needs its own definition
language but it should not be mixed with structure definition.
Typical rookie mistake ;) like early HTML tried to be structure and
presentation - we just recently started to recover from all that madness.
I don't think the relative expressibility of XSD vs. X is at
issue. It is possible to *express* a typing system in any
language, right? The problem is that a Scala expression is
necessarily going to tie you to the JVM which won't be
acceptable everywhere.
If you have decided that your particular business domain needs a
*universal* type system (which I think is a dubious
goal (hello HL7!), but we'll leave that for another discussion),
I think the means by which you express that type system
should at least have a couple of properties. It should be...
1. universally parseable, including human eye parsable, and
2. programmatically producible
Of course, with enough work, all programming languages meet
these requirements, but XSD, for whatever limitations it
might have, is built with those properties in mind. (Where XSD
falls down, IMHO is that it isn't really human producible.)
Interestingly, (to me anyway) homoiconic languages like lisp
also have these properties. That's one of the reasons they
deserve a closer look.
StanD.
Well, I think the answer should be something like Scala classes ;)
Really, because Scala has class Option to capture notion of
case class HCDocument (
id:String,
name:String,
person:Person,
tags:Option[List[String]] = None
)
What useful can be expressed in XSD, but cannot be expressed
as class with optionals?
Note the "useful" adjective, in my books things like schema
definition for ISBN are not useful
http://www.xfront.com/isbn.html
Is it just me or there is a reasonable use for schemas like
that one for ISBN ( all 3589 lines of it)?
I think structure definitions and validations should live in
separate definitions
so implementer can use any validator that makes sense in the
context when it makes sense
On Tue, Nov 26, 2013 at 12:44 PM, Dennis Sosnoski
Post by Jason Osgood
...
http://release.niem.gov/niem/3.0/schemas.html
Quite an impressive body of work.
Despite my antipathy towards all things XML, this work must
still be done. If not XSD, then what? I don’t have an
answer for that.
Yes, unfortunately I don't have an answer for that either. And
impressive they may be, but these schemas are a real pain to
work with
using JAXB. Layers and layers of classes to dig through to get
to the
actual data. :-(
- Dennis
--
Konstantin Ignatyev
PS: If this is a typical day on planet Earth, humans will add
fifteen million tons of carbon to the atmosphere, destroy 115 square
miles of tropical rainforest, create seventy-two miles of desert,
eliminate between forty to one hundred species, erode seventy-one
million tons of topsoil, add 2,700 tons of CFCs to the stratosphere,
and increase their population by 263,000
Bowers, C.A. The Culture of Denial: Why the Environmental Movement
Needs a Strategy for Reforming Universities and Public Schools. New
York: State University of New York Press, 1997: (4) (5) (p.206)
Dennis Sosnoski
2013-11-27 06:13:26 UTC
Permalink
The problem is that a close-approximation-to-nobody is using Relax NG to
define their data structures. It's the dark side of the network effect -
if everybody in a dispersed market is using technology A, competing
technology B is not going to have a chance even if it's really a better
choice (same reason Windows is still around, IMHO). The only exception
is if B gets a backer with sufficient strength to swing the market into
line, or if A becomes such a big pain that everyone agrees to change
(and in the latter case, it's more likely to be a change to a "new,
improved" version of A that happens to incorporate some features of B,
rather than to B). In this case, if you can get Microsoft to back Relax
NG it'll take over... but Microsoft was one of the big backers of the
horrible XML Schema mess, so good luck with that!

As it is, almost all the XML I'm doing is for web services, and Relax NG
is a non-starter there because the web services stacks just don't have
support (at least not all of them - and for web services you really want
to stay with what has support across all stacks, since the whole point
is interoperability).

- Dennis
Post by Stan Dyck
No sign? That's what all the Windows people say about their OS. I
don't listen to them either ;)
Support for RELAX NG is built into javax.xml.validation.
http://docs.oracle.com/javase/7/docs/api/javax/xml/validation/SchemaFactory.html
If it's better, you should use it.
StanD.
Post by Dennis Sosnoski
I've liked Relax NG ever since I first saw it. It's cleaner and much
easier to understand than schema - but there's no sign that it's
becoming an accepted alternative for standards, so I'm afraid it's
going to stay a niche technology.
- Dennis
Post by Stan Dyck
Excellent observation! If you agree and want to eschew XSD I suggest
you use RELAX NG for describing document structure and Schematron
for specifying validation rules. Both have the advantage of being
available now.
StanD.
Post by Konstantin Ignatyev
Apologies for not making myself clearer: I did not mean that Scala
classes, I meant using them to describe structure of documents
because of expressiveness. Other decent IDLs would work too
(https://developers.google.com/protocol-buffers/docs/overview for
example).
And I think that expressiveness of a language is VERY important,
that is why we have domain specific languages like SQL and CSS.
And my another major point is that XSD fails because it mixes
structure with validation. Validation needs its own definition
language but it should not be mixed with structure definition.
Typical rookie mistake ;) like early HTML tried to be structure and
presentation - we just recently started to recover from all that madness.
I don't think the relative expressibility of XSD vs. X is at
issue. It is possible to *express* a typing system in any
language, right? The problem is that a Scala expression is
necessarily going to tie you to the JVM which won't be
acceptable everywhere.
If you have decided that your particular business domain needs
a *universal* type system (which I think is a dubious
goal (hello HL7!), but we'll leave that for another
discussion), I think the means by which you express that type
system
should at least have a couple of properties. It should be...
1. universally parseable, including human eye parsable, and
2. programmatically producible
Of course, with enough work, all programming languages meet
these requirements, but XSD, for whatever limitations it
might have, is built with those properties in mind. (Where XSD
falls down, IMHO is that it isn't really human producible.)
Interestingly, (to me anyway) homoiconic languages like lisp
also have these properties. That's one of the reasons they
deserve a closer look.
StanD.
Well, I think the answer should be something like Scala
classes ;)
Really, because Scala has class Option to capture notion of
case class HCDocument (
id:String,
name:String,
person:Person,
tags:Option[List[String]] = None
)
What useful can be expressed in XSD, but cannot be expressed
as class with optionals?
Note the "useful" adjective, in my books things like schema
definition for ISBN are not useful
http://www.xfront.com/isbn.html
Is it just me or there is a reasonable use for schemas like
that one for ISBN ( all 3589 lines of it)?
I think structure definitions and validations should live in
separate definitions
so implementer can use any validator that makes sense in the
context when it makes sense
On Tue, Nov 26, 2013 at 12:44 PM, Dennis Sosnoski
Post by Jason Osgood
...
http://release.niem.gov/niem/3.0/schemas.html
Quite an impressive body of work.
Despite my antipathy towards all things XML, this work must
still be done. If not XSD, then what? I don’t have an
answer for that.
Yes, unfortunately I don't have an answer for that either. And
impressive they may be, but these schemas are a real pain to
work with
using JAXB. Layers and layers of classes to dig through to
get to the
actual data. :-(
- Dennis
--
Konstantin Ignatyev
PS: If this is a typical day on planet Earth, humans will add
fifteen million tons of carbon to the atmosphere, destroy 115
square miles of tropical rainforest, create seventy-two miles of
desert, eliminate between forty to one hundred species, erode
seventy-one million tons of topsoil, add 2,700 tons of CFCs to the
stratosphere, and increase their population by 263,000
Bowers, C.A. The Culture of Denial: Why the Environmental Movement
Needs a Strategy for Reforming Universities and Public Schools. New
York: State University of New York Press, 1997: (4) (5) (p.206)
Dennis Sosnoski
2013-11-27 03:38:55 UTC
Permalink
Post by Konstantin Ignatyev
Apologies for not making myself clearer: I did not mean that Scala
classes, I meant using them to describe structure of documents because
of expressiveness. Other decent IDLs would work too
(https://developers.google.com/protocol-buffers/docs/overview for
example).
And I think that expressiveness of a language is VERY important, that
is why we have domain specific languages like SQL and CSS.
And my another major point is that XSD fails because it mixes
structure with validation. Validation needs its own definition
language but it should not be mixed with structure definition.
I think that what's needed for business document exchange is a document
definition format that specifies both structure and data types. Schema
is a horrible mess, but does at least allow you to specify both these
aspects of the document (though poorly in some cases, such as with the
date/time types). Of course, .Net decided to discard much of the
structure in data binding (processing elements as a stew where order and
optionality are ignored), and Sun followed suite (because they wanted to
be compatible with .Net), so a lot of the benefits of schema are lost
unless you turn on validation (which very few people do for business
documents).

There have been proposals to provide similar document definitions for
other formats, such as JSON. I'd personally love to see some alternative
become widely accepted, but don't see any chance of something replacing
XML and Schema. Despite all the warts, these became the accepted
standard for business document exchange and will probably be with us for
a very, very long time to come. :-(

- Dennis
Jason Osgood
2013-11-27 01:56:05 UTC
Permalink
Hi Dennis Sosnoski.
...these schemas are a real pain to work with using JAXB. Layers and layers of classes to dig through to get to the actual data. :-(
Seems like the “favor composition over inheritance” heuristic is ignored by many of the (XML) standards.

Hmmm.


Cheers, Jason

------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/seajug/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/seajug/join
(Yahoo! ID required)

<*> To change settings via email:
seajug-digest-***@public.gmane.org
seajug-fullfeatured-***@public.gmane.org

<*> To unsubscribe from this group, send an email to:
seajug-unsubscribe-***@public.gmane.org

<*> Your use of Yahoo Groups is subject to:
http://info.yahoo.com/legal/us/yahoo/utos/terms/
Loading...