Sep 8


DocOrigin performance is ... GREAT!

Ask your SE to provide you with the DocOrigin Performance test collateral so you can run that test on your own machine. The collateral includes our standard Sample_Invoice test form and a script to create as much data for it as you like. The test form is not just a couple objects on a page, but over a full page of detail lines with totals, logo, etc. It should be fairly real world representative. Of course, if you wish to substitute in your own form and data to get more close-to-home results, please do so.

The trouble with performance is... it varies

Plain old multi-processing will mean results will vary a little from run-to-run. How powerful your machine is, memory, disk speed will of course matter. What else you have running at the same time clearly matters as well. It's pretty much impossible to be definitive. But, I repeat...

DocOrigin performance is ... REALLY GREAT!

On Windows, the performance script looks at your systeminfo results and reports a bit of context info so as to edge a bit closer to definitiveness. E.g., on my machine it says:

DocOrigin Performance Statistics
Based on form Sample_Invoice.xatw

This is system 'CYLENT', with 5996K memory, with speed rating '1392'
Using DocOrigin Merge version ''

Ok, it's a mere $500 laptop, Intel i5, 1 processor, dual core, 6GB, 5400 rpm drive, running Windows 7. Standard, basic machine.

After that it runs through various combinations of number of documents and pages per document. That matters a lot. Be wary of statistics that don't report such metrics. The (ok, 'A', it varies...) result is...

 #Docs  Pgs/Doc   ms/Page    ppm
    1 x     1      118ms     510 ppm
    1 x     2       62ms     974 ppm
    1 x     5       29ms   2,091 ppm
    1 x    50       10ms   6,216 ppm
    1 x   100       10ms   5,783 ppm
    1 x   200       10ms   6,093 ppm
    1 x   300       11ms   5,340 ppm
    1 x   400       12ms   4,947 ppm
    1 x   500       13ms   4,583 ppm
    2 x     1       63ms     952 ppm
    5 x     1       28ms   2,151 ppm
   50 x     1        7ms   8,192 ppm
   50 x     2        7ms   8,477 ppm
  100 x     1        7ms   9,119 ppm
  500 x     1        5ms  10,983 ppm
 1000 x     1        5ms  11,765 ppm
 2000 x     1        5ms  11,926 ppm
 1000 x     2        5ms  11,090 ppm
 2000 x     2        5ms  11,170 ppm
10000 x     2        5ms  10,940 ppm
20000 x     2        5ms  11,021 ppm
    1 x  1000       18ms   3,271 ppm
50000 x     2        6ms  10,461 ppm

Biased me, I think those numbers are not too shabby. Unsurprisingly, I have Visual Studio, Gmail, virus checking, and whatever other flotsam that Windows likes to run, all going on in the background. Perhaps your production server would not have that! Where's the 15,000 ppm metric? It varies, maybe I should stop some processes. Our test suite runs this every night and complains bitterly if the speed is down by over a few percent. BTW, it runs on an older machine, 4gb, with an online webserver being supported too.

Oh, important. These runs are each producing, as it happens, one "combined" PDF output file. That's pretty normal I think, when comparing apples to apples.

BTW, DocOrigin is doing "page n of m" processing with no second pass of the data required. It's all done, and all done at once.

Finally, that is just one instance of DocOrigin Merge running on my little laptop. There is nothing stopping you from running multiple instances.

Comparing with Adobe LiveCycle Output

I ran across a March, 2013 community blog posting for Adobe LiveCycle Output. It was interesting. It expounded upon how, with some configuration changes, they were able to get a 300% improvement from an out-of-the-box throughput of 980 surfaces/minute to 2750 surfaces/minute

Performance Tuning Tips for Faster LiveCycle Output

Well, that's a pretty good improvement. But let's compare. Their blog says

  • OS: Win 2008 R2
  • Arch : 64 bit
  • CPU Cores: 8
  • RAM : 32 GB
** Results may vary depending upon the hardware spec of the machine, on which tests are performed.

Yes, I like that footnote. Couldn't agree more. Hmmm. 32GB, 8 cores, compared to 6GB, 2 cores. Wow, 8 cores, what does that do to the license price?

I don't know if Win 2008 R2 is seriously different performance-wise than Win7

BTW, that "Arch: 64 bit" - yes, of course, in 2013(!) but as I understand it, XMLForm.exe is still just a 32-bit app, with reachable limits.

The blog also says:

For running the tests, configurations were as follows:
  • BMC pool size = 10
  • Watch Folder Batch size = 13
  • Max inline size = 512 KB
  • Input file size(XML) = 490 KB
  • JVM Args: -Xms2048m -Xmx4096m -XX:PermSize=512m -XX:MaxPermSize=1024m

From a DocOrigin perspective, the initial reaction to that is: "Yikes! What is all that? Sounds complex". With DocOrigin you just run Merge, and it does it. The whole blog posting seemed like rocket science.

The big thing that's missing is the number of documents and pages per document metrics. I have a hunch that what is being used for this test is the PurchaseOrder form as per the standard Installation Verification Sample. Also the 490k XML file size correlates very nicely to the 50 documents, of 2 pages each, sample.

So let's look at that 50x2 scenario:

DocOriginAdobe LiveCycle Output
8475 ppm2750 ppm
2 cores8 cores

Is this really apples -to- apples? Well, it's easy to open the PurchaseOrder.xdp in DocOrigin and save it as a DocOrigin .xatw file. So let's test the performance using what we think is the same form.

 #Docs  Pgs/Doc   ms/Page    ppm
   50 x     2       11ms   5,455 ppm

Oops, that was with producing individual PDFs, one per document. Let's try the normal, combined PDF result way. I believe that LiveCycle Output has no choice but to do it the combined output way.

 #Docs  Pgs/Doc   ms/Page    ppm
   50 x     2        7ms   8,982 ppm

Ok, that's about right: 9,000 ppm versus 2,750 ppm, on vastly different machines.

Multiple Instances

But what about multi-instances of Merge? Well, I tried that on my little 2 core box. By 3 instances I was up at about 18+K ppm. Adding many more instances didn't help much (as expected on a 2 core box). I once did see it hit just over 20K ppm — on a 1 cpu box. I do wonder what it would do on an 8 core, 32 gb machine.

Ok, lots of numbers, but I think two impressions should be clear:

  1. DocOrigin is a lot faster
  2. DocOrigin is a lot easier to run

BUT... performance varies. Try it for yourself on your machine. Invest half an hour. Contact a distributor to get a download link. Download it, install it (under 2 minutes). Tell the distributor that you want the performance test tool too. Download it, unzip anywhere, and run it.

Footnotes (not caveats)

  1. A side note is that DocOrigin combined PDFs do have a concept of "document". It is not just a whole whack of pages, some of which are notionally "Page 1 of". DocOrigin can extract documents out of its combined PDF, by document number or by any bookmark tags that you may have supplied. Of course, DocOrigin can produce a single PDF per each document in a data file of thousands of documents too. The latter is slower than doing a single combined PDF but nothing to lose sleep over.
  2. A LiveCycle Output owner tried to run the 500 document batch. On his machine, after 10 minutes(!), it crashed the whole server (blue screen of death). I ran the 500 and 1,000 doc batch with DO. They ran fine with the same stunning performance. Who cares how many docs there are in a batch?
  3. While DocOrigin processes these documents, i.e. in mid run, you can email them to wherever; you can invoke external infrastructure processes of your choice; copy the output to web sites; use data-driven values in the file names; collect stats for a designed-by-you summary report; oh, and of course you can generate the same document in multiple output formats, from exactly the same document DOM, no re-merge of the data; ... Ever heard of Multi-Channel? See: DocOrigin Multi-Channel
  4. Note that DocOrigin does HTML output and also fillable HTML output, complete with all the HTML5 input types at your disposal. All output formats, including HTML, support dynamic business charting, and not via embedded jaggy raster images, using data from the data stream. I suppose that is not performance related, but it felt worth saying. Sorry.
  5. Rather than all, one or nothing, you can create a new document whenever the branch, region or whatever, changes, so you can group the documents as you go. BTW, DocOrigin, includes data filters, one of which will sort your XML data on various keys so that all of the, for example, branch's documents come out together.