Generating Excel Charts with MarkLogic

GIVE MY CREATION LIFE!and some fun with Formulas too…

This is another interesting one I see regularly: “How can I generate Excel Charts from MarkLogic Server?”

Charts are actually rendered from DrawingML found in the .xlsx package.  The DrawingML is embedded in SpreadsheetML, which is the Open XML format for Excel 2007/2010.

You don’t want to mess with DrawingML, as it’s a nasty frickin riddle, wrapped in an engima, inside a russian doll style matrix of insanity and pain.

Word, Excel and PowerPoint are producers and consumers of XML.  To some extent and to varying degrees, each of their respective XML formats can be understood and worked with in a relatively straightforward and reasonable way.  Sometimes though, the XML generated by these applications is really just a serialization of their object model and you’ll just waste a ton of time and find yourself in an extremely uncomfortable place (ed-like the back of a Volkswagen?) trying to figure the XML out when you don’t have to.  So let’s leave the DrawingML be. Capisci?

Think about it this way:  A chart in a workbook is tied to certain cell values in a worksheet.  When the cell values update, the chart dynamically updates.  At the end of the day, the DrawingML is just a snapshot of what the chart looked like based on the cell values when the Workbook was saved in Excel. (ed-Pivot tables work similarly in this way, but that’s a post for another day.)

Now let’s say we have a workbook containing a chart.  We know we can save our .xlsx to MarkLogic Server and have it automatically unzipped for us, its component XML parts made immediately available for search and re-use.   We can then update our extracted worksheets in the Server using XQuery.  Finally, we can re-zip the extracted workbook components back up and open the updated .xlsx into Excel.  Excel will automatically refresh its chart for us when it consumes the XML so we see the latest visualization of our chart based on the information we added to the worksheets.

5 Steps to Chart Freedom

Step 1

Create your chart in a workbook and drive it off of some cell values.  Note the cells and the name of the worksheet you’re driving your chart from. (example: Sheet1, cells: B2, B3, B4, etc.) I’ve provided a sample .xlsx here.

On Sheet1 we see download counts for a fictional company’s widgets for the month of September. The chart shows downloads for the widgets Foo, Bar, and Wumpus.  The chart columns correspond to cells B2, B3, and B4.

On Sheet2 we see sales counts for a fictional company’s widgets for each salesperson. The chart shows total sales for each salesperson.  The chart sections correspond to cells E2, E3, and E4.  Look closer and you’ll see that the cell values in column E driving the chart are actually the result of formulas; they are SUMs of all widgets for each salesperson row.  Note that the cells B6, C6, D6, and E6 all contain SUM formulas for their respective columns as well.

Step 2

Enable the Office OpenXML Extract and Status Change Handling CPF pipelines for your MarkLogic database so the .xlsx will automatically be unzipped when ingested into MarkLogic and its component parts made available for update.  Also insure you have the URI Lexicon enabled for your database. An example how to set this up can be found here.

Step 3

Save your .xlsx to MarkLogic. Once saved, the .xlsx is unzipped, and we can now manipulate it’s extracted XML component parts directly.  The idea is to save workbooks containing your charts as templates within MarkLogic and then update the extracted worksheet parts based on new information being saved to your database.

Step 4

Use the XQuery API that comes with the MarkLogic Toolkit for Excel to set the cell values for your chart in the extracted worksheet.  In particular, look at the function excel:set-cells() for updating worksheets.  Evaluate the following in CQ.

Note: you may need to update the code samples below to reflect your workbook and where you saved it in MarkLogic.

xquery version “1.0-ml”;

import module namespace excel=”” at “/MarkLogic/openxml/spreadsheet-ml-support.xqy”;

let $doc1 := “/MySpreadsheet_xlsx_parts/xl/worksheets/sheet1.xml”
let $doc2 := “/MySpreadsheet_xlsx_parts/xl/worksheets/sheet2.xml”
let $sheet1:= fn:doc($doc1)/node()
let $sheet2 := fn:doc($doc2)/node()

let $cell1 := excel:cell(“B2”,120)
let $cell2 := excel:cell(“B3”,99)
let $cell3 := excel:cell(“B4”,456)

let $cell4 := excel:cell(“D3”,127)
let $cell5:= excel:cell(“E3″,(),”SUM(B3:D3)”)

return (xdmp:document-insert($doc1, excel:set-cells($sheet1,($cell1, $cell2, $cell3))),
                xdmp:document-insert($doc2, excel:set-cells($sheet2,($cell4, $cell5))))

In the code above, for Sheet1, we see that we use the excel:cell() constructor to create cells for B2, B3, and B4.  We set the values for these cells to new numbers. These numbers could be coming from the results of another query.  We update the worksheet, using excel:set-cells(), passing the function the sheet we want to update, as well as a sequence of cells we’d like added and/or updated on the referenced sheet.  Finally, we xdmp:document-insert() our updated document, overwriting the existing one with our updated worksheet.  Remember, Sheet1 just held the simple chart driven directly from the cell values.

With Sheet2, we again use excel:cell() to create cells for D3 and E3. Sheet2 is more interesting as the chart here is driven from cells that contain formulas. For E3, we create a cell using excel:cell(), setting the value of the cell to the empty sequence, () , and passing in the formula for the cell.  Again we excel:set-cells() to update our worksheet and xdmp:document-insert() to save our updated worksheet back to the Server.

Note on excel:cell(): This function creates a new cell, so if you wish to retain an existing formula for a cell before you update it in a worksheet, you can’t use the 2 argument excel:cell() function.  If you did that, you’d lose the formula for the cell in the worksheet when you overwrite the XML.  You must create the cell with the formula, as we did above for E3.  If this doesn’t work for you, you can always roll your own XQuery to update the cell values for worksheets containing formulas in a different way.

Note on Excel formulas: Unlike charts, cells containing formulas will not calculate and refresh automatically when you open the updated worksheet in Excel if those cells already contain values. The value of the cell within the XML for the worksheet is considered the cached value by Excel and will be displayed when the workbook is opened.  This is done for performance reasons, so formula heavy worksheets don’t take forever to open as they calculate the value for every cell containing a formula when a workbook is opened.  Formula calculation is postponed to avoid wait time when opening a workbook.  As a result of this though, you can create XML for a worksheet that when consumed by Excel, will result in a cell displaying the wrong results given its formula.

To get a formula to calculate the value for a cell when you open a workbook in Excel and insure the correct cell value is displayed, you need to set the value of the cell to nothing.  You can do this using excel:cell(), setting the value of the cell to the empty sequence: ().

For more information on the excel:* functions,  check out the XQuery API docs that come with the Toolkit for Excel.  There are a lot of functions available, all documented and with examples of usage.

Step 5

Zip up the updated .xlsx from it’s extracted component parts and open into Excel.  When you do this, it doesn’t matter what the DrawingML is.  Excel reads the cell values when it consumes the XML and will update the chart automatically.  The next time you save the workbook, the DrawingML is updated to reflect what the chart looks like based on the latest cell values. Evaluate the following in CQ.

xquery version “1.0-ml”;

let $directory := “/MySpreadsheet_xlsx_parts/”
let $uris := cts:uris(“”,”document”,cts:directory-query($directory,”infinity”))
let $parts := for $i in $uris let $x := fn:doc($i) return  $x

let $manifest := <parts xmlns=”xdmp:zip”>
                              for $i in $uris
                              let $dir := fn:substring-after($i,$directory)
                              let $part :=  <part>{$dir}</part>
                              return $part

let $xlsx := xdmp:zip-create($manifest, $parts)
return xdmp:save(“C:\MyUpdatedSheet.xlsx”,$xlsx)

Open MyUpdatedSheet.xlsx into Excel.

BooYaa!  We update a few cells on Sheet1, and our chart automatically updates for us when we open the .xlsx into Excel.

Now take a look at Sheet2.  We updated D3 and set the value of E3 to (). Subsequently, the formula in E3 calculated its SUM formula when the workbook was opened.  Since the chart is driven from E2, E3, and E4, it updated properly as well.  WooHaa!

But take a closer look at cells D6 and E6.  They each contain SUM formulas for their columns, and they’re displaying the wrong values!  (ed-#fail) This is because we didn’t set their values to nothing.  Since the cells contained values in the XML for the worksheet, the cell formulas were not calculated when the workbook was opened and the cached value was displayed.  If you click on each of those cells, you’ll see the formula, click off of them, and the cells will recalculate and update with the correct values.

Bring Excel Workbooks to Life!

There's always another way.So the title was a bit misleading, as we don’t actually generate charts, so much as create the appropriate XML for Excel worksheets so that the Excel application will update and render charts for us when it consumes the XML.   But understanding a little bit of the SpreadsheetML format and how Excel behaves when consuming XML for charts and formulas, the doors open up to some very interesting possibilities.

The above examples are intentionally simple, but think about this…

Instead of a dead .xlsx, that sits lifeless on the filesystem and only alive when opened and active and being manipulated directly in Excel,  workbooks can now stay alive in the Server, constantly updated from complex queries being evaluated as new information is saved to the database.  These workbooks can then be dynamically zipped up when called upon to open in Excel and provide snapshot visualizations and results for a point in time.  Excel will consume the XML and can update charts and calculate formulas when opening this snapshot workbook, while the underlying, extracted workbook lives on and continues to be updated.

But this is just one way to use MarkLogic and Excel together.  There’s always another way…


Dude, Where’s My Worksheet?

An Excel workbook can get “hidden” when you embed it within a PowerPoint.

In Office 2007, its possible to embed an Excel worksheet within a PowerPoint slide. You can do this from within PowerPoint by navigating to the ‘Insert’ tab in the Ribbon and then clicking ‘Object’ in the Text grouping.  The dialog box that appears will allow you to create one of several predefined New objects in your slide, or you can Browse to insert an existing object from a file.  From here we can select an Excel .xlsx we’ve created, and the worksheet will appear within our slide.

This can be very cool and useful if we have some analysis we’d like to present.  Also, whenever we select the inserted worksheet within the slide, it will automatically open in Excel so we can continue to tweak on it using Excel’s functionality.

So embed an Excel document in a PowerPoint (from an existing file). But don’t touch or select or tweak the embedding in the slide, just save the presentation .pptx somewhere. Change the .pptx extension to .zip, unzip the document, and you’ll find the embedded Excel .xlsx under /ppt/embeddings.  Yep, there’s a complete Excel .xlsx package within your PowerPoint .pptx.  Now open the .xlsx directly in Excel by double-clicking it; the Excel application will launch, but NOTHING IS VISIBLE! EVEN THOUGH THE SPREADSHEET IS THERE! EXCEL WILL APPEAR AS IF NO WORKBOOKS ARE OPENED!!!


You have to navigate to ‘View’ on the Ribbon and click ‘Unhide’ to see the document.

What Happened?

If you unzip the .xlsx and examine workbook.xml, you’ll find:

/workbook/bookViews/workbookView/@visibility = “hidden” .

This attribute is set for the embedded document, even though it wasn’t set in the original prior to embedding.

But guess what, if you would’ve tweaked the embedding by selecting it or resizing it in the slide prior to saving the .pptx, this attribute wouldn’t be set. You can open the .xlsx and the workbook IS VISIBLE.  This seems to be an odd quirk of PowerPoint’s default behavior that I stumbled upon while embedding documents programmatically.

I can understand why a behavior like this might go unnoticed if it’s expected the embedded document will only ever be opened within PowerPoint again. In PowerPoint the embedded .xlsx opens and is always visible within the context of the slides.

Embed Shmembed.

The thing is, I’m saving these documents to MarkLogic Server, where I automatically unzip all Office 2007 documents so I can search and reuse the XML, document parts, and original source .pptx, .xlsx, .docx, etc.  in the creation of new Office and other documents.  In MarkLogic, one of the things I’m doing for an app is making embedded Office docs available as independent documents within search results.  It’s freedom baby, yeah! And while we’re seeing this behavior with embedding, we now know that its possible to hide any workbook using that simple attribute.

So if we want our users to be able to view the Excel documents they are selecting for use,  a simple solution is to just remove the @visible and rezip the .xlsx in MarkLogic, either in the saved .xlsx, or when opening the embedded doc from the rezipping of its extracted parts, or both.  Remove that attribute and we’re all good for visibility.

Or,  if you’re feeling particularly ornery,  I guess you could add the attribute to all your organization’s Excel workbooks and watch your users go mad?  Maybe there’s a use-case for that, I don’t know.

But I thought this was interesting so I’d share. Maybe this will help someone who’s tweaking on this random behavior or with AddOLEObject as well.  You can have a .xlsx on your filesystem.  You can unzip and see the workbook and worksheet parts.  You can double-click the .xlsx and it will open in Excel, but nothing is visible, even though the worksheet information *IS* saved in the .xlsx package and available in the Excel application.  It turns out the document is actually open, but hidden.  You just have to ‘Unhide’ it.

Load A Custom UI for Office 2007 from MarkLogic Server

We all know that with Office 2007 the traditional Office file navigation menus and toolbars were replaced with The Ribbon.  The Ribbon a.k.a. RibbonX a.k.a. The Office Fluent Ribbon a.k.a. The Fluent UI a.k.a. …

The naming has evolved over time, but basically the Ribbon is the interface, now called the Fluent UI.  RibbonX is the original name given to the XML that  you can use to customize the Ribbon/Fluent UI to add your own tabs and controls.

Many examples demonstrate how to load a custom Ribbon UI from within a Document-Level customization or an Application-Level Add-in.  These solutions load the XML for the Ribbon from within either the Document itself or the deployed Add-in solution respectively.  But what if we want to keep the XML for our custom Ribbon interface someplace else? How about someplace where we can access and modify it so we can change the user experience in Office without requiring our users to re-install an Add-in or use a new Document template?

It is possible to load a customized Ribbon from an external resource.  Today’s post demonstrates how to load a custom Ribbon for an Application-Level Add-in from Ribbon XML that’s been saved in MarkLogic Server.

Save the Ribbon XML in MarkLogic

To keep it brief, we’ll just assume we’re using the default Documents database and the Docs HTTP Server in MarkLogic.

Open CQ in your browser and evaluate the following:

xquery version "1.0-ml";
<customUI xmlns="">
       <tab id="MyTab" label="My Tab">
         <group id="MyGroup" label="My Group">
           <button id="Button1"
                   label="My Button"
                   screentip="My Button Screentip"
                   imageMso="HappyFace" />

We can manipulate the Ribbon interface with a minimal amount of XML.  The tabs, groups, and controls are actually defined using XML elements. This example creates a new tab,  which includes a new control, a button.  The attributes of controls define callbacks.  Callbacks are the functions that will be called when a user interacts with the controls on our Ribbon. For our button @onAction defines the callback which refers to a function named “OnClick”.

We now need a module that the Add-in can call to fetch the XML for the Ribbon.  Create a file called ribbon.xqy and place it under /Docs.  The contents of the file is simply

xquery version "1.0-ml";

We use xdmp:quote() to return the XML as a string as that’s what our Add-in will parse to create the customization in Office for us.

Create an Add-in for Word

First, create an Add-in project for Word:

  1. Start Visual Studio 2008
  2. From the File menu, select New -> Project
  3. In Project Types, select Office 2007
  4. In Templates, select Word 2007 Add-in
  5. Name your Project, I’m calling mine MyRibbonAddin
  6. click OK

Now that we have our project, we need to add a Ribbon to it.

  1. In the Solution Explorer on the right, right-click the MyRibbonAddin project and select Add -> New Item
  2. In Templates, select Ribbon(XML)
  3. Click Add

This adds a  Ribbon1.cs and Ribbon1.xml to our project.

You might’ve noticed there was a Ribbon(Designer) available in the templates as well. Visual Studio 2008 comes with a Ribbon Designer that allows you to drag and drop components to your Ribbon. You can then use this Ribbon in your deployed Add-in.  You can also export the Ribbon from the Designer to XML, which you may find useful.  We selected Ribbon(XML) as we just want to use the simple Ribbon1.cs that is generated for us.

Now that we’ve started our Ribbon customization, we just need to load the Ribbon from MarkLogic, define our callback so our button will do something, and then enable the Ribbon within our Add-in.

Open Ribbon1.cs and go to the GetCustomUI function.  By default we see its loading the Ribbon1.xml it generated.  Change the function to look like the following:

 public string GetCustomUI(string ribbonID)
   string url = "http://localhost:8000/ribbon.xqy";
   System.Net.WebClient Client = new System.Net.WebClient();
   Client.Credentials = new System.Net.NetworkCredential("user", "password");
   string ribbon = Client.DownloadString(url);

   return ribbon;
   //return GetResourceText("MyRibbonAddin.Ribbon1.xml");

You’ll want to update the network credentials to be the ones you defined when you installed MarkLogic.

In the Ribbon Callbacks #region, define the OnClick function:

public void OnClick(Office.IRibbonControl ribbonControl)

Finally, there’s a TODO section that’s been commented out at the top of Ribbon1.cs.  To enable the Ribbon in the Add-in, copy the following function from the TODO section, paste and uncomment it in ThisAddin.cs, just above the ThisAddin_Startup function:

protected override Microsoft.Office.Core.IRibbonExtensibility CreateRibbonExtensibilityObject()
    return new Ribbon1();

Hit F5 on your keyboard to test the customization in debug mode.  Once Word starts, navigate to “My Tab” and click the Smiley Face “My Button”. Our MessageBox tells us we’ve clicked the button.


Boom! That’s it.  Pretty cool.  We can alter the Office interface by feeding it Ribbon XML at startup. And we can query the XML from an external resource like MarkLogic Server.

Next Steps

The example provided here is quite simple.  But you now have the basic steps required to create your own customization for the Office UI.  There’s a wealth of information on the Ribbon/Fluent UI online.  The Visual Studio 2008 Ribbon designer can help to familiarize you with the basic RibbonX elements as well.

Now that you have a custom UI, you can improve the callback functions to interact with the document a user is currently authoring.  You can also use Ribbon controls to send and receive information from MarkLogic.  In this way you can reuse content stored in MarkLogic in a document as well as save selected text back to MarkLogic for future reuse.

It is possible to dynamically generate a Ribbon customization from a query as well. You may want to do this depending on the content your author is accessing, or based on their approved level of access for that information, or both. Just modify the ribbon.xqy we created earlier in the post.

As an example, in the past I’ve found it useful to dynamically generate dropdown menus based on Word document metadata searches to provide users related reusable content chunks to insert into their documents.


Final Ribbon Notes

The Ribbon customization loads once, whenever you start the Office application.  You can’t change the layout or controls at runtime. You can however, update values within those controls.   So in the search dropdown example above, it is possible to update the contents of the dropdown based on some event, but you can’t change a dropdown to a checkbox while the Office application is running. Capiche?

The good news is that if you’ve generalized your callbacks you can update the ribbon.xqy and any other .xqy files on the server, and the next time a user restarts their Office application, they can have new functionality available to them.  This could be useful so you don’t have to constantly recompile your Add-in and/or send your users a new .msi to install.

How did we get the Smiley Face? There’s a list of icons you can download which provides you ids for several of the icons available throughout Office.  There’s also a list of  control ids available for download. This second list shows you all the built in controls available to you for use in Office.

Which brings us to: Yes, you can change the layout of the Ribbon provided in Office. You can override the functionality of existing controls, and you can remove the existing Ribbon entirely and provide your own, forcing users to only use the controls you provide.  But think of your users and what makes sense.  We want to help them, not hinder them.  You may find the  Office 2007 UI Style guide useful.

Finally, the MarkLogic Toolkit for Word and MarkLogic Toolkit for Excel are both free Application-Level Add-ins and  they are open source.  If you’re interested, these solutions may jump start your development, as well as provide you with other insights into how to make your customizations more useful.  They both provide multiple functions for interacting with documents while users are authoring them.  The Ribbon can be useful, but real estate is limited.  The Toolkit Add-ins make extensive use of a Custom Task Pane to enable authors to integrate information in MarkLogic and Office 2007.  Sometimes the Ribbon, Task Pane combo can be very useful.