The WordPress.com stats helper monkeys prepared a 2011 annual report for this blog. Thought I’d share the love for those interested.
2011 in review
Posted by wpaven on January 1, 2012
Posted in Uncategorized | Tagged: 2011, annual, report, review | Leave a Comment »
Generating Excel Charts with MarkLogic
Posted by wpaven on September 21, 2011
and some fun with Formulas too…
This is another interesting one I see regularly: “How can I generate Excel Charts from MarkLogic Server?”
Charts are actually rendered from DrawingML found in the .xlsx package. The DrawingML is embedded in SpreadsheetML, which is the Open XML format for Excel 2007/2010.
You don’t want to mess with DrawingML, as it’s a nasty frickin riddle, wrapped in an engima, inside a russian doll style matrix of insanity and pain.
Word, Excel and PowerPoint are producers and consumers of XML. To some extent and to varying degrees, each of their respective XML formats can be understood and worked with in a relatively straightforward and reasonable way. Sometimes though, the XML generated by these applications is really just a serialization of their object model and you’ll just waste a ton of time and find yourself in an extremely uncomfortable place (ed-like the back of a Volkswagen?) trying to figure the XML out when you don’t have to. So let’s leave the DrawingML be. Capisci?
Think about it this way: A chart in a workbook is tied to certain cell values in a worksheet. When the cell values update, the chart dynamically updates. At the end of the day, the DrawingML is just a snapshot of what the chart looked like based on the cell values when the Workbook was saved in Excel. (ed-Pivot tables work similarly in this way, but that’s a post for another day.)
Now let’s say we have a workbook containing a chart. We know we can save our .xlsx to MarkLogic Server and have it automatically unzipped for us, its component XML parts made immediately available for search and re-use. We can then update our extracted worksheets in the Server using XQuery. Finally, we can re-zip the extracted workbook components back up and open the updated .xlsx into Excel. Excel will automatically refresh its chart for us when it consumes the XML so we see the latest visualization of our chart based on the information we added to the worksheets.
5 Steps to Chart Freedom
Step 1
Create your chart in a workbook and drive it off of some cell values. Note the cells and the name of the worksheet you’re driving your chart from. (example: Sheet1, cells: B2, B3, B4, etc.) I’ve provided a sample .xlsx here.
On Sheet1 we see download counts for a fictional company’s widgets for the month of September. The chart shows downloads for the widgets Foo, Bar, and Wumpus. The chart columns correspond to cells B2, B3, and B4.
On Sheet2 we see sales counts for a fictional company’s widgets for each salesperson. The chart shows total sales for each salesperson. The chart sections correspond to cells E2, E3, and E4. Look closer and you’ll see that the cell values in column E driving the chart are actually the result of formulas; they are SUMs of all widgets for each salesperson row. Note that the cells B6, C6, D6, and E6 all contain SUM formulas for their respective columns as well.
Step 2
Enable the Office OpenXML Extract and Status Change Handling CPF pipelines for your MarkLogic database so the .xlsx will automatically be unzipped when ingested into MarkLogic and its component parts made available for update. Also insure you have the URI Lexicon enabled for your database. An example how to set this up can be found here.
Step 3
Save your .xlsx to MarkLogic. Once saved, the .xlsx is unzipped, and we can now manipulate it’s extracted XML component parts directly. The idea is to save workbooks containing your charts as templates within MarkLogic and then update the extracted worksheet parts based on new information being saved to your database.
Step 4
Use the XQuery API that comes with the MarkLogic Toolkit for Excel to set the cell values for your chart in the extracted worksheet. In particular, look at the function excel:set-cells() for updating worksheets. Evaluate the following in CQ.
Note: you may need to update the code samples below to reflect your workbook and where you saved it in MarkLogic.
xquery version “1.0-ml”;
import module namespace excel=”http://marklogic.com/openxml/excel” at “/MarkLogic/openxml/spreadsheet-ml-support.xqy”;
let $doc1 := “/MySpreadsheet_xlsx_parts/xl/worksheets/sheet1.xml”
let $doc2 := “/MySpreadsheet_xlsx_parts/xl/worksheets/sheet2.xml”
let $sheet1:= fn:doc($doc1)/node()
let $sheet2 := fn:doc($doc2)/node()
let $cell1 := excel:cell(“B2″,120)
let $cell2 := excel:cell(“B3″,99)
let $cell3 := excel:cell(“B4″,456)
let $cell4 := excel:cell(“D3″,127)
let $cell5:= excel:cell(“E3″,(),”SUM(B3:D3)”)
return (xdmp:document-insert($doc1, excel:set-cells($sheet1,($cell1, $cell2, $cell3))),
xdmp:document-insert($doc2, excel:set-cells($sheet2,($cell4, $cell5))))
In the code above, for Sheet1, we see that we use the excel:cell() constructor to create cells for B2, B3, and B4. We set the values for these cells to new numbers. These numbers could be coming from the results of another query. We update the worksheet, using excel:set-cells(), passing the function the sheet we want to update, as well as a sequence of cells we’d like added and/or updated on the referenced sheet. Finally, we xdmp:document-insert() our updated document, overwriting the existing one with our updated worksheet. Remember, Sheet1 just held the simple chart driven directly from the cell values.
With Sheet2, we again use excel:cell() to create cells for D3 and E3. Sheet2 is more interesting as the chart here is driven from cells that contain formulas. For E3, we create a cell using excel:cell(), setting the value of the cell to the empty sequence, () , and passing in the formula for the cell. Again we excel:set-cells() to update our worksheet and xdmp:document-insert() to save our updated worksheet back to the Server.
Note on excel:cell(): This function creates a new cell, so if you wish to retain an existing formula for a cell before you update it in a worksheet, you can’t use the 2 argument excel:cell() function. If you did that, you’d lose the formula for the cell in the worksheet when you overwrite the XML. You must create the cell with the formula, as we did above for E3. If this doesn’t work for you, you can always roll your own XQuery to update the cell values for worksheets containing formulas in a different way.
Note on Excel formulas: Unlike charts, cells containing formulas will not calculate and refresh automatically when you open the updated worksheet in Excel if those cells already contain values. The value of the cell within the XML for the worksheet is considered the cached value by Excel and will be displayed when the workbook is opened. This is done for performance reasons, so formula heavy worksheets don’t take forever to open as they calculate the value for every cell containing a formula when a workbook is opened. Formula calculation is postponed to avoid wait time when opening a workbook. As a result of this though, you can create XML for a worksheet that when consumed by Excel, will result in a cell displaying the wrong results given its formula.
To get a formula to calculate the value for a cell when you open a workbook in Excel and insure the correct cell value is displayed, you need to set the value of the cell to nothing. You can do this using excel:cell(), setting the value of the cell to the empty sequence: ().
For more information on the excel:* functions, check out the XQuery API docs that come with the Toolkit for Excel. There are a lot of functions available, all documented and with examples of usage.
Step 5
Zip up the updated .xlsx from it’s extracted component parts and open into Excel. When you do this, it doesn’t matter what the DrawingML is. Excel reads the cell values when it consumes the XML and will update the chart automatically. The next time you save the workbook, the DrawingML is updated to reflect what the chart looks like based on the latest cell values. Evaluate the following in CQ.
xquery version “1.0-ml”;
let $directory := “/MySpreadsheet_xlsx_parts/”
let $uris := cts:uris(“”,”document”,cts:directory-query($directory,”infinity”))
let $parts := for $i in $uris let $x := fn:doc($i) return $x
let $manifest := <parts xmlns=”xdmp:zip”>
{
for $i in $uris
let $dir := fn:substring-after($i,$directory)
let $part := <part>{$dir}</part>
return $part
}
</parts>
let $xlsx := xdmp:zip-create($manifest, $parts)
return xdmp:save(“C:\MyUpdatedSheet.xlsx”,$xlsx)
Open MyUpdatedSheet.xlsx into Excel.
BooYaa! We update a few cells on Sheet1, and our chart automatically updates for us when we open the .xlsx into Excel.
Now take a look at Sheet2. We updated D3 and set the value of E3 to (). Subsequently, the formula in E3 calculated its SUM formula when the workbook was opened. Since the chart is driven from E2, E3, and E4, it updated properly as well. WooHaa!
But take a closer look at cells D6 and E6. They each contain SUM formulas for their columns, and they’re displaying the wrong values! (ed-#fail) This is because we didn’t set their values to nothing. Since the cells contained values in the XML for the worksheet, the cell formulas were not calculated when the workbook was opened and the cached value was displayed. If you click on each of those cells, you’ll see the formula, click off of them, and the cells will recalculate and update with the correct values.
Bring Excel Workbooks to Life!
So the title was a bit misleading, as we don’t actually generate charts, so much as create the appropriate XML for Excel worksheets so that the Excel application will update and render charts for us when it consumes the XML. But understanding a little bit of the SpreadsheetML format and how Excel behaves when consuming XML for charts and formulas, the doors open up to some very interesting possibilities.
The above examples are intentionally simple, but think about this…
Instead of a dead .xlsx, that sits lifeless on the filesystem and only alive when opened and active and being manipulated directly in Excel, workbooks can now stay alive in the Server, constantly updated from complex queries being evaluated as new information is saved to the database. These workbooks can then be dynamically zipped up when called upon to open in Excel and provide snapshot visualizations and results for a point in time. Excel will consume the XML and can update charts and calculate formulas when opening this snapshot workbook, while the underlying, extracted workbook lives on and continues to be updated.
But this is just one way to use MarkLogic and Excel together. There’s always another way…
Posted in MarkLogic, Office, XQuery | Tagged: Charts, Excel, Formulas, MarkLogic, Office, OpenXML, XQuery | 1 Comment »
MarkLogic Toolkits for Word, Excel, and PowerPoint
Posted by wpaven on May 18, 2010
I thought I’d use this post to provide a brief introduction to the MarkLogic Toolkits for Office. So here’s an overview:
What is a Toolkit?
A Toolkit is a set of tools for jumpstarting your development with MarkLogic Server and Microsoft Office 2007 / Office 2010 / Open XML.
There are currently 3 Office Toolkits:
We care about Word, Excel, and PowerPoint, because with Office 2007, their respective document formats are now XML. Take a .docx, .xlsx, or .pptx and change it’s file extension to .zip. Extract the file and inside you’ll find a bunch of interrelated XML parts.
This update to the document formats provides an interesting opportunity as people can now work with XML without learning new, specialized tools, or even really being aware of the fact that they’re working with XML. Authors continue to use the tools they know and are familiar with in Office, and we can provide additional functionality to them by taking advantage of the XML. The Toolkits provide ways for us to enhance the authoring experience within Office as well as on the Server where we can prepare content for Office as well as additional consumers.
Each Toolkit is composed of 3 major components:
- Add-in for Word | Excel | PowerPoint
- XQuery API
- Sample Applications
Add-in with supporting JavaScript API
The Add-in is just a standard Windows application you install using a .msi. Double-click the .msi to start installation, click next, next, next, through the dialog screens as you would with any Windows app, and the next time you start Office you’ll find a Task Pane on the right hand side of the application (see image below).
NOTE: The Task Pane is just a browser! It’s using whatever version of IE is installed on the client, and exposing that within Office.
The Addin may just be a browser, but it also installs a supporting library for interacting with the active document (the document being authored). Access to this libary from the browser is available from the JavaScript API that comes with the Addin. Developers can quickly create a webapp within Word, Excel, or PowerPoint that communicates with and/or is even served from MarkLogic Server and they can use the JavaScript APIs to get XML in and out of the document being authored.
We wanted to avoid creating a situation where users had to constantly re-install Add-ins on the client. By making it a browser, we can update functionality by simply changing the application code on the Server.
JSDocs are provided with each Toolkit for the respective JavaScript API.
XQuery API
The XQuery APIs for Word, Excel, and PowerPoint provide functions for developers to manipulate and generate Office documents on the Server.
The goal is to simplify the use of Open XML. Along with each XQuery API, a CPF pipeline for MarkLogic is also provided that will automatically update Office documents on the Server as they are ingested to make all the XML more friendly for search and reuse. These updates are done without using any custom XML and without losing any document fidelity.
NOTE: The Add-in and XQuery API can work in concert or separately. If your authors use Office, it might make sense to use both. If you’re querying Office documents (or some other XML format) on the Server, and delivering results through a regular browser or some other consumer, you might not need Office on the client at all. But if you’re delivering an Office document as a result, yes, you still may require the Office application on the client, but you don’t necessarily require Add-in. It all depends on your use-cases and particular goals.
XQuery API docs are provided with each TK as well.
Sample Applications
Rather than just give developers an empty browser with a .js file and JavaScript API documentation to start development with, each TK comes with Sample Applications. A developer can just drop the Sample right into MarkLogic, configure their Addin to reference the URL of the Server, and quickly be up and running with applications within the Task Pane.
These Samples are VERY simple. They provide just a sliver of the available API functionality. Again, they’re intended to jumpstart development. We provide these samples so a developer can quickly see some useful functionality, open the source to see how the code looks, and get in there and start hacking to create the app they actually want. Developers can reference the API docs and add/change functionality as they require. When you look at the docs you’ll see that there is a LOT that can be done on the client and in the Server that isn’t demonstrated in the Samples at all.
NOTE: The Sample Application is not the Toolkit. It’s just one example of the type of application you can build using a TK.
NOTE: The Sample Application is not Office. We skinned the samples to be the colors of Office. But it’s just HTML, JavaScript, and CSS. Remember, the Pane is just a browser serving up pages from MarkLogic. The goal is to keep authors comfortable in their authoring environment, letting MarkLogic do what it does best (search, reuse, enrich, analyze, etc.) and let Office do what it does best (author,analyze,present). If you want to use crazy colors and the blink tag for your app, go for it!
A Toolkit Guide rounds out the documentation with details on creating, configuring, and delivering solutions that use the Toolkits.
Overview
Office is ubiquitous. The goal of the Toolkits is to keep authors authoring and analysts analyzing in the tools they are already using and comfortable with. Office is a publisher and consumer of XML, MarkLogic is an XML Server. The products compliment each other very nicely and we can create a much richer Office experience for authors without requiring them to learn new, custom tools, or even be aware of the fact that behind the scenes, it’s all XML.
Conclusion
The Toolkits are all free and now available on codeplex. They are all open source, released under the Apache 2 license.
The response to the TKs has been very positive. I’ve seen an increase in interest lately, and it’s been great to hear people are using these and finding them very useful. I was surprised to hear at #MLUC10 how one person has deployed all 3 TKs across his organization and is very excited about the possibilities. He also told me that multiple authors are enjoying the Sample apps in PowerPoint as-is. Very cool!
I just have to say, I get excited too! Each Office application has a different degree of XML friendliness and Word is by far the friendliest. With the Toolkit for Word we can use Word as a browser into MarkLogic. At work I send content back and forth between Word and MarkLogic and never have to save a local .docx on the client. It’s just XML going back and forth. Office consumes and publishes Open XML. Using the XQuery API, I can dynamically create the XML Word requires for consumption from alternative XML formats. Also, Word publishes WordprocessingML, but my destination XML format isn’t necessarily always Office docs. It’s pretty awesome, and that’s just Word! Similar opportunities exist for Excel and PowerPoint as well.
Spoiler Alert: there’s more awesome coming!
So that’s it. You’re now Toolkit experts. Go download the Toolkits and have fun creating your own MarkLogic applications for Office. If you have any questions , comments, or suggestions for the TKs please feel free to drop me a line in the comments. Thanks!
Posted in MarkLogic, Office, XQuery | Tagged: Excel, ooxml, open xml, powerpoint, word | Leave a Comment »
Ain’t No Party like a MarkLogic Party!
Posted by wpaven on May 11, 2010
Last week we had the 2010 MarkLogic User Conference and it was awesome! You can read all about it here and get some sense of the action (for a few days at least) here. If you were in attendance, thanks to everyone for helping put on such a great event!
I find the conference inspiring for many reasons. I could go on and on about why, instead you’ll definitely find the influence of #MLUC10 in subsequent posts. In a nutshell, once a year the entire MarkLogic Community gets together for a 3 day collaborative brainstorming session and celebrates Big, Screaming Fast XML and all things Server. This was my third, and they keep getting bigger and better. This year though, for the first time, we had a Conference party. The shindig was amazing and definitely deserves a special shout-out!
MarkLogic rented out the California Academy of Sciences and it was nuts. Well first, they gave out temporary tattoos featuring our new logo to everyone. You can check out the awesomeness on this person’s dome (click images for larger views), as well as the mighty fist here. Some people really got into the tats.
They had buses on a loop from the Intercontinental Hotel, where the main event was held, to the Academy to bring people continuously to and from the event. We got over there around 7 and started the evening by exploring the new 4-story rainforest exhibit. We walked spirally up a ramp within a giant, clear, enclosed, warm, humid dome, viewing butterflies, frogs, snakes, geckos, bats, birds, and other creatures of the rainforest. The habitat circles a giant pool in the center floor where we had the chance to see humongous gars and other fish swimming around. At the top level, the docents did a butterfly check of clothing before putting us on an elevator that took us down to a level below the rainforest, to the aquarium and an area where we could look at the giant rainforest pool from the under side.
And then it just got ridiculous (in a good way). You got off the elevator and people were waiting there to serve you wine and drinks the moment you walked in the room. You could walk through the aquarium, and every time you turned a corner, there was a sushi/hor d’oeuvre station or an open bar. I wandered around, exploring and talking to people, checking out jellyfish and snacking on salmon and dim-sum, while others ate sushi and watched the fish. (yeah, i know!).
I was having a good time and really thought that was it with regard to the party area. We’d seen the rainforest, and the aquarium was lots of fun. The Academy closed the rainforest at 8, so I thought what we’d seen was all very impressive and we’d finish the evening drinking and socializing among the eel and octopi.
I’m wandering around and I find a set of stairs that ends up taking me to the main exhibit area. The top of the stairs opened into a whole other party section with candle-lit tables, more food and open bars! There was prime-rib with mashed potatoes, mexican food, sliders and chips, dessert tables with cakes, cookies, and build your own sundaes. I don’t think i saw all the food available, there was so much of it.
It was just awesome. Everywhere you turned you found good people, good food, good drink, good conversation, and good science! And to top it off, they had someone taking photos in front of this large skeletal mouth. When you left the party at the end of the evening, the photo was in a small folder, waiting to be picked up by the door.
I took my pic with Josh, our Vice President of People. This is a pic of a pic, and looks blurry as I resized smaller, click for a better view. I’m the one throwin up the ML gang sign of course. In retrospect, I wish I would’ve stood behind Josh with my hands on his hips, both of us at an angle looking over one shoulder, prom style.
Yeah, that would be just SO wrong, and yet i still think it would’ve been absolutely hilarious. But this pic is great too! Ok, the pic tangent is all in good fun, but all kidding aside…
Well done MarkLogic! and Thanks for another great User Conference!
Posted in MarkLogic | Tagged: 2010, MarkLogic User Conference, MLUC, MLUC10 | Leave a Comment »
XQuery and MarkLogic Developer Blogs
Posted by wpaven on April 14, 2010
Inspired by this list of Computational Linguistic Blogs, I thought I’d aggregate and share the blogs I follow that provide useful information on working with XQuery and/or MarkLogic Server.
The blogs noted here are alive and active. I like these as they’re written by people who are actually writing code, building things, and solving problems. They provide code examples, as well as practical insight. I give a brief synopsis of the activity, and if the author also provides information on working with MarkLogic, I note that as well. Finally, if the author is active on twitter, you can click their name to follow.
The following are all awesome and listed in no particular order.
Blogroll
- alex bleasdale / Developer Notes
Self-described as notes on XML and Web Development, he’s been on a MarkLogic and XQuery tear lately. Lots of useful information here.
- norman walsh / blog
Norm is a shotgun blast of all things X: XML, XProc, XQuery, XSLT and more. He’s now an Engineer at MarkLogic too. Posts frequently, and its always good.
- rob whitby / XQuery Hacker
Always sharing the code. Check out his Search Interface to MarkLogic API docs and DQ, an enhanced version of CQ.
- matt turner / Discovering XQuery
Irregular posting, but when published, they’re always good. The blog archive provides many useful tutorials for working with XQuery as well as MarkLogic Server.
- kit wallace / blog
A person who sees solutions written in other languages, asks himself “how would I do that in XQuery”, figures it out, and shares the code. Useful, fun, and frequent posts on XQuery , XSLT, XProc and more.
- jeni tennison / Musings
Excellent posts on XQuery, XML, RDF, and Linked Data published on a regular basis.
- miguel de melo / blog
Blog links to his two other blogs on XSLT and XQuery which provide great code samples and explanations.
- mattio valentino /Rendition Protocol
Posts on MarkLogic, XQuery, and more. The blog is a collection of his development notes and is full of useful snippets and insight.
Alright, that’s it for now. If you know of others who share XQuery code, experience and/or information on working with MarkLogic Server on a regular basis, I’d really like to know about them. Please let me know who they are so I can check them out and maybe add them to the list.
Thanks!
Posted in MarkLogic, XQuery | Tagged: blogs, developer, XML, xproc, XQuery, xslt | Leave a Comment »
Fun with XQuery, Images encoded as base64 Strings, and Word 2007
Posted by wpaven on March 26, 2010
or: There and Back Again, A JPEGs Tale.
This is a fun one that comes up every once in awhile. When you save a Word 2007 document as .xml, Word serializes images as base 64 strings. It turns out that organizations regularly save Word documents as .xml and they want the ability to view these images in a browser or some other application so they can decide how they’d like to re-use them. So the first question that comes up is: How can I transform the base 64 string back into an image?
If you want to play along at home, copy the image of Bilbo here and save it in a Word 2007 document: In the Ribbon, select the ‘Insert’ tab, from the ‘Illustrations’ group, choose ‘Picture’, then select your pic and insert it. Next: Go to the Button, click ‘Save As’, select ‘Other Formats’, and for the ‘Save as format’ choose ‘Word XML Document (*.xml)’.
Don’t choose the 2003 XML, cause that’s something else. It similar, but different (cause it’s not the same).
So now, open that bad boy in vi, Visual Studio, or some other editor and take a peek. I want to take this opportunity to introduce you to the Flat OPC format. When you save a Word doc as a .docx, you end up with a .zip file that contains all these interrelated .xml files and their associated assets (such as images). When you save as .xml, you end up with all those same XML parts serialized in a single .xml file, with images serialized as base 64 strings. This .xml format is known affectionately in Redmond as Flat OPC. Once you understand just a little about this format, the amount of Word document @$$ you can kick in MarkLogic Server and/or using an Add-in in Word is awesome.
SPOILER ALERT: An upcoming post is going to dive into how we can exploit the Flat OPC format for document re-use.
The main body of content for a Word 2007 document can be found in the document.xml part. With images, you’ll find a reference to the image in document.xml, but the image will be stored separately as its own part in the document package; as a binary in the .docx, but serialized as base 64 when saved as .xml. Knowing this, let’s convert the string to an image.
Throw BilboBaggins.xml into MarkLogic Server. (I use WebDAV). To view the base 64 string, evaluate the following in CQ:
xquery version "1.0-ml";
declare namespace pkg="http://schemas.microsoft.com/office/2006/xmlPackage";
let $doc := fn:doc("/BilboBaggins.xml")/node()[2]
let $image-string := $doc/pkg:part[@pkg:name="/word/media/image1.jpeg"]/pkg:binaryData/node()
return $image-string
Yep, it’s that ugly. Luckily viewing the image is as simple as:
xquery version "1.0-ml";
declare namespace pkg="http://schemas.microsoft.com/office/2006/xmlPackage";
let $doc := fn:doc("/BilboBaggins.xml")/node()[2]
let $image-string := $doc/pkg:part[@pkg:name="/word/media/image1.jpeg"]/pkg:binaryData/node()
return binary{xs:hexBinary(xs:base64Binary($image-string))}

Now, what about the reverse? What if we have images in the Server that we want to serialize as base 64 strings?
Take the image you copied at the beginning and save it to MarkLogic. We can convert it to a base 64 string by evaluating the following in CQ:
xquery version "1.0-ml";
declare namespace ooxml= "http://marklogic.com/ooxml";
declare namespace pkg="http://schemas.microsoft.com/office/2006/xmlPackage";
declare function ooxml:base64-string-to-binary(
$string as xs:string
) as binary()
{
binary{xs:hexBinary(xs:base64Binary($string))}
};
declare function ooxml:binary-to-base64-string(
$node as binary()
) as xs:string
{
xs:base64Binary(xs:hexBinary($node)) cast as xs:string
};
let $doc := fn:doc("/bilbo-200x200.jpg")/node()
return ooxml:binary-to-base64-string($doc)
Now, the above will work if we just want the base 64 string, but if we want a string we can use with Word and the Flat OPC format, certain rules apply: 1) the string must be broken into lines of 76 characters, and 2) there must not be a line break at the beginning or end of the content. No big deal, we just do the following:
xquery version "1.0-ml";
declare namespace ooxml= "http://marklogic.com/ooxml";
declare namespace pkg="http://schemas.microsoft.com/office/2006/xmlPackage";
declare function ooxml:base64-string-to-binary(
$string as xs:string
) as binary()
{
binary{xs:hexBinary(xs:base64Binary($string))}
};
declare function ooxml:binary-to-base64-string(
$node as binary()
) as xs:string
{
xs:base64Binary(xs:hexBinary($node)) cast as xs:string
};
declare function ooxml:base64-opc-format(
$binstring as xs:string)
{
fn:string-join(ooxml:format-binary($binstring),"
")
};
declare function ooxml:format-binary(
$binstring as xs:string
)as xs:string*
{
for $i in 0 to (fn:string-length($binstring) idiv 76)
let $start := ($i * 76)
return fn:substring($binstring,$start,76)
};
let $doc := fn:doc("/bilbo-200x200.jpg")/node()
return ooxml:base64-opc-format(ooxml:binary-to-base64-string($doc))
And that’s all there is to it! You are now a Master of the image-encoded-as-base64-string Universe! Cheers!
Posted in MarkLogic, Office, XQuery | Tagged: base64, images, OPC, XML | 2 Comments »
Saving macro code from an Active Excel Workbook as text in a Custom XML Part
Posted by wpaven on March 23, 2010
So we can audit and search it later!
Interesting question came my way last week: Is it possible to save macro code as text in an Excel 2007 .xlsm package?
The answer is: Yes!
Why would someone want to do this? Well, this particular organization wants to save Excel workbooks to MarkLogic Server. In the Server, the workbooks are automatically unzipped and they can search and run reports on the SpreadsheetML, running analytics on their analytics.
But macros are saved as .bin files within the .xlsm (in vbaProject.bin) and they want the ability to audit the macros being used, as well as search and possibly re-use them.
So wouldn’t it be dreamy if we could create an Add-in for Excel that could capture when an Analyst saves a workbook and serialize the code for any macros in the workbook as text, storing them in Custom XML Parts within the .xlsm package?
Recipe for Awesome
We can prototype this quickly using the MarkLogic Toolkit for Excel.
I’m just going to focus on the macro to text part for this post. I choose to handwave at events right now as that’s a topic for another time. There are many you could possibly capture, but let’s just say we (handwave, handwave) capture the WorkbookBeforeSave event. When that event fires, we’d call some function that uses the code below to get the text for each macro, XML-ize it, and add it to the .xlsm package. Score-Double-Bonus-Action: the Add-in that comes with the Toolkit already has a function addCustomXMLPart(string myxml), that provides a way for us to add the Custom XML Parts to the .xlsm.
I used Excel to record a simple macro that merges a couple of cells and does some formatting. To tell you the truth, what I know about macros in Office is more meta than practical, so this was an interesting learning exercise for me. I created my macro using this tutorial, which I found useful.
All that’s left is to create two functions: one to get the count of the macros in the workbook, and another to get each macro and serialize it as text:
//you probably need to add the reference using VBA = Microsoft.Vbe.Interop; public int getMacroCount() { VBA.VBProject proj = Globals.ThisAddIn.Application.ActiveWorkbook.VBProject; return proj.VBComponents.Count; } public string getMacroText(int idx) { VBA.VBProject proj = Globals.ThisAddIn.Application.ActiveWorkbook.VBProject; string projName = proj.Name; string componentFile = ""; object o_idx = idx; try { VBA.VBComponent vbComponent = proj.VBComponents.Item(o_idx); if (vbComponent != null) { VBA.CodeModule componentCode = vbComponent.CodeModule; componentFile = ""; if (componentCode.CountOfLines > 0) { for (int i = 0; i < componentCode.CountOfLines; i++) { componentFile += componentCode.get_Lines(i + 1, 1) + Environment.NewLine; } } } } catch (Exception e) { componentFile = "error: "+e.Message; } return componentFile; }
So we call the count, and for each one, we get the macro text, if there is any. If a macro exists, we can just wrap it in some well formed XML and add it to the .xlsm package using addCustomXMLPart(ourmacroinxml).
Now once we’ve saved the .xlsm toMarkLogic, we can query from CQ:
xquery version "1.0-ml";
declare namespace xl="http://marklogic.com/xl";
cts:search(//xl:metadata, cts:word-query("format"))
And view our results:
Notes
MSDN documentation isn’t the greatest to begin with, but I couldn’t find anything useful in the library on Microsoft.Vbe.Interop, VBProject, or VBComponents.
I did find some older blog posts and some suggestions on the forums. Consensus seems to be that you have to rely on intellisense and just play around with these objects to figure them out. Of course, after monkeying around with this stuff for a couple of hours I then knew what to look for and found some useful posts on stackoverflow. But if anyone knows where I can find decent API documentation or any useful information on using Microsoft.Vbe.Interop, that would be much appreciated.
Finally, the example here is pretty simple and you can easily add more metadata about the .xlsm and/or any embedded macros to the Custom XML Part. Take a closer look at componentCode (vbComponent.CodeModule) above. You’ll find properties that return the procedure name, signature, line counts, and more, which could be useful for reporting and diffing. Enjoy!
public int getMacroCount()
{
int count=0;
try
{
VBA.VBProject proj = Globals.ThisAddIn.Application.ActiveWorkbook.VBProject;
count = proj.VBComponents.Count;
}
catch
{
count = 0;
}
return count;
}
public string getMacroText(int idx)
{
VBA.VBProject proj = Globals.ThisAddIn.Application.ActiveWorkbook.VBProject;
string projName = proj.Name;
string componentFile = “”;
object o_idx = idx;
try
{
VBA.VBComponent vbComponent = proj.VBComponents.Item(o_idx);
if (vbComponent != null)
{
VBA.CodeModule componentCode = vbComponent.CodeModule;
componentFile = “”;
if (componentCode.CountOfLines > 0)
{
for (int i = 0; i < componentCode.CountOfLines; i++)
{
componentFile += componentCode.get_Lines(i + 1, 1) + Environment.NewLine;
}
}
}
}
catch (Exception e)
{
componentFile = “error: “+e.Message;
}
return componentFile;
}
Posted in MarkLogic, Office | Tagged: Excel, macros, VBProject, XML | Leave a Comment »
Dude, Where’s My Worksheet?
Posted by wpaven on September 22, 2009
An Excel workbook can get “hidden” when you embed it within a PowerPoint.
In Office 2007, its possible to embed an Excel worksheet within a PowerPoint slide. You can do this from within PowerPoint by navigating to the ‘Insert’ tab in the Ribbon and then clicking ‘Object’ in the Text grouping. The dialog box that appears will allow you to create one of several predefined New objects in your slide, or you can Browse to insert an existing object from a file. From here we can select an Excel .xlsx we’ve created, and the worksheet will appear within our slide.
This can be very cool and useful if we have some analysis we’d like to present. Also, whenever we select the inserted worksheet within the slide, it will automatically open in Excel so we can continue to tweak on it using Excel’s functionality.
So embed an Excel document in a PowerPoint (from an existing file). But don’t touch or select or tweak the embedding in the slide, just save the presentation .pptx somewhere. Change the .pptx extension to .zip, unzip the document, and you’ll find the embedded Excel .xlsx under /ppt/embeddings. Yep, there’s a complete Excel .xlsx package within your PowerPoint .pptx. Now open the .xlsx directly in Excel by double-clicking it; the Excel application will launch, but NOTHING IS VISIBLE! EVEN THOUGH THE SPREADSHEET IS THERE! EXCEL WILL APPEAR AS IF NO WORKBOOKS ARE OPENED!!!

You have to navigate to ‘View’ on the Ribbon and click ‘Unhide’ to see the document.
What Happened?
If you unzip the .xlsx and examine workbook.xml, you’ll find:
/workbook/bookViews/workbookView/@visibility = “hidden” .
This attribute is set for the embedded document, even though it wasn’t set in the original prior to embedding.
But guess what, if you would’ve tweaked the embedding by selecting it or resizing it in the slide prior to saving the .pptx, this attribute wouldn’t be set. You can open the .xlsx and the workbook IS VISIBLE. This seems to be an odd quirk of PowerPoint’s default behavior that I stumbled upon while embedding documents programmatically.
I can understand why a behavior like this might go unnoticed if it’s expected the embedded document will only ever be opened within PowerPoint again. In PowerPoint the embedded .xlsx opens and is always visible within the context of the slides.
Embed Shmembed.
The thing is, I’m saving these documents to MarkLogic Server, where I automatically unzip all Office 2007 documents so I can search and reuse the XML, document parts, and original source .pptx, .xlsx, .docx, etc. in the creation of new Office and other documents. In MarkLogic, one of the things I’m doing for an app is making embedded Office docs available as independent documents within search results. It’s freedom baby, yeah! And while we’re seeing this behavior with embedding, we now know that its possible to hide any workbook using that simple attribute.
So if we want our users to be able to view the Excel documents they are selecting for use, a simple solution is to just remove the @visible and rezip the .xlsx in MarkLogic, either in the saved .xlsx, or when opening the embedded doc from the rezipping of its extracted parts, or both. Remove that attribute and we’re all good for visibility.
Or, if you’re feeling particularly ornery, I guess you could add the attribute to all your organization’s Excel workbooks and watch your users go mad? Maybe there’s a use-case for that, I don’t know.
But I thought this was interesting so I’d share. Maybe this will help someone who’s tweaking on this random behavior or with AddOLEObject as well. You can have a .xlsx on your filesystem. You can unzip and see the workbook and worksheet parts. You can double-click the .xlsx and it will open in Excel, but nothing is visible, even though the worksheet information *IS* saved in the .xlsx package and available in the Excel application. It turns out the document is actually open, but hidden. You just have to ‘Unhide’ it.
Posted in MarkLogic, Office | Tagged: AddOLEObject, MarkLogic, Office, OpenXML | Leave a Comment »
Getting started with Open XML, ODF, IDML, and other zipped XML documents in MarkLogic
Posted by wpaven on August 31, 2009
Microsoft Office 2007, Open Office, and Adobe’s InDesign CS4 all have something in common. These applications all save their documents as .zip archives of XML files. As an example, take any .docx, .odt, or .idml file and change the file extension to .zip. Extract the contents and within you’ll find multiple interrelated XML documents. There are probably other document formats that follow this model as well, but these are the 3 I see questions for on a regular basis. When working with documents of this type, the first thing many people want to do is work with the XML within the zip packages. Today’s post will demonstrate tools provided by MarkLogic Server for working with these document types, as well as some methods for managing and using the extracted pieces.
To play along at home, install MarkLogic Server, and setup CQ. The following examples work with the default Documents database and default Docs HTTP Server. I’ll be using Word 2007 documents, as that’s kinda my thing, but you can use the same code, tools, and methods with the other file formats as well. Just replace the .docx in the samples below with a .odt or .idml file, etc., and everything will just auto-magically work. Ultimately they’re all just zip files of XML and associated resources. Let’s do this!
The Office OpenXML Extract pipeline
It doesn’t get any easier than this. If you’re working with Office 2007 documents, you can quickly configure the Server to automatically unzip these files when you save them to your database. The XML within will be saved in a directory named for the original file, maintaining the naming and directory structure they had within the .zip. To make this happen, we just need to install content processing and attach the Office OpenXML Extract pipeline.
Install Content Processing
In the Admin UI, navigate to:
Databases -> Documents -> Content Processing
We see a message informing us that Content Processing is not installed. That’s ok. Click the ‘Install’ tab.
Next we’re presented with an option to ‘enable conversion’. If you’re running the Community Edition, leave this as false. Click ‘install’.
Finally, we’re presented with the message ‘Content Processing will be installed for the database test without conversion.’ Click ‘ok’.
Note: There are conversion utilities available that require a separate license. If you are running a version of the Server other than Community, feel free to install these utilities by setting the enable conversion option to true. For those on the Community Edition, installing content processing allows us to build our own conversion utilities, as well as take advantage of other available pipelines that don’t require the separate license. w00t!
Now that we’ve installed Content Processing, we just need to attach the Office OpenXML Extract pipeline.
Attach a Content Processing Pipeline
In the Admin UI, navigate to:
Databases -> Documents -> Content Processing -> Domains -> Default Documents -> Pipelines
Make sure that the pipelines ‘Status Change Handling’ and ‘Office OpenXML Extract’ are checked. Then click ‘ok’.
You’ll now see those pipelines are attached.

Let’s take advantage our freshly configured pipeline. Open CQ and evaluate the following to insert a document. I’m inserting C:\foo.docx. The default domain for CPF is the root directory “/”, so remember to prefix the name of your file with “/” in the uri option so the document will be processed.
xquery version "1.0-ml";
xdmp:document-load("C:\foo.docx",
<options xmlns="xdmp:document-load">
<uri>/foo.docx</uri>
</options>)
To validate the pipeline ran for the document, evaluate the following in CQ:
xquery version "1.0-ml";
xdmp:document-properties("/foo.docx"),
xdmp:document-properties("/foo_docx_parts/word/document.xml")
Notice in the results returned that A) there are CPF properties on the .docx informing us the XML has been extracted, and B) we are looking at the properties for an extracted Part. The document.xml is in a sibling directory of the original foo.docx named /foo_docx_parts/. We now have the original zip package and all its extracted pieces available to us in the Server. With Content Processing installed and our pipeline configured, anytime we save an Office document to the Server in the future they will be automatically unzipped and their parts will be saved similarly for us in a _parts directory.

Note: The .docx and the parts directory are linked. Delete the .docx, and the related _parts folder and its extracted pieces will be deleted as well.
So, What are pipelines? What’s Content Processing? What was all that configuration we did?
CPF in a Nutshell ( the Content Processing Framework )
MarkLogic Server includes a framework for processing content that we refer to lovingly as CPF. CPF stands for the Content Processing Framework. The gist is this: a document has a lifecycle that starts with creation and advances as users/applications update and modify the document. CPF provides a way to take action on documents based on where they are in their lifecyle.
A pipeline is an XML document that describes a set of content processing steps. It defines the steps that occur during the processing of documents and defines actions that occur at each step. These actions can be found in supporting XQuery functions and modules. CPF was built for you to create your own content processing applications, with your own content processing code, and following your own logical and business processes.
If you’d like to know more, you can check out the Content Processing Framework guide, I’ve also provided a quick intro here as well.
Note: the quick intro linked here was written for MarkLogic Server 3.2. For 4.*.* some minor updates are required. I’ll revisit and update that post in the future, but if you’re interested, the guide should provide enough info to help you modify the example successfully.
But what if we’re not working with Office 2007 documents? Or we don’t want to use CPF? Can we still unzip these documents, extract the individual XML files and insert them into our MarkLogic Database? Yes.
xdmp:zip utilities
When working with .zip files, you’ll want to take a look at the functions xdmp:zip-create(), xdmp:zip-get(), and xdmp:zip-manifest().
I have a Word document, sampleManuscript.docx, that I’ve saved in the directory C:\test. I can take a look at the names of the files inside the .docx by evaluating the following in CQ:
xquery version "1.0-ml";
xdmp:zip-manifest(xdmp:document-get("C:\test\sampleManuscript.docx"))
MarkLogic provides utilities for working with files on the filesystem, but let’s load our document into the Server.
Note: For the following examples, we want to insure CPF does not process our documents for us. So navigate to pipelines, as we did in our configuration steps above. Uncheck the Office OpenXML Extract pipeline, and click ‘ok’. This will detach the pipeline so it will not action the example document below on load.
xquery version "1.0-ml";
xdmp:document-load("C:\test\sampleManuscript.docx",
<options xmlns="xdmp:document-load">
<uri>/myManuscript/sampleManuscript.docx</uri>
</options>)
The above returns the empty sequence. You can validate that your document inserted properly by clicking ‘explore’ in CQ. Or by evaluating the following:
xquery version "1.0-ml";
xdmp:document-properties("/myManuscript/sampleManuscript.docx")
At a minimum, you’ll see a last-modified metadata timestamp for the document. If you’ve enabled content processing for the database as we did above, you will see other cpf:* properties. Assuming no pipelines are attached, the .docx will be in cpf:state initial and no action has been taken to extract its XML parts.
If we want to access a file within the .docx, assuming we know the name of the piece we want in the .zip, we can get it using xdmp:zip-get(). The following returns the XML for the document.xml file located within our .docx package.
xquery version "1.0-ml";
xdmp:zip-get(fn:doc("/myManuscript/sampleManuscript.docx"),"word/document.xml")
Instead of having to know the names of individual pieces within a .zip package and/or having to extract individual files each time we want to access a piece of XML, let’s just unzip and extract the pieces from our.docx and insert them into a directory, similar to how CPF did for us.
xquery version "1.0-ml";
declare namespace zip="xdmp:zip";
let $doc := "/myManuscript/sampleManuscript.docx"
let $directory-uri := "/myManuscript/sampleManuscript_docx_parts/"
let $zipfile := fn:doc($doc)
let $manifest := xdmp:zip-manifest($zipfile)
for $part-name in $manifest/zip:part
let $options := if ($part-name = "/_rels/.rels") then
<options xmlns="xdmp:zip-get">
<format>xml</format>
</options>
else
<options xmlns="xdmp:zip-get"/>
let $part := xdmp:zip-get($zipfile, $part-name, $options)
let $part-uri := fn:concat($directory-uri, $part-name)
return xdmp:document-insert($part-uri, $part)
We just loop through the manifest, extract the pieces, and insert into a directory. Here we’re explicitly telling the server to treat .rels as XML. With a little modification and refinement, we can easily take the above and make it into a re-usable module. We could even use this as a starting point for creating our own CPF pipeline action.
Zip it!

Now that we’ve extracted XML documents from zip files and saved them to the Server, lets zip ‘em back up.
When we extracted the pieces for sampleManuscript.docx above, we saved them to a directory named /sampleManuscript_docx_parts/ and we specified this as a subdirectory of the folder /myManuscript/.
When we insert a document to MarkLogic, we specify the name of the file we are saving with a uri parameter (xs:string). If this string has “/”s in it, the strings in between the slashes will be treated as directories. So “/” marks the path for our content’s location on the Server. By default, MarkLogic is configured for automatic directory creation. This is done as many people like using directories to manage their content, and it can be very helpful when loading documents using webDAV. It’s also very helpful for managing unzipped XML packages, as we want to retain the original structure of the document, so when we zip the pieces back up the document will open successfully it in its respective application.
We saved our extracted documents pieces to directories, but we don’t know what pieces were inserted, or what subdirectories were created. If we enable the uri lexicon for our database, we can evaluate a query using cts:uris that will provide us the uris for all the extracted pieces. We can then use this to create a manifest and zip the pieces back up.
Validate Directory Creation is Automatic
In Admin UI, navigate to:
Databases -> Documents
Scroll down until you see the property ‘directory creation‘. The dropdown selection is set to ‘automatic‘.
For more info on directories, refer to the Admin’s Guide, and Application Developer’s Guide.
Enable URI Lexicons
Scroll up, and a few properties up from ‘directory creation’ you’ll find ‘uri lexicon‘. Set the enabled option to ‘true’ and click ‘ok’.
Now head back to CQ, and evaluate the following:
xquery version "1.0-ml";
let $directory := "/myManuscript/sampleManuscript_docx_parts/"
return cts:uris("","document",cts:directory-query($directory,"infinity"))
The results are a list of all the XML documents we extracted from sampleManuscript.docx. We can zip them back up and save locally by evaluating the following:
xquery version "1.0-ml";
let $directory := "/myManuscript/sampleManuscript_docx_parts/"
let $uris := cts:uris("","document",cts:directory-query($directory,"infinity"))
let $parts := (for $i in $uris let $x := fn:doc($i) return $x)
let $manifest := <parts xmlns="xdmp:zip">
{
for $i in $uris
let $file := fn:substring-after($i,$directory)
let $part := <part>{$file}</part>
return $part
}
</parts>
let $pkg := xdmp:zip-create($manifest, ($parts))
return xdmp:save("C:\test.docx",$pkg)
Double-click on the saved file to open it in Word. I’ve named it test.docx to demonstrate we can name it anything we want too. More likely we would’ve used the document’s original name, sampleManuscript.docx.
Most of the time though, we’ll want users to be able to dynamically generate their documents on the fly. Place the following code in a module named opendocx.xqy. Place it under the /Docs directory for the Server, which is found in the directory where MarkLogic is installed. On Windows the default is C:\Program Files\MarkLogic\Docs.
xquery version "1.0-ml";
let $directory := "/myManuscript/sampleManuscript_docx_parts/"
let $uris := cts:uris("","document",cts:directory-query($directory,"infinity"))
let $parts := (for $i in $uris let $x := fn:doc($i) return $x)
let $manifest := <parts xmlns="xdmp:zip">
{
for $i in $uris
let $file := fn:substring-after($i,$directory)
let $part := <part>{$file}</part>
return $part
}
</parts>
let $filename := "test.docx"
let $pkg := xdmp:zip-create($manifest, ($parts))
let $disposition := concat("attachment; filename=""",$filename,"""")
let $x := xdmp:add-response-header("Content-Disposition", $disposition)
let $x:= xdmp:set-response-content-type("application/vnd.openxmlformats-officedocument.wordprocessingml.document")
return $pkg
In a browser, navigate to the url http://localhost:8000/opendocx.xqy.
Assuming Office 2007 is installed on your machine, the document opens right up into its respective application.

We can just open the .docx as well. Save the following under /Docs as opendocx2.xqy. Update the url in your browser, and the file will again open from the Server into Word.
xquery version "1.0-ml";
let $docname := "/myManuscript/sampleManuscript.docx"
let $pkg := fn:doc($docname)
let $filename := "test.docx"
let $disposition := concat("attachment; filename=""",$filename,"""")
let $x := xdmp:add-response-header("Content-Disposition", $disposition)
let $x:= xdmp:set-response-content-type("application/vnd.openxmlformats-officedocument.wordprocessingml.document")
return $pkg
And there you have it! To the Server and back again! Kind of like the Hobbit, but you didn’t need Gandalf or a bunch of dwarves to help you make the journey.
That should be more than enough to get you started, and to continue with something super awesome, just ponder this: when you have a solid understanding of the document format you’re working with, you can generate Word, Excel, PowerPoint, OpenOffice, InDesign4, and any other document type on the Server, and you don’t even need the original application to start with! We can generate these documents dynamically, on-the-fly, and serve ‘em up to our users who use these applications, but to us on the Server, it’s just a set of XML and related parts. Cheers!
Posted in MarkLogic | Tagged: IDML, ODF, OpenXML, zip | 1 Comment »
Load A Custom UI for Office 2007 from MarkLogic Server
Posted by wpaven on August 7, 2009
We all know that with Office 2007 the traditional Office file navigation menus and toolbars were replaced with The Ribbon. The Ribbon a.k.a. RibbonX a.k.a. The Office Fluent Ribbon a.k.a. The Fluent UI a.k.a. …
The naming has evolved over time, but basically the Ribbon is the interface, now called the Fluent UI. RibbonX is the original name given to the XML that you can use to customize the Ribbon/Fluent UI to add your own tabs and controls.
Many examples demonstrate how to load a custom Ribbon UI from within a Document-Level customization or an Application-Level Add-in. These solutions load the XML for the Ribbon from within either the Document itself or the deployed Add-in solution respectively. But what if we want to keep the XML for our custom Ribbon interface someplace else? How about someplace where we can access and modify it so we can change the user experience in Office without requiring our users to re-install an Add-in or use a new Document template?
It is possible to load a customized Ribbon from an external resource. Today’s post demonstrates how to load a custom Ribbon for an Application-Level Add-in from Ribbon XML that’s been saved in MarkLogic Server.
Save the Ribbon XML in MarkLogic
To keep it brief, we’ll just assume we’re using the default Documents database and the Docs HTTP Server in MarkLogic.
Open CQ in your browser and evaluate the following:
xquery version "1.0-ml";
xdmp:document-insert("myribbon.xml",
<customUI xmlns="http://schemas.microsoft.com/office/2006/01/customui">
<ribbon>
<tabs>
<tab id="MyTab" label="My Tab">
<group id="MyGroup" label="My Group">
<button id="Button1"
size="large"
label="My Button"
screentip="My Button Screentip"
onAction="OnClick"
imageMso="HappyFace" />
</group>
</tab>
</tabs>
</ribbon>
</customUI>)
We can manipulate the Ribbon interface with a minimal amount of XML. The tabs, groups, and controls are actually defined using XML elements. This example creates a new tab, which includes a new control, a button. The attributes of controls define callbacks. Callbacks are the functions that will be called when a user interacts with the controls on our Ribbon. For our button @onAction defines the callback which refers to a function named “OnClick”.
We now need a module that the Add-in can call to fetch the XML for the Ribbon. Create a file called ribbon.xqy and place it under /Docs. The contents of the file is simply
xquery version "1.0-ml";
xdmp:quote(fn:doc("myribbon.xml")
We use xdmp:quote() to return the XML as a string as that’s what our Add-in will parse to create the customization in Office for us.
Create an Add-in for Word
First, create an Add-in project for Word:
- Start Visual Studio 2008
- From the File menu, select New -> Project
- In Project Types, select Office 2007
- In Templates, select Word 2007 Add-in
- Name your Project, I’m calling mine MyRibbonAddin
- click OK
Now that we have our project, we need to add a Ribbon to it.
- In the Solution Explorer on the right, right-click the MyRibbonAddin project and select Add -> New Item
- In Templates, select Ribbon(XML)
- Click Add
This adds a Ribbon1.cs and Ribbon1.xml to our project.
You might’ve noticed there was a Ribbon(Designer) available in the templates as well. Visual Studio 2008 comes with a Ribbon Designer that allows you to drag and drop components to your Ribbon. You can then use this Ribbon in your deployed Add-in. You can also export the Ribbon from the Designer to XML, which you may find useful. We selected Ribbon(XML) as we just want to use the simple Ribbon1.cs that is generated for us.
Now that we’ve started our Ribbon customization, we just need to load the Ribbon from MarkLogic, define our callback so our button will do something, and then enable the Ribbon within our Add-in.
Open Ribbon1.cs and go to the GetCustomUI function. By default we see its loading the Ribbon1.xml it generated. Change the function to look like the following:
public string GetCustomUI(string ribbonID) { string url = "http://localhost:8000/ribbon.xqy"; System.Net.WebClient Client = new System.Net.WebClient(); Client.Credentials = new System.Net.NetworkCredential("user", "password"); string ribbon = Client.DownloadString(url); Client.Dispose(); return ribbon; //return GetResourceText("MyRibbonAddin.Ribbon1.xml"); }
You’ll want to update the network credentials to be the ones you defined when you installed MarkLogic.
In the Ribbon Callbacks #region, define the OnClick function:
public void OnClick(Office.IRibbonControl ribbonControl)
{
System.Windows.Forms.MessageBox.Show("clicked");
}
Finally, there’s a TODO section that’s been commented out at the top of Ribbon1.cs. To enable the Ribbon in the Add-in, copy the following function from the TODO section, paste and uncomment it in ThisAddin.cs, just above the ThisAddin_Startup function:
protected override Microsoft.Office.Core.IRibbonExtensibility CreateRibbonExtensibilityObject()
{
return new Ribbon1();
}
Hit F5 on your keyboard to test the customization in debug mode. Once Word starts, navigate to “My Tab” and click the Smiley Face “My Button”. Our MessageBox tells us we’ve clicked the button.

Boom! That’s it. Pretty cool. We can alter the Office interface by feeding it Ribbon XML at startup. And we can query the XML from an external resource like MarkLogic Server.
Next Steps
The example provided here is quite simple. But you now have the basic steps required to create your own customization for the Office UI. There’s a wealth of information on the Ribbon/Fluent UI online. The Visual Studio 2008 Ribbon designer can help to familiarize you with the basic RibbonX elements as well.
Now that you have a custom UI, you can improve the callback functions to interact with the document a user is currently authoring. You can also use Ribbon controls to send and receive information from MarkLogic. In this way you can reuse content stored in MarkLogic in a document as well as save selected text back to MarkLogic for future reuse.
It is possible to dynamically generate a Ribbon customization from a query as well. You may want to do this depending on the content your author is accessing, or based on their approved level of access for that information, or both. Just modify the ribbon.xqy we created earlier in the post.
As an example, in the past I’ve found it useful to dynamically generate dropdown menus based on Word document metadata searches to provide users related reusable content chunks to insert into their documents.

Final Ribbon Notes
The Ribbon customization loads once, whenever you start the Office application. You can’t change the layout or controls at runtime. You can however, update values within those controls. So in the search dropdown example above, it is possible to update the contents of the dropdown based on some event, but you can’t change a dropdown to a checkbox while the Office application is running. Capiche?
The good news is that if you’ve generalized your callbacks you can update the ribbon.xqy and any other .xqy files on the server, and the next time a user restarts their Office application, they can have new functionality available to them. This could be useful so you don’t have to constantly recompile your Add-in and/or send your users a new .msi to install.
How did we get the Smiley Face? There’s a list of icons you can download which provides you ids for several of the icons available throughout Office. There’s also a list of control ids available for download. This second list shows you all the built in controls available to you for use in Office.
Which brings us to: Yes, you can change the layout of the Ribbon provided in Office. You can override the functionality of existing controls, and you can remove the existing Ribbon entirely and provide your own, forcing users to only use the controls you provide. But think of your users and what makes sense. We want to help them, not hinder them. You may find the Office 2007 UI Style guide useful.
Finally, the MarkLogic Toolkit for Word and MarkLogic Toolkit for Excel are both free Application-Level Add-ins and they are open source. If you’re interested, these solutions may jump start your development, as well as provide you with other insights into how to make your customizations more useful. They both provide multiple functions for interacting with documents while users are authoring them. The Ribbon can be useful, but real estate is limited. The Toolkit Add-ins make extensive use of a Custom Task Pane to enable authors to integrate information in MarkLogic and Office 2007. Sometimes the Ribbon, Task Pane combo can be very useful.
Posted in MarkLogic, Office | Tagged: FluentUI, MarkLogic, Office, Ribbon, RibbonX | 1 Comment »







