1Notes on WST StructuredDocument 2------------------------------- 3 4Created: 2010/11/26 5References: WST 3.1.x, Eclipse 3.5 Galileo 6 7To manipulate XML documents in refactorings, we sometimes use the WST/SEE 8"StructuredDocument" API. There isn't exactly a lot of documentation on 9this out there, so this is a short explanation of how it works, totally 10based on _empirical_ evidence. As such, it must be taken with a grain of salt. 11 12Examples of usage can be found in 13 sdk/eclipse/plugins/com.android.ide.eclipse.adt/src/com/android/ide/eclipse/adt/internal/refactorings/ 14 15 161- Get a document instance 17-------------------------- 18 19To get a document from an existing IFile resource: 20 21 IModelManager modelMan = StructuredModelManager.getModelManager(); 22 IStructuredDocument sdoc = modelMan.createStructuredDocumentFor(file); 23 24Note that the IStructuredDocument and all the associated interfaces we'll use 25below are all located in org.eclipse.wst.sse.core.internal.provisional, 26meaning they _might_ change later. 27 28Also note that this parses the content of the file on disk, not of a buffer 29with pending unsaved modifications opened in an editor. 30 31There is a counterpart for non-existent resources: 32 33 IModelManager.createNewStructuredDocumentFor(IFile) 34 35However our goal so far has been to _parse_ existing documents, find 36the place that we wanted to modify and then generate a TextFileChange 37for a refactoring operation. Consequently this document doesn't say 38anything about using this model to modify content directly. 39 40 412- Structured Document overview 42------------------------------- 43 44The IStructuredDocument is organized in "regions", which are little pieces 45of text. 46 47The document contains a list of region collections, each one being 48a list of regions. Each region has a type, as well as text. 49 50Since we use this to parse XML, let's look at this XML example: 51 52<?xml version="1.0" encoding="utf-8"?> \n 53<resource> \n 54 <color/> 55 <string name="my_string">Some Value</string> <!-- comment -->\n 56</resource> 57 58 59This will result in the following regions and sub-regions: 60(all the constants below are located in DOMRegionContext) 61 62XML_PI_OPEN 63 XML_PI_OPEN:<? 64 XML_TAG_NAME:xml 65 XML_TAG_ATTRIBUTE_NAME:version 66 XML_TAG_ATTRIBUTE_EQUALS:= 67 XML_TAG_ATTRIBUTE_VALUE:"1.0" 68 XML_TAG_ATTRIBUTE_NAME:encoding 69 XML_TAG_ATTRIBUTE_EQUALS:= 70 XML_TAG_ATTRIBUTE_VALUE:"utf-8" 71 XML_PI_CLOSE:?> 72 73XML_CONTENT 74 XML_CONTENT:\n 75 76XML_TAG_NAME 77 XML_TAG_OPEN:< 78 XML_TAG_NAME:resources 79 XML_TAG_CLOSE:> 80 81XML_CONTENT 82 XML_CONTENT:\n + whitespace before color 83 84XML_TAG_NAME 85 XML_TAG_OPEN:< 86 XML_TAG_NAME:color 87 XML_EMPTY_TAG_CLOSE:/> 88 89XML_CONTENT 90 XML_CONTENT:\n + whitespace before string 91 92XML_TAG_NAME 93 XML_TAG_OPEN:< 94 XML_TAG_NAME:string 95 XML_TAG_ATTRIBUTE_NAME:name 96 XML_TAG_ATTRIBUTE_EQUALS:= 97 XML_TAG_ATTRIBUTE_VALUE:"my_string" 98 XML_TAG_CLOSE:> 99 100XML_CONTENT 101 XML_CONTENT:Some Value 102 103XML_TAG_NAME 104 XML_END_TAG_OPEN:</ 105 XML_TAG_NAME:string 106 XML_TAG_CLOSE:> 107 108XML_CONTENT 109 XML_CONTENT: (2 spaces before the comment) 110 111XML_COMMENT_TEXT 112 XML_COMMENT_OPEN:<!-- 113 XML_COMMENT_TEXT: comment 114 XML_COMMENT_CLOSE:-- 115 116XML_CONTENT 117 XML_CONTENT: \n after comment 118 119XML_TAG_NAME 120 XML_END_TAG_OPEN:</ 121 XML_TAG_NAME:resources 122 XML_TAG_CLOSE:> 123 124XML_CONTENT 125 XML_CONTENT: 126 127 1283- Iterating through regions 129---------------------------- 130 131To iterate through all regions, we need to process the list of top-level regions and then 132iterate over inner regions: 133 134 for (IStructuredDocumentRegion regions : sdoc.getStructuredDocumentRegions()) { 135 // process inner regions 136 for (int i = 0; i < regions.getNumberOfRegions(); i++) { 137 ITextRegion region = regions.getRegions().get(i); 138 String type = region.getType(); 139 String text = regions.getText(region); 140 } 141 } 142 143Each "region collection" basically matches one XML tag, with sub-regions for all the tokens 144inside a tag. 145 146Note that an XML_CONTENT region is actually the whitespace, was is known as a TEXT in the w3c DOM. 147 148Also note that each outer region has a type, but the inner regions also reuse a similar type. 149So for example an outer XML_TAG_NAME region collection is a proper XML tag, and it will contain 150an opening tag, a closing tag but also an XML_TAG_NAME that is the tag name itself. 151 152Surprisingly, the inner regions do not have many access methods we can use on them, except their 153type and start/length/end. There are two length and end methods: 154- getLength() and getEnd() take any whitespace into account. 155- getTextLength() and getTextEnd() exclude some typical trailing whitespace. 156 157Note that regarding the trailing whitespace, empirical evidence shows that in the XML case 158here, the only case where it matters is in a tag such as <string name="my_string">: for the 159XML_TAG_NAME region, getLength is 7 (string + space) and getTextLength is 6 (string, no space). 160Spacing between XML element is its own collapsed region. 161 162If you want the text of the inner region, you actually need to query it from the outer region. 163The outer IStructuredDocumentRegion (the region collection) contains lots more useful access 164methods, some of which return details on the inner regions: 165- getText : without the whitespace. 166- getFullText : with the whitespace. 167- getStart / getLength / getEnd : type-dependent offset, including whitespace. 168- getStart / getTextLength / getTextEnd : type-dependent offset, excluding "irrelevant" whitespace. 169- getStartOffset / getEndOffset / getTextEndOffset : relative to document. 170 171Empirical evidence shows that there is no discernible difference between the getStart/getEnd 172values and those returned by getStartOffset/getEndOffset. Please abide by the javadoc. 173 174All offsets start at zero. 175 176Given a region collection, you can also browse regions either using a getRegions() list, or 177using getFirst/getLastRegion, or using getRegionAtCharacterOffset(). Iterating the region 178list seems the most useful scenario. There's no actual iterator provided for inner regions. 179 180There are a few other methods available in the regions classes. This was not an exhaustive list. 181 182 183---- 184