1page.title=Parsing XML Data 2parent.title=Performing Network Operations 3parent.link=index.html 4 5trainingnavtop=true 6 7previous.title=Managing Network Usage 8previous.link=managing.html 9 10@jd:body 11 12<div id="tb-wrapper"> 13<div id="tb"> 14 15 16 17<h2>This lesson teaches you to</h2> 18<ol> 19 <li><a href="#choose">Choose a Parser</a></li> 20 <li><a href="#analyze">Analyze the Feed</a></li> 21 <li><a href="#instantiate">Instantiate the Parser</a></li> 22 <li><a href="#read">Read the Feed</a></li> 23 <li><a href="#parse">Parse XML</a></li> 24 <li><a href="#skip">Skip Tags You Don't Care About</a></li> 25 <li><a href="#consume">Consume XML Data</a></li> 26</ol> 27 28<h2>You should also read</h2> 29<ul> 30 <li><a href="{@docRoot}guide/webapps/index.html">Web Apps Overview</a></li> 31</ul> 32 33<h2>Try it out</h2> 34 35<div class="download-box"> 36 <a href="{@docRoot}shareables/training/NetworkUsage.zip" 37class="button">Download the sample</a> 38 <p class="filename">NetworkUsage.zip</p> 39</div> 40 41</div> 42</div> 43 44<p>Extensible Markup Language (XML) is a set of rules for encoding documents in 45machine-readable form. XML is a popular format for sharing data on the internet. 46Websites that frequently update their content, such as news sites or blogs, 47often provide an XML feed so that external programs can keep abreast of content 48changes. Uploading and parsing XML data is a common task for network-connected 49apps. This lesson explains how to parse XML documents and use their data.</p> 50 51<h2 id="choose">Choose a Parser</h2> 52 53<p>We recommend {@link org.xmlpull.v1.XmlPullParser}, which is an efficient and 54maintainable way to parse XML on Android. Historically Android has had two 55implementations of this interface:</p> 56 57<ul> 58 <li><a href="http://kxml.sourceforge.net/"><code>KXmlParser</code></a> 59 via {@link org.xmlpull.v1.XmlPullParserFactory#newPullParser XmlPullParserFactory.newPullParser()}. 60 </li> 61 <li><code>ExpatPullParser</code>, via 62 {@link android.util.Xml#newPullParser Xml.newPullParser()}. 63 </li> 64</ul> 65 66<p>Either choice is fine. The 67example in this section uses <code>ExpatPullParser</code>, via 68{@link android.util.Xml#newPullParser Xml.newPullParser()}. </p> 69 70<h2 id="analyze">Analyze the Feed</h2> 71 72<p>The first step in parsing a feed is to decide which fields you're interested in. 73The parser extracts data for those fields and ignores the rest.</p> 74 75<p>Here is an excerpt from the feed that's being parsed in the sample app. Each 76post to <a href="http://stackoverflow.com">StackOverflow.com</a> appears in the 77feed as an <code>entry</code> tag that contains several nested tags:</p> 78 79<pre><?xml version="1.0" encoding="utf-8"?> 80<feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" ..."> 81<title type="text">newest questions tagged android - Stack Overflow</title> 82... 83 <entry> 84 ... 85 </entry> 86 <entry> 87 <id>http://stackoverflow.com/q/9439999</id> 88 <re:rank scheme="http://stackoverflow.com">0</re:rank> 89 <title type="text">Where is my data file?</title> 90 <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="android"/> 91 <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="file"/> 92 <author> 93 <name>cliff2310</name> 94 <uri>http://stackoverflow.com/users/1128925</uri> 95 </author> 96 <link rel="alternate" href="http://stackoverflow.com/questions/9439999/where-is-my-data-file" /> 97 <published>2012-02-25T00:30:54Z</published> 98 <updated>2012-02-25T00:30:54Z</updated> 99 <summary type="html"> 100 <p>I have an Application that requires a data file...</p> 101 102 </summary> 103 </entry> 104 <entry> 105 ... 106 </entry> 107... 108</feed></pre> 109 110<p>The sample app 111extracts data for the <code>entry</code> tag and its nested tags 112<code>title</code>, <code>link</code>, and <code>summary</code>.</p> 113 114 115<h2 id="instantiate">Instantiate the Parser</h2> 116 117<p>The next step is to 118instantiate a parser and kick off the parsing process. In this snippet, a parser 119is initialized to not process namespaces, and to use the provided {@link 120java.io.InputStream} as its input. It starts the parsing process with a call to 121{@link org.xmlpull.v1.XmlPullParser#nextTag() nextTag()} and invokes the 122<code>readFeed()</code> method, which extracts and processes the data the app is 123interested in:</p> 124 125<pre>public class StackOverflowXmlParser { 126 // We don't use namespaces 127 private static final String ns = null; 128 129 public List<Entry> parse(InputStream in) throws XmlPullParserException, IOException { 130 try { 131 XmlPullParser parser = Xml.newPullParser(); 132 parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false); 133 parser.setInput(in, null); 134 parser.nextTag(); 135 return readFeed(parser); 136 } finally { 137 in.close(); 138 } 139 } 140 ... 141}</pre> 142 143<h2 id="read">Read the Feed</h2> 144 145<p>The <code>readFeed()</code> method does the actual work of processing the 146feed. It looks for elements tagged "entry" as a starting point for recursively 147processing the feed. If a tag isn't an {@code entry} tag, it skips it. Once the whole 148feed has been recursively processed, <code>readFeed()</code> returns a {@link 149java.util.List} containing the entries (including nested data members) it 150extracted from the feed. This {@link java.util.List} is then returned by the 151parser.</p> 152 153<pre> 154private List<Entry> readFeed(XmlPullParser parser) throws XmlPullParserException, IOException { 155 List<Entry> entries = new ArrayList<Entry>(); 156 157 parser.require(XmlPullParser.START_TAG, ns, "feed"); 158 while (parser.next() != XmlPullParser.END_TAG) { 159 if (parser.getEventType() != XmlPullParser.START_TAG) { 160 continue; 161 } 162 String name = parser.getName(); 163 // Starts by looking for the entry tag 164 if (name.equals("entry")) { 165 entries.add(readEntry(parser)); 166 } else { 167 skip(parser); 168 } 169 } 170 return entries; 171}</pre> 172 173 174<h2 id="parse">Parse XML</h2> 175 176 177<p>The steps for parsing an XML feed are as follows:</p> 178<ol> 179 180 <li>As described in <a href="#analyze">Analyze the Feed</a>, identify the tags you want to include in your app. This 181example extracts data for the <code>entry</code> tag and its nested tags 182<code>title</code>, <code>link</code>, and <code>summary</code>.</li> 183 184<li>Create the following methods:</p> 185 186<ul> 187 188<li>A "read" method for each tag you're interested in. For example, 189<code>readEntry()</code>, <code>readTitle()</code>, and so on. The parser reads 190tags from the input stream. When it encounters a tag named <code>entry</code>, 191<code>title</code>, 192<code>link</code> or <code>summary</code>, it calls the appropriate method 193for that tag. Otherwise, it skips the tag. 194</li> 195 196<li>Methods to extract data for each different type of tag and to advance the 197parser to the next tag. For example: 198<ul> 199 200<li>For the <code>title</code> and <code>summary</code> tags, the parser calls 201<code>readText()</code>. This method extracts data for these tags by calling 202<code>parser.getText()</code>.</li> 203 204<li>For the <code>link</code> tag, the parser extracts data for links by first 205determining if the link is the kind 206it's interested in. Then it uses <code>parser.getAttributeValue()</code> to 207extract the link's value.</li> 208 209<li>For the <code>entry</code> tag, the parser calls <code>readEntry()</code>. 210This method parses the entry's nested tags and returns an <code>Entry</code> 211object with the data members <code>title</code>, <code>link</code>, and 212<code>summary</code>.</li> 213 214</ul> 215</li> 216<li>A helper <code>skip()</code> method that's recursive. For more discussion of this topic, see <a href="#skip">Skip Tags You Don't Care About</a>.</li> 217</ul> 218 219 </li> 220</ol> 221 222<p>This snippet shows how the parser parses entries, titles, links, and summaries.</p> 223<pre>public static class Entry { 224 public final String title; 225 public final String link; 226 public final String summary; 227 228 private Entry(String title, String summary, String link) { 229 this.title = title; 230 this.summary = summary; 231 this.link = link; 232 } 233} 234 235// Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off 236// to their respective "read" methods for processing. Otherwise, skips the tag. 237private Entry readEntry(XmlPullParser parser) throws XmlPullParserException, IOException { 238 parser.require(XmlPullParser.START_TAG, ns, "entry"); 239 String title = null; 240 String summary = null; 241 String link = null; 242 while (parser.next() != XmlPullParser.END_TAG) { 243 if (parser.getEventType() != XmlPullParser.START_TAG) { 244 continue; 245 } 246 String name = parser.getName(); 247 if (name.equals("title")) { 248 title = readTitle(parser); 249 } else if (name.equals("summary")) { 250 summary = readSummary(parser); 251 } else if (name.equals("link")) { 252 link = readLink(parser); 253 } else { 254 skip(parser); 255 } 256 } 257 return new Entry(title, summary, link); 258} 259 260// Processes title tags in the feed. 261private String readTitle(XmlPullParser parser) throws IOException, XmlPullParserException { 262 parser.require(XmlPullParser.START_TAG, ns, "title"); 263 String title = readText(parser); 264 parser.require(XmlPullParser.END_TAG, ns, "title"); 265 return title; 266} 267 268// Processes link tags in the feed. 269private String readLink(XmlPullParser parser) throws IOException, XmlPullParserException { 270 String link = ""; 271 parser.require(XmlPullParser.START_TAG, ns, "link"); 272 String tag = parser.getName(); 273 String relType = parser.getAttributeValue(null, "rel"); 274 if (tag.equals("link")) { 275 if (relType.equals("alternate")){ 276 link = parser.getAttributeValue(null, "href"); 277 parser.nextTag(); 278 } 279 } 280 parser.require(XmlPullParser.END_TAG, ns, "link"); 281 return link; 282} 283 284// Processes summary tags in the feed. 285private String readSummary(XmlPullParser parser) throws IOException, XmlPullParserException { 286 parser.require(XmlPullParser.START_TAG, ns, "summary"); 287 String summary = readText(parser); 288 parser.require(XmlPullParser.END_TAG, ns, "summary"); 289 return summary; 290} 291 292// For the tags title and summary, extracts their text values. 293private String readText(XmlPullParser parser) throws IOException, XmlPullParserException { 294 String result = ""; 295 if (parser.next() == XmlPullParser.TEXT) { 296 result = parser.getText(); 297 parser.nextTag(); 298 } 299 return result; 300} 301 ... 302}</pre> 303 304<h2 id="skip">Skip Tags You Don't Care About</h2> 305 306<p>One of the steps in the XML parsing described above is for the parser to skip tags it's not interested in. Here is the parser's <code>skip()</code> method:</p> 307 308<pre> 309private void skip(XmlPullParser parser) throws XmlPullParserException, IOException { 310 if (parser.getEventType() != XmlPullParser.START_TAG) { 311 throw new IllegalStateException(); 312 } 313 int depth = 1; 314 while (depth != 0) { 315 switch (parser.next()) { 316 case XmlPullParser.END_TAG: 317 depth--; 318 break; 319 case XmlPullParser.START_TAG: 320 depth++; 321 break; 322 } 323 } 324 } 325</pre> 326 327<p>This is how it works:</p> 328 329<ul> 330 331<li>It throws an exception if the current event isn't a 332<code>START_TAG</code>.</li> 333 334<li>It consumes the <code>START_TAG</code>, and all events up to and including 335the matching <code>END_TAG</code>.</li> 336 337<li>To make sure that it stops at the correct <code>END_TAG</code> and not at 338the first tag it encounters after the original <code>START_TAG</code>, it keeps 339track of the nesting depth.</li> 340 341</ul> 342 343<p>Thus if the current element has nested elements, the value of 344<code>depth</code> won't be 0 until the parser has consumed all events between 345the original <code>START_TAG</code> and its matching <code>END_TAG</code>. For 346example, consider how the parser skips the <code><author></code> element, 347which has 2 nested elements, <code><name></code> and 348<code><uri></code>:</p> 349 350<ul> 351 352<li>The first time through the <code>while</code> loop, the next tag the parser 353encounters after <code><author></code> is the <code>START_TAG</code> for 354<code><name></code>. The value for <code>depth</code> is incremented to 3552.</li> 356 357<li>The second time through the <code>while</code> loop, the next tag the parser 358encounters is the <code>END_TAG</code> <code></name></code>. The value 359for <code>depth</code> is decremented to 1.</li> 360 361<li>The third time through the <code>while</code> loop, the next tag the parser 362encounters is the <code>START_TAG</code> <code><uri></code>. The value 363for <code>depth</code> is incremented to 2.</li> 364 365<li>The fourth time through the <code>while</code> loop, the next tag the parser 366encounters is the <code>END_TAG</code> <code></uri></code>. The value for 367<code>depth</code> is decremented to 1.</li> 368 369<li>The fifth time and final time through the <code>while</code> loop, the next 370tag the parser encounters is the <code>END_TAG</code> 371<code></author></code>. The value for <code>depth</code> is decremented to 3720, indicating that the <code><author></code> element has been successfully 373skipped.</li> 374 375</ul> 376 377<h2 id="consume">Consume XML Data</h2> 378 379<p>The example application fetches and parses the XML feed within an {@link 380android.os.AsyncTask}. This takes the processing off the main UI thread. When 381processing is complete, the app updates the UI in the main activity 382(<code>NetworkActivity</code>).</p> 383<p>In the excerpt shown below, the <code>loadPage()</code> method does the 384following:</p> 385 386<ul> 387 388 <li>Initializes a string variable with the URL for the XML feed.</li> 389 390 <li>If the user's settings and the network connection allow it, invokes 391<code>new DownloadXmlTask().execute(url)</code>. This instantiates a new 392<code>DownloadXmlTask</code> object ({@link android.os.AsyncTask} subclass) and 393runs its {@link android.os.AsyncTask#execute execute()} method, which downloads 394and parses the feed and returns a string result to be displayed in the UI.</li> 395 396</ul> 397<pre> 398public class NetworkActivity extends Activity { 399 public static final String WIFI = "Wi-Fi"; 400 public static final String ANY = "Any"; 401 private static final String URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest"; 402 403 // Whether there is a Wi-Fi connection. 404 private static boolean wifiConnected = false; 405 // Whether there is a mobile connection. 406 private static boolean mobileConnected = false; 407 // Whether the display should be refreshed. 408 public static boolean refreshDisplay = true; 409 public static String sPref = null; 410 411 ... 412 413 // Uses AsyncTask to download the XML feed from stackoverflow.com. 414 public void loadPage() { 415 416 if((sPref.equals(ANY)) && (wifiConnected || mobileConnected)) { 417 new DownloadXmlTask().execute(URL); 418 } 419 else if ((sPref.equals(WIFI)) && (wifiConnected)) { 420 new DownloadXmlTask().execute(URL); 421 } else { 422 // show error 423 } 424 }</pre> 425 426<p>The {@link android.os.AsyncTask} subclass shown below, 427<code>DownloadXmlTask</code>, implements the following {@link 428android.os.AsyncTask} methods:</p> 429 430 <ul> 431 432 <li>{@link android.os.AsyncTask#doInBackground doInBackground()} executes 433the method <code>loadXmlFromNetwork()</code>. It passes the feed URL as a 434parameter. The method <code>loadXmlFromNetwork()</code> fetches and processes 435the feed. When it finishes, it passes back a result string.</li> 436 437 <li>{@link android.os.AsyncTask#onPostExecute onPostExecute()} takes the 438returned string and displays it in the UI.</li> 439 440 </ul> 441 442<pre> 443// Implementation of AsyncTask used to download XML feed from stackoverflow.com. 444private class DownloadXmlTask extends AsyncTask<String, Void, String> { 445 @Override 446 protected String doInBackground(String... urls) { 447 try { 448 return loadXmlFromNetwork(urls[0]); 449 } catch (IOException e) { 450 return getResources().getString(R.string.connection_error); 451 } catch (XmlPullParserException e) { 452 return getResources().getString(R.string.xml_error); 453 } 454 } 455 456 @Override 457 protected void onPostExecute(String result) { 458 setContentView(R.layout.main); 459 // Displays the HTML string in the UI via a WebView 460 WebView myWebView = (WebView) findViewById(R.id.webview); 461 myWebView.loadData(result, "text/html", null); 462 } 463}</pre> 464 465 <p>Below is the method <code>loadXmlFromNetwork()</code> that is invoked from 466<code>DownloadXmlTask</code>. It does the following:</p> 467 468 <ol> 469 470 <li>Instantiates a <code>StackOverflowXmlParser</code>. It also creates variables for 471a {@link java.util.List} of <code>Entry</code> objects (<code>entries</code>), and 472<code>title</code>, <code>url</code>, and <code>summary</code>, to hold the 473values extracted from the XML feed for those fields.</li> 474 475 <li>Calls <code>downloadUrl()</code>, which fetches the feed and returns it as 476 an {@link java.io.InputStream}.</li> 477 478 <li>Uses <code>StackOverflowXmlParser</code> to parse the {@link java.io.InputStream}. 479 <code>StackOverflowXmlParser</code> populates a 480 {@link java.util.List} of <code>entries</code> with data from the feed.</li> 481 482 <li>Processes the <code>entries</code> {@link java.util.List}, 483 and combines the feed data with HTML markup.</li> 484 485 <li>Returns an HTML string that is displayed in the main activity 486UI by the {@link android.os.AsyncTask} method {@link 487android.os.AsyncTask#onPostExecute onPostExecute()}.</li> 488 489</ol> 490 491<pre> 492// Uploads XML from stackoverflow.com, parses it, and combines it with 493// HTML markup. Returns HTML string. 494private String loadXmlFromNetwork(String urlString) throws XmlPullParserException, IOException { 495 InputStream stream = null; 496 // Instantiate the parser 497 StackOverflowXmlParser stackOverflowXmlParser = new StackOverflowXmlParser(); 498 List<Entry> entries = null; 499 String title = null; 500 String url = null; 501 String summary = null; 502 Calendar rightNow = Calendar.getInstance(); 503 DateFormat formatter = new SimpleDateFormat("MMM dd h:mmaa"); 504 505 // Checks whether the user set the preference to include summary text 506 SharedPreferences sharedPrefs = PreferenceManager.getDefaultSharedPreferences(this); 507 boolean pref = sharedPrefs.getBoolean("summaryPref", false); 508 509 StringBuilder htmlString = new StringBuilder(); 510 htmlString.append("<h3>" + getResources().getString(R.string.page_title) + "</h3>"); 511 htmlString.append("<em>" + getResources().getString(R.string.updated) + " " + 512 formatter.format(rightNow.getTime()) + "</em>"); 513 514 try { 515 stream = downloadUrl(urlString); 516 entries = stackOverflowXmlParser.parse(stream); 517 // Makes sure that the InputStream is closed after the app is 518 // finished using it. 519 } finally { 520 if (stream != null) { 521 stream.close(); 522 } 523 } 524 525 // StackOverflowXmlParser returns a List (called "entries") of Entry objects. 526 // Each Entry object represents a single post in the XML feed. 527 // This section processes the entries list to combine each entry with HTML markup. 528 // Each entry is displayed in the UI as a link that optionally includes 529 // a text summary. 530 for (Entry entry : entries) { 531 htmlString.append("<p><a href='"); 532 htmlString.append(entry.link); 533 htmlString.append("'>" + entry.title + "</a></p>"); 534 // If the user set the preference to include summary text, 535 // adds it to the display. 536 if (pref) { 537 htmlString.append(entry.summary); 538 } 539 } 540 return htmlString.toString(); 541} 542 543// Given a string representation of a URL, sets up a connection and gets 544// an input stream. 545private InputStream downloadUrl(String urlString) throws IOException { 546 URL url = new URL(urlString); 547 HttpURLConnection conn = (HttpURLConnection) url.openConnection(); 548 conn.setReadTimeout(10000 /* milliseconds */); 549 conn.setConnectTimeout(15000 /* milliseconds */); 550 conn.setRequestMethod("GET"); 551 conn.setDoInput(true); 552 // Starts the query 553 conn.connect(); 554 return conn.getInputStream(); 555}</pre> 556