使用Shell脚本获取多行XML中的标签值[重复]

 收藏

我有一个xml文件,如下所示

<Module dataPath="/abc/def/xyz" handler="DataRegistry" id="id1" path="test.so"/>
<Module id="id2" path="/my/file/path">
  <Config>
    <Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/xyz" />
    <Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/xyz" id="V2"/>
  </Config>
</Module>

I just want to extract value of dataPath from every moduleid.

我正在使用,命令像

`grep 'id2' file | grep -ioPm1 "(?<=DataPath=)[^ ]+"`

这是从第一个模块ID给我的,而不是从第二个模块ID给我的。因为第二个模块在多行中。

如何使用Shell脚本执行此操作?

所需的输出将是–如果我想获取id1模块的数据路径,则应该获取

/my/file/path

对于第二个模块ID,例如ID2,我应该用逗号分隔数据路径

/my/file/path, /my/file/path

Or my second approach to grep the datapath is to replace the newline character between <Module and </Module> only, then i can use grep.

回复
  • 輸給o時咣 回复

    -m1 tells grep to exit after first matching line, that's why it prints only one line of output.
    I wouldn't use a line oriented tool for this though. There are more convenient tools out there for parsing XML, such as :

    xml sel -t -m '//@dataPath' -v . -n file.xml
    

  • yhic 回复

    首先,我的答案假设您具有实际格式正确的源XML。您提供的示例代码没有根元素-但我认为仍然存在根元素。

    Bash功能本身并不是非常适合解析XML。

    This renowned Bash FAQ states the following:

    Do not attempt [to extract data from an XML file] with , , , and so on (it leads to undesired results)

    If you must use a shell script then utilize an XML specific command line tool, such as XMLStarlet or xsltproc. Refer to the download info here for XML Starlet if you don't have it installed already.

    解:

    1. Given your source XML and your desired output consider utilizing the following template to achieve this.

      template.xsl

      <?xml version="1.0" encoding="UTF-8"?>
      <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
        <xsl:output method="text"/>
      
        <xsl:template match="node()|@*">
          <xsl:apply-templates select="node()|@*"/>
        </xsl:template>
      
        <xsl:template match="Module">
          <xsl:choose>
      
            <xsl:when test="@dataPath and not(descendant::*/@dataPath)">
              <xsl:value-of select="@dataPath"/>
              <xsl:text>&#xa;</xsl:text>
            </xsl:when>
      
            <xsl:when test="not(@dataPath) and descendant::*/@dataPath">
              <xsl:for-each select="descendant::*/@dataPath">
                <xsl:value-of select="."/>
                <xsl:if test="position()!=last()">
                  <xsl:text>, </xsl:text>
                </xsl:if>
              </xsl:for-each>
              <xsl:text>&#xa;</xsl:text>
            </xsl:when>
      
            <xsl:when test="@dataPath and descendant::*/@dataPath">
              <xsl:value-of select="@dataPath"/>
              <xsl:text>, </xsl:text>
              <xsl:for-each select="descendant::*/@dataPath">
                <xsl:value-of select="."/>
                <xsl:if test="position()!=last()">
                  <xsl:text>, </xsl:text>
                </xsl:if>
              </xsl:for-each>
              <xsl:text>&#xa;</xsl:text>
            </xsl:when>
      
          </xsl:choose>
        </xsl:template>
      
      </xsl:stylesheet>
      
    2. Then run either;

      • the following XML Starlet command:

        $ xml tr /path/to/template.xsl /path/to/input.xml
        
      • Or the following xsltproc command:

        $ xsltproc /path/to/template.xsl /path/to/input.xml
        

      Note: The pathnames to template.xsl and input.xml in the aforementioned command(s) should be redefined to wherever those files reside.

      Either of the commands above essentially transform your input.xml file and print the desired results.

    演示:

    1. Using the following input.xml file:

      <?xml version="1.0" encoding="UTF-8"?>
      <root>
        <Module dataPath="/abc/def/1" handler="DataRegistry" id="id1" path="test.so"/>
      
        <Module id="id2" path="/my/file/path">
          <Config>
            <Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/2" />
            <Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/3" id="V2"/>
          </Config>
        </Module>
      
        <Module id="id3" path="/my/file/path" dataPath="/abc/def/4">
          <Config>
            <Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/5" />
            <Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/6" id="V2"/>
          </Config>
        </Module>
      
        <Module id="id4" path="/my/file/path" dataPath="/abc/def/7"/>
        <Module id="id5" path="/my/file/path" dataPath="/abc/def/8"/>
      
      
        <!-- The following <Module>'s have no associated `dataPath` attribute -->
        <Module id="id6">
          <Config>
            <Source cutoffpackage="1" dailyStart="20060819" id="V2"/>
          </Config>
        </Module>
      
        <Module id="id7"/>
      </root>
      
    2. Then running either of the aforementioned commands prints the following result:

      /abc/def/1
      /abc/def/2, /abc/def/3
      /abc/def/4, /abc/def/5, /abc/def/6
      /abc/def/7
      /abc/def/8
      

    附加说明:

    If you wanted to avoid the use of a separate .xsl file you could inline the aforementioned XSLT template in your shell script as follows:

    script.sh

    #!/usr/bin/env bash
    
    xslt() {
    cat <<EOX
    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
      <xsl:output method="text"/>
    
      <xsl:template match="node()|@*">
        <xsl:apply-templates select="node()|@*"/>
      </xsl:template>
    
      <xsl:template match="Module">
        <xsl:choose>
    
          <xsl:when test="@dataPath and not(descendant::*/@dataPath)">
            <xsl:value-of select="@dataPath"/>
            <xsl:text>&#xa;</xsl:text>
          </xsl:when>
    
          <xsl:when test="not(@dataPath) and descendant::*/@dataPath">
            <xsl:for-each select="descendant::*/@dataPath">
              <xsl:value-of select="."/>
              <xsl:if test="position()!=last()">
                <xsl:text>, </xsl:text>
              </xsl:if>
            </xsl:for-each>
            <xsl:text>&#xa;</xsl:text>
          </xsl:when>
    
          <xsl:when test="@dataPath and descendant::*/@dataPath">
            <xsl:value-of select="@dataPath"/>
            <xsl:text>, </xsl:text>
            <xsl:for-each select="descendant::*/@dataPath">
              <xsl:value-of select="."/>
              <xsl:if test="position()!=last()">
                <xsl:text>, </xsl:text>
              </xsl:if>
            </xsl:for-each>
            <xsl:text>&#xa;</xsl:text>
          </xsl:when>
    
        </xsl:choose>
      </xsl:template>
    
    </xsl:stylesheet>
    EOX
    }
    
    # 1. Using XML Startlet
    xml tr <(xslt) /path/to/input.xml
    
    # 2. Or using xsltproc
    xsltproc <(xslt) - </path/to/input.xml
    

    Note: The pathname to your input.xml, (i.e. the /path/to/input.xml part in script.sh above), should again be redefined to wherever that file resides.