Product attribute group names and attribute names unknown, different from product to product. Number of attributes group in html table unknown, but we can count it with
product_attribute_group_number = response.xpath('count(//th)').extract()
print ‘###product_attribute_group_number###’, int(float(product_attribute_group_number))
We can loop over every attribute group with:
for x in range (1,product_attribute_group_number):
for sel in response.xpath('//tr[th]/following-sibling::tr[count(.|//tr[th]/preceding-sibling::tr)=count(//tr[th]/preceding-sibling::tr)]|//tr[th]' %(x, x+1, x+1, x)):
product_attribute_group_name = sel.xpath('th/text()').extract()
print ‘###product_attribute_group_name###’, product_attribute_group_name
item = {}
for prop_row in product_attributes:
try:
prop = prop_row.xpath('th/text()').extract()
except IndexError, e:
print e# or pass, do nothing just ignore that row
prop = prop.strip()
try:
val = prop_row.xpath('td/text()').extract()
except IndexError, e:
print e# or pass, do nothing just ignore that row…
val = val.strip()
item = val
yield item
Is it correct way with correct selector xpath? Next question: what is correct selector xpath for last attributes group? (It hasn`t following-sibling::tr) Are there more elegant methods to parse html table with product attributes which are grouped to attribute groups?
Table example:
Operating System
![](/static/djangobb_forum/img/smilies/sad.png)
OS(attribute name) Windows 8(attribute value)
OS Language(attribute name) English(attribute value)
Audio
![](/static/djangobb_forum/img/smilies/sad.png)
Speakers(attribute name) Stereo Speakers(attribute value)
Mic In(attribute name) Yes(attribute value)
Headphone(attribute name) Yes(attribute value)
Battery
![](/static/djangobb_forum/img/smilies/sad.png)
Battery Type(attribute name) 4 Cell Li-ion(attribute value)
Battery life(attribute name) 41 WHr(attribute value)
<div class=“parameters-wrapper”>
<table class=“techSpecs”>
<tr>
<th class=“tech-specs-category” colspan=“2”>Operating System:</th>
</tr>
<tr>
<th>OS</th>
<td>Windows 8</td>
</tr>
<tr>
<th>OS Language</th>
<td>English</td>
</tr>
<tr>
<th class=“tech-specs-category” colspan=“2”>Audio:</th>
</tr>
<tr>
<th>Speakers</th>
<td>Stereo Speakers</td>
</tr>
<tr>
<th>Mic In</th>
<td>Yes</td>
</tr>
<tr>
<th>Headphone</th>
<td>Yes</td>
</tr>
<tr>
<th class=“tech-specs-category” colspan=“2”>Battery:</th>
</tr>
<tr>
<th>Battery Type</th>
<td>4 Cell Li-ion</td>
</tr>
<tr>
<th>Battery life</th>
<td>41 WHr</td>
</tr>
</table>
</div>