140 likes | 151 Views
Learn about the challenges with content transformations in Alfresco, including inconsistent results, non-deterministic behavior, and lack of visibility. Discover how to improve performance and transparency.
E N D
Transformations – WTF’s going on? • Andy.hunt@alfresco.com
Basics… • What’s a Transformation? • Indexing • Doclib Thumbnails • Previews • Rules • ….
What’s the problem? Lots of transformers Lots of mimetypes Lots of permutations of the above Inconsistent results / Non-deterministic Transformations not working Lack of visibility
How does Alfresco choose? Active Transformers “Explicit” takes precedence Any Limits Speed
Make it transparent Log4j.logger. org.alfresco.repo.content.transform .TransformerDebug = DEBUG debugTransfomers.txt Exactly 18 bytes
Example 1 – txt to html ] 193 text/plain text/html ] 193 txt html 24.txt 5 bytes ContentService.transform(...) ] 193 **a) transformer.complex.OpenOffice.PdfBox<<Complex>>< 5 MB204 ms ] 193 b) transformer.OpenOffice<<Proxy>>1,918 ms ] 193 c) transformer.TikaAuto6,724 ms ] 193.1 text/plain text/html ] 193.1 txt html 24.txt 5 bytes transformer.complex.OpenOffice.PdfBox<<Complex>> ] 193.1.1 text/plain application/pdf ] 193.1.1 txt pdf 24.txt 5 bytes transformer.OpenOffice<<Proxy>> ] 193.1.1 Finished in 43 ms ] 193.1.2 store:///installs/3411e/tomcat/temp/Alfresco/ComplextTransformer_intermediate_txt_5927671274426616985.pdf ] 193.1.2 application/pdf text/html ] 193.1.2 pdf html <<TemporaryFile>> 6.2 KB transformer.PdfBox ] 193.1.2 Finished in 7 ms ] 193.1 Finished in 50 ms ] 193 Finished in 56 ms
Example 2 – large txt to html ] 204 txt html alfresco.biggerlog.txt 16.5 MB ContentService.transform(...) ] 204 **a) transformer.TikaAuto526 ms ] 204 b) transformer.OpenOffice<<Proxy>>853 ms ] 204 --c) transformer.complex.OpenOffice.PdfBox<<Complex>>> 5 MB
Example lists ] 13.2 transformer.StringExtracter0 ms ] 13.2 1) txt txt unlimited ] 13.2 2) csv txt unlimited ] 13.2 3) html txt unlimiteddisabled not explicit ] 14.1243 txt jp2 a) transformer.complex.OpenOffice.Image<<Complex>>1,171 ms5 MB ] 14.1249 txt txt a) transformer.StringExtracter0 msunlimited ] 14.1249 b) transformer.TikaAuto0 msunlimited ] 14.1249 c) transformer.complex.OpenOffice.PdfBox<<Complex>>0 ms0 bytes disabled
What can we do? Available transformers content-services-context.xml <!-- This one does excel only --> <bean id="transformer.Poi" class="org.alfresco.repo.content.transform.PoiHssfContentTransformer" parent="baseContentTransformer" />
What can we do? Explicit transformers html txt a) transformer.StringExtracter 0 ms unlimited disabled not explicit b) transformer.OpenOffice<<Proxy>> 831 ms 0 bytes disabled not explicit c) transformer.TikaAuto 0 ms unlimited disabled not explicit d) transformer.HtmlParser 0 ms unlimited EXPLICIT e) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms unlimited disabled not explicit <property name="explicitTransformations"> <list> <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails" > <property name="sourceMimetype"><value>text/html</value></property> <property name="targetMimetype"><value>text/plain</value></property> </bean> </list> </property>
What can we do? Explicit transformers - 2 html txt a) transformer.StringExtracter 0 ms unlimited disabled not explicit b) transformer.OpenOffice<<Proxy>> 831 ms 0 bytes disabled not explicit c) transformer.TikaAuto 0 ms unlimited disabled not explicit d) transformer.HtmlParser 0 ms unlimited EXPLICIT e) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms unlimited disabled not explicit <property name="supportedTransformations"> <list> <bean class="org.alfresco.repo.content.transform.SupportedTransformation" > <property name="sourceMimetype"><value>text/html</value></property> <property name="targetMimetype"><value>text/csv</value></property> </bean> </list> </property>
What can we do? Any Limits maxSourceSizeKBytes content.transformer.PdfBox.TextToPdf.maxSourceSizeKBytes Listed in repository.properties content.transformer.default.maxSourceSizeKBytes=-1
What can we do? Speed - Startup Averages transformer.OpenOffice.time=123456 transformer.PdfBox.TextToPdf.time=50000 transformer.complex.Text.Image.time=10000 transformer.complex.Text.Image.count=10000