guideSupported content formatting features

Listed below are all content formatting features that are currently supported by the Import from Word converter.

# Inline formatting

Support for inline formatting includes conversion of inline styling (from UI toolbar buttons), styles, and default document properties.

The inline formatting is generated as native HTML tags (e.g. <strong> for bold text) or as <span> tags with proper CSS styling declarations. Applying multiple inline styles to the same content will produce nested HTML, like:

<p>
    <span style="font-size: 21.33px; font-family: Georgia, serif">
        <i>
            <u>Hello, World!</u>
        </i>
    </span>
</p>

All inline content is placed inside a paragraph (<p>) element.

# Basic styling

Provides support for basic inline styling, including bold, italics, underline, strikethrough.

Basic styles are converted to semantic HTML tags, making the content more accessible and SEO-friendly.

Feature name Markup
Bold <strong>
Italics <i>
Underline <u>
Strikethrough <s>

# Font styles

Provides support for different font styles, including font family, font size, font color, and background color and many more.

Feature name Markup
Font size <span style="font-size: 16px">
Font family <span style="font-family: 'Comic Sans MS'">
Font color <span style="color: #ff0000">
Font background <span style="background-color: #00ffff">
Subscript <sub>
Superscript <sup>
Small caps <span style="font-variant-caps: small-caps">
All caps <span style="text-transform: uppercase">
Letter spacing <span style="letter-spacing: 1px">
Font stretching <span style="font-stretch: 125%">

Provides support for links.

Feature name Markup
Link <a href="https://cksource.com">

Links can be fully styled just as any other inline content.

<p>
    <a href="https://cksource.com">
        <span style="color: #f12ec5; font-size: 26.67px;">
            <strong>
                <u>Colorful link!</u>
            </strong>
        </span>
    </a>
</p>

# Hidden text

Hidden text is supported by simply not outputting any HTML markup for it.

# Inline images

Inline images are converted to <img> tags with proper src and alt attributes.

Feature name Markup
Inline image <img alt="An image" src="https://cksource.com/image.png">

Images that include an external hyperlink are converted to inline images, wrapped in an anchor (<a>) tag.

# Known limitations for inline formatting

  • Due to their complicated representation in Word documents, tab characters are always converted to 4 spaces.

# Block formatting

Block-level Word features are converted to proper block-level HTML elements to represent the same semantic meaning and visual appearance. Some elements like images can be either inline or block-level, depending on their position in the document.

Similarly to inline formatting, block formatting can be directly applied to the Word content and (depending on the feature) defined as inline styling or a separate style, or it can be part of default document properties.

# Paragraphs

Any text that occurs in the document, regardless of its styling, is placed inside a paragraph element. Paragraphs are converted to <p> tags, and can be nested inside table cells and list items.

<p>
    Representative paragraph.
</p>

Paragraphs in Word can additionally be styled with the following features:

Feature name Markup
Text alignment <p style="text-align: right">
Indentation <p style="margin-left: 48px">
First line indentation <p style="text-indent: 190px">
Hanging indentation <p style="margin-left: 190px; text-indent: -190px">
Line height <p style="line-height: 1.5">
Paragraph spacing <p style="margin-top: 20px; margin-bottom: 10px">
Paragraph borders <p style="border-top: 1px solid #000000">
Background color <p style="background-color: #ffc000">

# Known limitations

  • The “at least” line height type is treated the same as the “exact” one.
  • Spacing between paragraphs of the same style is always preserved, even when explicitly disabled.

# Headings

Headings in Word documents are represented by paragraphs with a special formatting property called “outline level”. It can be applied either by using styles, such as the built-in “Heading 1” style, or by paragraph formatting.

This property determines into what heading a paragraph will be converted. The converter only supports headings from <h1> up to <h6>. If there are some headings imported outside this span, they will be clamped to the closest respective level (i.e. levels larger than 1 will be turned into <h1> while levels below 6 will be rendered as <h6>). Otherwise, they act like paragraphs.

Styles that can be applied to paragraphs can also be applied to headings.

<h1>Heading 1</h1>
<h2 style="color: #ffc000">Colored heading 2</h2>

# Images

Depending on formatting of the paragraph containing an image as well as the presence of other content within the same paragraph, some images are treated as block ones instead of the default inline layout.

Block images are represented as <figure> elements with <img> tag inside, to keep the same content semantics as CKEditor 5.

<figure class="image">
    <img src="https://cksource.com/image.png" alt="An image" />
</figure>

Every figure element includes the image class, giving the integrator an easy way to apply custom styles to all block images, e.g. to reposition them inside the document.

Import from Word is able to recognize both embedded images in the document and images that come from external sources.

Images that include an external link are converted to inline images wrapped in <a> tag.

Additionally, the converter supports these Word features:

Feature name Markup
Alternative text <img alt="Alternative text" src="..." />
Image height <img style="height: 100px" height="100" src="..." />
Image width <img style="width: 100px" width="100" src="..." />

Image caption is converted to normal paragraph with applied proper caption styling.

# Known limitations

  • Positioning and text wrapping settings are not supported.
  • Captions are currently converted to regular paragraphs.

# Lists

Import from Word support both ordered and unordered lists that are available in Word, converting them to proper HTML elements. List items text content is always wrapped within a <p> element.

The most basic list structure of an unordered list is shown below:

<ul style="list-style-type: disc">
  <li><p>List item 1</p></li>
  <li>
    <p>List item 2</p>
    <ul style="list-style-type: circle">
      <li><p>List item 3</p></li>
    </ul>
  </li>
</ul>

And the basic structure for an ordered list:

<ol style="list-style-type: decimal">
  <li><p>List item 1</p></li>
  <li>
    <p>List item 2</p>
    <ol style="list-style-type: lower-latin">
      <li><p>List item 3</p></li>
    </ol>
  </li>
</ol>

The converter supports multi-level lists that do not include some intermediary levels. Such lists are created by indenting list items. As an example, we can have a list that, after the first level, skips the second one and goes directly to the third one. This is supported by the converter, and the result will be the following:

<ol style="list-style-type: decimal">
  <li>
    <p>Level 1</p>
    <ul style="list-style-type: none">
      <li>
        <ol style="list-style-type: lower-roman">
          <li><p>Level 3</p></li>
        </ol>
      </li>
    </ul>
  </li>
</ol>

Ordered lists can start with a different number than 1 by utilizing the available start HTML attribute that matches the proper list level from Word document:

<ol start="4" style="list-style-type: decimal">
  <li><p>Item 4</p></li>
  <li><p>Item 5</p></li>
  <li><p>Item 6</p></li>
</ol>

Due to the fact that Word lets the user set any character as a marker in unordered lists, the converter will first try to recognize the marker character and match it to supported ones as closely as possible. If the used marker cannot be recognized, it will fall back to the disc list style type.

  • Default supported unordered list styles are: disc, circle, square, and none.
  • Default supported ordered list styles are: decimal, lower-latin, lower-roman, and decimal-leading-zero.
  • Other numbering types, such as Hebrew or Thai, are also supported. However, some of them do not have a CSS counterpart, and they are converted to some other numbering that is available in CSS. An example of such a numbering is the numbering using the Russian alphabet that is converted to lower-latin.

# Known limitations

  • Due to differences between built-in DOCX and CSS numbering definitions, some numbering types may have discrepancies. The extent of those discrepancies depends on specific numbering, so for some lists the first list item with different numbering may be the thousandth one, whereas for other ones the numbering may start being different at around the 30th item.
  • Custom list markers are not supported. This applies to some built-in list styles, and, in general, to markers that cannot be represented by the built-in CSS list style types.
  • Marker detection, due to DOCX representation of unordered lists, depends on fonts used for list markers and cannot be generalized. This means that there is always a possibility of producing an invalid marker.
  • Indentation of list items is not supported right now.

# Tables

Tables are represented as <figure> elements with a <table> element inside, to keep the same content semantics as CKEditor 5.

Every figure element includes table class, giving the integrator an easy way to apply custom styles to all tables, e.g. to reposition them inside the document.

An example of a simple table that will be output by the converter (styles skipped for clarity):

<figure class="table">
  <table>
    <tbody>
      <tr>
        <td>
          <p>Cell 1</p>
        </td>
        <td>
          <p>Cell 2</p>
        </td>
      </tr>
    </tbody>
  </table>
</figure>

Tables support most features that are available in Word, including:

Feature name Markup
Table / cell width <td style="width: 100px;">
Table / cell height <td style="height: 20px;">
Cell merging <td colspan="2">
Cell padding <td style="padding-top: 20px">
Cell spacing <table style="border-spacing: 10px">
Cell’s vertical alignment <td style="vertical-align: top">
Table background color <table style="background-color: #f4b083;">
Cell background color <td style="background-color: #f4b083;">
Table border style <table style="border-top: 1px solid #f4b083;">
Cell border style <td style="border-top: 1px solid #f4b083;">
Table header <th scope="col">
Table alignment/floating <figure style="margin-left: auto; margin-right: 0">

Table nesting is properly supported by the converter, it’s possible to have multiple tables inside each other. Table captions are converted to normal paragraphs to keep the same representation of the caption as in Word.

# Known limitations

  • Conditional table formatting is not supported.
  • Captions are currently converted to regular paragraphs.
  • Some border styles may not be properly resolved in the browser, as the resolution of border conflicts differs between HTML and Word.

# Page breaks

Page breaks in Word can be added in two ways: by using the “Page Break” button in the toolbar, or by applying a special “Page break before” paragraph formatting that adds a page break before the paragraph.

Both those methods are supported by Import from Word, and produce the following HTML:

<div class="page-break" style="page-break-after: always">
  <span style="display: none">&nbsp;</span>
</div>

# Known limitations

  • Page breaks applied through the paragraph formatting are not supported inside tables.

# Horizontal lines

Horizontal lines in Word are, similarly to HTML, designed to guide the flow of a text or to separate parts of a document. Import from Word properly recognizes and converts horizontal lines to semantically correct <hr> elements.

<p>Paragraph 1</p>
<hr />
<p>Paragraph 2</p>

# Known limitations

  • Horizontal line styling is not supported at the moment.

# Complex objects

Word incorporates more complex objects like a table of contents, text boxes, citation fields, etc. Import from Word attempts to support all of them, but some of them may not be converted properly due to limitations of HTML and CSS itself.

# Known limitations

  • Content of a table of contents is preserved, but it does not support content bookmarks.
  • Form objects: text and styling is retained only.