Class RawText

java.lang.Object
org.eclipse.jgit.diff.Sequence
org.eclipse.jgit.diff.RawText

public class RawText extends Sequence
A Sequence supporting UNIX formatted text in byte[] format.

Elements of the sequence are the lines of the file, as delimited by the UNIX newline character ('\n'). The file content is treated as 8 bit binary text, with no assumptions or requirements on character encoding.

Note that the first line of the file is element 0, as defined by the Sequence interface API. Traditionally in a text editor a patch file the first line is line number 1. Callers may need to subtract 1 prior to invoking methods if they are converting from "line number" to "element index".

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final AtomicInteger
    Number of bytes to check for heuristics in isBinary(byte[]).
    protected final byte[]
    The file content for this sequence.
    static final RawText
    A RawText of length 0
    private static final int
    Default and minimum for BUFFER_SIZE.
    protected final IntList
    Map of line number to starting position within content.
  • Constructor Summary

    Constructors
    Constructor
    Description
    RawText(byte[] input)
    Create a new sequence from an existing content byte array.
    RawText(byte[] input, IntList lineMap)
    Create a new sequence from the existing content byte array and the line map indicating line boundaries.
    RawText(File file)
    Create a new sequence from a file.
  • Method Summary

    Modifier and Type
    Method
    Description
    protected String
    decode(int start, int end)
    Decode a region of the text into a String.
    static int
    Obtains the buffer size to use for analyzing whether certain content is text or binary, or what line endings are used if it's text.
    private int
    getEnd(int i)
     
    Get the line delimiter for the first line.
    byte[]
     
    getRawString(int i)
    Get the raw text for a single line.
    private int
    getStart(int i)
     
    getString(int i)
    Get the text for a single line.
    getString(int begin, int end, boolean dropLF)
    Get the text for a region of lines.
    static boolean
    isBinary(byte[] raw)
    Determine heuristically whether a byte array represents binary (as opposed to text) content.
    static boolean
    isBinary(byte[] raw, int length)
    Determine heuristically whether a byte array represents binary (as opposed to text) content.
    static boolean
    isBinary(byte[] raw, int length, boolean complete)
    Determine heuristically whether a byte array represents binary (as opposed to text) content.
    static boolean
    isBinary(byte curr, byte prev)
    Determines from the last two bytes read from a source if it looks like binary content.
    static boolean
    Determine heuristically whether the bytes contained in a stream represents binary (as opposed to text) content.
    static boolean
    isCrLfText(byte[] raw)
    Determine heuristically whether a byte array represents text content using CR-LF as line separator.
    static boolean
    isCrLfText(byte[] raw, int length)
    Determine heuristically whether a byte array represents text content using CR-LF as line separator.
    static boolean
    isCrLfText(byte[] raw, int length, boolean complete)
    Determine heuristically whether a byte array represents text content using CR-LF as line separator.
    static boolean
    Determine heuristically whether the bytes contained in a stream represent text content using CR-LF as line separator.
    boolean
    Determine if the file ends with a LF ('\n').
    static RawText
    load(ObjectLoader ldr, int threshold)
    Read a blob object into RawText, or throw BinaryBlobException if the blob is binary.
    static int
    setBufferSize(int bufferSize)
    Sets the buffer size to use for analyzing whether certain content is text or binary, or what line endings are used if it's text.
    int
    Get size
    void
    writeLine(OutputStream out, int i)
    Write a specific line to the output stream, without its trailing LF.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • EMPTY_TEXT

      public static final RawText EMPTY_TEXT
      A RawText of length 0
    • FIRST_FEW_BYTES

      private static final int FIRST_FEW_BYTES
      Default and minimum for BUFFER_SIZE.
      See Also:
    • BUFFER_SIZE

      private static final AtomicInteger BUFFER_SIZE
      Number of bytes to check for heuristics in isBinary(byte[]).
    • content

      protected final byte[] content
      The file content for this sequence.
    • lines

      protected final IntList lines
      Map of line number to starting position within content.
  • Constructor Details

    • RawText

      public RawText(byte[] input)
      Create a new sequence from an existing content byte array.

      The entire array (indexes 0 through length-1) is used as the content.

      Parameters:
      input - the content array. The object retains a reference to this array, so it should be immutable.
    • RawText

      public RawText(byte[] input, IntList lineMap)
      Create a new sequence from the existing content byte array and the line map indicating line boundaries.
      Parameters:
      input - the content array. The object retains a reference to this array, so it should be immutable.
      lineMap - an array with 1-based offsets for the start of each line. The first and last entries should be Integer.MIN_VALUE and an offset one past the end of the last line, respectively.
      Since:
      5.0
    • RawText

      public RawText(File file) throws IOException
      Create a new sequence from a file.

      The entire file contents are used.

      Parameters:
      file - the text file.
      Throws:
      IOException - if Exceptions occur while reading the file
  • Method Details

    • getRawContent

      public byte[] getRawContent()
      Returns:
      the raw, unprocessed content read.
      Since:
      4.11
    • size

      public int size()
      Get size
      Specified by:
      size in class Sequence
      Returns:
      size
    • writeLine

      public void writeLine(OutputStream out, int i) throws IOException
      Write a specific line to the output stream, without its trailing LF.

      The specified line is copied as-is, with no character encoding translation performed.

      If the specified line ends with an LF ('\n'), the LF is not copied. It is up to the caller to write the LF, if desired, between output lines.

      Parameters:
      out - stream to copy the line data onto.
      i - index of the line to extract. Note this is 0-based, so line number 1 is actually index 0.
      Throws:
      IOException - the stream write operation failed.
    • isMissingNewlineAtEnd

      public boolean isMissingNewlineAtEnd()
      Determine if the file ends with a LF ('\n').
      Returns:
      true if the last line has an LF; false otherwise.
    • getString

      public String getString(int i)
      Get the text for a single line.
      Parameters:
      i - index of the line to extract. Note this is 0-based, so line number 1 is actually index 0.
      Returns:
      the text for the line, without a trailing LF.
    • getRawString

      public ByteBuffer getRawString(int i)
      Get the raw text for a single line.
      Parameters:
      i - index of the line to extract. Note this is 0-based, so line number 1 is actually index 0.
      Returns:
      the text for the line, without a trailing LF, as a ByteBuffer that is backed by a slice of the raw content, with the buffer's position on the start of the line and the limit at the end.
      Since:
      5.12
    • getString

      public String getString(int begin, int end, boolean dropLF)
      Get the text for a region of lines.
      Parameters:
      begin - index of the first line to extract. Note this is 0-based, so line number 1 is actually index 0.
      end - index of one past the last line to extract.
      dropLF - if true the trailing LF ('\n') of the last returned line is dropped, if present.
      Returns:
      the text for lines [begin, end).
    • decode

      protected String decode(int start, int end)
      Decode a region of the text into a String. The default implementation of this method tries to guess the character set by considering UTF-8, the platform default, and falling back on ISO-8859-1 if neither of those can correctly decode the region given.
      Parameters:
      start - first byte of the content to decode.
      end - one past the last byte of the content to decode.
      Returns:
      the region [start, end) decoded as a String.
    • getStart

      private int getStart(int i)
    • getEnd

      private int getEnd(int i)
    • getBufferSize

      public static int getBufferSize()
      Obtains the buffer size to use for analyzing whether certain content is text or binary, or what line endings are used if it's text.
      Returns:
      the buffer size, by default FIRST_FEW_BYTES bytes
      Since:
      6.0
    • setBufferSize

      public static int setBufferSize(int bufferSize)
      Sets the buffer size to use for analyzing whether certain content is text or binary, or what line endings are used if it's text. If the given bufferSize is smaller than FIRST_FEW_BYTES set the buffer size to FIRST_FEW_BYTES.
      Parameters:
      bufferSize - Size to set
      Returns:
      the size actually set
      Since:
      6.0
    • isBinary

      public static boolean isBinary(InputStream raw) throws IOException
      Determine heuristically whether the bytes contained in a stream represents binary (as opposed to text) content. Note: Do not further use this stream after having called this method! The stream may not be fully read and will be left at an unknown position after consuming an unknown number of bytes. The caller is responsible for closing the stream.
      Parameters:
      raw - input stream containing the raw file content.
      Returns:
      true if raw is likely to be a binary file, false otherwise
      Throws:
      IOException - if input stream could not be read
    • isBinary

      public static boolean isBinary(byte[] raw)
      Determine heuristically whether a byte array represents binary (as opposed to text) content.
      Parameters:
      raw - the raw file content.
      Returns:
      true if raw is likely to be a binary file, false otherwise
    • isBinary

      public static boolean isBinary(byte[] raw, int length)
      Determine heuristically whether a byte array represents binary (as opposed to text) content.
      Parameters:
      raw - the raw file content.
      length - number of bytes in raw to evaluate. This should be raw.length unless raw was over-allocated by the caller.
      Returns:
      true if raw is likely to be a binary file, false otherwise
    • isBinary

      public static boolean isBinary(byte[] raw, int length, boolean complete)
      Determine heuristically whether a byte array represents binary (as opposed to text) content.
      Parameters:
      raw - the raw file content.
      length - number of bytes in raw to evaluate. This should be raw.length unless raw was over-allocated by the caller.
      complete - whether raw contains the whole data
      Returns:
      true if raw is likely to be a binary file, false otherwise
      Since:
      6.0
    • isBinary

      public static boolean isBinary(byte curr, byte prev)
      Determines from the last two bytes read from a source if it looks like binary content.
      Parameters:
      curr - the last byte, read after prev
      prev - the previous byte, read before last
      Returns:
      true if either byte is NUL, or if prev is CR and curr is not LF, false otherwise
      Since:
      6.0
    • isCrLfText

      public static boolean isCrLfText(byte[] raw)
      Determine heuristically whether a byte array represents text content using CR-LF as line separator.
      Parameters:
      raw - the raw file content.
      Returns:
      true if raw is likely to be CR-LF delimited text, false otherwise
      Since:
      5.3
    • isCrLfText

      public static boolean isCrLfText(InputStream raw) throws IOException
      Determine heuristically whether the bytes contained in a stream represent text content using CR-LF as line separator. Note: Do not further use this stream after having called this method! The stream may not be fully read and will be left at an unknown position after consuming an unknown number of bytes. The caller is responsible for closing the stream.
      Parameters:
      raw - input stream containing the raw file content.
      Returns:
      true if raw is likely to be CR-LF delimited text, false otherwise
      Throws:
      IOException - if input stream could not be read
      Since:
      5.3
    • isCrLfText

      public static boolean isCrLfText(byte[] raw, int length)
      Determine heuristically whether a byte array represents text content using CR-LF as line separator.
      Parameters:
      raw - the raw file content.
      length - number of bytes in raw to evaluate.
      Returns:
      true if raw is likely to be CR-LF delimited text, false otherwise
      Since:
      5.3
    • isCrLfText

      public static boolean isCrLfText(byte[] raw, int length, boolean complete)
      Determine heuristically whether a byte array represents text content using CR-LF as line separator.
      Parameters:
      raw - the raw file content.
      length - number of bytes in raw to evaluate.
      complete - whether raw contains the whole data
      Returns:
      true if raw is likely to be CR-LF delimited text, false otherwise
      Since:
      6.0
    • getLineDelimiter

      public String getLineDelimiter()
      Get the line delimiter for the first line.
      Returns:
      the line delimiter or null
      Since:
      2.0
    • load

      public static RawText load(ObjectLoader ldr, int threshold) throws IOException, BinaryBlobException
      Read a blob object into RawText, or throw BinaryBlobException if the blob is binary.
      Parameters:
      ldr - the ObjectLoader for the blob
      threshold - if the blob is larger than this size, it is always assumed to be binary.
      Returns:
      the RawText representing the blob.
      Throws:
      BinaryBlobException - if the blob contains binary data.
      IOException - if the input could not be read.
      Since:
      4.10