User Tools

Site Tools


cross_platform:encoding:encoding

  

Makefile

QR code

The use of QR codes is free of any license. The QR code is clearly defined and published as an ISO standard.

Text Wrangling

  • Here I'm talking about raw text, no formatting, except for the pesky EOL.
  • LanguageTool - Open Source proofreading software

Carriage return

In ASCII and Unicode, the carriage return is defined as 13 (or hexadecimal 0D); it may also be seen as control+M or ^M.

Tab key

In C and many other programming languages the escape code \t can be used to put this character into a string constant.

ASCII

American Standard Code for Information Interchange was the encoding of the WWW until 2008.

ASCII developed from telegraphic codes. Its first commercial use was as a seven-bit teleprinter code promoted by Bell data services.

And it's a very very limited tool for a World Wide Web.

gVim

Install gVim

  • echo $VIMRUNTIME shows where it's installed.
  • If you're in a gVim, you can see what's available to you (including GUI functionality), with :version.

MSWin

http://vim.sourceforge.net/download.php#pc (= http://www.vim.org/download.php#pc ) is the source for gVim, but Cream has more up-to-date (Creamless) versions.

gvim-7-4-423.exe

- on Windows 7, oddly, has these important configuration options off-ticked by default:

Desktop icon
Contextual menu
Command prompt.BAT files 
Create plugin directories in %HOMEDRIVE%%HOMEPATH%

- I tick everything on to get full functionality.

Those BAT files are: C:\Windows\gview.bat, gvim.bat, gvimdiff.bat.

They get around the need to do this:

(Win-Brk for) System > Advanced system settings (System Properties > Advanced) > Environment Variables > System Variables > Path > Edit

then append gvim.exe's location to the existing path - to be able to run gvim from console:

;C:\Program Files\vim\vim74

- which you'd have to do anyway if you also want access to vim without GUI (eg as called by Vifm).


vim Easy is offered as a Desktop shortcut (= gvim -y). It runs in insert mode (Esc doesn't take you back to normal mode), but you might as well use Notepad++.

Deciding

my case

Well it took me quite some time to select vim, and then about 80 hours to get up to a level of competency that feels useful, so I understand why people - including me in the past - are put off.

I do a little coding, and I tend to take a lot of notes as I'm learning, and I've just written another story (for my theatre work) in LibreOffice, and I keep my private thoughts and address books in odt format too. But recently I got fed up with the slow load times of LibreOffice, and concerned that I was generating too many notes, and finding things hard to locate once I'd noted them. Most of all, I couldn't get LibreOffice to compress, expand, section, and move around my long prose in the fast and easy way that I wanted.

So I decided to look for something more flexible, but that would allow me to work on Windows 7, GNU-Linux, and Android! It was an article by Cory Doctorow that convinced me that I should downgrade my format to plain text. Then I read about how fast vim is once you get used to it, and the scripts that can make it quickly do clever stuff like make a simple wiki, and I was convinced. Then I went down the rabbit hole, as vimmers sometimes say, discovering the vast power of this text-wrangler.

I still have some doubts about how working in monospaced text is going to be - we'll see. But I've no doubt that the time I've spent learning vim will pay off - it really is a cleverly powerful tool that makes working in plain text files fun


Regular Expressions

further Guidance

Regexper zenly simple, to just type in this, for example, which looks for hr followed by any number of t's:

/hrt*/


ZyTrax's User Guide

RegExLib.com- a library!

Learn Regex The Hard Way

from Wikipedia

Glob

http://en.wikipedia.org/wiki/Glob_%28programming%29

is an abbreviation for “global command”
glob(), used by programs such as the shell.
gray|grey gr(a|e)y Boolean ORs to match “gray” or “grey”.
colou?r matches both “color” and “colour”
ab*c matches “ac”, “abc”, “abbc”, “abbbc”, and so on
ab+c matches “abc”, “abbc”, “abbbc”, and so on, but not “ac”.
the three strings “Handel”, “Händel”, and “Haendel” can be specified by the pattern H(ä|ae?)ndel

Typography

characters in typesetting

Soft hyphen (U+00AD)

may also be called a discretionary hyphen or optional hyphen. It serves as an invisible marker used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed. It becomes visible only after word wrapping at the end of a line.

Fonts

CJK fonts

computer fonts which contain a large range of Chinese/Japanese/Korean characters

Monospaced font

TeX

Device independent file format

dvipdfmx is an extended version of the dvipdfm DVI-to-PDF translator, included current TeX distributions like TeX Live 2014 and MiKTeX 2.9. The primary goal of the dvipdfmx project is to support multi-byte character encodings and CJK character sets for East Asian languages. dvipdfmx is also included (in a somewhat modified form) in XeTeX.

TeX

The name TeX is intended by its developer to be with the final consonant of loch or Bach. The letters of the name are meant to represent the capital Greek letters tau, epsilon, and chi, as TeX is an abbreviation of τέχνη (ΤΕΧΝΗ – technē), Greek for both “art” and “craft”, which is also the root word of technical. English speakers often pronounce it like the first syllable of technical.

XeTeX

http://en.wikipedia.org/wiki/XeTeX

xdvipdfmx, a modified version of dvipdfmx, which uses FreeType. This driver works on all platforms.
works well with both LaTeX and ConTeXt macro packages. Its LaTeX counterpart is invoked as xelatex. It is usually used with the fontspec package, which provides a configurable interface for font selection, and allows complex font choices to be named and later reused.

Internationalization

babel “Multilingual support for Plain TEX or LATEX”

http://en.wikibooks.org/wiki/LaTeX/Internationalization

The babel package … will take care of everything (with XeTeX and LuaTeX you should consider polyglossia). \usepackage[language]{babel} - place soon after the \documentclass command, so that all the other packages you load afterwards will know the language you are using. Babel will automatically activate the appropriate hyphenation rules for the language you choose.

TeX Live

TeX Directory Structure

http://en.wikipedia.org/wiki/TeX_Live

is now the default TeX distribution for several Linux distributions such as Fedora, Debian, Ubuntu and Gentoo.

Texdoc helps you find and view documentation in TeX Live.

Texdoc is part of the TeX Live distribution. (A texdoc command is available in MikTeX, but it is actually a shortcut for another program: mthelp, developed independently.)

TeXworks

requires a TeX-Installation: TeX Live, MiKTeX, or MacTeX

Get it, install it, and run it on a tex file - Ctrl-T. It's lightweight and very fast. You'll have to manually run it twice if you're using something like lastpage package. My Preferences are:

Preview > Fit to window | Typesetting > Processing tools > Default > XeLaTeX

Coding

Namespace

In general, a namespace is a container for a set of identifiers (names), and allows the disambiguation of homonym identifiers residing in different namespaces.

printf format string

The format string is written in a simple template language, and specifies a method for rendering an arbitrary number of varied data type parameters into a string. This string is then by default printed on the standard output stream

Procedural programming

a programming paradigm, derived from structured programming, based upon the concept of the procedure call. Procedures, also known as routines, subroutines, methods, or functions (not to be confused with mathematical functions, but similar to those used in functional programming), simply contain a series of computational steps to be carried out. Any given procedure might be called at any point during a program's execution, including by other procedures or itself. Procedural programming languages include C, Go, Fortran, Pascal, and BASIC.

Tuple:

In mathematics, computer science, linguistics, and philosophy a tuple is an ordered list of elements… The term originated as an abstraction of the sequence: single, double, triple, quadruple, quintuple, sextuple, septuple, octuple, …, n-tuple, …

Trimming (computer programming)

In computer programming, trimming (trim) or stripping (strip) is a string manipulation in which leading and trailing whitespace is removed from a string.

Coding Craft

Code reading becomes much easier with syntax highlighting and Code folding. Comments are of course helpful.

MWE (Minimal Working Example)

Object-oriented programming

Method (or a member function)

Methods define the behaviour of objects at program run time. Methods can be defined both inside and outside a class.

Languages

Fortran

Since Fortran has been in use for more than fifty years, there is a vast body of Fortran in daily use throughout the scientific and engineering communities.

C

Java

Criticism of Java

Java (programming language)

a general-purpose computer programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible. It is intended to let application developers “write once, run anywhere” (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation. Java applications are typically compiled to bytecode that can run on any Java virtual machine (JVM) regardless of computer architecture. As of 2015, Java is one of the most popular programming languages in use, particularly for client-server web applications, with a reported 9 million developers.

What Is a Class?

Java Class Library

https://en.wikipedia.org/wiki/Java_Class_Library

The Java Class Library (JCL) is a set of dynamically loadable libraries that Java applications can call at run time. Because the Java Platform is not dependent on a specific operating system, applications cannot rely on any of the platform-native libraries. Instead, the Java Platform provides a comprehensive set of standard class libraries, containing the functions common to modern operating systems.
Almost all of JCL is stored in a single Java archive file called “rt.jar”, which is provided with JRE and JDK distributions. The Java Class Library (rt.jar) is located in the default bootstrap classpath, and does not have to appear in the classpath declared for the application. The runtime uses the bootstrap class loader to find the JCL.

JAR (file format)

In software, JAR (Java Archive) is a package file format typically used to aggregate many Java class files and associated metadata and resources (text, images, etc.) into one file to distribute application software or libraries on the Java platform.
JAR files are fundamentally archive files, built on the ZIP file format and have the .jar file extension. Computer users can create or extract JAR files using the jar command that comes with a JDK. They can also use zip tools to do so; however, the order of entries in the zip file headers is important when compressing, as the manifest often needs to be first. Inside a JAR, file names are unicode text.
An executable Java program can be packaged in a JAR file, along with any libraries the program uses. Executable JAR files have the manifest specifying the entry point class with Main-Class: myPrograms.MyClass and an explicit Class-Path (and the -cp argument is ignored). Some operating systems can run these directly when clicked. The typical invocation is “java -jar foo.jar” from a command line.

R

The Comprehensive R Archive Network

Google's R Style Guide

R (programming language)

Like other similar languages such as APL and MATLAB, R supports matrix arithmetic. R's data structures include vectors, matrices, arrays, data frames (similar to tables in a relational database) and lists. R's extensible object system includes objects for (among others): regression models, time-series and geo-spatial coordinates. The scalar data type was never a data structure of R. A scalar is represented as a vector with length one in R.

Setting the line length in R - options()$width tells me what it is, and options(width=120) sets it.

Terminate an R Session: q(save=“no”)

R Programming/Settings

https://en.wikibooks.org/wiki/R_Programming/Settings

Once you have installed R, you may want to choose a working environment. This can be a simple text editor (such as Emacs, Vim or Gedit), an integrated development interface (IDE) or graphical user interface (GUI). RStudio is now a popular option.
For Linux and Mac OS users it is possible to use R from the terminal.
$ R
> q("no") # to leave R and return to the terminal


R can be customized using the Rprofile file. On Linux, this file is stored in the home directory. You can edit it by running the following command in a terminal : $ gedit ~/.Rprofile

- which ain't in my N130 openSUSE (I've got ~/.Rhistory).

Markup

  • When HTML & CSS are done wrong, you get tag soup.
  • WYSIWYM an acronym for “what you see is what you mean”.

Cascading Style Sheets

CSS does it all! Checkout Mozilla's guide.

XML

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.

XHTML Extensible HyperText Markup Language

a family of XML markup languages that mirror or extend versions of the widely used Hypertext Markup Language (HTML), the language in which web pages are written.

Windows 7 opens a *.xml in Internet Explorer by default, but Vim has good syntax highlighting too.

HyperText Markup Language

Character encodings & Special characters

http://en.wikipedia.org/wiki/HTML

HTML Color Shades

HTML Codes Table

HTML Comments

If you've got some HTML to try, here's a good place: http://htmledit.squarefree.com

I found HTML weird until I realised what markup means, and that any word processor is doing a kind of hidden markup. Now I've taken to writing in plain text with gVim, if I want a fancier output, like these DokuWiki pages, writing the markup is easy. HTML is just a standard markup that breaks web pages down into elements, which are defined by tags. It's really not hard, but appears bewildering at first because web pages are rarely simple static documents, but usually heavily styled with CSS, and full of complicated dynamic content that changes in response to the viewer.

HTML5

HTML Elements

at Wikipedia

http://en.wikipedia.org/wiki/HTML_element eg <br line break

an end tag, in which the element name is prefixed with a slash: </tag>
Element (and attribute) names may be written in any combination of upper or lower case in HTML, but must be in lower case in XHTML.
An HTML document can also be extended through the use of scripts to provide additional behaviors beyond the abilities of HTML hyperlinks and forms.
The elements style and script, with related attributes, provide reference points in HTML markup for links to stylesheets and scripts. They can also contain instructions directly.

Pandoc

Templates:

Batch conversion in Windows Console

I adapted Endoro's answer for recursive conversion of all `*.md` files under the current directory to `*.pdf`:

for /r . %i in (*.md) do pandoc -f markdown_strict %~fi -o %~dpni.pdf

Adding in my personal preferences:

for /r . %i in (*.md) do pandoc -V mainfont="Arial" --toc --toc-depth=4 -f markdown_strict --latex-engine=xelatex %~fi -o %~dpni.pdf

markdown to pdf

For converting my SM files, I need to specify markdown_strict:

pandoc -f markdown_strict WorkLog_140308-.txt -o w.pdf --latex-engine=xelatex

- I'm also using the XeLaTeX conversion engine for better font support.

LaTeX geometry package is recommended in the FAQs for setting margins - I find any less than 1.2cm and the page number gets cropped by the bottom edge of the page. The default paper turns out to be US Letter, so I change that too, and increase the LaTeX fontsize (from default 10):

pandoc -V papersize:"a4paper" -V geometry:margin=1.2cm -V fontsize=12pt -f markdown_strict Minecraft.txt -o Minecraft.pdf --latex-engine=xelatex

This gets landscape:

pandoc -Vpapersize:a4paper,landscape -Vgeometry:margin=1in -f markdown_strict md.txt -o md.pdf --latex-engine=xelatex

documentclass & 4th level header

A search of the *.hs files in https://github.com/jgm/pandoc/ shows it to refer to documentclass only in Readers\LaTeX.hs & Writers\LaTeX.hs. In the latter, it appears to default to article.

In any case, as Ivan Zakharyaschev demonstrated, the 4th level header, eg #### fourth becomes \paragraph{fourth}. Unfortunately, this is generally distinguished in LaTeX from the third level only by the fact that the contents begin on the same line as the header. This MWE, if you compile it, demonstrates:

\documentclass{article}
\setcounter{secnumdepth}{0}
\begin{document}
\section{A Section}
this text is nicely separated from the header
\subsection{A Subsection}
this text is nicely separated from the header
\subsubsection{A Subsubsection}
this text is nicely separated from the header
\paragraph{A Paragraph}
this first line of text runs on from the header
\subparagraph{A Subparagraph}
this first line of text runs on from the header
\end{document}

- changing the documentclass to report, memoir, scrartcl doesn't improve the formatting of the section headers. This is unfortunate for me because I sometimes want to follow a 4th level header with some quoted verse, and I can't find any discrete and robust (ie works in conversion to other formats) way to tell markdown to begin the first line of text on a line of it's own, and not a continuation of the 4th level header's line. For example, this, and other tricks, don't work universally:

#### 4th level header
>  
first line of text now following a false comment of two blank spaces

I considered adding header formatting into pandoc\templates\default.latex, but then found what seems to be a simpler solution, by elevating the headers with –chapters. The first four header levels (the only ones I find consistently well formatted in other markdown conversion engines) become Chapter, Section, Subsection, Subsubsection, with much clearer formatting distinctions between them, but Chapter comes with some now unwanted formatting, which I deal with in the header.

To automate my conversion preferences, I wrote a Windows Batch file and a Powershell script.

table-of-contents

is created with –toc –toc-depth=<depth>, where <depth> is from 1 to 6, like this:

pandoc -V mainfont="Arial" --toc --toc-depth=6 -f markdown_strict Verifiable.md -o Verifiable.pdf --latex-engine=xelatex

The hyperlinking works nice and unobtrusively, but toc only descends to depth 5 (my issue), a LaTeX limitation.

pdf bookmarks are on by default to a depth of 3 (or as set by –toc-depth), and you'd have to adjust Pandoc's default.latex template to turn them off.

cross_platform/encoding/encoding.txt · Last modified: 2016/09/27 20:49 (external edit)