downloading via command line with wget

/etc/wgetrc 	Default location of the global startup file.
.wgetrc 	User startup file.
 
#How to Download a Website Using wget

wget -r www.dlsite.com

#This downloads the pages recursively up to a maximum of 5 levels deep.

#Five levels deep might not be enough to get everything from the site. You can use the -l switch to set the number of levels you wish to go to as follows:

wget -r -l10 www.dlsite.com

#If you want infinite recursion you can use the following:

wget -r -l inf www.dlsite.com

#  How to Download Certain File Types 

wget -A "*.mp3" -r

#The reverse of this is to ignore certain files. Perhaps you don't want to download executables. In this case, you would use the following syntax:

wget -R "*.exe" -r

#Other Parameters

-b, --background	Go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log.

-o logfile, --output-file=logfile	Log all messages to logfile. The messages are normally reported to standard error.

-a logfile, --append-output=logfile 	Append to logfile. This option is the same as -o, only it appends to logfile instead of overwriting the old log file. If logfile does not exist, a new file is created.

-q, --quiet 	Turn off wget's output.

-v, --verbose 	Turn on verbose output, with all the available data. The default output is verbose.

-nv, --non-verbose 	Non-verbose output. Turn off verbose without being completely quiet (use -q for that), which means that error messages and basic information still get printed.

-i file, --input-file=file 	Read URLs from a local or external file. If "-" is specified as file, URLs are read from the standard input. (Use "./-" to read from a file literally named "-".)

-F, --force-html 	When input is read from a file, force it to be treated as an HTML file. This enables you to retrieve relative links from existing HTML files on your local disk, by adding  to HTML, or using the --base command-line option.

-t number, --tries=number 	Set number of retries to number. Specify 0 or inf for infinite retrying. The default is to retry 20 times, with the exception of fatal errors like "connection refused'' or "not found'' (404), which are not retried.

-O file, --output-document=file 	The documents will not be written to the appropriate files, but all will be concatenated together and written to file.

-c, --continue 	Continue getting a partially-downloaded file. This option is useful when you want to finish up a download started by a previous instance of wget, or by another program. For instance: wget -c ftp://dlsite/filename

--progress=type 	Select the progress indicator you want to use. Legal indicators are "dot" and "bar".

-N, --timestamping 	Turn on time stamping. Output file will have timestamp matching remote copy; if file already exists locally, and remote file is not newer, no download will occur.

--no-use-server-timestamps 	Don't set the local file's timestamp by the one on the server.

-S, --server-response 	Print the headers sent by HTTP servers and responses sent by FTP servers.

--spider 	When invoked with this option, wget will behave as a web spider, which means that it will not download the pages, just check that they are there. For example, you can use wget to check your bookmarks: wget --spider --force-html -i bookmarks.html

-T seconds, --timeout=seconds 	Set the network timeout to seconds seconds. This option is equivalent to specifying --dns-timeout, --connect-timeout, and --read-timeout, all at the same time.

--dns-timeout=seconds 	Set the DNS lookup timeout to seconds seconds. DNS lookups that don't complete within the specified time will fail. By default, there is no timeout on DNS lookups, other than that implemented by system libraries.

--connect-timeout=seconds 	Set the connect timeout to seconds seconds. TCP connections that take longer to establish will be aborted. By default, there is no connect timeout, other than that implemented by system libraries.

--read-timeout=seconds 	Set the read (and write) timeout to seconds seconds. Reads that take longer will fail. The default value for read timeout is 900 seconds.

--limit-rate=amount 	Limit the download speed to amount bytes per second. The amount may be expressed in bytes, kilobytes (with the k suffix), or megabytes (with the m suffix). For example, --limit-rate=20k will limit the retrieval rate to 20 KB/s. This option is useful when, for whatever reason, you don't want wget to consume the entire available bandwidth.

-w seconds, --wait=seconds 	Wait the specified number of seconds between the retrievals. Use of this option is recommended, as it lightens the server load by making the requests less frequent. Instead of in seconds, the time can be specified in minutes using the m suffix, in hours using h suffix, or in days using d suffix.

--waitretry=seconds 	If you don't want wget to wait between every retrieval, but only between retries of failed downloads, you can use this option. wget will use linear backoff, waiting 1 second after the first failure on a given file, then waiting 2 seconds after the second failure on that file, up to the maximum number of seconds you specify. Therefore, a value of 10 will actually make wget wait up to (1 + 2 + ... + 10) = 55 seconds per file. By default, wget will assume a value of 10 seconds.

--random-wait 	Some websites may perform log analysis to identify retrieval programs such as wget by looking for statistically significant similarities in the time between requests. This option causes the time between requests to vary between 0 and 2*wait seconds, where wait was specified using the --wait option, to mask wget's presence from such analysis.

--no-dns-cache 	Turn off caching of DNS lookups. Normally, wget remembers the addresses it looked up from DNS so it doesn't have to repeatedly contact the DNS server for the same (typically small) set of addresses it retrieves. This cache exists in memory only; a new wget run will contact DNS again.

--retry-connrefused 	Consider "connection refused" a transient error and try again. Normally wget gives up on a URL when it is unable to connect to the site because failure to connect is taken as a sign that the server is not running at all and that retries would not help. This option is for mirroring unreliable sites whose servers tend to disappear for short periods of time.

--user=user, --password=password 	Specify the username user and password for both FTP and HTTP file retrieval. These parameters can be overridden using the --ftp-user and --ftp-password options for FTP connections and the --http-user and --http-password options for HTTP connections.

--ask-password 	Prompt for a password for each connection established. Cannot be specified when --password is being used, because they are mutually exclusive.

--unlink 	Force wget to unlink file instead of clobbering existing file. This option is useful for downloading to the directory with hardlinks.

-nd, --no-directories 	Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the file names will get extensions .n).

-x, --force-directories 	The opposite of -nd; create a hierarchy of directories, even if one would not have been created otherwise. For example, wget -x http://fly.srk.fer.hr/robots.txt will save the downloaded file to fly.srk.fer.hr/robots.txt.

-nH, --no-host-directories 	Disable generation of host-prefixed directories. By default, invoking wget with -r http://dlsite/ will create a structure of directories beginning with dlsite/. This option disables such behaviour.

--protocol-directories 	Use the protocol name as a directory component of local file names. For example, with this option, wget -r http://host will save to http/host/... rather than just to host/....

--cut-dirs=number 	Ignore number directory components. This option is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.

--http-user=user, --http-passwd=password 	Specify the username user and password on an HTTP server. According to the challenge, wget will encode them using either the "basic" (insecure) or the "digest" authentication scheme.

--ignore-length 	Unfortunately, some HTTP servers (CGI programs, to be more precise) send out bogus "Content-Length" headers, which makes wget start to bray like a stuck pig, as it thinks not all the document was retrieved. You can spot this syndrome if wget retries getting the same document again and again, each time claiming that the (otherwise normal) connection has closed on the very same byte. With this option, wget ignores the "Content-Length" header, as if it never existed.

--private-key=file 	Read the private key from file. This option allows you to provide the private key in a file separate from the certificate.

--private-key-type=type 	Specify the type of the private key. Accepted values are PEM (the default) and DER.

-r, --recursive 	Turn on recursive retrieving.

-l depth, --level=depth 	Specify recursion maximum depth level depth. The default maximum depth is 5.

-K, --backup-converted 	When converting a file, backup the original version with an .orig suffix. Affects the behavior of -N.

-m, --mirror 	Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf -nr.

-p, --page-requisites 	This option causes wget to download all the files that are necessary to properly display a given HTML page. Including such things as inlined images, sounds, and referenced stylesheets. Ordinarily, when downloading a single HTML page, any requisite documents that may be needed to display it properly are not downloaded. Using -r together with -l can help, but since wget does not ordinarily distinguish between external and inlined documents, one is generally left with "leaf documents'' that are missing their requisites.

-A acclist, --accept acclist; -R rejlist, --reject rejlist 	Specify comma-separated lists of file name suffixes or patterns to accept or reject. Note that if any of the wildcard characters, *, ?, [ or ], appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a suffix.

-D domain-list, --domains=domain-list 	Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H.

--exclude-domains domain-list 	Specify the domains that are not to be followed.

--follow-ftp 	Follow FTP links from HTML documents. Without this option, wget will ignore all the FTP links.